Number of Matching Subsequences: Bucket Method vs Binary Search

Introduction

Problem: Number of Matching Subsequences (LC 792)

Given a string s and an array of strings words, return the number of words[i] that is a subsequence of s.

A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.

Example:

Input: s = "abcde", words = ["a","bb","acd","ace"]
Output: 3
Explanation: "a", "acd", "ace" are subsequences of "abcde"

Why Interviewers Love This Problem

Tests optimization thinking: The naive O(|s| × |words|) approach will TLE
Multiple valid solutions: Bucket method vs Binary Search - both are valid
Data structure design: How you organize data affects performance
Real-world relevance: Similar to autocomplete, text matching, DNA sequence analysis

Core Intuition

The naive approach checks each word against s using two pointers - O(|s|) per word. When words is large, this becomes too slow.

Key insight: Instead of checking each word independently, we can:

Bucket Method: Process all words simultaneously as we iterate through s once
Binary Search: Preprocess s to enable O(log|s|) character lookup per character in word

Interactive Visualizer

Subsequence Matching Visualizer

s = "abcde", words = ["acd","ace","bb"]

Step 1/0:

Solution 1: Bucket/Queue Method

Concept

Think of this as an event-driven or publish-subscribe system:

Algorithm Concept	Software Architecture Analogy
26 buckets	Message Queues/Topics - one topic per letter
Word enters bucket	Subscribe - word subscribes to its next needed character
Iterate through s	Publisher - each character is a published message
Process & move	Consumer - on match, either complete or re-subscribe to next topic

Algorithm Steps

1. Initialize 26 buckets (one per letter)
2. Place each word in bucket of its FIRST character
   - Store: (wordIndex, currentMatchPosition)
   
3. For each character c in string s:
   - Take all entries from bucket[c]
   - For each entry:
     - Advance position by 1
     - If position == word.length → word is matched! count++
     - Else → move to bucket of NEXT character to match

4. Return count

Visual Walkthrough

s = "abcde", words = ["acd", "ace"]

Initial:
  bucket['a'] = [{0,0}, {1,0}]   // acd@0, ace@0

Process 'a':
  - {0,0} → acd matched 'a', next='c' → bucket['c'] = [{0,1}]
  - {1,0} → ace matched 'a', next='c' → bucket['c'] = [{0,1}, {1,1}]

Process 'b':
  - bucket['b'] empty, skip

Process 'c':
  - {0,1} → acd matched 'c', next='d' → bucket['d'] = [{0,2}]
  - {1,1} → ace matched 'c', next='e' → bucket['e'] = [{1,2}]

Process 'd':
  - {0,2} → acd matched 'd', pos=3 == len("acd") ✓ count=1

Process 'e':
  - {1,2} → ace matched 'e', pos=3 == len("ace") ✓ count=2

Result: 2

Code

Complexity Analysis

Time: $O(|s| + \sum|words[i]|)$ $O (∣ s ∣ + \sum ∣ w or d s [i] ∣)$
- Iterate through s once: O(|s|)
- Each character in each word processed exactly once: O(Σ|words[i]|)
Space: $O(|words| + \sum|words[i]|)$ $O (∣ w or d s ∣ + \sum ∣ w or d s [i] ∣)$
- Store (wordIndex, position) pairs for all words

Solution 2: Binary Search Method

Concept

Preprocess s to build a position index, then for each word, use binary search to find characters in order.

s = "abcab"

Position Index:
  'a' → [0, 3]    // 'a' appears at positions 0 and 3
  'b' → [1, 4]    // 'b' appears at positions 1 and 4
  'c' → [2]       // 'c' appears at position 2

For word “cab”:

Find ‘c’ at position > -1 → found at 2, lastPos = 2
Find ‘a’ at position > 2 → found at 3, lastPos = 3
Find ‘b’ at position > 3 → found at 4, lastPos = 4
All characters found → “cab” IS a subsequence ✓

Binary Search: Find First Position > Target

positions = [0, 3, 7, 12, 15]
target = 5

Binary search for first value > 5:

  [0, 3, 7, 12, 15]
   L        M     R     mid=7 > 5, move R
   
  [0, 3, 7, 12, 15]
   L  M  R            mid=3 ≤ 5, move L
   
  [0, 3, 7, 12, 15]
      L  R            mid=7 > 5, move R
      M
      
  [0, 3, 7, 12, 15]
      R  L            L > R, done!
         ↑
      return L=2, positions[2]=7 is first value > 5

Code

Complexity Analysis

Time: $O(|s| + \sum(|word[i]| \times \log|s|))$ $O (∣ s ∣ + \sum (∣ w or d [i] ∣ \times lo g ∣ s ∣))$
- Preprocess s: O(|s|)
- For each word, binary search for each character: O(|word| × log|s|)
Space: $O(|s|)$ $O (∣ s ∣)$
- Store position index for all characters in s

Method Comparison

Aspect	Bucket Method	Binary Search
Time Complexity	$O(\\|s\\| + \sum\\|words[i]\\|)$	$O(\\|s\\| + \sum(\\|word[i]\\| \times \log\\|s\\|))$
Space Complexity	$O(\\|words\\| + \sum\\|words[i]\\|)$	$O(\\|s\\|)$
Best When	Many words, single query	Fixed `s`, multiple query batches
Preprocessing	None	Build position index
Constant Factor	Lower	Higher (due to log factor)

When to Use Which?

Bucket Method: One-time query with many words - processes everything in one pass
Binary Search: When s is fixed and you need to query different word sets multiple times

Edge Cases

Empty string s: Return 0 (no word can be a subsequence)
Empty words array: Return 0
Word longer than s: Cannot be a subsequence
Character not in s: Word cannot be a subsequence
Duplicate words: Count each occurrence separately

Common Mistakes

Off-by-one in binary search: Searching for > lastPos vs >= lastPos
Forgetting to clear bucket: In bucket method, must process and remove entries
Using wrong comparison: positions[mid] > target not >=
Not handling empty position list: Character might not exist in s

Interview Tips

Start with brute force: Explain O(|s| × |words|) two-pointer approach
Identify bottleneck: “We’re repeating work by scanning s for each word”
Propose optimization: Either batch process (bucket) or preprocess (binary search)
Trade-off discussion: Explain when each method is preferred
Code one solution: Pick the one you’re more comfortable with

LC 392: Is Subsequence - Basic version, single word check
LC 1055: Shortest Way to Form String 🔒 - Greedy with binary search
LC 522: Longest Uncommon Subsequence II - Subsequence comparison

Summary

Key Takeaway	Details
Pattern	Batch processing OR preprocess + query
Bucket Method	Event-driven, process all words in single `s` scan
Binary Search	Precompute positions, O(log n) per character lookup
Binary Search Template	Find first position > target, return `left`
Interview Strategy	Know both, implement your stronger one, discuss trade-offs

Both solutions demonstrate important algorithmic thinking:

Bucket: Avoid redundant work by processing in parallel
Binary Search: Trade preprocessing time for faster queries

Introduction

Problem: Number of Matching Subsequences (LC 792)

Why Interviewers Love This Problem

Core Intuition

Interactive Visualizer

Subsequence Matching Visualizer

Solution 1: Bucket/Queue Method

Concept

Algorithm Steps

Visual Walkthrough

Code

Complexity Analysis

Solution 2: Binary Search Method

Concept

Binary Search: Find First Position > Target

Code

Complexity Analysis

Method Comparison

When to Use Which?

Edge Cases

Common Mistakes

Interview Tips

Related Problems

Summary