Number of Matching Subsequences: Bucket Method vs Binary Search
Master two efficient approaches to solve the Matching Subsequences problem - Bucket/Queue method with O(|s| + Σ|words|) and Binary Search with O(|s| + Σ|word|×log|s|). Interactive visualizations and complete code templates.
Introduction
Problem: Number of Matching Subsequences (LC 792)
Given a string s and an array of strings words, return the number of words[i] that is a subsequence of s.
A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.
Example:
Input: s = "abcde", words = ["a","bb","acd","ace"]
Output: 3
Explanation: "a", "acd", "ace" are subsequences of "abcde"
Why Interviewers Love This Problem
- Tests optimization thinking: The naive O(|s| × |words|) approach will TLE
- Multiple valid solutions: Bucket method vs Binary Search - both are valid
- Data structure design: How you organize data affects performance
- Real-world relevance: Similar to autocomplete, text matching, DNA sequence analysis
Core Intuition
The naive approach checks each word against s using two pointers - O(|s|) per word. When words is large, this becomes too slow.
Key insight: Instead of checking each word independently, we can:
- Bucket Method: Process all words simultaneously as we iterate through
sonce - Binary Search: Preprocess
sto enable O(log|s|) character lookup per character in word
Interactive Visualizer
Subsequence Matching Visualizer
s = "abcde", words = ["acd","ace","bb"]Solution 1: Bucket/Queue Method
Concept
Think of this as an event-driven or publish-subscribe system:
| Algorithm Concept | Software Architecture Analogy |
|---|---|
| 26 buckets | Message Queues/Topics - one topic per letter |
| Word enters bucket | Subscribe - word subscribes to its next needed character |
| Iterate through s | Publisher - each character is a published message |
| Process & move | Consumer - on match, either complete or re-subscribe to next topic |
Algorithm Steps
1. Initialize 26 buckets (one per letter)
2. Place each word in bucket of its FIRST character
- Store: (wordIndex, currentMatchPosition)
3. For each character c in string s:
- Take all entries from bucket[c]
- For each entry:
- Advance position by 1
- If position == word.length → word is matched! count++
- Else → move to bucket of NEXT character to match
4. Return count
Visual Walkthrough
s = "abcde", words = ["acd", "ace"]
Initial:
bucket['a'] = [{0,0}, {1,0}] // acd@0, ace@0
Process 'a':
- {0,0} → acd matched 'a', next='c' → bucket['c'] = [{0,1}]
- {1,0} → ace matched 'a', next='c' → bucket['c'] = [{0,1}, {1,1}]
Process 'b':
- bucket['b'] empty, skip
Process 'c':
- {0,1} → acd matched 'c', next='d' → bucket['d'] = [{0,2}]
- {1,1} → ace matched 'c', next='e' → bucket['e'] = [{1,2}]
Process 'd':
- {0,2} → acd matched 'd', pos=3 == len("acd") ✓ count=1
Process 'e':
- {1,2} → ace matched 'e', pos=3 == len("ace") ✓ count=2
Result: 2
Code
Complexity Analysis
- Time:
- Iterate through
sonce: O(|s|) - Each character in each word processed exactly once: O(Σ|words[i]|)
- Iterate through
- Space:
- Store (wordIndex, position) pairs for all words
Solution 2: Binary Search Method
Concept
Preprocess s to build a position index, then for each word, use binary search to find characters in order.
s = "abcab"
Position Index:
'a' → [0, 3] // 'a' appears at positions 0 and 3
'b' → [1, 4] // 'b' appears at positions 1 and 4
'c' → [2] // 'c' appears at position 2
For word “cab”:
- Find ‘c’ at position > -1 → found at 2, lastPos = 2
- Find ‘a’ at position > 2 → found at 3, lastPos = 3
- Find ‘b’ at position > 3 → found at 4, lastPos = 4
- All characters found → “cab” IS a subsequence ✓
Binary Search: Find First Position > Target
positions = [0, 3, 7, 12, 15]
target = 5
Binary search for first value > 5:
[0, 3, 7, 12, 15]
L M R mid=7 > 5, move R
[0, 3, 7, 12, 15]
L M R mid=3 ≤ 5, move L
[0, 3, 7, 12, 15]
L R mid=7 > 5, move R
M
[0, 3, 7, 12, 15]
R L L > R, done!
↑
return L=2, positions[2]=7 is first value > 5
Code
Complexity Analysis
- Time:
- Preprocess
s: O(|s|) - For each word, binary search for each character: O(|word| × log|s|)
- Preprocess
- Space:
- Store position index for all characters in
s
- Store position index for all characters in
Method Comparison
| Aspect | Bucket Method | Binary Search |
|---|---|---|
| Time Complexity | ||
| Space Complexity | ||
| Best When | Many words, single query | Fixed s, multiple query batches |
| Preprocessing | None | Build position index |
| Constant Factor | Lower | Higher (due to log factor) |
When to Use Which?
- Bucket Method: One-time query with many words - processes everything in one pass
- Binary Search: When
sis fixed and you need to query different word sets multiple times
Edge Cases
- Empty string s: Return 0 (no word can be a subsequence)
- Empty words array: Return 0
- Word longer than s: Cannot be a subsequence
- Character not in s: Word cannot be a subsequence
- Duplicate words: Count each occurrence separately
Common Mistakes
- Off-by-one in binary search: Searching for
> lastPosvs>= lastPos - Forgetting to clear bucket: In bucket method, must process and remove entries
- Using wrong comparison:
positions[mid] > targetnot>= - Not handling empty position list: Character might not exist in
s
Interview Tips
- Start with brute force: Explain O(|s| × |words|) two-pointer approach
- Identify bottleneck: “We’re repeating work by scanning
sfor each word” - Propose optimization: Either batch process (bucket) or preprocess (binary search)
- Trade-off discussion: Explain when each method is preferred
- Code one solution: Pick the one you’re more comfortable with
Related Problems
- LC 392: Is Subsequence - Basic version, single word check
- LC 1055: Shortest Way to Form String 🔒 - Greedy with binary search
- LC 522: Longest Uncommon Subsequence II - Subsequence comparison
Summary
| Key Takeaway | Details |
|---|---|
| Pattern | Batch processing OR preprocess + query |
| Bucket Method | Event-driven, process all words in single s scan |
| Binary Search | Precompute positions, O(log n) per character lookup |
| Binary Search Template | Find first position > target, return left |
| Interview Strategy | Know both, implement your stronger one, discuss trade-offs |
Both solutions demonstrate important algorithmic thinking:
- Bucket: Avoid redundant work by processing in parallel
- Binary Search: Trade preprocessing time for faster queries