English Jackey

Number of Matching Subsequences: Bucket Method vs Binary Search

Master two efficient approaches to solve the Matching Subsequences problem - Bucket/Queue method with O(|s| + Σ|words|) and Binary Search with O(|s| + Σ|word|×log|s|). Interactive visualizations and complete code templates.

#algorithm #string #binary-search #subsequence #leetcode #interview

Introduction

Problem: Number of Matching Subsequences (LC 792)

Given a string s and an array of strings words, return the number of words[i] that is a subsequence of s.

A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.

Example:

Input: s = "abcde", words = ["a","bb","acd","ace"]
Output: 3
Explanation: "a", "acd", "ace" are subsequences of "abcde"

Why Interviewers Love This Problem

  1. Tests optimization thinking: The naive O(|s| × |words|) approach will TLE
  2. Multiple valid solutions: Bucket method vs Binary Search - both are valid
  3. Data structure design: How you organize data affects performance
  4. Real-world relevance: Similar to autocomplete, text matching, DNA sequence analysis

Core Intuition

The naive approach checks each word against s using two pointers - O(|s|) per word. When words is large, this becomes too slow.

Key insight: Instead of checking each word independently, we can:

  1. Bucket Method: Process all words simultaneously as we iterate through s once
  2. Binary Search: Preprocess s to enable O(log|s|) character lookup per character in word

Interactive Visualizer

Subsequence Matching Visualizer

s = "abcde", words = ["acd","ace","bb"]
Step 1/0:

Solution 1: Bucket/Queue Method

Concept

Think of this as an event-driven or publish-subscribe system:

Algorithm ConceptSoftware Architecture Analogy
26 bucketsMessage Queues/Topics - one topic per letter
Word enters bucketSubscribe - word subscribes to its next needed character
Iterate through sPublisher - each character is a published message
Process & moveConsumer - on match, either complete or re-subscribe to next topic

Algorithm Steps

1. Initialize 26 buckets (one per letter)
2. Place each word in bucket of its FIRST character
   - Store: (wordIndex, currentMatchPosition)
   
3. For each character c in string s:
   - Take all entries from bucket[c]
   - For each entry:
     - Advance position by 1
     - If position == word.length → word is matched! count++
     - Else → move to bucket of NEXT character to match

4. Return count

Visual Walkthrough

s = "abcde", words = ["acd", "ace"]

Initial:
  bucket['a'] = [{0,0}, {1,0}]   // acd@0, ace@0

Process 'a':
  - {0,0} → acd matched 'a', next='c' → bucket['c'] = [{0,1}]
  - {1,0} → ace matched 'a', next='c' → bucket['c'] = [{0,1}, {1,1}]

Process 'b':
  - bucket['b'] empty, skip

Process 'c':
  - {0,1} → acd matched 'c', next='d' → bucket['d'] = [{0,2}]
  - {1,1} → ace matched 'c', next='e' → bucket['e'] = [{1,2}]

Process 'd':
  - {0,2} → acd matched 'd', pos=3 == len("acd") ✓ count=1

Process 'e':
  - {1,2} → ace matched 'e', pos=3 == len("ace") ✓ count=2

Result: 2

Code

Loading...

Complexity Analysis

  • Time: O(s+words[i])O(|s| + \sum|words[i]|)
    • Iterate through s once: O(|s|)
    • Each character in each word processed exactly once: O(Σ|words[i]|)
  • Space: O(words+words[i])O(|words| + \sum|words[i]|)
    • Store (wordIndex, position) pairs for all words

Solution 2: Binary Search Method

Concept

Preprocess s to build a position index, then for each word, use binary search to find characters in order.

s = "abcab"

Position Index:
  'a' → [0, 3]    // 'a' appears at positions 0 and 3
  'b' → [1, 4]    // 'b' appears at positions 1 and 4
  'c' → [2]       // 'c' appears at position 2

For word “cab”:

  1. Find ‘c’ at position > -1 → found at 2, lastPos = 2
  2. Find ‘a’ at position > 2 → found at 3, lastPos = 3
  3. Find ‘b’ at position > 3 → found at 4, lastPos = 4
  4. All characters found → “cab” IS a subsequence ✓

Binary Search: Find First Position > Target

positions = [0, 3, 7, 12, 15]
target = 5

Binary search for first value > 5:

  [0, 3, 7, 12, 15]
   L        M     R     mid=7 > 5, move R
   
  [0, 3, 7, 12, 15]
   L  M  R            mid=3 ≤ 5, move L
   
  [0, 3, 7, 12, 15]
      L  R            mid=7 > 5, move R
      M
      
  [0, 3, 7, 12, 15]
      R  L            L > R, done!

      return L=2, positions[2]=7 is first value > 5

Code

Loading...

Complexity Analysis

  • Time: O(s+(word[i]×logs))O(|s| + \sum(|word[i]| \times \log|s|))
    • Preprocess s: O(|s|)
    • For each word, binary search for each character: O(|word| × log|s|)
  • Space: O(s)O(|s|)
    • Store position index for all characters in s

Method Comparison

AspectBucket MethodBinary Search
Time ComplexityO(s+words[i])O(\|s\| + \sum\|words[i]\|)O(s+(word[i]×logs))O(\|s\| + \sum(\|word[i]\| \times \log\|s\|))
Space ComplexityO(words+words[i])O(\|words\| + \sum\|words[i]\|)O(s)O(\|s\|)
Best WhenMany words, single queryFixed s, multiple query batches
PreprocessingNoneBuild position index
Constant FactorLowerHigher (due to log factor)

When to Use Which?

  • Bucket Method: One-time query with many words - processes everything in one pass
  • Binary Search: When s is fixed and you need to query different word sets multiple times

Edge Cases

  1. Empty string s: Return 0 (no word can be a subsequence)
  2. Empty words array: Return 0
  3. Word longer than s: Cannot be a subsequence
  4. Character not in s: Word cannot be a subsequence
  5. Duplicate words: Count each occurrence separately

Common Mistakes

  1. Off-by-one in binary search: Searching for > lastPos vs >= lastPos
  2. Forgetting to clear bucket: In bucket method, must process and remove entries
  3. Using wrong comparison: positions[mid] > target not >=
  4. Not handling empty position list: Character might not exist in s

Interview Tips

  1. Start with brute force: Explain O(|s| × |words|) two-pointer approach
  2. Identify bottleneck: “We’re repeating work by scanning s for each word”
  3. Propose optimization: Either batch process (bucket) or preprocess (binary search)
  4. Trade-off discussion: Explain when each method is preferred
  5. Code one solution: Pick the one you’re more comfortable with


Summary

Key TakeawayDetails
PatternBatch processing OR preprocess + query
Bucket MethodEvent-driven, process all words in single s scan
Binary SearchPrecompute positions, O(log n) per character lookup
Binary Search TemplateFind first position > target, return left
Interview StrategyKnow both, implement your stronger one, discuss trade-offs

Both solutions demonstrate important algorithmic thinking:

  • Bucket: Avoid redundant work by processing in parallel
  • Binary Search: Trade preprocessing time for faster queries