Union-Find (Disjoint Set Union): Efficiently Handle Dynamic Connectivity
Master the Union-Find data structure with path compression and union by rank optimizations. Includes interactive visualizations, LeetCode problems, and interview tips.
Union-Find (Disjoint Set Union): Efficiently Handle Dynamic Connectivity
Union-Find (also known as Disjoint Set Union or DSU) is a data structure designed to handle dynamic connectivity problems efficiently. It can quickly determine whether two elements belong to the same set and merge two sets together.
Why Interviewers Love Union-Find
In FAANG interviews, Union-Find is a popular topic because it tests:
- Data structure design: How to represent a tree structure with arrays
- Optimization thinking: Path compression and union by rank techniques
- Algorithm application: Recognizing when Union-Find can solve a problem
- Complexity analysis: Understanding amortized complexity
Common applications: Graph connectivity, Minimum Spanning Tree (Kruskal), Social networks, Equivalence class problems
Core Concepts: A Software Architecture Perspective
With your software architecture experience, think of Union-Find as an organizational hierarchy management system:
Company Org Chart Analogy:
CEO (root node)
├── CTO (intermediate node)
│ ├── Dev Manager
│ │ ├── Developer A
│ │ └── Developer B
│ └── QA Manager
│ └── Tester C
└── CFO
└── Accountant D
Find(DeveloperA) = Find the CEO (top leader)
Union(CompanyA, CompanyB) = Merge two companies
IsConnected(A, B) = Are two people in the same company?
Core operations:
- Find(x): Find the representative element (root) of x’s set
- Union(x, y): Merge the sets containing x and y
- IsConnected(x, y): Check if x and y are in the same set
Interactive Visualization
Watch how Union-Find builds connected components step by step. In this example, we have 6 nodes (friends in a social network) and need to process 5 friendship connections:
Social Network: Building Friend Groups
Goal: Connect 6 people through friendships. Dashed lines = pending connections, solid green = established friendships.
Initialize 6 elements, each in its own set. parent[i] = i for all nodes.
Parent Array
Rank Array
Operations to Process
How to use this visualization:
- Initial state: All 6 nodes are isolated (each person is their own friend group)
- Dashed lines: Show connections we need to establish
- Click “Next”: Watch each union operation connect nodes
- Solid green lines: Friendships successfully established
- Final result: Two friend groups merge into one large group
Data Structure Design
Core Data
parent[] array: parent[i] is the parent of node i
rank[] array: rank[i] is the rank (approximate height) of tree rooted at i
Initialization
Initial state (n=5):
parent: [0, 1, 2, 3, 4] ← Each element points to itself
rank: [0, 0, 0, 0, 0] ← Initial rank is 0
Logical structure:
0 1 2 3 4 ← 5 independent sets (forest)
Two Key Optimizations
1. Path Compression
Problem: If tree degenerates into a linked list, Find becomes O(n)
Solution: During Find, make all nodes along the path point directly to root
Before path compression: After path compression:
0 0
| /|\ \
1 1 2 3 4
|
2
|
3
|
4
find(4) process:
4 → 3 → 2 → 1 → 0 (find root)
Then point 4, 3, 2, 1 all directly to 0
2. Union by Rank
Problem: Random merging can create very tall trees
Solution: Always attach the shorter tree under the taller one
Union(A, B):
Case 1: rank[A] < rank[B]
A goes under B, rank unchanged
Case 2: rank[A] > rank[B]
B goes under A, rank unchanged
Case 3: rank[A] == rank[B]
B goes under A, rank[A]++
Why it works:
Keeps tree height as small as possible, ensuring Find is efficient
Code Template
Key Points:
- Path compression is simplest with recursion:
parent[x] = find(parent[x]) - Union by rank is implemented in
Unionby comparing root ranks countvariable is optional, used to track number of components
Problem 1: Number of Provinces (LC 547)
Problem Statement
Given an n x n adjacency matrix isConnected where isConnected[i][j] = 1 means city i and city j are directly connected. Return the number of provinces (connected components).
Approach
This is a classic Union-Find application:
- Traverse the adjacency matrix
- If two cities are connected, Union them
- Return the number of connected components
Example:
isConnected = [[1,1,0],
[1,1,0],
[0,0,1]]
Cities 0 and 1 are connected → Union(0, 1)
City 2 is independent
Result: 2 provinces
Code Implementation
Complexity Analysis
- Time Complexity:
- Need to traverse entire adjacency matrix
- Each Union/Find operation is nearly O(1)
- Space Complexity: for Union-Find arrays
Problem 2: Redundant Connection (LC 684)
Problem Statement
A tree is a connected graph with no cycles. Given a tree with n nodes (numbered 1 to n) with one additional edge added. Find and return the redundant edge.
Approach
Key insight: When adding an edge, if both endpoints are already connected, that edge is redundant
Example: edges = [[1,2], [1,3], [2,3]]
Process [1,2]: 1 and 2 not connected → Union(1, 2)
Process [1,3]: 1 and 3 not connected → Union(1, 3)
Process [2,3]: 2 and 3 already connected → This is the redundant edge!
Redundant Connection Visualization
Initialize 4 elements, each in its own set. parent[i] = i for all nodes.
Parent Array
Rank Array
Operations to Process
Code Implementation
Complexity Analysis
- Time Complexity:
- Space Complexity:
Interview Points
- Process edges in order: Problem guarantees returning the last redundant edge
- Nodes are 1-indexed: Union-Find size needs to be n+1
- Possible follow-up: What if it’s a directed graph? (LC 685)
Problem 3: Accounts Merge (LC 721)
Problem Statement
Given a list of accounts where each account contains a name and several emails. If two accounts share the same email, they belong to the same person. Merge all accounts.
Approach
This is a classic equivalence class problem:
- Same email = same person: When encountering the same email, Union the accounts
- Build mapping: email → account index
- Collect results: Find all emails for each connected component
Example:
accounts = [
["John", "john@mail.com", "john00@mail.com"], // Account 0
["John", "johnnybravo@mail.com"], // Account 1
["John", "john@mail.com", "john_newyork@mail.com"], // Account 2
["Mary", "mary@mail.com"] // Account 3
]
Account 0 and Account 2 both have "john@mail.com" → Union(0, 2)
Result:
- John: john00@mail.com, john@mail.com, john_newyork@mail.com
- John: johnnybravo@mail.com
- Mary: mary@mail.com
Code Implementation
Complexity Analysis
Let N = number of accounts, K = average emails per account:
- Time Complexity:
- Union operations:
- Sorting emails:
- Space Complexity:
Problem 4: Longest Consecutive Sequence (LC 128)
Problem Statement
Given an unsorted array of integers nums, find the length of the longest consecutive elements sequence. Required O(n) time complexity.
Approach
Think of consecutive numbers as sets to be merged:
- For each number
num, traverse the array - Check if
num-1andnum+1exist - If they exist, Union them
- Find the largest connected component
Example: nums = [100, 4, 200, 1, 3, 2]
Process 100: 99 and 101 don't exist
Process 4: 3 and 5 don't exist
Process 200: 199 and 201 don't exist
Process 1: 0 doesn't exist, 2 doesn't exist
Process 3: 2 not seen yet, 4 exists → Union(3, 4)
Process 2: 1 exists → Union(2, 1), 3 exists → Union(2, 3)
Finally {1, 2, 3, 4} are in same set, length = 4
Code Implementation
Complexity Analysis
- Time Complexity:
- Space Complexity:
Note: The HashSet solution is more elegant for this problem (only counting from sequence starts), but Union-Find demonstrates its versatility.
Deep Dive: Complexity Analysis
Time Complexity
| Operation | No Optimization | Path Compression Only | Rank Only | Both |
|---|---|---|---|---|
| Find | O(n) | O(log n) amortized | O(log n) | O(α(n)) |
| Union | O(n) | O(log n) amortized | O(log n) | O(α(n)) |
is the inverse Ackermann function, which grows extremely slowly:
α(1) = 1
α(4) = 2
α(65536) = 3
α(2^65536) = 4
For any practical input, α(n) ≤ 4
So it's effectively O(1)
Space Complexity
- parent[] array: O(n)
- rank[] array: O(n)
- Total space: O(n)
Edge Cases & Common Mistakes
1. Node Indexing Issues
// ❌ Wrong: Nodes start at 1 but array size is only n
UnionFind uf = new UnionFind(n);
uf.union(1, 2); // May cause out of bounds!
// ✅ Correct: Allocate one extra space
UnionFind uf = new UnionFind(n + 1);
2. Forgetting to Check Connection
// ❌ Wrong: Directly union, can't detect cycles
uf.union(u, v);
// ✅ Correct: Check first
if (!uf.isConnected(u, v)) {
uf.union(u, v);
}
3. Path Compression Implementation
// ❌ Wrong: Only compress one level
public int find(int x) {
if (parent[x] != x) {
parent[x] = parent[parent[x]]; // Only jump one level
}
return parent[x];
}
// ✅ Correct: Full compression
public int find(int x) {
if (parent[x] != x) {
parent[x] = find(parent[x]); // Recursively compress entire path
}
return parent[x];
}
When to Use Union-Find
✅ Good Use Cases
- Checking graph connectivity
- Counting connected components
- Detecting cycles in undirected graphs
- Dynamic connectivity queries (edges being added)
- Equivalence class merging problems
❌ Not Suitable For
- Need to delete edges (Union-Find doesn’t support efficient deletion)
- Need to find shortest paths
- Need to traverse entire connected components
- Directed graph strongly connected components
Comparison with Other Algorithms
| Scenario | Union-Find | DFS/BFS |
|---|---|---|
| Static graph connectivity | ✅ Faster | ✅ Works |
| Dynamic edge addition | ✅ Best | ❌ Need rebuild |
| Edge deletion | ❌ Not supported | ✅ Can rebuild |
| Find path | ❌ Only knows if connected | ✅ Can record path |
| Shortest path | ❌ | ✅ BFS |
Interview Tips
How to Explain to Interviewers
- Draw diagrams: Show tree structure, demonstrate Union and Find
- Emphasize optimizations: Proactively mention path compression and union by rank
- Analyze complexity: Explain that α(n) is effectively O(1)
Common Follow-up Questions
-
Why rank instead of size?
- Both work; rank is an upper bound on tree height
- Using size (set size) also guarantees O(log n)
-
Can you support deletion?
- Standard Union-Find doesn’t support it
- Can use persistent Union-Find or offline processing
-
How to optimize space?
- If only need connectivity, can omit rank array
- But affects worst-case complexity
Practice Problems
Beginner
Intermediate
- LC 721: Accounts Merge
- LC 128: Longest Consecutive Sequence
- LC 990: Satisfiability of Equality Equations
- LC 1319: Number of Operations to Make Network Connected
Advanced
- LC 685: Redundant Connection II (Directed graph)
- LC 778: Swim in Rising Water
- LC 765: Couples Holding Hands
Summary
- Union-Find = Forest: Use parent array to represent parent-child relationships
- Both optimizations are essential: Path compression + union by rank → O(α(n))
- Use cases: Dynamic connectivity, counting components, cycle detection
- Interview essentials: Know the template, application scenarios, complexity analysis
Remember this mantra: Find compresses path, Union ranks and merges smart
Once you master Union-Find, you’ll discover many graph problems can be solved efficiently with it!