Union-Find (Disjoint Set Union): Efficiently Handle Dynamic Connectivity

Union-Find (also known as Disjoint Set Union or DSU) is a data structure designed to handle dynamic connectivity problems efficiently. It can quickly determine whether two elements belong to the same set and merge two sets together.

Why Interviewers Love Union-Find

In FAANG interviews, Union-Find is a popular topic because it tests:

Data structure design: How to represent a tree structure with arrays
Optimization thinking: Path compression and union by rank techniques
Algorithm application: Recognizing when Union-Find can solve a problem
Complexity analysis: Understanding amortized complexity $\alpha(n)$

Common applications: Graph connectivity, Minimum Spanning Tree (Kruskal), Social networks, Equivalence class problems

Core Concepts: A Software Architecture Perspective

With your software architecture experience, think of Union-Find as an organizational hierarchy management system:

Company Org Chart Analogy:

CEO (root node)
├── CTO (intermediate node)
│   ├── Dev Manager
│   │   ├── Developer A
│   │   └── Developer B
│   └── QA Manager
│       └── Tester C
└── CFO
    └── Accountant D

Find(DeveloperA) = Find the CEO (top leader)
Union(CompanyA, CompanyB) = Merge two companies
IsConnected(A, B) = Are two people in the same company?

Core operations:

Find(x): Find the representative element (root) of x’s set
Union(x, y): Merge the sets containing x and y
IsConnected(x, y): Check if x and y are in the same set

Interactive Visualization

Watch how Union-Find builds connected components step by step. In this example, we have 6 nodes (friends in a social network) and need to process 5 friendship connections:

Social Network: Building Friend Groups

Goal: Connect 6 people through friendships. Dashed lines = pending connections, solid green = established friendships.

Step 0 of 16Connected Components: 6

Initialize 6 elements, each in its own set. parent[i] = i for all nodes.

Pending

Connected

Root

Current

Parent Array

0: 0

1: 1

2: 2

3: 3

4: 4

5: 5

Rank Array

0: 0

1: 0

2: 0

3: 0

4: 0

5: 0

Operations to Process

union(0, 1)

union(1, 2)

union(3, 4)

union(4, 5)

union(2, 4)

How to use this visualization:

Initial state: All 6 nodes are isolated (each person is their own friend group)
Dashed lines: Show connections we need to establish
Click “Next”: Watch each union operation connect nodes
Solid green lines: Friendships successfully established
Final result: Two friend groups merge into one large group

Data Structure Design

Core Data

parent[] array: parent[i] is the parent of node i
rank[]   array: rank[i] is the rank (approximate height) of tree rooted at i

Initialization

Initial state (n=5):

parent: [0, 1, 2, 3, 4]  ← Each element points to itself
rank:   [0, 0, 0, 0, 0]  ← Initial rank is 0

Logical structure:
  0    1    2    3    4   ← 5 independent sets (forest)

Two Key Optimizations

1. Path Compression

Problem: If tree degenerates into a linked list, Find becomes O(n)

Solution: During Find, make all nodes along the path point directly to root

Before path compression:        After path compression:
        0                              0
        |                           /|\ \
        1                          1 2 3 4
        |
        2
        |
        3
        |
        4

find(4) process:
4 → 3 → 2 → 1 → 0 (find root)
Then point 4, 3, 2, 1 all directly to 0

2. Union by Rank

Problem: Random merging can create very tall trees

Solution: Always attach the shorter tree under the taller one

Union(A, B):

Case 1: rank[A] < rank[B]
    A goes under B, rank unchanged

Case 2: rank[A] > rank[B]
    B goes under A, rank unchanged

Case 3: rank[A] == rank[B]
    B goes under A, rank[A]++

Why it works:
Keeps tree height as small as possible, ensuring Find is efficient

Code Template

Key Points:

Path compression is simplest with recursion: parent[x] = find(parent[x])
Union by rank is implemented in Union by comparing root ranks
count variable is optional, used to track number of components

Problem 1: Number of Provinces (LC 547)

Problem Statement

Given an n x n adjacency matrix isConnected where isConnected[i][j] = 1 means city i and city j are directly connected. Return the number of provinces (connected components).

Approach

This is a classic Union-Find application:

Traverse the adjacency matrix
If two cities are connected, Union them
Return the number of connected components

Example:
isConnected = [[1,1,0],
               [1,1,0],
               [0,0,1]]

Cities 0 and 1 are connected → Union(0, 1)
City 2 is independent

Result: 2 provinces

Code Implementation

Complexity Analysis

Time Complexity: $O(n^2 \cdot \alpha(n)) \approx O(n^2)$ $O (n^{2} \cdot α (n)) \approx O (n^{2})$
- Need to traverse entire adjacency matrix
- Each Union/Find operation is nearly O(1)
Space Complexity: $O(n)$ for Union-Find arrays

Problem 2: Redundant Connection (LC 684)

Problem Statement

A tree is a connected graph with no cycles. Given a tree with n nodes (numbered 1 to n) with one additional edge added. Find and return the redundant edge.

Approach

Key insight: When adding an edge, if both endpoints are already connected, that edge is redundant

Example: edges = [[1,2], [1,3], [2,3]]

Process [1,2]: 1 and 2 not connected → Union(1, 2)
Process [1,3]: 1 and 3 not connected → Union(1, 3)
Process [2,3]: 2 and 3 already connected → This is the redundant edge!

Redundant Connection Visualization

Step 0 of 10Connected Components: 4

Initialize 4 elements, each in its own set. parent[i] = i for all nodes.

Pending

Connected

Root

Current

Parent Array

0: 0

1: 1

2: 2

3: 3

Rank Array

0: 0

1: 0

2: 0

3: 0

Operations to Process

union(1, 2)

union(1, 3)

union(2, 3)

Code Implementation

Complexity Analysis

Time Complexity: $O(n \cdot \alpha(n)) \approx O(n)$
Space Complexity: $O(n)$

Interview Points

Process edges in order: Problem guarantees returning the last redundant edge
Nodes are 1-indexed: Union-Find size needs to be n+1
Possible follow-up: What if it’s a directed graph? (LC 685)

Problem 3: Accounts Merge (LC 721)

Problem Statement

Given a list of accounts where each account contains a name and several emails. If two accounts share the same email, they belong to the same person. Merge all accounts.

Approach

This is a classic equivalence class problem:

Same email = same person: When encountering the same email, Union the accounts
Build mapping: email → account index
Collect results: Find all emails for each connected component

Example:
accounts = [
  ["John", "john@mail.com", "john00@mail.com"],  // Account 0
  ["John", "johnnybravo@mail.com"],               // Account 1
  ["John", "john@mail.com", "john_newyork@mail.com"], // Account 2
  ["Mary", "mary@mail.com"]                       // Account 3
]

Account 0 and Account 2 both have "john@mail.com" → Union(0, 2)

Result:
- John: john00@mail.com, john@mail.com, john_newyork@mail.com
- John: johnnybravo@mail.com
- Mary: mary@mail.com

Code Implementation

Complexity Analysis

Let N = number of accounts, K = average emails per account:

Time Complexity: $O(NK \cdot \alpha(N) + NK \cdot \log(NK))$ $O (N K \cdot α (N) + N K \cdot lo g (N K))$
- Union operations: $O(NK \cdot \alpha(N))$
- Sorting emails: $O(NK \cdot \log(NK))$
Space Complexity: $O(NK)$

Problem 4: Longest Consecutive Sequence (LC 128)

Problem Statement

Given an unsorted array of integers nums, find the length of the longest consecutive elements sequence. Required O(n) time complexity.

Approach

Think of consecutive numbers as sets to be merged:

For each number num, traverse the array
Check if num-1 and num+1 exist
If they exist, Union them
Find the largest connected component

Example: nums = [100, 4, 200, 1, 3, 2]

Process 100: 99 and 101 don't exist
Process 4: 3 and 5 don't exist
Process 200: 199 and 201 don't exist
Process 1: 0 doesn't exist, 2 doesn't exist
Process 3: 2 not seen yet, 4 exists → Union(3, 4)
Process 2: 1 exists → Union(2, 1), 3 exists → Union(2, 3)

Finally {1, 2, 3, 4} are in same set, length = 4

Code Implementation

Complexity Analysis

Time Complexity: $O(n \cdot \alpha(n)) \approx O(n)$
Space Complexity: $O(n)$

Note: The HashSet solution is more elegant for this problem (only counting from sequence starts), but Union-Find demonstrates its versatility.

Deep Dive: Complexity Analysis

Time Complexity

Operation	No Optimization	Path Compression Only	Rank Only	Both
Find	O(n)	O(log n) amortized	O(log n)	O(α(n))
Union	O(n)	O(log n) amortized	O(log n)	O(α(n))

$\alpha(n)$ is the inverse Ackermann function, which grows extremely slowly:

α(1) = 1
α(4) = 2
α(65536) = 3
α(2^65536) = 4

For any practical input, α(n) ≤ 4
So it's effectively O(1)

Space Complexity

parent[] array: O(n)
rank[] array: O(n)
Total space: O(n)

Edge Cases & Common Mistakes

1. Node Indexing Issues

// ❌ Wrong: Nodes start at 1 but array size is only n
UnionFind uf = new UnionFind(n);
uf.union(1, 2); // May cause out of bounds!

// ✅ Correct: Allocate one extra space
UnionFind uf = new UnionFind(n + 1);

2. Forgetting to Check Connection

// ❌ Wrong: Directly union, can't detect cycles
uf.union(u, v);

// ✅ Correct: Check first
if (!uf.isConnected(u, v)) {
    uf.union(u, v);
}

3. Path Compression Implementation

// ❌ Wrong: Only compress one level
public int find(int x) {
    if (parent[x] != x) {
        parent[x] = parent[parent[x]]; // Only jump one level
    }
    return parent[x];
}

// ✅ Correct: Full compression
public int find(int x) {
    if (parent[x] != x) {
        parent[x] = find(parent[x]); // Recursively compress entire path
    }
    return parent[x];
}

When to Use Union-Find

✅ Good Use Cases

Checking graph connectivity
Counting connected components
Detecting cycles in undirected graphs
Dynamic connectivity queries (edges being added)
Equivalence class merging problems

❌ Not Suitable For

Need to delete edges (Union-Find doesn’t support efficient deletion)
Need to find shortest paths
Need to traverse entire connected components
Directed graph strongly connected components

Comparison with Other Algorithms

Scenario	Union-Find	DFS/BFS
Static graph connectivity	✅ Faster	✅ Works
Dynamic edge addition	✅ Best	❌ Need rebuild
Edge deletion	❌ Not supported	✅ Can rebuild
Find path	❌ Only knows if connected	✅ Can record path
Shortest path	❌	✅ BFS

Interview Tips

How to Explain to Interviewers

Draw diagrams: Show tree structure, demonstrate Union and Find
Emphasize optimizations: Proactively mention path compression and union by rank
Analyze complexity: Explain that α(n) is effectively O(1)

Common Follow-up Questions

Why rank instead of size?
- Both work; rank is an upper bound on tree height
- Using size (set size) also guarantees O(log n)
Can you support deletion?
- Standard Union-Find doesn’t support it
- Can use persistent Union-Find or offline processing
How to optimize space?
- If only need connectivity, can omit rank array
- But affects worst-case complexity

Practice Problems

Beginner

Intermediate

Advanced

LC 685: Redundant Connection II (Directed graph)
LC 778: Swim in Rising Water
LC 765: Couples Holding Hands

Summary

Union-Find = Forest: Use parent array to represent parent-child relationships
Both optimizations are essential: Path compression + union by rank → O(α(n))
Use cases: Dynamic connectivity, counting components, cycle detection
Interview essentials: Know the template, application scenarios, complexity analysis

Remember this mantra: Find compresses path, Union ranks and merges smart

Once you master Union-Find, you’ll discover many graph problems can be solved efficiently with it!