English Jackey

Union-Find (Disjoint Set Union): Efficiently Handle Dynamic Connectivity

Master the Union-Find data structure with path compression and union by rank optimizations. Includes interactive visualizations, LeetCode problems, and interview tips.

#algorithm #union-find #data-structure #graph #visualization #leetcode #interview

Union-Find (Disjoint Set Union): Efficiently Handle Dynamic Connectivity

Union-Find (also known as Disjoint Set Union or DSU) is a data structure designed to handle dynamic connectivity problems efficiently. It can quickly determine whether two elements belong to the same set and merge two sets together.

Why Interviewers Love Union-Find

In FAANG interviews, Union-Find is a popular topic because it tests:

  1. Data structure design: How to represent a tree structure with arrays
  2. Optimization thinking: Path compression and union by rank techniques
  3. Algorithm application: Recognizing when Union-Find can solve a problem
  4. Complexity analysis: Understanding amortized complexity α(n)\alpha(n)

Common applications: Graph connectivity, Minimum Spanning Tree (Kruskal), Social networks, Equivalence class problems

Core Concepts: A Software Architecture Perspective

With your software architecture experience, think of Union-Find as an organizational hierarchy management system:

Company Org Chart Analogy:

CEO (root node)
├── CTO (intermediate node)
│   ├── Dev Manager
│   │   ├── Developer A
│   │   └── Developer B
│   └── QA Manager
│       └── Tester C
└── CFO
    └── Accountant D

Find(DeveloperA) = Find the CEO (top leader)
Union(CompanyA, CompanyB) = Merge two companies
IsConnected(A, B) = Are two people in the same company?

Core operations:

  • Find(x): Find the representative element (root) of x’s set
  • Union(x, y): Merge the sets containing x and y
  • IsConnected(x, y): Check if x and y are in the same set

Interactive Visualization

Watch how Union-Find builds connected components step by step. In this example, we have 6 nodes (friends in a social network) and need to process 5 friendship connections:

Social Network: Building Friend Groups

Goal: Connect 6 people through friendships. Dashed lines = pending connections, solid green = established friendships.

Step 0 of 16Connected Components: 6

Initialize 6 elements, each in its own set. parent[i] = i for all nodes.

012345
Pending
Connected
Root
Current

Parent Array

0: 0
1: 1
2: 2
3: 3
4: 4
5: 5

Rank Array

0: 0
1: 0
2: 0
3: 0
4: 0
5: 0

Operations to Process

union(0, 1)
union(1, 2)
union(3, 4)
union(4, 5)
union(2, 4)

How to use this visualization:

  1. Initial state: All 6 nodes are isolated (each person is their own friend group)
  2. Dashed lines: Show connections we need to establish
  3. Click “Next”: Watch each union operation connect nodes
  4. Solid green lines: Friendships successfully established
  5. Final result: Two friend groups merge into one large group

Data Structure Design

Core Data

parent[] array: parent[i] is the parent of node i
rank[]   array: rank[i] is the rank (approximate height) of tree rooted at i

Initialization

Initial state (n=5):

parent: [0, 1, 2, 3, 4]  ← Each element points to itself
rank:   [0, 0, 0, 0, 0]  ← Initial rank is 0

Logical structure:
  0    1    2    3    4   ← 5 independent sets (forest)

Two Key Optimizations

1. Path Compression

Problem: If tree degenerates into a linked list, Find becomes O(n)

Solution: During Find, make all nodes along the path point directly to root

Before path compression:        After path compression:
        0                              0
        |                           /|\ \
        1                          1 2 3 4
        |
        2
        |
        3
        |
        4

find(4) process:
4 → 3 → 2 → 1 → 0 (find root)
Then point 4, 3, 2, 1 all directly to 0

2. Union by Rank

Problem: Random merging can create very tall trees

Solution: Always attach the shorter tree under the taller one

Union(A, B):

Case 1: rank[A] < rank[B]
    A goes under B, rank unchanged

Case 2: rank[A] > rank[B]
    B goes under A, rank unchanged

Case 3: rank[A] == rank[B]
    B goes under A, rank[A]++

Why it works:
Keeps tree height as small as possible, ensuring Find is efficient

Code Template

Loading...

Key Points:

  1. Path compression is simplest with recursion: parent[x] = find(parent[x])
  2. Union by rank is implemented in Union by comparing root ranks
  3. count variable is optional, used to track number of components

Problem 1: Number of Provinces (LC 547)

Problem Statement

Given an n x n adjacency matrix isConnected where isConnected[i][j] = 1 means city i and city j are directly connected. Return the number of provinces (connected components).

Approach

This is a classic Union-Find application:

  1. Traverse the adjacency matrix
  2. If two cities are connected, Union them
  3. Return the number of connected components
Example:
isConnected = [[1,1,0],
               [1,1,0],
               [0,0,1]]

Cities 0 and 1 are connected → Union(0, 1)
City 2 is independent

Result: 2 provinces

Code Implementation

Loading...

Complexity Analysis

  • Time Complexity: O(n2α(n))O(n2)O(n^2 \cdot \alpha(n)) \approx O(n^2)
    • Need to traverse entire adjacency matrix
    • Each Union/Find operation is nearly O(1)
  • Space Complexity: O(n)O(n) for Union-Find arrays

Problem 2: Redundant Connection (LC 684)

Problem Statement

A tree is a connected graph with no cycles. Given a tree with n nodes (numbered 1 to n) with one additional edge added. Find and return the redundant edge.

Approach

Key insight: When adding an edge, if both endpoints are already connected, that edge is redundant

Example: edges = [[1,2], [1,3], [2,3]]

Process [1,2]: 1 and 2 not connected → Union(1, 2)
Process [1,3]: 1 and 3 not connected → Union(1, 3)
Process [2,3]: 2 and 3 already connected → This is the redundant edge!

Redundant Connection Visualization

Step 0 of 10Connected Components: 4

Initialize 4 elements, each in its own set. parent[i] = i for all nodes.

0123
Pending
Connected
Root
Current

Parent Array

0: 0
1: 1
2: 2
3: 3

Rank Array

0: 0
1: 0
2: 0
3: 0

Operations to Process

union(1, 2)
union(1, 3)
union(2, 3)

Code Implementation

Loading...

Complexity Analysis

  • Time Complexity: O(nα(n))O(n)O(n \cdot \alpha(n)) \approx O(n)
  • Space Complexity: O(n)O(n)

Interview Points

  1. Process edges in order: Problem guarantees returning the last redundant edge
  2. Nodes are 1-indexed: Union-Find size needs to be n+1
  3. Possible follow-up: What if it’s a directed graph? (LC 685)

Problem 3: Accounts Merge (LC 721)

Problem Statement

Given a list of accounts where each account contains a name and several emails. If two accounts share the same email, they belong to the same person. Merge all accounts.

Approach

This is a classic equivalence class problem:

  1. Same email = same person: When encountering the same email, Union the accounts
  2. Build mapping: email → account index
  3. Collect results: Find all emails for each connected component
Example:
accounts = [
  ["John", "john@mail.com", "john00@mail.com"],  // Account 0
  ["John", "johnnybravo@mail.com"],               // Account 1
  ["John", "john@mail.com", "john_newyork@mail.com"], // Account 2
  ["Mary", "mary@mail.com"]                       // Account 3
]

Account 0 and Account 2 both have "john@mail.com" → Union(0, 2)

Result:
- John: john00@mail.com, john@mail.com, john_newyork@mail.com
- John: johnnybravo@mail.com
- Mary: mary@mail.com

Code Implementation

Loading...

Complexity Analysis

Let N = number of accounts, K = average emails per account:

  • Time Complexity: O(NKα(N)+NKlog(NK))O(NK \cdot \alpha(N) + NK \cdot \log(NK))
    • Union operations: O(NKα(N))O(NK \cdot \alpha(N))
    • Sorting emails: O(NKlog(NK))O(NK \cdot \log(NK))
  • Space Complexity: O(NK)O(NK)

Problem 4: Longest Consecutive Sequence (LC 128)

Problem Statement

Given an unsorted array of integers nums, find the length of the longest consecutive elements sequence. Required O(n) time complexity.

Approach

Think of consecutive numbers as sets to be merged:

  1. For each number num, traverse the array
  2. Check if num-1 and num+1 exist
  3. If they exist, Union them
  4. Find the largest connected component
Example: nums = [100, 4, 200, 1, 3, 2]

Process 100: 99 and 101 don't exist
Process 4: 3 and 5 don't exist
Process 200: 199 and 201 don't exist
Process 1: 0 doesn't exist, 2 doesn't exist
Process 3: 2 not seen yet, 4 exists → Union(3, 4)
Process 2: 1 exists → Union(2, 1), 3 exists → Union(2, 3)

Finally {1, 2, 3, 4} are in same set, length = 4

Code Implementation

Loading...

Complexity Analysis

  • Time Complexity: O(nα(n))O(n)O(n \cdot \alpha(n)) \approx O(n)
  • Space Complexity: O(n)O(n)

Note: The HashSet solution is more elegant for this problem (only counting from sequence starts), but Union-Find demonstrates its versatility.


Deep Dive: Complexity Analysis

Time Complexity

OperationNo OptimizationPath Compression OnlyRank OnlyBoth
FindO(n)O(log n) amortizedO(log n)O(α(n))
UnionO(n)O(log n) amortizedO(log n)O(α(n))

α(n)\alpha(n) is the inverse Ackermann function, which grows extremely slowly:

α(1) = 1
α(4) = 2
α(65536) = 3
α(2^65536) = 4

For any practical input, α(n) ≤ 4
So it's effectively O(1)

Space Complexity

  • parent[] array: O(n)
  • rank[] array: O(n)
  • Total space: O(n)

Edge Cases & Common Mistakes

1. Node Indexing Issues

// ❌ Wrong: Nodes start at 1 but array size is only n
UnionFind uf = new UnionFind(n);
uf.union(1, 2); // May cause out of bounds!

// ✅ Correct: Allocate one extra space
UnionFind uf = new UnionFind(n + 1);

2. Forgetting to Check Connection

// ❌ Wrong: Directly union, can't detect cycles
uf.union(u, v);

// ✅ Correct: Check first
if (!uf.isConnected(u, v)) {
    uf.union(u, v);
}

3. Path Compression Implementation

// ❌ Wrong: Only compress one level
public int find(int x) {
    if (parent[x] != x) {
        parent[x] = parent[parent[x]]; // Only jump one level
    }
    return parent[x];
}

// ✅ Correct: Full compression
public int find(int x) {
    if (parent[x] != x) {
        parent[x] = find(parent[x]); // Recursively compress entire path
    }
    return parent[x];
}

When to Use Union-Find

✅ Good Use Cases

  • Checking graph connectivity
  • Counting connected components
  • Detecting cycles in undirected graphs
  • Dynamic connectivity queries (edges being added)
  • Equivalence class merging problems

❌ Not Suitable For

  • Need to delete edges (Union-Find doesn’t support efficient deletion)
  • Need to find shortest paths
  • Need to traverse entire connected components
  • Directed graph strongly connected components

Comparison with Other Algorithms

ScenarioUnion-FindDFS/BFS
Static graph connectivity✅ Faster✅ Works
Dynamic edge addition✅ Best❌ Need rebuild
Edge deletion❌ Not supported✅ Can rebuild
Find path❌ Only knows if connected✅ Can record path
Shortest path✅ BFS

Interview Tips

How to Explain to Interviewers

  1. Draw diagrams: Show tree structure, demonstrate Union and Find
  2. Emphasize optimizations: Proactively mention path compression and union by rank
  3. Analyze complexity: Explain that α(n) is effectively O(1)

Common Follow-up Questions

  1. Why rank instead of size?

    • Both work; rank is an upper bound on tree height
    • Using size (set size) also guarantees O(log n)
  2. Can you support deletion?

    • Standard Union-Find doesn’t support it
    • Can use persistent Union-Find or offline processing
  3. How to optimize space?

    • If only need connectivity, can omit rank array
    • But affects worst-case complexity

Practice Problems

Beginner

Intermediate

Advanced


Summary

  1. Union-Find = Forest: Use parent array to represent parent-child relationships
  2. Both optimizations are essential: Path compression + union by rank → O(α(n))
  3. Use cases: Dynamic connectivity, counting components, cycle detection
  4. Interview essentials: Know the template, application scenarios, complexity analysis

Remember this mantra: Find compresses path, Union ranks and merges smart


Once you master Union-Find, you’ll discover many graph problems can be solved efficiently with it!