Graph Data Structure - Complete Guide with DFS & BFS Traversal
Master graph data structures with interactive visualizations. Learn adjacency list vs matrix representations, implement DFS and BFS traversals in Java, Python, JavaScript. Complete guide for coding interviews at FAANG and top tech companies.
What is a Graph?
A graph is a fundamental data structure consisting of:
- Vertices (Nodes): The entities or points in the graph
- Edges: Connections between vertices
Graphs are used to model relationships, networks, hierarchies, and paths. They’re essential for solving problems like:
- Social network connections (Facebook friends)
- Road networks and navigation (Google Maps)
- Web page links (Google PageRank)
- Dependency resolution (npm packages)
- Recommendation systems (Netflix, Amazon)
Why Interviewers Love Graphs
Graph problems appear in 40-50% of FAANG interviews because they test:
- Multiple algorithmic patterns: DFS, BFS, shortest path, cycle detection
- Data structure design: Choosing the right representation
- Complex problem-solving: Real-world applications
- Code organization: Clean traversal logic
Graph Types
Directed vs Undirected
Undirected: Directed:
A --- B A ──> B
| | | ↑
C --- D └──> C
- Undirected: Edges have no direction (friendships on Facebook)
- Directed: Edges have direction (Twitter follows)
Weighted vs Unweighted
Weighted: Unweighted:
A --5-- B A --- B
| | | |
3 2 C --- D
| |
C --1-- D
- Weighted: Edges have values (road distances)
- Unweighted: All edges equal (connections exist or not)
Cyclic vs Acyclic
Cyclic: Acyclic (DAG):
A → B A → B
↓ ↓ ↓ ↓
C ← D C → D
- Cyclic: Contains cycles
- Acyclic: No cycles (DAG = Directed Acyclic Graph)
Graph Representations
There are two main ways to represent graphs: Adjacency List and Adjacency Matrix.
Comparison Table
| Feature | Adjacency List | Adjacency Matrix |
|---|---|---|
| Space | ||
| Check edge | ||
| Find neighbors | ||
| Add edge | ||
| Best for | Sparse graphs | Dense graphs, quick edge lookup |
| Interview use | ✅ 90% of problems | Rare (specific use cases) |
When to Use Each
Use Adjacency List when:
- Graph is sparse (few edges):
- Need to iterate through neighbors frequently
- This is the default for 90% of interview problems
Use Adjacency Matrix when:
- Graph is dense:
- Need edge existence checks
- Working with weighted graphs with many operations
- Problem explicitly requires it
Adjacency List Implementation
Most common representation in interviews!
Adjacency List Visualization
Graph with edges: (0,1), (0,2), (1,3), (1,4), (2,4)
Adjacency List:
0: [1, 2]
1: [0, 3, 4]
2: [0, 4]
3: [1]
4: [1, 2]
Visual:
0
/ \
1 2
/ \ /
3 4
Adjacency Matrix Implementation
Less common, but useful for dense graphs.
Adjacency Matrix Visualization
Same graph as above:
Adjacency Matrix (5×5):
0 1 2 3 4
0 [0, 1, 1, 0, 0]
1 [1, 0, 0, 1, 1]
2 [1, 0, 0, 0, 1]
3 [0, 1, 0, 0, 0]
4 [0, 1, 1, 0, 0]
1 means edge exists, 0 means no edge
For weighted graphs, use weight instead of 1
Graph DFS (Depth-First Search)
DFS explores as far as possible along each branch before backtracking.
DFS Pattern
1. Start at a node
2. Mark it visited
3. Recursively visit an unvisited neighbor
4. Backtrack when no unvisited neighbors
5. Repeat until all reachable nodes visited
Interactive DFS Visualization
DFS Visualization
Adjacency List
Stack
Visited
DFS Implementation
DFS Traversal Example
Graph: DFS from 0:
0 1. Visit: 0
/|\ 2. Visit: 1 (first edge from 0)
1 2 3 3. Visit: 4 (from 1)
|\ /| 4. Try 5 (from 4)
| 4 | 5. Visit: 5
| \| 6. Backtrack to 1 (no more neighbors)
\-5 7. Backtrack to 0
8. Visit: 2 (second edge from 0)
9. Try 4 (from 2) - ALREADY VISITED! Skip
10. Backtrack to 0
11. Visit: 3 (third edge from 0)
12. Try 5 (from 3) - ALREADY VISITED! Skip
Order: 0 → 1 → 4 → 5 → 2 → 3
Key DFS behaviors shown:
✓ Node 0 has 3 edges - explores each branch completely
✓ Node 4 has 2 incoming edges (1→4, 2→4) - visited check prevents revisiting
✓ Node 5 has 2 incoming edges (4→5, 3→5) - visited check prevents revisiting
✓ Backtracking after exhausting each branch
Stack state:
0: [0]
1: [0, 1] (go to 1)
2: [0, 1, 3] (go to 3)
3: [0, 1] (backtrack from 3)
4: [0, 1, 4] (go to 4)
5: [0, 1, 4, 5] (go to 5)
6: [0, 1, 4] (backtrack from 5)
...
Graph BFS (Breadth-First Search)
BFS explores all neighbors at the current depth before moving deeper.
BFS Pattern
1. Start at a node, add to queue
2. While queue not empty:
a. Dequeue a node
b. Mark it visited
c. Enqueue all unvisited neighbors
3. Repeat until queue empty
Interactive BFS Visualization
BFS Visualization
Adjacency List
Queue
Visited
BFS Implementation
BFS Traversal Example
Graph: BFS from 0:
0 Level 0: 0
/ \ Level 1: 1, 2
1 2 Level 2: 3, 4
/| | Level 3: 5
3 4 5
| |
\-/
Order: 0 → 1 → 2 → 3 → 4 → 5
Queue state:
0: [0]
1: [1, 2] (dequeued 0, enqueued 1, 2)
2: [2, 3, 4] (dequeued 1, enqueued 3, 4)
3: [3, 4] (dequeued 2, 4 already in queue)
4: [4, 5] (dequeued 3, enqueued 5)
5: [5] (dequeued 4, 5 already in queue)
6: [] (dequeued 5)
DFS vs BFS Comparison
| Aspect | DFS | BFS |
|---|---|---|
| Data Structure | Stack (recursion or explicit) | Queue |
| Order | Go deep first | Go wide first |
| Space | (height) | (max width) |
| Time | ||
| Use for | Paths, cycles, backtracking | Shortest path, levels |
| Interview frequency | ⭐⭐⭐⭐⭐ Very common | ⭐⭐⭐⭐⭐ Very common |
Common Graph Problems
Problem 1: Number of Islands (LC 200)
Problem: Given a 2D grid of ‘1’s (land) and ‘0’s (water), count islands.
Pattern: DFS/BFS on implicit graph (grid as graph)
Complexity: time, space (recursion stack)
Why is Time Complexity O(M × N)? Detailed Analysis
At first glance, this might seem confusing:
- The outer double loop visits all cells:
- Each DFS call might visit many cells
- Does this mean ? No! ❌
Key Insight: Each cell is visited at most once!
The Correct Analysis
1. Outer Loop Cost
for (int i = 0; i < M; i++) { // M iterations
for (int j = 0; j < N; j++) { // N iterations
if (grid[i][j] == '1') {
dfs(grid, i, j);
}
}
}
This loop visits all cells:
2. Total DFS Calls (Amortized Analysis)
The key is to analyze total DFS calls across the entire execution:
private void dfs(char[][] grid, int i, int j) {
if (grid[i][j] == '0') {
return; // ← Already visited, return immediately
}
grid[i][j] = '0'; // ← Mark as visited (critical!)
dfs(grid, i + 1, j);
dfs(grid, i - 1, j);
dfs(grid, i, j + 1);
dfs(grid, i, j - 1);
}
Critical observations:
- When we visit a cell, we immediately mark it as
'0' - Once marked, it will never be visited again
- Each cell is processed exactly once by DFS
3. Concrete Example
Consider a 3×4 grid (M=3, N=4):
[
['1', '1', '0', '0'],
['1', '0', '0', '1'],
['0', '0', '1', '1']
]
Execution trace:
Step 1: Scan reaches grid[0][0] = '1'
→ Call dfs(0,0)
→ DFS visits: (0,0) → (0,1) → (1,0)
→ All marked as '0'
→ Total: 3 cells visited
Step 2: Continue scan, grid[0][1] now '0', skip
Step 3: Continue scan, grid[1][0] now '0', skip
Step 4: Scan reaches grid[1][3] = '1'
→ Call dfs(1,3)
→ DFS visits: (1,3)
→ Marked as '0'
→ Total: 1 cell visited
Step 5: Scan reaches grid[2][2] = '1'
→ Call dfs(2,2)
→ DFS visits: (2,2) → (2,3)
→ Marked as '0'
→ Total: 2 cells visited
Statistics:
- Outer loop iterations: 12 cells (3×4)
- Total DFS visits: 3 + 1 + 2 = 6 cells (exactly the number of ‘1’s)
- Each cell visited exactly once by DFS
4. Time Complexity Formula
Total Time = Outer Loop Time + All DFS Calls Time
Outer Loop:
- Visit M×N cells
- Each check: O(1)
- Total: O(M×N)
All DFS Calls (across entire execution):
- Each cell visited at most once
- Each visit: O(1) work (mark + 4 recursive calls)
- Total: O(M×N)
Total Time = O(M×N) + O(M×N) = O(M×N)
Why NOT ?
❌ Wrong thinking: “Each DFS might visit cells, and we call DFS times”
✅ Correct thinking: “Although we may initiate multiple DFS calls, all DFS calls combined visit each cell at most once”
Analogy: It’s like painting an array:
- Outer loop scans all elements:
- When we find an unpainted element, paint it and adjacent ones
- Multiple “paint operations”, but each element painted once
- Total time: , not
Space Complexity: (worst case)
Worst case: entire grid is ‘1’ (one giant island)
[
['1', '1', '1'],
['1', '1', '1'],
['1', '1', '1']
]
DFS recursion depth can reach in a snake-like path from top-left to bottom-right.
Summary:
- ✅ Each cell marked at most once
- ✅ Marked cells never revisited
- ✅ Amortized analysis: all DFS calls visit each cell once
- ✅ Linear time, not quadratic!
Alternative Solution: BFS Approach
While DFS is intuitive for this problem, BFS also works perfectly and may be easier to understand for some people.
Key Differences:
- DFS: Uses recursion (implicit stack), explores deep into one island
- BFS: Uses explicit queue, explores island level by level
BFS Execution Example:
Grid: Execution:
['1', '1', '0'] 1. Find '1' at (0,0), count=1
['1', '0', '0'] 2. BFS from (0,0):
['0', '0', '1'] Queue: [(0,0)]
Process (0,0), mark as '0', add neighbors
Queue: [(0,1), (1,0)]
Process (0,1), mark as '0'
Queue: [(1,0)]
Process (1,0), mark as '0'
Queue: [] (BFS complete)
3. Continue scanning...
4. Find '1' at (2,2), count=2
5. BFS from (2,2):
Queue: [(2,2)]
Process (2,2), mark as '0'
Queue: [] (BFS complete)
Result: 2 islands
DFS vs BFS for Number of Islands:
| Aspect | DFS | BFS |
|---|---|---|
| Implementation | Simpler (recursion) | Slightly more code (queue) |
| Memory | worst case (recursion) | typical |
| Order | Go deep (zigzag) | Level by level (expanding) |
| Stack overflow risk | Yes (very large grids) | No |
| Interview preference | ⭐⭐⭐⭐⭐ More common | ⭐⭐⭐⭐ Also good |
When to use BFS:
- Large grids where DFS might cause stack overflow
- Need to find shortest path (though not needed here)
- Prefer iterative solutions over recursion
When to use DFS:
- Simpler implementation preferred
- Recursion is natural for the problem
- Grid size is reasonable (< 1000×1000)
Both have same time complexity:
Key BFS insight:
- Mark cells as visited immediately when adding to queue (not when processing)
- This prevents adding same cell multiple times to queue
- Without this, cells could be added 4 times (from 4 neighbors)!
Problem 2: Clone Graph (LC 133)
Problem: Deep copy an undirected graph.
Pattern: DFS/BFS with hashmap for tracking clones
Complexity: time, space
Problem 3: Course Schedule (LC 207)
Problem: Detect if courses can be finished (cycle detection in directed graph).
Pattern: DFS cycle detection or topological sort
Complexity: time, space
Time & Space Complexity Summary
Representations
| Operation | Adjacency List | Adjacency Matrix |
|---|---|---|
| Space | ||
| Check edge | ||
| Find neighbors | ||
| Add edge |
Traversals
| Algorithm | Time | Space | Best For |
|---|---|---|---|
| DFS | Paths, cycles, connectivity | ||
| BFS | Shortest path, levels |
Where:
- = number of vertices
- = number of edges
- = height of graph (DFS stack depth)
- = maximum width (max nodes in BFS queue)
Interview Tips & Common Mistakes
✅ Best Practices
-
Always clarify the graph type:
- Directed or undirected?
- Weighted or unweighted?
- Can it have cycles?
-
Choose the right representation:
- Default to adjacency list (90% of cases)
- Use matrix only if explicitly beneficial
-
DFS vs BFS decision:
- Use BFS for shortest path, level-order
- Use DFS for paths, cycles, backtracking
-
Track visited nodes:
- Use
Set<Integer>orboolean[] - Mark visited before adding to queue (BFS)
- Mark visited when entering node (DFS)
- Use
❌ Common Mistakes
- Forgetting to mark nodes as visited → infinite loops
- Marking visited too late in BFS → nodes added multiple times
- Not handling disconnected components → missing parts of graph
- Using wrong representation → inefficient solution
- Not checking for null/empty graph → runtime errors
Interview Strategy
When you see a graph problem:
1. Clarify: directed? weighted? cycles?
2. Choose representation (usually adjacency list)
3. Identify pattern:
- Connectivity → DFS
- Shortest path → BFS
- Cycle detection → DFS with states
- Topological sort → DFS or BFS (Kahn's)
4. Implement with visited tracking
5. Test edge cases: empty, single node, disconnected
Common Graph Patterns
Pattern 1: Graph Traversal
When: Need to visit all nodes, check connectivity Use: DFS or BFS Examples: Number of Islands, Connected Components
Pattern 2: Shortest Path
When: Find minimum distance/steps Use: BFS (unweighted), Dijkstra (weighted) Examples: Word Ladder, Shortest Bridge
Pattern 3: Cycle Detection
When: Check for circular dependencies Use: DFS with 3 states (unvisited, visiting, visited) Examples: Course Schedule, Detect Cycle
Pattern 4: Topological Sort
When: Order tasks with dependencies Use: DFS (post-order) or BFS (Kahn’s algorithm) Examples: Course Schedule II, Alien Dictionary
Pattern 5: Union-Find
When: Dynamic connectivity, grouping Use: Union-Find (Disjoint Set Union) Examples: Number of Provinces, Redundant Connection
When NOT to Use Graphs
- Simple linear traversal → Use array/list
- Hierarchical data with clear parent-child → Use tree
- Key-value lookups → Use hash map
- Sequential processing → Use queue/stack
Key Takeaways for Interviews
- Adjacency list is the default representation (90% of problems)
- DFS = Stack, BFS = Queue (memorize both templates)
- Always track visited nodes to avoid infinite loops
- BFS finds shortest path in unweighted graphs
- DFS is better for paths, cycles, and backtracking
- Time complexity is for both traversals
- Space complexity:
- DFS: (height/depth)
- BFS: (max width)
Final Summary
Graphs are fundamental data structures that model relationships and connections. Mastering graphs requires:
- Understanding representations: Adjacency list (default) vs matrix
- Knowing traversal algorithms: DFS (depth-first) and BFS (breadth-first)
- Recognizing patterns: Connectivity, shortest path, cycles, topological sort
- Practice common problems: Start with LC 200, 133, 207
Remember: Most graph problems in interviews use adjacency lists and either DFS or BFS. Get comfortable with both traversal patterns, and you’ll handle 90% of graph questions confidently!
Next steps:
- Practice basic traversal problems
- Learn advanced algorithms: Dijkstra, Union-Find, Topological Sort
- Study graph-specific patterns: bipartite, strongly connected components
Happy coding! 🚀