English Walking in Code

Graph Data Structure - Complete Guide with DFS & BFS Traversal

Master graph data structures with interactive visualizations. Learn adjacency list vs matrix representations, implement DFS and BFS traversals in Java, Python, JavaScript. Complete guide for coding interviews at FAANG and top tech companies.

#algorithm #graph #dfs #bfs #data-structure #visualization #interview #leetcode

What is a Graph?

A graph is a fundamental data structure consisting of:

  • Vertices (Nodes): The entities or points in the graph
  • Edges: Connections between vertices

Graphs are used to model relationships, networks, hierarchies, and paths. They’re essential for solving problems like:

  • Social network connections (Facebook friends)
  • Road networks and navigation (Google Maps)
  • Web page links (Google PageRank)
  • Dependency resolution (npm packages)
  • Recommendation systems (Netflix, Amazon)

Why Interviewers Love Graphs

Graph problems appear in 40-50% of FAANG interviews because they test:

  1. Multiple algorithmic patterns: DFS, BFS, shortest path, cycle detection
  2. Data structure design: Choosing the right representation
  3. Complex problem-solving: Real-world applications
  4. Code organization: Clean traversal logic

Graph Types

Directed vs Undirected

Undirected:          Directed:
   A --- B              A ──> B
   |     |              |     ↑
   C --- D              └──> C
  • Undirected: Edges have no direction (friendships on Facebook)
  • Directed: Edges have direction (Twitter follows)

Weighted vs Unweighted

Weighted:            Unweighted:
   A --5-- B            A --- B
   |       |            |     |
   3       2            C --- D
   |       |
   C --1-- D
  • Weighted: Edges have values (road distances)
  • Unweighted: All edges equal (connections exist or not)

Cyclic vs Acyclic

Cyclic:              Acyclic (DAG):
   A → B                A → B
   ↓   ↓                ↓   ↓
   C ← D                C → D
  • Cyclic: Contains cycles
  • Acyclic: No cycles (DAG = Directed Acyclic Graph)

Graph Representations

There are two main ways to represent graphs: Adjacency List and Adjacency Matrix.

Comparison Table

FeatureAdjacency ListAdjacency Matrix
SpaceO(V+E)O(V + E)O(V2)O(V^2)
Check edgeO(degree)O(\text{degree})O(1)O(1)
Find neighborsO(degree)O(\text{degree})O(V)O(V)
Add edgeO(1)O(1)O(1)O(1)
Best forSparse graphsDense graphs, quick edge lookup
Interview use90% of problemsRare (specific use cases)

When to Use Each

Use Adjacency List when:

  • Graph is sparse (few edges): EV2E \ll V^2
  • Need to iterate through neighbors frequently
  • This is the default for 90% of interview problems

Use Adjacency Matrix when:

  • Graph is dense: EV2E \approx V^2
  • Need O(1)O(1) edge existence checks
  • Working with weighted graphs with many operations
  • Problem explicitly requires it

Adjacency List Implementation

Most common representation in interviews!

Loading...

Adjacency List Visualization

Graph with edges: (0,1), (0,2), (1,3), (1,4), (2,4)

Adjacency List:
  0: [1, 2]
  1: [0, 3, 4]
  2: [0, 4]
  3: [1]
  4: [1, 2]

Visual:
    0
   / \
  1   2
 / \ /
3   4

Adjacency Matrix Implementation

Less common, but useful for dense graphs.

Loading...

Adjacency Matrix Visualization

Same graph as above:

Adjacency Matrix (5×5):
     0  1  2  3  4
  0 [0, 1, 1, 0, 0]
  1 [1, 0, 0, 1, 1]
  2 [1, 0, 0, 0, 1]
  3 [0, 1, 0, 0, 0]
  4 [0, 1, 1, 0, 0]

1 means edge exists, 0 means no edge
For weighted graphs, use weight instead of 1

DFS explores as far as possible along each branch before backtracking.

DFS Pattern

1. Start at a node
2. Mark it visited
3. Recursively visit an unvisited neighbor
4. Backtrack when no unvisited neighbors
5. Repeat until all reachable nodes visited

Interactive DFS Visualization

DFS Visualization

Step 0 of 12Initialize DFS from node 0
012345
Current
Visited
In Path

Adjacency List

0: [1, 2, 3]
1: [0, 4]
2: [0, 4]
3: [0, 5]
4: [1, 2, 5]
5: [3, 4]

Stack

[0]

Visited

{}

DFS Implementation

Loading...

DFS Traversal Example

Graph:       DFS from 0:
    0            1. Visit: 0
   /|\           2. Visit: 1 (first edge from 0)
  1 2 3          3. Visit: 4 (from 1)
  |\ /|          4. Try 5 (from 4)
  | 4 |          5. Visit: 5
  |  \|          6. Backtrack to 1 (no more neighbors)
   \-5           7. Backtrack to 0
                 8. Visit: 2 (second edge from 0)
                 9. Try 4 (from 2) - ALREADY VISITED! Skip
                 10. Backtrack to 0
                 11. Visit: 3 (third edge from 0)
                 12. Try 5 (from 3) - ALREADY VISITED! Skip

Order: 0 → 1 → 4 → 5 → 2 → 3

Key DFS behaviors shown:
✓ Node 0 has 3 edges - explores each branch completely
✓ Node 4 has 2 incoming edges (1→4, 2→4) - visited check prevents revisiting
✓ Node 5 has 2 incoming edges (4→5, 3→5) - visited check prevents revisiting
✓ Backtracking after exhausting each branch

Stack state:
0:  [0]
1:  [0, 1]      (go to 1)
2:  [0, 1, 3]   (go to 3)
3:  [0, 1]      (backtrack from 3)
4:  [0, 1, 4]   (go to 4)
5:  [0, 1, 4, 5] (go to 5)
6:  [0, 1, 4]   (backtrack from 5)
...

BFS explores all neighbors at the current depth before moving deeper.

BFS Pattern

1. Start at a node, add to queue
2. While queue not empty:
   a. Dequeue a node
   b. Mark it visited
   c. Enqueue all unvisited neighbors
3. Repeat until queue empty

Interactive BFS Visualization

BFS Visualization

Step 0 of 12Initialize BFS from node 0
012345
Current
Visited
In Path

Adjacency List

0: [1, 2]
1: [0, 3, 4]
2: [0, 4]
3: [1, 5]
4: [1, 2, 5]
5: [3, 4]

Queue

[0]

Visited

{0}

BFS Implementation

Loading...

BFS Traversal Example

Graph:       BFS from 0:
    0            Level 0: 0
   / \           Level 1: 1, 2
  1   2          Level 2: 3, 4
 /|   |          Level 3: 5
3 4   5
  |   |
   \-/

Order: 0 → 1 → 2 → 3 → 4 → 5

Queue state:
0:  [0]
1:  [1, 2]      (dequeued 0, enqueued 1, 2)
2:  [2, 3, 4]   (dequeued 1, enqueued 3, 4)
3:  [3, 4]      (dequeued 2, 4 already in queue)
4:  [4, 5]      (dequeued 3, enqueued 5)
5:  [5]         (dequeued 4, 5 already in queue)
6:  []          (dequeued 5)

DFS vs BFS Comparison

AspectDFSBFS
Data StructureStack (recursion or explicit)Queue
OrderGo deep firstGo wide first
SpaceO(H)O(H) (height)O(W)O(W) (max width)
TimeO(V+E)O(V + E)O(V+E)O(V + E)
Use forPaths, cycles, backtrackingShortest path, levels
Interview frequency⭐⭐⭐⭐⭐ Very common⭐⭐⭐⭐⭐ Very common

Common Graph Problems

Problem 1: Number of Islands (LC 200)

Problem: Given a 2D grid of ‘1’s (land) and ‘0’s (water), count islands.

Pattern: DFS/BFS on implicit graph (grid as graph)

Loading...

Complexity: O(M×N)O(M \times N) time, O(M×N)O(M \times N) space (recursion stack)

Why is Time Complexity O(M × N)? Detailed Analysis

At first glance, this might seem confusing:

  • The outer double loop visits all cells: O(M×N)O(M \times N)
  • Each DFS call might visit many cells
  • Does this mean O((M×N)2)O((M \times N)^2)? No!

Key Insight: Each cell is visited at most once!

The Correct Analysis

1. Outer Loop Cost

for (int i = 0; i < M; i++) {           // M iterations
    for (int j = 0; j < N; j++) {       // N iterations
        if (grid[i][j] == '1') {
            dfs(grid, i, j);
        }
    }
}

This loop visits all M×NM \times N cells: O(M×N)O(M \times N)

2. Total DFS Calls (Amortized Analysis)

The key is to analyze total DFS calls across the entire execution:

private void dfs(char[][] grid, int i, int j) {
    if (grid[i][j] == '0') {
        return;  // ← Already visited, return immediately
    }
    
    grid[i][j] = '0'; // ← Mark as visited (critical!)
    
    dfs(grid, i + 1, j);
    dfs(grid, i - 1, j);
    dfs(grid, i, j + 1);
    dfs(grid, i, j - 1);
}

Critical observations:

  • When we visit a cell, we immediately mark it as '0'
  • Once marked, it will never be visited again
  • Each cell is processed exactly once by DFS

3. Concrete Example

Consider a 3×4 grid (M=3, N=4):

[
  ['1', '1', '0', '0'],
  ['1', '0', '0', '1'],
  ['0', '0', '1', '1']
]

Execution trace:

Step 1: Scan reaches grid[0][0] = '1'
  → Call dfs(0,0)
  → DFS visits: (0,0) → (0,1) → (1,0)
  → All marked as '0'
  → Total: 3 cells visited

Step 2: Continue scan, grid[0][1] now '0', skip
Step 3: Continue scan, grid[1][0] now '0', skip
Step 4: Scan reaches grid[1][3] = '1'
  → Call dfs(1,3)
  → DFS visits: (1,3)
  → Marked as '0'
  → Total: 1 cell visited

Step 5: Scan reaches grid[2][2] = '1'
  → Call dfs(2,2)
  → DFS visits: (2,2) → (2,3)
  → Marked as '0'
  → Total: 2 cells visited

Statistics:

  • Outer loop iterations: 12 cells (3×4)
  • Total DFS visits: 3 + 1 + 2 = 6 cells (exactly the number of ‘1’s)
  • Each cell visited exactly once by DFS

4. Time Complexity Formula

Total Time = Outer Loop Time + All DFS Calls Time

Outer Loop:
  - Visit M×N cells
  - Each check: O(1)
  - Total: O(M×N)

All DFS Calls (across entire execution):
  - Each cell visited at most once
  - Each visit: O(1) work (mark + 4 recursive calls)
  - Total: O(M×N)

Total Time = O(M×N) + O(M×N) = O(M×N)

Why NOT O((M×N)2)O((M \times N)^2)?

Wrong thinking: “Each DFS might visit O(M×N)O(M \times N) cells, and we call DFS M×NM \times N times”

Correct thinking: “Although we may initiate multiple DFS calls, all DFS calls combined visit each cell at most once

Analogy: It’s like painting an array:

  • Outer loop scans all elements: O(N)O(N)
  • When we find an unpainted element, paint it and adjacent ones
  • Multiple “paint operations”, but each element painted once
  • Total time: O(N)O(N), not O(N2)O(N^2)

Space Complexity: O(M×N)O(M \times N) (worst case)

Worst case: entire grid is ‘1’ (one giant island)

[
  ['1', '1', '1'],
  ['1', '1', '1'],
  ['1', '1', '1']
]

DFS recursion depth can reach O(M×N)O(M \times N) in a snake-like path from top-left to bottom-right.

Summary:

  • ✅ Each cell marked at most once
  • ✅ Marked cells never revisited
  • ✅ Amortized analysis: all DFS calls visit each cell once
  • ✅ Linear time, not quadratic!

Alternative Solution: BFS Approach

While DFS is intuitive for this problem, BFS also works perfectly and may be easier to understand for some people.

Key Differences:

  • DFS: Uses recursion (implicit stack), explores deep into one island
  • BFS: Uses explicit queue, explores island level by level
Loading...

BFS Execution Example:

Grid:                   Execution:
['1', '1', '0']         1. Find '1' at (0,0), count=1
['1', '0', '0']         2. BFS from (0,0):
['0', '0', '1']            Queue: [(0,0)]
                           Process (0,0), mark as '0', add neighbors
                           Queue: [(0,1), (1,0)]
                           Process (0,1), mark as '0'
                           Queue: [(1,0)]
                           Process (1,0), mark as '0'
                           Queue: []  (BFS complete)
                        3. Continue scanning...
                        4. Find '1' at (2,2), count=2
                        5. BFS from (2,2):
                           Queue: [(2,2)]
                           Process (2,2), mark as '0'
                           Queue: []  (BFS complete)
                        
Result: 2 islands

DFS vs BFS for Number of Islands:

AspectDFSBFS
ImplementationSimpler (recursion)Slightly more code (queue)
MemoryO(M×N)O(M \times N) worst case (recursion)O(min(M,N))O(\min(M, N)) typical
OrderGo deep (zigzag)Level by level (expanding)
Stack overflow riskYes (very large grids)No
Interview preference⭐⭐⭐⭐⭐ More common⭐⭐⭐⭐ Also good

When to use BFS:

  • Large grids where DFS might cause stack overflow
  • Need to find shortest path (though not needed here)
  • Prefer iterative solutions over recursion

When to use DFS:

  • Simpler implementation preferred
  • Recursion is natural for the problem
  • Grid size is reasonable (< 1000×1000)

Both have same time complexity: O(M×N)O(M \times N)

Key BFS insight:

  • Mark cells as visited immediately when adding to queue (not when processing)
  • This prevents adding same cell multiple times to queue
  • Without this, cells could be added 4 times (from 4 neighbors)!

Problem 2: Clone Graph (LC 133)

Problem: Deep copy an undirected graph.

Pattern: DFS/BFS with hashmap for tracking clones

Loading...

Complexity: O(V+E)O(V + E) time, O(V)O(V) space

Problem 3: Course Schedule (LC 207)

Problem: Detect if courses can be finished (cycle detection in directed graph).

Pattern: DFS cycle detection or topological sort

Loading...

Complexity: O(V+E)O(V + E) time, O(V)O(V) space

Time & Space Complexity Summary

Representations

OperationAdjacency ListAdjacency Matrix
SpaceO(V+E)O(V + E)O(V2)O(V^2)
Check edgeO(degree)O(\text{degree})O(1)O(1)
Find neighborsO(degree)O(\text{degree})O(V)O(V)
Add edgeO(1)O(1)O(1)O(1)

Traversals

AlgorithmTimeSpaceBest For
DFSO(V+E)O(V + E)O(H)O(H)Paths, cycles, connectivity
BFSO(V+E)O(V + E)O(W)O(W)Shortest path, levels

Where:

  • VV = number of vertices
  • EE = number of edges
  • HH = height of graph (DFS stack depth)
  • WW = maximum width (max nodes in BFS queue)

Interview Tips & Common Mistakes

✅ Best Practices

  1. Always clarify the graph type:

    • Directed or undirected?
    • Weighted or unweighted?
    • Can it have cycles?
  2. Choose the right representation:

    • Default to adjacency list (90% of cases)
    • Use matrix only if explicitly beneficial
  3. DFS vs BFS decision:

    • Use BFS for shortest path, level-order
    • Use DFS for paths, cycles, backtracking
  4. Track visited nodes:

    • Use Set<Integer> or boolean[]
    • Mark visited before adding to queue (BFS)
    • Mark visited when entering node (DFS)

❌ Common Mistakes

  1. Forgetting to mark nodes as visited → infinite loops
  2. Marking visited too late in BFS → nodes added multiple times
  3. Not handling disconnected components → missing parts of graph
  4. Using wrong representation → inefficient solution
  5. Not checking for null/empty graph → runtime errors

Interview Strategy

When you see a graph problem:

1. Clarify: directed? weighted? cycles?
2. Choose representation (usually adjacency list)
3. Identify pattern:
   - Connectivity → DFS
   - Shortest path → BFS
   - Cycle detection → DFS with states
   - Topological sort → DFS or BFS (Kahn's)
4. Implement with visited tracking
5. Test edge cases: empty, single node, disconnected

Common Graph Patterns

Pattern 1: Graph Traversal

When: Need to visit all nodes, check connectivity Use: DFS or BFS Examples: Number of Islands, Connected Components

Pattern 2: Shortest Path

When: Find minimum distance/steps Use: BFS (unweighted), Dijkstra (weighted) Examples: Word Ladder, Shortest Bridge

Pattern 3: Cycle Detection

When: Check for circular dependencies Use: DFS with 3 states (unvisited, visiting, visited) Examples: Course Schedule, Detect Cycle

Pattern 4: Topological Sort

When: Order tasks with dependencies Use: DFS (post-order) or BFS (Kahn’s algorithm) Examples: Course Schedule II, Alien Dictionary

Pattern 5: Union-Find

When: Dynamic connectivity, grouping Use: Union-Find (Disjoint Set Union) Examples: Number of Provinces, Redundant Connection

When NOT to Use Graphs

  • Simple linear traversal → Use array/list
  • Hierarchical data with clear parent-child → Use tree
  • Key-value lookups → Use hash map
  • Sequential processing → Use queue/stack

Key Takeaways for Interviews

  1. Adjacency list is the default representation (90% of problems)
  2. DFS = Stack, BFS = Queue (memorize both templates)
  3. Always track visited nodes to avoid infinite loops
  4. BFS finds shortest path in unweighted graphs
  5. DFS is better for paths, cycles, and backtracking
  6. Time complexity is O(V+E)O(V + E) for both traversals
  7. Space complexity:
    • DFS: O(H)O(H) (height/depth)
    • BFS: O(W)O(W) (max width)

Final Summary

Graphs are fundamental data structures that model relationships and connections. Mastering graphs requires:

  1. Understanding representations: Adjacency list (default) vs matrix
  2. Knowing traversal algorithms: DFS (depth-first) and BFS (breadth-first)
  3. Recognizing patterns: Connectivity, shortest path, cycles, topological sort
  4. Practice common problems: Start with LC 200, 133, 207

Remember: Most graph problems in interviews use adjacency lists and either DFS or BFS. Get comfortable with both traversal patterns, and you’ll handle 90% of graph questions confidently!

Next steps:

  • Practice basic traversal problems
  • Learn advanced algorithms: Dijkstra, Union-Find, Topological Sort
  • Study graph-specific patterns: bipartite, strongly connected components

Happy coding! 🚀