Graph Data Structure - Complete Guide with DFS & BFS Traversal

What is a Graph?

A graph is a fundamental data structure consisting of:

Vertices (Nodes): The entities or points in the graph
Edges: Connections between vertices

Graphs are used to model relationships, networks, hierarchies, and paths. They’re essential for solving problems like:

Social network connections (Facebook friends)
Road networks and navigation (Google Maps)
Web page links (Google PageRank)
Dependency resolution (npm packages)
Recommendation systems (Netflix, Amazon)

Why Interviewers Love Graphs

Graph problems appear in 40-50% of FAANG interviews because they test:

Multiple algorithmic patterns: DFS, BFS, shortest path, cycle detection
Data structure design: Choosing the right representation
Complex problem-solving: Real-world applications
Code organization: Clean traversal logic

Graph Types

Directed vs Undirected

Undirected:          Directed:
   A --- B              A ──> B
   |     |              |     ↑
   C --- D              └──> C

Undirected: Edges have no direction (friendships on Facebook)
Directed: Edges have direction (Twitter follows)

Weighted vs Unweighted

Weighted:            Unweighted:
   A --5-- B            A --- B
   |       |            |     |
   3       2            C --- D
   |       |
   C --1-- D

Weighted: Edges have values (road distances)
Unweighted: All edges equal (connections exist or not)

Cyclic vs Acyclic

Cyclic:              Acyclic (DAG):
   A → B                A → B
   ↓   ↓                ↓   ↓
   C ← D                C → D

Cyclic: Contains cycles
Acyclic: No cycles (DAG = Directed Acyclic Graph)

Graph Representations

There are two main ways to represent graphs: Adjacency List and Adjacency Matrix.

Comparison Table

Feature	Adjacency List	Adjacency Matrix
Space	$O(V + E)$	$O(V^2)$
Check edge	$O(\text{degree})$	$O(1)$
Find neighbors	$O(\text{degree})$	$O(V)$
Add edge	$O(1)$	$O(1)$
Best for	Sparse graphs	Dense graphs, quick edge lookup
Interview use	✅ 90% of problems	Rare (specific use cases)

When to Use Each

Use Adjacency List when:

Graph is sparse (few edges): $E \ll V^2$
Need to iterate through neighbors frequently
This is the default for 90% of interview problems

Use Adjacency Matrix when:

Graph is dense: $E \approx V^2$
Need $O(1)$ edge existence checks
Working with weighted graphs with many operations
Problem explicitly requires it

Adjacency List Implementation

Most common representation in interviews!

Adjacency List Visualization

Graph with edges: (0,1), (0,2), (1,3), (1,4), (2,4)

Adjacency List:
  0: [1, 2]
  1: [0, 3, 4]
  2: [0, 4]
  3: [1]
  4: [1, 2]

Visual:
    0
   / \
  1   2
 / \ /
3   4

Adjacency Matrix Implementation

Less common, but useful for dense graphs.

Adjacency Matrix Visualization

Same graph as above:

Adjacency Matrix (5×5):
     0  1  2  3  4
  0 [0, 1, 1, 0, 0]
  1 [1, 0, 0, 1, 1]
  2 [1, 0, 0, 0, 1]
  3 [0, 1, 0, 0, 0]
  4 [0, 1, 1, 0, 0]

1 means edge exists, 0 means no edge
For weighted graphs, use weight instead of 1

Graph DFS (Depth-First Search)

DFS explores as far as possible along each branch before backtracking.

DFS Pattern

1. Start at a node
2. Mark it visited
3. Recursively visit an unvisited neighbor
4. Backtrack when no unvisited neighbors
5. Repeat until all reachable nodes visited

Interactive DFS Visualization

DFS Visualization

Step 0 of 12Initialize DFS from node 0

Current

Visited

In Path

Adjacency List

0: [1, 2, 3]

1: [0, 4]

2: [0, 4]

3: [0, 5]

4: [1, 2, 5]

5: [3, 4]

Stack

[0]

Visited

{}

DFS Implementation

DFS Traversal Example

Graph:       DFS from 0:
    0            1. Visit: 0
   /|\           2. Visit: 1 (first edge from 0)
  1 2 3          3. Visit: 4 (from 1)
  |\ /|          4. Try 5 (from 4)
  | 4 |          5. Visit: 5
  |  \|          6. Backtrack to 1 (no more neighbors)
   \-5           7. Backtrack to 0
                 8. Visit: 2 (second edge from 0)
                 9. Try 4 (from 2) - ALREADY VISITED! Skip
                 10. Backtrack to 0
                 11. Visit: 3 (third edge from 0)
                 12. Try 5 (from 3) - ALREADY VISITED! Skip

Order: 0 → 1 → 4 → 5 → 2 → 3

Key DFS behaviors shown:
✓ Node 0 has 3 edges - explores each branch completely
✓ Node 4 has 2 incoming edges (1→4, 2→4) - visited check prevents revisiting
✓ Node 5 has 2 incoming edges (4→5, 3→5) - visited check prevents revisiting
✓ Backtracking after exhausting each branch

Stack state:
0:  [0]
1:  [0, 1]      (go to 1)
2:  [0, 1, 3]   (go to 3)
3:  [0, 1]      (backtrack from 3)
4:  [0, 1, 4]   (go to 4)
5:  [0, 1, 4, 5] (go to 5)
6:  [0, 1, 4]   (backtrack from 5)
...

Graph BFS (Breadth-First Search)

BFS explores all neighbors at the current depth before moving deeper.

BFS Pattern

1. Start at a node, add to queue
2. While queue not empty:
   a. Dequeue a node
   b. Mark it visited
   c. Enqueue all unvisited neighbors
3. Repeat until queue empty

Interactive BFS Visualization

BFS Visualization

Step 0 of 12Initialize BFS from node 0

Current

Visited

In Path

Adjacency List

0: [1, 2]

1: [0, 3, 4]

2: [0, 4]

3: [1, 5]

4: [1, 2, 5]

5: [3, 4]

Queue

[0]

Visited

{0}

BFS Implementation

BFS Traversal Example

Graph:       BFS from 0:
    0            Level 0: 0
   / \           Level 1: 1, 2
  1   2          Level 2: 3, 4
 /|   |          Level 3: 5
3 4   5
  |   |
   \-/

Order: 0 → 1 → 2 → 3 → 4 → 5

Queue state:
0:  [0]
1:  [1, 2]      (dequeued 0, enqueued 1, 2)
2:  [2, 3, 4]   (dequeued 1, enqueued 3, 4)
3:  [3, 4]      (dequeued 2, 4 already in queue)
4:  [4, 5]      (dequeued 3, enqueued 5)
5:  [5]         (dequeued 4, 5 already in queue)
6:  []          (dequeued 5)

DFS vs BFS Comparison

Aspect	DFS	BFS
Data Structure	Stack (recursion or explicit)	Queue
Order	Go deep first	Go wide first
Space	$O(H)$ (height)	$O(W)$ (max width)
Time	$O(V + E)$	$O(V + E)$
Use for	Paths, cycles, backtracking	Shortest path, levels
Interview frequency	⭐⭐⭐⭐⭐ Very common	⭐⭐⭐⭐⭐ Very common

Common Graph Problems

Problem 1: Number of Islands (LC 200)

Problem: Given a 2D grid of ‘1’s (land) and ‘0’s (water), count islands.

Pattern: DFS/BFS on implicit graph (grid as graph)

Complexity: $O(M \times N)$ time, $O(M \times N)$ space (recursion stack)

Why is Time Complexity O(M × N)? Detailed Analysis

At first glance, this might seem confusing:

The outer double loop visits all cells: $O(M \times N)$
Each DFS call might visit many cells
Does this mean $O((M \times N)^2)$ ? No! ❌

Key Insight: Each cell is visited at most once!

The Correct Analysis

1. Outer Loop Cost

for (int i = 0; i < M; i++) {           // M iterations
    for (int j = 0; j < N; j++) {       // N iterations
        if (grid[i][j] == '1') {
            dfs(grid, i, j);
        }
    }
}

This loop visits all $M \times N$ cells: $O(M \times N)$

2. Total DFS Calls (Amortized Analysis)

The key is to analyze total DFS calls across the entire execution:

private void dfs(char[][] grid, int i, int j) {
    if (grid[i][j] == '0') {
        return;  // ← Already visited, return immediately
    }
    
    grid[i][j] = '0'; // ← Mark as visited (critical!)
    
    dfs(grid, i + 1, j);
    dfs(grid, i - 1, j);
    dfs(grid, i, j + 1);
    dfs(grid, i, j - 1);
}

Critical observations:

When we visit a cell, we immediately mark it as '0'
Once marked, it will never be visited again
Each cell is processed exactly once by DFS

3. Concrete Example

Consider a 3×4 grid (M=3, N=4):

[
  ['1', '1', '0', '0'],
  ['1', '0', '0', '1'],
  ['0', '0', '1', '1']
]

Execution trace:

Step 1: Scan reaches grid[0][0] = '1'
  → Call dfs(0,0)
  → DFS visits: (0,0) → (0,1) → (1,0)
  → All marked as '0'
  → Total: 3 cells visited

Step 2: Continue scan, grid[0][1] now '0', skip
Step 3: Continue scan, grid[1][0] now '0', skip
Step 4: Scan reaches grid[1][3] = '1'
  → Call dfs(1,3)
  → DFS visits: (1,3)
  → Marked as '0'
  → Total: 1 cell visited

Step 5: Scan reaches grid[2][2] = '1'
  → Call dfs(2,2)
  → DFS visits: (2,2) → (2,3)
  → Marked as '0'
  → Total: 2 cells visited

Statistics:

Outer loop iterations: 12 cells (3×4)
Total DFS visits: 3 + 1 + 2 = 6 cells (exactly the number of ‘1’s)
Each cell visited exactly once by DFS

4. Time Complexity Formula

Total Time = Outer Loop Time + All DFS Calls Time

Outer Loop:
  - Visit M×N cells
  - Each check: O(1)
  - Total: O(M×N)

All DFS Calls (across entire execution):
  - Each cell visited at most once
  - Each visit: O(1) work (mark + 4 recursive calls)
  - Total: O(M×N)

Total Time = O(M×N) + O(M×N) = O(M×N)

Why NOT $O((M \times N)^2)$ ?

❌ Wrong thinking: “Each DFS might visit $O(M \times N)$ cells, and we call DFS $M \times N$ times”

✅ Correct thinking: “Although we may initiate multiple DFS calls, all DFS calls combined visit each cell at most once”

Analogy: It’s like painting an array:

Outer loop scans all elements: $O(N)$
When we find an unpainted element, paint it and adjacent ones
Multiple “paint operations”, but each element painted once
Total time: $O(N)$ , not $O(N^2)$

Space Complexity: $O(M \times N)$ (worst case)

Worst case: entire grid is ‘1’ (one giant island)

[
  ['1', '1', '1'],
  ['1', '1', '1'],
  ['1', '1', '1']
]

DFS recursion depth can reach $O(M \times N)$ in a snake-like path from top-left to bottom-right.

Summary:

✅ Each cell marked at most once
✅ Marked cells never revisited
✅ Amortized analysis: all DFS calls visit each cell once
✅ Linear time, not quadratic!

Alternative Solution: BFS Approach

While DFS is intuitive for this problem, BFS also works perfectly and may be easier to understand for some people.

Key Differences:

DFS: Uses recursion (implicit stack), explores deep into one island
BFS: Uses explicit queue, explores island level by level

BFS Execution Example:

Grid:                   Execution:
['1', '1', '0']         1. Find '1' at (0,0), count=1
['1', '0', '0']         2. BFS from (0,0):
['0', '0', '1']            Queue: [(0,0)]
                           Process (0,0), mark as '0', add neighbors
                           Queue: [(0,1), (1,0)]
                           Process (0,1), mark as '0'
                           Queue: [(1,0)]
                           Process (1,0), mark as '0'
                           Queue: []  (BFS complete)
                        3. Continue scanning...
                        4. Find '1' at (2,2), count=2
                        5. BFS from (2,2):
                           Queue: [(2,2)]
                           Process (2,2), mark as '0'
                           Queue: []  (BFS complete)
                        
Result: 2 islands

DFS vs BFS for Number of Islands:

Aspect	DFS	BFS
Implementation	Simpler (recursion)	Slightly more code (queue)
Memory	$O(M \times N)$ worst case (recursion)	$O(\min(M, N))$ typical
Order	Go deep (zigzag)	Level by level (expanding)
Stack overflow risk	Yes (very large grids)	No
Interview preference	⭐⭐⭐⭐⭐ More common	⭐⭐⭐⭐ Also good

When to use BFS:

Large grids where DFS might cause stack overflow
Need to find shortest path (though not needed here)
Prefer iterative solutions over recursion

When to use DFS:

Simpler implementation preferred
Recursion is natural for the problem
Grid size is reasonable (< 1000×1000)

Both have same time complexity: $O(M \times N)$

Key BFS insight:

Mark cells as visited immediately when adding to queue (not when processing)
This prevents adding same cell multiple times to queue
Without this, cells could be added 4 times (from 4 neighbors)!

Problem 2: Clone Graph (LC 133)

Problem: Deep copy an undirected graph.

Pattern: DFS/BFS with hashmap for tracking clones

Complexity: $O(V + E)$ time, $O(V)$ space

Problem 3: Course Schedule (LC 207)

Problem: Detect if courses can be finished (cycle detection in directed graph).

Pattern: DFS cycle detection or topological sort

Complexity: $O(V + E)$ time, $O(V)$ space

Time & Space Complexity Summary

Representations

Operation	Adjacency List	Adjacency Matrix
Space	$O(V + E)$	$O(V^2)$
Check edge	$O(\text{degree})$	$O(1)$
Find neighbors	$O(\text{degree})$	$O(V)$
Add edge	$O(1)$	$O(1)$

Traversals

Algorithm	Time	Space	Best For
DFS	$O(V + E)$	$O(H)$	Paths, cycles, connectivity
BFS	$O(V + E)$	$O(W)$	Shortest path, levels

Where:

$V$ = number of vertices
$E$ = number of edges
$H$ = height of graph (DFS stack depth)
$W$ = maximum width (max nodes in BFS queue)

Interview Tips & Common Mistakes

✅ Best Practices

Always clarify the graph type:
- Directed or undirected?
- Weighted or unweighted?
- Can it have cycles?
Choose the right representation:
- Default to adjacency list (90% of cases)
- Use matrix only if explicitly beneficial
DFS vs BFS decision:
- Use BFS for shortest path, level-order
- Use DFS for paths, cycles, backtracking
Track visited nodes:
- Use Set<Integer> or boolean[]
- Mark visited before adding to queue (BFS)
- Mark visited when entering node (DFS)

❌ Common Mistakes

Forgetting to mark nodes as visited → infinite loops
Marking visited too late in BFS → nodes added multiple times
Not handling disconnected components → missing parts of graph
Using wrong representation → inefficient solution
Not checking for null/empty graph → runtime errors

Interview Strategy

When you see a graph problem:

1. Clarify: directed? weighted? cycles?
2. Choose representation (usually adjacency list)
3. Identify pattern:
   - Connectivity → DFS
   - Shortest path → BFS
   - Cycle detection → DFS with states
   - Topological sort → DFS or BFS (Kahn's)
4. Implement with visited tracking
5. Test edge cases: empty, single node, disconnected

Common Graph Patterns

Pattern 1: Graph Traversal

When: Need to visit all nodes, check connectivity Use: DFS or BFS Examples: Number of Islands, Connected Components

Pattern 2: Shortest Path

When: Find minimum distance/steps Use: BFS (unweighted), Dijkstra (weighted) Examples: Word Ladder, Shortest Bridge

Pattern 3: Cycle Detection

When: Check for circular dependencies Use: DFS with 3 states (unvisited, visiting, visited) Examples: Course Schedule, Detect Cycle

Pattern 4: Topological Sort

When: Order tasks with dependencies Use: DFS (post-order) or BFS (Kahn’s algorithm) Examples: Course Schedule II, Alien Dictionary

Pattern 5: Union-Find

When: Dynamic connectivity, grouping Use: Union-Find (Disjoint Set Union) Examples: Number of Provinces, Redundant Connection

When NOT to Use Graphs

Simple linear traversal → Use array/list
Hierarchical data with clear parent-child → Use tree
Key-value lookups → Use hash map
Sequential processing → Use queue/stack

Key Takeaways for Interviews

Adjacency list is the default representation (90% of problems)
DFS = Stack, BFS = Queue (memorize both templates)
Always track visited nodes to avoid infinite loops
BFS finds shortest path in unweighted graphs
DFS is better for paths, cycles, and backtracking
Time complexity is $O(V + E)$ for both traversals
Space complexity:
- DFS: $O(H)$ (height/depth)
- BFS: $O(W)$ (max width)

Final Summary

Graphs are fundamental data structures that model relationships and connections. Mastering graphs requires:

Understanding representations: Adjacency list (default) vs matrix
Knowing traversal algorithms: DFS (depth-first) and BFS (breadth-first)
Recognizing patterns: Connectivity, shortest path, cycles, topological sort
Practice common problems: Start with LC 200, 133, 207

Remember: Most graph problems in interviews use adjacency lists and either DFS or BFS. Get comfortable with both traversal patterns, and you’ll handle 90% of graph questions confidently!

Next steps:

Practice basic traversal problems
Learn advanced algorithms: Dijkstra, Union-Find, Topological Sort
Study graph-specific patterns: bipartite, strongly connected components

Happy coding! 🚀