English Jackey

Tri-Color Marking Algorithm: Complete Guide to Garbage Collection Marking

Master the tri-color marking algorithm with interactive visualizations. Learn how garbage collectors mark reachable objects using white, gray, and black colors with step-by-step animations.

#garbage-collection #algorithm #visualization #interactive-learning #memory-management

Tri-Color Marking Algorithm: Complete Guide to Garbage Collection Marking

The tri-color marking algorithm is a fundamental technique in modern garbage collectors (GC). It elegantly solves the problem of identifying which objects in memory are still reachable and which can be safely collected. This algorithm is used in many production systems, from the Go runtime to the V8 JavaScript engine.

Drawing from my 11+ years of experience at Microsoft, Booking.com, and Alibaba working with high-performance systems, I’ll walk you through this algorithm with interactive visualizations that make the abstract concept concrete.

Interactive Visualization

Let’s start by seeing the algorithm in action. Try playing with different examples to understand how objects transition through the three colors:

What is the Tri-Color Marking Algorithm?

The tri-color marking algorithm is a marking strategy used during the mark phase of garbage collection. It categorizes objects into three sets using colors:

  • White Set: Objects that haven’t been visited yet. After marking completes, white objects are considered garbage.
  • Gray Set: Objects that have been discovered but whose references haven’t been fully explored yet.
  • Black Set: Objects that have been fully explored—they are reachable, and all their references have been processed.

Key Invariant

The algorithm maintains a crucial invariant throughout execution:

No black object can point to a white object directly.

This invariant ensures correctness: if a black object (already scanned) pointed to a white object (not yet scanned), that white object might be incorrectly collected as garbage.

How the Algorithm Works

Phase 1: Initialization

All objects in the heap start in the white set. This represents the “innocent until proven guilty” approach—every object is potentially garbage until we prove it’s reachable.

// Pseudo-code for initialization
function initialize(heap) {
  for (object in heap) {
    object.color = WHITE;
  }
}

Phase 2: Mark Roots

The root set contains all directly accessible objects—typically:

  • Global variables
  • Stack variables
  • CPU registers

These roots are marked gray because we know they’re reachable, but we haven’t scanned their references yet.

function markRoots(roots) {
  for (root in roots) {
    root.color = GRAY;
    graySet.add(root);
  }
}

Phase 3: Process Gray Objects

This is the main loop of the algorithm. We repeatedly:

  1. Pick a gray object from the gray set
  2. For each reference in that object:
    • If the referenced object is white, mark it gray
  3. Mark the current object black (fully processed)
function processGrayObjects() {
  while (graySet.isNotEmpty()) {
    let current = graySet.removeAny();
    
    // Scan all references
    for (reference in current.references) {
      if (reference.color == WHITE) {
        reference.color = GRAY;
        graySet.add(reference);
      }
    }
    
    // Mark current as fully processed
    current.color = BLACK;
    blackSet.add(current);
  }
}

Phase 4: Sweep

After marking completes, all white objects are unreachable and can be collected:

function sweep(heap) {
  for (object in heap) {
    if (object.color == WHITE) {
      // This object is garbage
      freeMemory(object);
    } else {
      // Reset color for next GC cycle
      object.color = WHITE;
    }
  }
}

Step-by-Step Example

Let’s trace through a simple example with objects A → B → C and an isolated cycle D ↔ E:

Initial State:
Root → A → B → C
D ↔ E (isolated cycle)

All objects: WHITE

Step 1: Mark root A as GRAY

A: GRAY
B, C, D, E: WHITE
Gray Set: [A]

Step 2: Process A, discover B

A: BLACK (scanned)
B: GRAY (discovered)
C, D, E: WHITE
Gray Set: [B]

Step 3: Process B, discover C

A: BLACK
B: BLACK (scanned)
C: GRAY (discovered)
D, E: WHITE
Gray Set: [C]

Step 4: Process C

A, B, C: BLACK
D, E: WHITE (unreachable!)
Gray Set: [] (empty)

Result: D and E remain white—they form an isolated cycle unreachable from roots. They can be collected as garbage.

Why Use Three Colors?

You might wonder: why not just use two colors (reachable/unreachable)?

The three-color scheme provides several advantages:

1. Incremental Collection

The gray set represents the “wavefront” of the marking process. We can:

  • Process a few gray objects
  • Pause to let the program (mutator) run
  • Resume processing later

This enables incremental GC that doesn’t pause the program for long periods.

2. Concurrent Collection

With proper synchronization, the GC can run concurrently with the program:

  • The program might create new objects or change references
  • The gray set helps track what still needs scanning
  • The invariant prevents collecting reachable objects

3. Clear Progress Tracking

The three sets clearly show:

  • White: Not yet explored
  • Gray: Currently exploring (work in progress)
  • Black: Fully explored (work completed)

Handling Concurrent Mutations

When the GC runs concurrently with the program (mutator), the mutator might:

  1. Create new objects: New objects start white or gray
  2. Update references: This could violate our invariant!

The Problem

Consider this scenario:

Initial:
Black → Gray → White

Mutator updates:
Black → White (now points to white!)
Gray → null (removes original path)

Now a black object points to a white object, violating our invariant! The white object might be incorrectly collected.

Solutions

Two main approaches fix this:

Write Barrier (Dijkstra’s Algorithm)

When a black object gains a reference to a white object, mark the white object gray:

function writeBarrier(blackObject, newReference) {
  if (blackObject.color == BLACK && newReference.color == WHITE) {
    newReference.color = GRAY;
    graySet.add(newReference);
  }
}

Read Barrier (Yuasa’s Algorithm)

When a reference is removed, if it points to a white object, mark that object gray:

function readBarrier(object, oldReference) {
  if (oldReference.color == WHITE) {
    oldReference.color = GRAY;
    graySet.add(oldReference);
  }
}

Real-World Implementations

Go Runtime

Go’s garbage collector uses tri-color marking with:

  • Concurrent marking (GC runs alongside goroutines)
  • Write barriers to maintain invariants
  • Pacer that adjusts GC frequency based on allocation rate
// Simplified Go GC pseudocode
func gcMark() {
    // Stop the world briefly
    stopTheWorld()
    
    // Mark roots
    scanRoots()
    
    // Start the world, mark concurrently
    startTheWorld()
    
    // Process gray objects concurrently
    for hasGrayObjects() {
        processGrayObjects()
    }
}

V8 JavaScript Engine

V8 uses tri-color marking in its:

  • Major GC (Mark-Compact)
  • Incremental marking
  • Concurrent marking

Java HotSpot

Some Java collectors like G1 and ZGC use tri-color marking for concurrent and incremental collection.

Complexity Analysis

Time Complexity

  • Marking phase: O(R + E) where:
    • R = number of reachable objects
    • E = number of references between reachable objects

Each object is visited once, each reference is traversed once.

Space Complexity

  • Gray set: O(W) where W is the maximum “width” of the object graph
    • In the worst case (a long chain), W = 1
    • In the best case (a star pattern), W = R - 1

Advantages and Disadvantages

Advantages ✅

  1. Incremental: Can pause and resume marking
  2. Concurrent: Can run alongside the program
  3. Precise: Correctly identifies all reachable objects
  4. Scalable: Time proportional to reachable objects (not total heap)

Disadvantages ❌

  1. Overhead: Requires tracking three sets
  2. Barriers: Write/read barriers add execution cost
  3. Complexity: More complex than stop-the-world approaches
  4. Floating garbage: Concurrent marking might miss some garbage (collected next cycle)

Common Interview Questions

Q1: Why can’t we just use two colors?

Two colors (marked/unmarked) don’t provide enough information for incremental or concurrent collection. The gray color represents the “frontier” of objects we’re actively exploring, allowing us to pause and resume.

Q2: How does this handle cycles?

The algorithm naturally handles cycles. Even if objects reference each other, they’ll only become gray if reachable from roots. Isolated cycles remain white and are collected.

Q3: What’s the difference from reference counting?

Reference counting:

  • Tracks number of references to each object
  • Cannot handle cycles
  • Immediate collection

Tri-color marking:

  • Traces reachability from roots
  • Handles cycles naturally
  • Batch collection

Practice Problems

Problem 1: Trace the Algorithm

Given this object graph:

Root1 → A → C
Root2 → B → C
D → E → D (cycle)

Trace the algorithm step by step. Which objects are collected?

Solution
Initial: All WHITE

Step 1: Mark roots
A: GRAY, B: GRAY
Others: WHITE

Step 2: Process A
A: BLACK, C: GRAY (discovered from A)
B: GRAY, D, E: WHITE

Step 3: Process B
A, B: BLACK
C: already GRAY
D, E: WHITE

Step 4: Process C
A, B, C: BLACK
D, E: WHITE

Result: D and E are garbage (isolated cycle)

Problem 2: Design a Write Barrier

Implement a write barrier for the tri-color marking algorithm that maintains the invariant “no black object points to a white object.”

Solution
function writeBarrier(
  sourceObj: Object,
  field: string,
  targetObj: Object
) {
  // Perform the write
  sourceObj[field] = targetObj;
  
  // If source is black and target is white,
  // mark target gray to maintain invariant
  if (sourceObj.color === 'BLACK' && targetObj.color === 'WHITE') {
    targetObj.color = 'GRAY';
    graySet.add(targetObj);
  }
}

Multi-Language Implementations

JavaScript/TypeScript

class TriColorMarking {
  private whiteSet = new Set<number>();
  private graySet = new Set<number>();
  private blackSet = new Set<number>();
  
  mark(roots: number[], graph: Map<number, number[]>): Set<number> {
    // Initialize all as white
    for (const node of graph.keys()) {
      this.whiteSet.add(node);
    }
    
    // Mark roots gray
    for (const root of roots) {
      this.whiteSet.delete(root);
      this.graySet.add(root);
    }
    
    // Process gray objects
    while (this.graySet.size > 0) {
      const current = this.graySet.values().next().value;
      this.graySet.delete(current);
      
      // Mark references
      const refs = graph.get(current) || [];
      for (const ref of refs) {
        if (this.whiteSet.has(ref)) {
          this.whiteSet.delete(ref);
          this.graySet.add(ref);
        }
      }
      
      // Mark current black
      this.blackSet.add(current);
    }
    
    // Return garbage (remaining white objects)
    return this.whiteSet;
  }
}

Python

class TriColorMarking:
    def __init__(self):
        self.white_set = set()
        self.gray_set = set()
        self.black_set = set()
    
    def mark(self, roots, graph):
        """
        Mark reachable objects using tri-color algorithm.
        Returns set of garbage (white) objects.
        """
        # Initialize all as white
        self.white_set = set(graph.keys())
        
        # Mark roots gray
        for root in roots:
            if root in self.white_set:
                self.white_set.remove(root)
                self.gray_set.add(root)
        
        # Process gray objects
        while self.gray_set:
            current = self.gray_set.pop()
            
            # Mark references
            for ref in graph.get(current, []):
                if ref in self.white_set:
                    self.white_set.remove(ref)
                    self.gray_set.add(ref)
            
            # Mark current black
            self.black_set.add(current)
        
        # Return garbage
        return self.white_set

Java

public class TriColorMarking {
    private Set<Integer> whiteSet = new HashSet<>();
    private Set<Integer> graySet = new HashSet<>();
    private Set<Integer> blackSet = new HashSet<>();
    
    public Set<Integer> mark(
        List<Integer> roots,
        Map<Integer, List<Integer>> graph
    ) {
        // Initialize all as white
        whiteSet.addAll(graph.keySet());
        
        // Mark roots gray
        for (int root : roots) {
            if (whiteSet.contains(root)) {
                whiteSet.remove(root);
                graySet.add(root);
            }
        }
        
        // Process gray objects
        while (!graySet.isEmpty()) {
            int current = graySet.iterator().next();
            graySet.remove(current);
            
            // Mark references
            List<Integer> refs = graph.getOrDefault(
                current,
                Collections.emptyList()
            );
            for (int ref : refs) {
                if (whiteSet.contains(ref)) {
                    whiteSet.remove(ref);
                    graySet.add(ref);
                }
            }
            
            // Mark current black
            blackSet.add(current);
        }
        
        // Return garbage
        return whiteSet;
    }
}

Performance Optimization Tips

1. Gray Set Implementation

Use a work-stealing deque for the gray set in parallel collectors:

class WorkStealingDeque<T> {
  private items: T[] = [];
  
  pushBottom(item: T) {
    this.items.push(item);
  }
  
  popBottom(): T | undefined {
    return this.items.pop();
  }
  
  popTop(): T | undefined {
    return this.items.shift();
  }
}

2. Prefetching

Prefetch object data when processing gray objects to reduce cache misses:

// C++ example
while (!graySet.empty()) {
    auto current = graySet.front();
    graySet.pop();
    
    // Prefetch next objects
    for (auto& ref : current->references) {
        __builtin_prefetch(ref);
    }
    
    // Process current
    processObject(current);
}

3. Parallel Marking

Multiple threads can mark concurrently using thread-local gray sets:

function parallelMark(roots: number[], graph: Map<number, number[]>) {
  const workers = [];
  const localGraySets = [];
  
  // Distribute roots among workers
  for (let i = 0; i < NUM_WORKERS; i++) {
    localGraySets[i] = new Set();
    workers[i] = new Worker(() => {
      while (hasWork()) {
        processLocalGraySet(localGraySets[i]);
      }
    });
  }
}

Conclusion

The tri-color marking algorithm is a elegant and powerful technique for garbage collection. Its ability to run incrementally and concurrently makes it ideal for modern systems that can’t tolerate long GC pauses.

Key takeaways:

  • Three colors track marking progress: white (unvisited), gray (frontier), black (completed)
  • Invariant: No black → white pointers prevents collecting reachable objects
  • Barriers (write/read) maintain correctness during concurrent execution
  • Widely used in Go, V8, Java, and many other production systems

The interactive visualization at the top of this article lets you explore different scenarios. Try creating your own object graphs to deepen your understanding!

Further Reading


Want to explore more algorithm visualizations? Check out our other interactive tutorials on topological sort, binary search, and graph traversal!