GC
Common Garbage Collection Strategies
Reference Counting Algorithm
The reference counting algorithm is one of the simplest garbage collection algorithms. Its basic idea is to add a reference count field to an object. Whenever a reference to the object is created, the count is incremented. When a reference becomes invalid, the count is decremented. When the count reaches zero, it indicates that the object is no longer in use and can be reclaimed.
Pros and Cons
Pros:
- No need for traversal
- It doesn't require traversing from root nodes, making it relatively easy to find references.
- Immediate garbage collection
- Each object always knows its reference count. When the count reaches zero, it immediately connects itself to a free list for later reclamation.
- Minimizes program pause time
- Garbage collection is triggered when the mutator updates reference counts, avoiding long pauses due to memory exhaustion.
Cons:
- Cannot handle circular references
- Requires modifying counters for every reference count change, resulting in additional overhead
- Needs extra space to store counters
Tracing Garbage Collection Algorithms
Tracing garbage collection algorithms come in three main strategies:
- Mark-Sweep
- Mark-Compact
- Mark-Copying
Note
All three strategies involve a "stop the world" pause during execution.
Mark-Sweep Algorithm
How it works:
- Start from root objects and recursively traverse all reachable objects, marking them as live objects.
- Scan all objects in the heap, reclaiming unmarked objects.
Pros and Cons
Pros:
- Can handle circular references
- No need for extra space to store counters
Cons:
- Generates fragmentation during the sweep phase, leading to memory fragmentation. This may cause subsequent garbage collection due to inability to find contiguous memory for object allocation.
- Unstable execution efficiency
Mark-Copying Algorithm
How it works:
- Start from root objects and recursively traverse all reachable objects, marking them as live objects.
- Divide the heap into two equal areas: the "from" space and the "to" space.
- During program execution, only place objects in the "from" space. When the "from" space is full, perform garbage collection. Traverse all objects in the "from" space, identify live objects, move them to the "to" space, and then clear the "from" space. Finally, swap the roles of the two areas (making the "to" space the new "from" space and vice versa).
Pros and Cons
Pros:
- Solves memory fragmentation
- Each garbage collection cycle moves live objects to the "to" space, ensuring contiguous storage.
- Relatively high execution efficiency
- Only copying live objects and batch clearing unlive objects results in shorter time requirements and higher throughput.
- Enables fast memory allocation
- Contiguous memory allows efficient pointer-based memory allocation compared to other algorithms that use free lists.
Cons:
- Low space utilization
- Only half of the memory space can be used for objects.
- Inefficient recursion
- Recursive traversal and copying of all reachable objects may be less efficient than iteration, and additional stack overhead may lead to memory overflow.
Mark-Compact Algorithm
How it works:
- Start from root objects and recursively traverse all reachable objects, marking them as live objects.
- Move live objects to one end of the heap and then reclaim unmarked objects.
Pros and Cons
Pros:
- High space utilization
- Compared to the Mark-Copying algorithm, it achieves better space utilization without wasting half of the memory.
Cons:
- Lower execution efficiency
- Moving live objects to one end of the heap requires three traversal passes, taking more time. For a large number of objects, the pause time may be longer than the other two strategies.
Comparison of Three Garbage Collection Strategies
Throughput: Mark-Compact > Mark-Sweep > Copying Memory Utilization: Mark-Compact > Mark-Sweep > Copying Memory Fragmentation: Mark-Compact = Copying > Mark-Sweep
Garbage Collection in Golang
Tri-Color Marking Algorithm
The tri-color marking algorithm improves upon the mark-sweep algorithm by breaking down its two phases (marking and sweeping) into three phases (marking, termination of marking, and sweeping), reducing the stop-the-world (STW) time.
Three Object Colors
Color | Object State | Description |
---|---|---|
White | Unvisited | Object has not been visited, may be garbage |
Gray | In Progress | Object has been visited, but not its children |
Black | Completed | Object has been visited, along with its children |
Ultimately, white objects are reclaimed.
How It Works
- At the start of garbage collection, root objects are marked as gray.
- From the gray objects, one object is chosen and marked as black, and its children are marked as gray.
- All white objects pointed to by black objects are marked as gray.
- Repeat steps 2 and 3 until there are no gray objects left.
- Clean up all white objects.
What If There's No STW?
In practice, even with the tri-color marking algorithm, the STW time can still be relatively long. However, if we avoid STW, the program might continue running during the marking and sweeping phases. This could lead to changes in object state, causing the garbage collector to incorrectly mark objects, resulting in collection errors.
As shown in the diagram above, if we traverse objects A and D, and before reaching B, D adds a reference to C while B removes its reference to C, then after garbage collection, C could become white and be mistakenly collected.
Barrier Techniques
To address the issues mentioned above, Golang
introduced barrier techniques that allow the garbage collector to be notified when object states change.
Important
To ensure correctness in concurrent or incremental marking algorithms, we need to achieve one of the following tri-color invariants:
- Strong Tri-Color Invariant: During the marking phase, black objects do not point to white objects.
- Weak Tri-Color Invariant: During the marking phase, white objects (G) pointed to by black objects must have a path through one or more white objects to reach another white object (G).
As shown in the diagram above, if A
adds a reference to D
, we need to add another reference from E
to D
to maintain the weak tri-color invariant.
Barrier Techniques Used in Golang
- Write Barrier (Insertion Barrier)
- Delete Barrier
Write Barrier (Insertion Barrier)
In Golang
, when an object A
adds a reference to another object B
, a reference to B
is inserted into A
's reference list, and B
is marked as gray.
[!warning] The write barrier only takes effect for heap objects, not stack objects, primarily due to performance considerations.
As shown in the initial conditions (Figure 1), A
belongs to stack data, and F
belongs to heap data. In Figure 2, references are added simultaneously from A
to D
and from F
to H
. Due to the write barrier, H
becomes gray, while D
remains white since it is not in the heap. After scanning (Figure 4), H
is marked as black, and D
is marked as white. At this point, an STW phase is initiated to rescan stack objects and mark D
as black.
Delete Barrier
In Golang
, when an object A
removes a reference to another object B
, a reference to B
is removed from A
's reference list. If B
is white, it is marked as gray.
The reason for marking white objects as gray during deletion is that white objects may have subsequent references. Not marking them as gray could result in subsequent objects not being scanned.
Hybrid Write Barrier
The write barrier and delete barrier have the following drawbacks:
- The write barrier still requires an additional STW phase after scanning to re-scan stack objects.
- The delete barrier has lower precision; at the start of collection, an STW phase is needed to rescan stack objects and record an initial snapshot to protect all live objects at that moment.
To address these issues, Golang
introduced the hybrid write barrier, which combines aspects of both insertion and deletion barriers. It notifies the garbage collector when object states change.
How It Works
- At the start of garbage collection, all stack objects are scanned and marked as black (without a second scan).
- Any objects created on the stack during garbage collection are immediately marked as black, avoiding the need for a second scan.
- Any deleted objects during garbage collection are marked as gray.
- Any objects created during garbage collection are marked as gray.