.NET Garbage Collection, Here We Go!

Written by anand-gupta | Published 2020/07/20
Tech Story Tags: csharp | dotnet | garbage-collector | jit | memory-management | memory-allocation | programming | tutorial

TLDR This post discusses the aggressive nature of the garbage collection process in.NET as well as a concept that is often referred to as eager root collection. The behavior we are going to discuss in this post is actually the work of JIT and not the garbage collector. JIT is a very smart (and very aggressive) compiler that compiles MSIL code to native assembly code that is executed by the processor. When garbage collector is invoked inside Run() method from line 26 to line 28, tiger object is collected by GC since JIT informed the GC that tiger is not a live root.via the TL;DR App

This post discusses the aggressive and hungry nature of the garbage collection process in .NET as well as a concept that is often referred to as eager root collection.
I deliberately use the phrase garbage collection process and not garbage collector. This is because the behavior we are going to discuss in this post is actually the work of JIT and not the garbage collector. This statement is quite important as it goes against the popular notion about the role of garbage collector; however this JIT behavior does contribute to the garbage collection process by assisting the garbage collector as we will see in this post.
Consider the below code:
using System;
using System.Runtime.CompilerServices;

class Program
{
  static void Main(string[] args)
  {
      Tiger tiger = new Tiger();
      tiger.Run();
      var age = tiger.GetAge(2020);
      Console.WriteLine($"Tiger's age is {age}");
      Console.ReadLine();
  }
}
 
public class Tiger
{
    public int YearOfBirth { get { return 2010; } }
    
    ~Tiger()
    {
        Console.WriteLine("Tiger Dead");
    }

    public void Run()
    {
        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
        Console.WriteLine("Tiger is running");
    }

    public int GetAge(int currentYear)
    {
        return  currentYear - YearOfBirth;
    }
  }
Can you guess the what the output will be ?
Well, the answer is — it depends!
If we run this under Debug mode (non-optimized code), you will see the output as below:
Tiger is running
Tiger’s age is 10
If we run this under Release mode (optimized code), you will see the output as below:
Tiger Dead
Tiger is running
Tiger’s age is 10
Surprised!
How can tiger be running after being dead!
This is where the JIT optimization comes in. The JIT is a very smart compiler that compiles MSIL code to native assembly code that is executed by the processor. JIT is aware exactly what objects are “live” at every line of code. It uses this knowledge to aggressively identify objects that are no longer in use to inform the garbage collector which objects in a method is eligible for garbage collection. It maintains the list of live roots in the stack and registers in the form of GCInfo and makes it available to the GC when it needs this during garbage collection.
Now, how can we relate and reconcile what I just described above about JIT with our code.
At line 8, the JIT identifies the object tiger to be no longer live. When garbage collector is invoked inside Run() method from line 26 to line 28, tiger object is collected by GC, since JIT informed the GC that tiger is not a live root. As part of the garbage collection, the destructor of the Tiger instance is called (by finalizer thread) and you see the output ‘Tiger is dead’ before proceeding to executing the rest of the program.
Now, the question is why does JIT think that tiger object is not a live object at line 8, when in fact we are executing a method of tiger object itself. Not only that, we using the tiger object on line 9 again to call tiger.GetAge() method, after the execution of Run() method. How can we call a method of a object if it has already been collected by the GC and its destructor has been called already.
In summary, from the lexical scope of point of view, tiger object is being used at line 8 and line 9, and yet JIT believes it is not being used. How can that be? Remember I said JIT is a smart (and very aggressive) compiler? JIT knows that when it is executing the Run() method, it is not using the tiger object in any way. The statements inside Run method can be run without the need to have the actual reference to the tiger object. So, it does not identify tiger as live root.
But wait! That only explains we are not using the tiger object inside the Run() method, but what about line 9? Are we not using tiger object to invoke the GetAge() method? Not just that, the GetAge() method also uses the tiger instance property YearOfBirth during its execution, so it needs the tiger object to execute GetAge(). Then how can JIT make a decision that tiger object is not a live root inside Run() method? This is the magic of inlining of methods by JIT. Remember, I said the JIT is a smart beast? Based on some inlining rules (used to determine if inlining a method will give any benefits or not), the JIT may decide to inline certain methods. The exact rules of method inlining is beyond the scope of this post, but you may refer to this post if you are curious. Inlining means instead of making a method call, the JIT will execute the statements of the target method in the calling method “inline” at the point of method invocation. It may also do some further optimizations (like replacing YearOfBirth with hardcoded integer value 2010) to make this possible. In the case of GetMethod() method invocation, the JIT decides to inline it, which means instead of :
var age = tiger.GetAge(2010)
the JIT emits code equivalent to
var age = 10 // this is arithmetic difference of 2020 an 2010
and the above statement obviously does not require the usage of tiger object.
Hence, JIT can safely identify the tiger object as not live and report to garbage collector accordingly. It does this by maintaining something called GCInfo for every JIT compiled method. The GCInfo of a method informs the garbage collector what objects are to be considered as live root. GC uses this information during its mark phase of garbage collection. The objects not marked are deemed to be “garbage” and will be collected by the garbage collector in its sweep phase.
If you are curious, below is the JIT optimized code for Main() method in the Release mode. Notice line 14 where a hardcoded value of 0xa is being used. (0xa is the hex equivalent of number 10). There is no call to Tiger.GetAge() method. In the Debug mode JIT code (not shown here for brevity), you will see an assembly statement like call Tiger.GetAge(Int32), similar to call Tiger.Run() as you can see below :
Program.Main(System.String[])
    L0000: sub rsp, 0x48
    L0004: xor eax, eax
    L0006: mov [rsp+0x28], rax
    L000b: mov [rsp+0x30], rax
    L0010: mov [rsp+0x38], rax
    L0015: mov [rsp+0x40], rax
    L001a: mov rcx, 0x7ff965bed120
    L0024: call 0x00007ff9bc52d7a0
    L0029: mov rcx, rax
    L002c: call Tiger.Run()
    L0031: mov rcx, 0x7ff95cabb1e8
    L003b: call 0x00007ff9bc6178f0
    L0040: mov dword ptr [rax+8], 0xa
    L0047: xor r8d, r8d
    L004a: mov rdx, 0x1d3d8f31388
    L0054: mov rdx, [rdx]
    L0057: mov rcx, 0x1d3d9587158
    L0061: mov rcx, [rcx]
    L0064: lea r9, [rsp+0x28]
    L0069: mov [r9], rax
    L006c: mov [r9+8], r8
    L0070: mov [r9+0x10], r8
    L0074: mov [r9+0x18], rdx
    L0078: lea r8, [rsp+0x28]
    L007d: mov rdx, rcx
    L0080: xor ecx, ecx
    L0082: call System.String.FormatHelper(System.IFormatProvider, System.String, System.ParamsArray)
    L0087: mov rcx, rax
    L008a: call System.Console.WriteLine(System.String)
    L008f: call System.Console.ReadLine()
    L0094: nop
    L0095: add rsp, 0x48
    L0099: ret
Conclusion
The garbage collection behavior can be different between optimized code (Release build) and unoptimized code (Debug build). The difference can be due to eager collection of objects in case of optimized code. This happens due to the smart and aggressive nature of JIT compiler in the optimized build. It may also use the magic of inling a method if it deems appropriate and any other code optimizations it sees fit during the compilation of code from MSIL to assembly instructions. JIT maintains GCInfo that informs the GC about the live roots in a method for any given line of statement in the method. GC uses this information to perform its mark phase using the live roots and considers other unmarked objects as garbage and proceeds to collect those in the sweep phase.
In the next post, we discuss how to extend the lifetime of an object.


Written by anand-gupta | Passionate Techie, Binge Watcher, Globe trotter
Published by HackerNoon on 2020/07/20