paint-brush
Getting Rid of Garbage in Javaby@azamatnurkhojayev
585 reads
585 reads

Getting Rid of Garbage in Java

by Azamat NurkhojayevSeptember 12th, 2022
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Garbage collection is the process of reclaiming full runtime memory by destroying unused objects. When an object is not referenced, it is assumed to be dead and no longer needed. Garbage collectors in Java implement a generational garbage collection strategy that classifies objects by age.
featured image - Getting Rid of Garbage in Java
Azamat Nurkhojayev HackerNoon profile picture

What is garbage collection, why is it needed, and how does it work?

Garbage collection  is the process of reclaiming full runtime memory by destroying unused objects. Every application needs memory to run. However, computer memory is limited. Therefore, it is important to clear it of old unused data to make room for new ones.


The main purpose of garbage collection is to free heap memory by destroying objects that do not contain a reference. When an object is not referenced, it is assumed to be dead and no longer needed. Thus, the memory occupied by the object can be reclaimed.

Java memory structure

Native Memory - all available system memory.


Heap - the portion of native memory allocated to the heap. This is where the JVM stores objects. This is the common space for all application threads. The size of this memory area is configurable using the -Xms (minimum size) and -Xmx (maximum size) options.


Stack - used to store local variables and method call stack. Each thread has its stack.


Metaspace - this memory stores class metadata and static variables. This space is also shared by everyone. Since metaspace is part of native memory, its size depends on the platform. The upper limit on the amount of memory used for metaspace can be configured using the MaxMetaspaceSize flag.


PermGen (Permanent Generation) - was present until Java 7. Starting with Java 8, it was replaced by the Metaspace area.


CodeCache - the JIT compiler compiles frequently executed code, converts it to native machine code, and caches it for faster execution. This is also part of native memory.

Garbage collection

Garbage collection in Java is an automatic process. The programmer does not need to select objects and delete them.


Garbage collection uses the Mark & Sweep algorithm. This algorithm consists of three stages:


  1. Mark. In the first step, the GC scans all objects and marks the live ones (objects that are still in use). At this stage, the program execution is suspended. This step is also called "Stop the World".


  2. Sweep. At this step, the memory is occupied by objects not seen in the steps.


  3. Compact. Objects that survive the cleanup are moved to a single contiguous block of memory. This reduces heap fragmentation and makes it easier and faster to allocate new objects.

Object Generations

What is object generation?

Garbage collectors in Java implement a generational garbage collection strategy that classifies objects by age.


To optimize garbage collection, heap memory is further divided into four areas. Objects are placed in these areas based on their age (how long they have been used in the application).


  1. Young Generation. This is where new objects are created. The young generation area is divided into three sections: Eden, S0, and S1 (Survivor Space).


  2. Old Generation. There are long-lived objects here.

What is Stop the World?

When the mark stage starts, the application stops running. After the mark completes, the application resumes its work. Any garbage collection is "Stop the World".

What is the generational hypothesis?

As mentioned earlier, generations are used to optimize the mark and sweep stages. The generational hypothesis says the following:


  1. Most objects don't last long.
  2. If the object survives, then it will most likely live forever.
  3. The marking and sweeping steps take less time with a lot of debris. That is, marking will be faster if you analyze small and numerous dead objects.

Thus, the generation-based garbage collection algorithm looks like this:


  1. New objects are created in the Eden area. Survivor areas (S0, S1) are currently empty.
  2. When the Eden area fills up, a Minor GC occurs. Minor GC is a process in which mark and sweep operations are performed on the young generation.
  3. After Minor GC, live objects are moved to one of Survivor's areas (for example, S0). Dead objects are completely removed.
  4. As the application runs, the Eden space is filled with new objects. At the next Minor GC, the young generation and S0 areas are cleared. This time the surviving objects are moved to area S1 and their age is incremented (marking that they survived the garbage collection).
  5. At the next Minor GC, the process is repeated. However, this time around, Survivor's areas are reversed. Living objects move to S0 and their age increases. The Eden and S1 areas are cleared.
  6. Objects between Survivor regions are copied a certain number of times (until they survive a certain number of Minor GCs) or as long as there is enough space. These objects are then copied to the Old area.
  7. Major GC. With Major GC, the mark and sweep steps are performed for the Old Generation. Major GC is slower compared to Minor GC because the old generation is mostly live objects.

Benefits of using generations

Minor GC happens on a smaller part of the heap (~2/3 of the heap). The marking step is effective because the area is small and consists mostly of dead objects.

Disadvantages of using generations

At any given time, one of the Survivor spaces (S0 or S1) is empty and unused.

Garbage Collector Types

Serial

Uses one thread.


Advantages

Effective, because there is no overhead for interaction between threads.


When to use

Single processor machines. Working with small datasets.


Flags to Enable

-XX:+UseSerialGC

Parallel

Uses multiple threads.


Advantages

Multithreading speeds up garbage collection.


When to use

Peak performance is a priority. GC pauses of one second or more are acceptable. Working with medium and large data sets. For applications running on multiprocessor or multi-threaded hardware.


Flags to Enable

-XX:+UseParallelGC

CMS

Known as a low-pause proportional collector.


Advantages

Minimize downtime, which is the lot of many applications. But to accomplish this task, you have to sacrifice CPU load and often the overall throughput task.


When to use

The collector may be suitable for applications that use a large amount of long-lived data.


Flags to Enable

-XX:+UseConcMarkSweepGC

G1(Garbage First)

The hard work of a parallel worker application is in progress.


Advantages

It can be used both on small systems and on large ones with a large number of processors and a large amount of memory.


When to use

When bandwidth skips response time GC Pauses must be less than one second.


Flags to Enable

-XX:+UseG1GC

ZGC

All the hard work of parallel applications is done.


Advantages

Low latency.


When to use

Response time is prioritized.


Flags to Enable

-XX:+UseZGC