Thursday 9 October 2014

Garbage collection in Java

Background

Be it any programming language memory management forms a very important aspect. In languages like C++ you need to write the code (destructors) to cleanup the memory used by your application. But in Java the JVM handles the memory cleanup. Programmers don't have to worry about it. In this post we will see how garbage collection is done in Java.


Basics

  1. Heap is the memory space in JVM where Objects are created. Consider following code statement -

    Animal animal = new Dog(); 

    Lets break it down into two statements - declaration and initialization.

    First we declare a reference to Object of type Animal i.e Animal animal;  This simply creates a reference on Stack . This reference can point to an actual Object of type Animal or any of it's subclass (by polymorphism).

    Then we initialize it i.e animal = new Dog();  . So now the reference on the Stack is pointing to an actual object of type Dog that lies in the Heap.

    So to conclude this point Objects are created on heap, references are on Stack and references point to the Objects in Heap.

  2. Heap space is used for dynamic allocation of memory. Meaning when instance of JVM is created a fixed heap space is initialized. As and when Objects are created space from this heap is utilized and similarly freed when Objects are garbage collected.

  3. If ever the Heap gets filled up and there is no more space left JVM will throw java.lang.OutOfMemoryError error and simply shut down. It's what we call a crash.

Garbage Collection

JVM keeps track of live Objects and discards the one those are not. How does JVM figure out and keep track for live Objects. There is various algorithms that JVM use. We will get to it. On a higher level you can think as JVM will remove all the objects that are no longer referenced or reachable by your application code. As we understand Object not reachable by the application code will be garbage collected how do we define what comprises our current application code so that we may figure out set of Objects that are reachable or unreachable.


There are special objects called GC roots that are always reachable by the application code. All objects that can be reached via these roots are alive. Rest can be garbage collected. A simple java application has following GC roots - 

  1. Local variables [kept live from Stack]
  2. Live threads
  3. Static variables



Eligibility for garbage Collection

In last section we said all Objects that are not reachable from GC root are potential candidates for garbage collection. Some of the general cases in which Object will be eligible for GC -

  1. When you explicitly set it's reference to null. If you recollect the first point from basics section setting animal = null; will make the Dog object eligible for GC.
  2. When an Object is created in a method or a block , when program context goes out of that scope (technically speaking that Objects reference went out of the Stack) that object will be eligible for GC.
  3. If an Object is eligible for GC all objects that the parent Object have reference to will be eligible for GC . Unless of course when the child Objects are reference via some other GC root.
NOTE : GC thread is a daemon thread which is run by JVM based on GC algorithm. It may run in parallel to other live application threads or may led to Stop the world event where all application threads are suspended and GC happens (typically happens during full or Major GC when old generation area is full)

How are Cyclic references handled?

So lets says you have Object A that has reference to object B and B has reference to A. So basically both have life references to each other. Will they be GCed? That depends. If any of the Object os reachable from the GC roots they will not be eligible for GC but if they are not reachable both will be eligible for GC. 

In short ,
Cyclic dependencies are not counted as reference so if Object A has reference of Object B and Object B has reference of Object A and they don't have any other live reference then both Objects A and B will be eligible for Garbage collection.

[Take a look Non reachable Objects in the above picture]

Mark-and-Sweep Algorithm

To determine which objects are no longer in use, the JVM intermittently runs what is very aptly called a mark-and-sweep algorithm. It works as follows
  1. The algorithm traverses all object references, starting with the GC roots, and marks every object found as alive.
  2. All of the heap memory that is not occupied by marked objects is reclaimed. It is simply marked as free, essentially swept free of unused objects.
So if any object is not reachable from the GC roots(even if it is self-referenced or cyclic-referenced) it will be subjected to garbage collection.
Ofcourse sometimes this may led to memory leak if programmer forgets to dereference an object.



How Garbage Collection works in the Heap?

To understand this section first you need to understand Java memory model. Heap space is mainly divided into 3 sections - 

  1. Young generation
  2. Tenure or Old generation
  3. Permanent generation

Young generation is further subdivided into -
  1. Eden space
  2. Survivor1 (S1)
  3. Survivor2 (S2)



Here is what happens -

  1. When your Objects are created they are infact created in Young generation (Eden space). 
  2. When Objects are directly garbage collected from Eden space it's termed as Minor GC (does not affect your Java process) .Though this is also a Stop the World event it has very less impact assuming  assuming a high infant mortality rate. Which means most of the newly created objects have very short lifespan and become unreachable early. So very less objects needs to be moved to one of the survivor spaces or old generation.
  3. Also note at a single point of time only one of the survivor space is occupied (other is empty). 
  4. So on each minor GC objects with no live reference from Eden and one of the Survivor spaces are removed. Surviving ones are moved to the empty survivor space and the source survivor space is freed.
  5. Finally when Objects have survived multiple minor GC cycles they will be moved to Tenure or Old generation generation. This will typically be based on age or number of cycles objects are alive in young generation.
  6. When objects in Old generation are subjected to garbage collection we call it Major GC. This is often much slower because it involves all live objects.

Note that all GC (major and minor) are Stop the World Events meaning all currently running java threads will stop running GC will be performed. Since minor GC deals with short lived Objects it is faster and does not affect the process. It is the Major or full GC that affects the process performance. Each programmer should try to minimize number of occurrences of full GC.


Also note permanent generation space is where all the program meta data go - classes , static variables, String pool etc. Objects from perm gen area are garbage collected in full GC.


JVM arguments for controlling Heap Size



Detailed arguments can be checked from oracle website : Java HotSpot VM Options.

Important points

  1. Heap space is allocated when JVM instance is created. Objects are allocated and de -allocated space dynamically. 
  2. Heap is divided into - Young, old and permanent generation.
  3. Objects are created in Eden space of Young gen and subsequently moved to survivor spaces and then old generation.
  4. Permanent generation is space where you store your class metadata, static variables, String pool etc.
  5. We must always aim to reduce frequency of full or major GCs as they affect applications performance.
  6. The young generation consists of eden plus two survivor spaces . Objects are initially allocated in eden. One survivor space is empty at any time, and serves as a destination of the next, copying collection of any live objects in eden and the other survivor space. Objects are copied between survivor spaces in this way until they are old enough to be tenured, or copied to the tenured generation.
  7. There is no way to force garbage collections. But then there are some methods like System.gc () and Runtime.gc (). However these methods simply request JVM to perform GC. JVM may choose to ignore.
  8. Before Object is garbage collected it's finalize ()  method is called (You can see this method in Object class). if you want to perform any cleanup of your own you need to override this method and add your logic

References

t> UA-39527780-1 back to top