Sunday, October 5, 2014

Websphere Garbage Collection


The mark and sweep algorithm is suitable for application throughput.  The application will pause each time the GC is running.  Generational GC is good for application that creates large number of objects, uses them and destory them within a short interval.  The young objects are kept in the nursery.  A minor GC takes place regularly.  Older objects are migrated to the old generation space which a mark and sweep GC will be performed.  This method improves performance and reduce fragmentation.

The mark and sweep method will need to acquire exclusive access to JVM which means all thread activities are stopped (STW = stop the world).

In the mark phase, all live objects are marked.  All unreachable objects are considered garbaged.  The process of markin all reachable objects is called tracing.  Tracing starts off from stacks, static objects, local and global JNI references.

Parallel mark uses N-1 helper thread to trace in parallel.  N equals to the number of processor.  One application thread is used as the master coordinating agent.  Parallel marking is turned on by default and controlled by Xgcthreads parameter.  To turn off, set Xgcthreads = 1.

Concurrent mark performs the tracing concurrently with the application activities.  It ask each of the application thread to scan its stack.  Tracing is done by a low priority background thread and the application thread when it does a heap lock allocation (i.e. allocation that need to acquire an exclusive log to the heap to serialize access).  Concurrent mark reduce the GC pause and make the pause time more consistent by spreading the tracing to run concurrently with other application activities.  As the application needs to perform some tracing, it will run slightly longer and throughput will be impacted slightly.  Concurrent mark is controlled by the xgcpolciy parameter. "optthruput" disables it and "optavgpause" enables it.

When mark phase completes, the mark bit vector identifies the location of all live objects in the heap.  One bit in the mark bit vector represents 8 bytes in the heap.  To avoid filling the free pool with many small size object, only chunk with 512 bytes or more will be reclaimed.  Minimize chunk size for 64-bit platform is 768 bytes.  The chunks that are not reclaimed are called "dark matter" and they will be recovered with the adjacent object blocks when the time comes.

Parallel bitwise sweep speeds up the sweep using the same set of helper threads used for sweep.  Each helper threads will sweep an area of 256KB.

Concurrent sweep likes concurrent mark, reduces average pause time.  It shares the same mark map with the concurrent map and so these 2 activities are exclusive

No comments: