Java Application Memory Usage and Analysis

The Java Virtual Machine (JVM) runs standalone applications and many key enterprise applications like monolithic application servers, API Gateways and microservices. Understanding an tuning an application begins with understanding the technology running it. Here is a quick overview of the JVM Memory management

JVM Memory:

  • Stack and Heap form the memory used by a Java Application
  • The execution thread uses the Stack – it starts with the ‘main’ function and the functions it calls along with primitives they create and the references to objects in these functions
  • All the objects live in the Heap – the heap is bigger
  • Stack memory management (cleaning old unused stuff) is done using a Last In First Out (LIFO) strategy
  • Heap memory management is more complex since this is where objects are created and cleaning them up requires care
  • You use command line (CLI) arguments to control sizes and algorithms for managing the Java Memory

Java Memory Management:

  • Memory management is the process of cleaning up unused items in Stack and Heap
  • Not cleaning up will eventually halt the application as the fixed memory is used up and an out-of-memory or a stack-overflow exception occurs
  • The process of cleaning JVM memory is called “Garbage Collection” a.k.a “GC”
  • The Stack is managed using a simple LIFO strategy
  • The Heap is managed using one or more algorithms which can be specified by Command Line arguments
  • The Heap Collection Algorithms include Serial, New Par, Parallel Scavenge, CMS, Serial Old (MSC), Parallel Old and G1GC

I like to use the “Fridge” analogy – Think about how leftovers go into the fridge and how last weeks leftovers, fresh veggies and that weird stuff from long long ago gets cleaned out … what is your strategy? JVM Garbage Collection Algorithms follow a similar concept while working with a few more constraints (how do you clean the fridge while your roommate/partner/husband/wife is/are taking food out?)

 

Java Heap Memory:

  • The GC Algorithms divide the heap into age based partitions
  • The process of cleanup or collection uses age as a factor
  • The two main partitions are “Young or New Generation” and “Old or Tenured Generation” space
  • The “Young or New Generation” space is further divided into 2 Survivor partitions
  • Each survivor partition is further divided into multiple sections based on command line arguments
  • We can control the GC Algorithm performance based on our use-case and performance requirements through command line arguments that specify the size of the Young, Old and Survivor spaces.

For example, consider applications that do not have long lived objects – Stateless web applications, API Gateway products e.t.c … these need larger Young or New Generation space with a strategy to age objects slowly (long tenuring). These would use either the NewPar GC or CMS algorithm to do Garbage Collection – if there are more than 2 CPU cores available (i.e. extra threads for the collector) then the application can benefit more with the CMS algorithm

The Picture Below give a view of how the Heap section of the Java Application’s memory is divided. There are partitions based on the age of “Objects” and stuff that is very old and unused eventually gets cleaned up

JVM Heap.png

The Heap and Stack Memory can be viewed at runtime using various tools like Oracle’s JRockit Mission Control (now part of the JDK), we can also do very long term analysis of the memory and the garbage collection process using Garbage Collection (GC) logs and free tools to parse GC logs

One of the key resources to analyse in the JVM is the Memory usage and Garbage Collection. The process of cleaning un-used objects from the JVM memory is called “Garbage Collection (GC)”, details of how this works is provided here: Java Garbage Collectors
Tools:

Oracle JRockit Mission Control is now free with the JDK!

  • OSX : “/Library/Java/JavaVirtualMachines/{JDK}/Contents/Home/bin/”
  • Windows: “JDK_HOME/bin/”

GC Viewer tool can be downloaded from here

Memory Analysis Tools:

  • Runtime memory analysis with Oracle JRockit Mission Control (JMRC)
  • Garbage Collection (GC) logs and “GC Viewer Tool” analyser tool

Oracle JRockit Mission Control (JMRC)

  • Available for SPARC, Linux and Windows only
  • Download here: http://www.oracle.com/technetwork/java/javase/downloads/java-archive-downloads-jrockit-2192437.html
  • Usage Examples:
  • High Level CPU  and Heap usage, along with details on memory Fragmentation
  • Detailed view of Heap – Tenured, Young Eden, Young Survivor etc
  • The “Flight Recorder” tool helps start a recording session over a duration and is used for deep analysis at a later time
  • Thread Level Socket Read/Write details
  • GC Pause – Young and Tenured map
  • Detailed Thread analysis

 

Garbage Collection log analysis

  1. Problem: Often it is not possible to run a profiler at runtime because
    1. Running on the same host uses up resources
    2. Running on a remote resource produces JMX connectivity issues due to firewall etc
  2. Solution:
    1. Enable GC Logging using the following JVM command line arguments
      -XX:+PrintGCDetails
      -XX:+PrintGCDateStamps
      -XX:+PrintTenuringDistribution
      -Xloggc:%GC_LOG_HOME%/logs/gc.log
    2. Ensure the GC_LOG_HOME is not on a Remote host (i.e. no overhead in writing GC logs
    3. Analyse logs using a tool
      1. Example: GC Viewer https://github.com/chewiebug/GCViewer
  1. Using the GC Viewer Tool to analyse GC Logs
    1. Download the tool
      1. https://github.com/chewiebug/GCViewer
    2. Import a GC log file
    3. View Summary
      1. Gives a summary of the Garbage Collection
      2. Total number of concurrent collections, minor and major collections
      3. Usage Scenario:
        ” Your application runs fine but occasionally you are seeing issues with slowed performance … it could be possible that there is a Serial Garbage Collector running which stops all processing (threads) and cleans memory.  Use the summary view GC Viewer to look for Long pauses in the Full GC Pause”
    4. View Detailed Heap Info
      1. Gives a view of the growing heap sizes in Young and Old generation over time (horizontal axis)
      2. Moving the slide bar moves time along and the “zig-zag” patters rise and fall with rising memory usage and clearing
      3. Vertical lines show instances when “collections” of the “garbage” take place
        1. Too many close lines indicate frequent collection (and interruption to the application if collections are not parallel) – bad, this means frequent minor collections
        2. Tall and Thick lines indicate longer collection (usually Full GC) – bad, this means longer full GC pauses

 

API Performance Testing with Gatling

The Problem

“Our API Gateway is falling over and we expect a 6-fold increase in our client base and a 10-fold increase in requests, our backend service is scaling and performing well. Please help us understand and fix the API Gateway” 

Tasks

It was pretty clear we had to run a series of performance tests simulating the current and new user load, then apply fixes to the product (API Gateway), rinse and repeat until we have met the client’s objectives.

So the key tasks were:

  1. Do some Baseline tests to compare the API Gateway Proxy to the Backend API
  2. Use tools to monitor the API Gateway under load
  3. Tune the API Gateway to Scale under Increasing Load
  4. Repeat until finely tuned

My experience with API performance testing was limited (Java Clients) so I reached out to my colleagues and got a leg up from our co-founder (smart techie) who created a simple “Maven-Gatling-Scala Project” in Eclipse to enable me to run these tests.

Gatling + Eclipse + Maven

  • What is Gatling? Read more here: http://gatling.io/docs/2.1.7/quickstart.html
  • Creating an Eclipse Project Structure
    • Create a Maven Project
    • Create a Scala Test under /src/test/scala
    • Create galting.conf , recorder.conf and a logback.xml under /src/test/resources
    • Begin writing your Performance Test Scenario
  • How do we create Performance Test Scenarios?
    • Engine.scala – Must be a Scala App extension
    • Recorder.scala – Must be a Scala App
    • Test – Must extend Gatling Simulation, we created an AbstractClass and extended it in our test cases so we can reuse some common variables
    • gatling.conf – This file contains important configuration which determine where the test results go, how to make the http calls etc See more details here http://gatling.io/docs/2.0.0-RC2/general/configuration.html

Here is a screenshot of my workspace

Screen Shot 2015-10-21 at 11.31.07 am

Executing the Gatling Scala Test

  • Use Maven Gatling Plugin See https://github.com/gatling/gatling-maven-plugin-demo
  • Make sure your pom file has the gatling-maven-plugin artifact as shown in the screen shot below
  • Use the option “-DSimulationClass” to specify the Test you want to run (For Example: -DSimulationClass=MyAPIOneTest)
  • Use the option “-Dgatling.core.outputDirectoryBaseName” to output your reports to this folder

Screen Shot 2015-10-21 at 11.53.50 am


Creating A Performance Test Scenario

Our starting point to do any sort of technical analysis of the API gateway was to study what it did over a varying load and build a baseline. Also this had to be done by invoking a few of APIs during the load to simulate varying requests per second (For example: One api is invoked every 5 seconds while another is done every 10 seconds).

After reading the documentation for Gatling and trying out a couple of simple tests, we were able to replicate our requirements of ‘n’ users by doing 2 scenarios executing http invocations to two different API calls with different ‘pause’ times calculated from the ‘once every x second’ requirement. Each scenario took ‘n/2’ users to simulate the ‘agents’ / ‘connections’ making different calls and each scenario was run in parallel.

  • We used a simple spreadsheet to put down our test ranges, write down what we will vary, what we will keep constant and our expectations for the total number of Requests during the Test duration and expectations for the Requests Per Second we expect Gatling to achieve per test
  • We used the first few rows on the spreadsheet to do a dry run and then matched the Gatling Report results to our expectations in the spreadsheet and confirmed we were on the right track – that is, as we increased the user base our number for RPS and how the APIs were invoked matched our expectation
  • We then coded the “mvn gatling:execute” calls into a run script which would take in User count, Test Duration and Hold times as arguments and also coded a Test Suite Run Script to run the first script using values from the Spreadsheet Rows
  • We assimilated  the Gatling Reports into a Reports folder and served it via a simple HTTP Server
  • We did extensive analysis of the Baseline Results, monitored our target product using tools (VisualVM, Oracle JRockit), did various tuning (that is another blog post) and re-ran the tests until we were able to see the product scale better under increasing load

Performance Test Setup

The setup has 3 AWS instances created to replicate the Backend API and the API Gateway and host the Performance tester (Gatling code). We also use Nodejs to server the Gatling reports though a static folder mapping, that is during each run the Gatling reports are generated onto a ‘reports’ folder mapped under the ‘public’ folder of a simple http server written in Node/Express framework.

Perf Test Setup

API Performance Testing Toolkit

The Scala code for the Gatling tests is as follows – notice “testScenario” and “testScenario2” pause for 3.40 seconds and 6.80 seconds and the Gatling Setup has these two scenarios in parallel. See http://gatling.io/docs/2.0.0/general/simulation_setup.html

Screen Shot 2015-10-20 at 1.56.45 pm

Results

  • 2 user tests reveal we coded for the scenario right and calls are made to the different APIs during differing time periods as shown below

Screen Shot 2015-10-20 at 1.56.22 pm Screen Shot 2015-10-20 at 2.26.50 pm

  • As we build our scenarios, we watch the users ramp up as expected and performance of the backend and gateway degrade as expected with more and more users
    • Baseline testing

 Screen Shot 2015-10-21 at 12.12.08 pm

    • Before tuning

Screen Shot 2015-10-20 at 2.00.08 pm

  • Finally we use this information to Tune the components (change Garbage collector, add a load balancer etc) and retest until we meet and exceed the desired results
    • After tuning

Screen Shot 2015-10-20 at 1.59.17 pm

Conclusion

  • Scala/Gatling

High Performance Computing Comes to the Enterprise – Oracle’s Exalogic

Oracle’s Exalogic….

is a hardware platform that outperforms competition with features like 40 Gb/sec Infiniband network link,  30 x86 compute nodes,  360 Xeon cores (2.93 GHz), 2.8 TB DRAM and 960 GB SSD  in a full rack. Phew!

Ref: Oracle’s Whitepaper on Exalogic

You can “google” it … search for “Oracle Exalogic” and learn more about the beast, but in short this is a platform that is not only optimized to perform well, but also designed to use less resources. So for example, the power consumption is really low and this is a very green option. Or so says the “cool-ade label”.

Application Architects have always fret over network latency, I/O bottlenecks and general hardware issues over the years. While the classical “Computer Science” recommends/insists that the optimization lies in the application and algorithmic efficiency – the reality in enterprise environments is that “Information Processing” applications are often (lets assume) optimized but it is hardware issues that cause more problems. Sure there is no replacement to SQL tuning, code instrumentation etc but if you are an enterprise invested in a lot of COTS applications – you just want the damn thing to run! Often the “damn thing” does want to run but then it has limited resources and “scaling” of these resources is not optimized.

This is specially true for 3-tier applications which despite being optimized (No “select * queries” or bad sort loops) have to run on hardware that perform great in isolation but when clustered they do not scale as well as High Performance Computing applications do. Why is that?

The problem lies…

in the common protocols used to move data around. Ethernet and TCP/IP over it has been the standard to make computers and applications in them “talk”. Lets just say that this hardware and protocol can be optimized quite a bit! Well that’s what has happened with Exalogic and Exadata.

Thanks to some fresh thinking on Oracle’s part, their acquisition of Sun Microsystems, improvements in Java language (some of the New IO stuff) and high performance network switches from Infiniband… there is a new hardware platform in town which is juiced out (can I say “pimped out”) to perform!

My joy stems from the fact that Oracle is using optimizations employed in High Performance computing to enterprise hardware. The use of collective communication APIs like Scatter/Gather to improve application I/O throughput and latency (Fact: 10GBPS Ethernet has a MTU of 1.5K – while Infiniband uses a 64K MTU for Inifiniband over IP protocol and 32K MTU or more for Socket Direct Protocol).


Personally all this ties very well with my background in High Performance Computing (see my Master’s in Computer Science report on High Performance Unified Parallel C (UPC) Collectives For Linux/Myrinet Platforms   done at Michigan Tech with Dr. Steve Seidel) …and my experience in Enterprise Application development/architecture.


…here’s my description of Scatter and Gather collective communication written in 2004:

Broadcast using memory pinning and remote DMA access in linux (basically network card can access user space directly and do gets and puts)