Java performance extract

Benchmarks:

  • micro
  • meso
  • macro

Tools

  • non Java: CPU, disk, network
  • Java tools: flags, heap, GC

Profilers: sampling, instrumentation, native

JIT: client (default threshold is 1500), server (default threshold is 10000), tiered compilation

Tuning JIT:

  • code cache
  • compilation thresholds – how many times code will be interpreted before gets compiled
  • print compilation process logs
  • compilation thread (amount can be adjusted)
  • inlining (limits of code for inline – default 325 bytes)
  • escape analysis mode, very efficient, but will break improperly synchronized code
  • de-optimisation
  • tiered compilation levels

GC: serial, throughput (parallel), concurrent (CMS), G1

GC generation: new (eden, survived), old

All GC do stop-the-world pause while checking the eden, for not eden, CMS and G1 may do (lower CPU consumption) or with not stop-the-world pause (high CPU consumption)

Serial GC (x32, single core machine or Windows) – for client:

  • single threaded
  • stop-the-world for new or old generation processing

Throughput (Unix, multi-core, x62):

  • multi-threaded
  • stop-the-world for new or old generation processing

CMS:

– multiple threads for new generation

  • for old generation, one thread scans object to free in background with no stop, but old generation remains fragmented, stop-the-world still happens, but quite rare, to defragment the old generation heap, usually it happens when there is no space to allocate for new object

G1:

  • for large heaps (more than 4GB), marker heap with a region
  • System.gc() does stop-the-wrold for all types of GC and do full scan

Tuning GC:

  • sizing heap (small – too often GC works, big – OS swapping of RAM and drive)
  • sizing generations
  • Permgen/MetaSpace – keeps information about loaded classes – is expensive operation for resizing, it is better to define at startup
  • Controlling amount of GC threads
  • adaptive sizing (should be turned-on)
  • large object

Tuning threads:

  • pool size
  • thread stack size
  • avoid synchronization
  • thread priorities
  • adjusting spinning

Persistence

  • choose right driver (try different)
  • prepared statement and statement pooling
  • connection pools
  • transaction pools
  • cached queries

Other optimisations:

  • reuse Random
  • JNI is not solution for performance
  • Exceptions are not always an issue
  • One line string concatenation is faster then multiline
  • Lambdas and anonymous classes has the same performance, but lambdas loaded faster
Advertisements

About DmitryKrinitsyn
Software developer and muay thai adept

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: