Skip to content

Benchmarks

Charles Oliver Nutter edited this page Jun 22, 2023 · 3 revisions

Because JRuby runs on the JVM and has various settings for different optimizations, benchmarking it requires a bit more care than benchmarking non-optimizing implementations of Ruby. This document tries to describe the basics of getting good benchmark numbers with JRuby.

Benchmarking with JRuby

Quick recommendations

  • Always try to use the most recent version of the JVM. Newer versions generally perform better and have newer optimizations.
  • JRuby should be benchmarked with invokedynamic enabled, either by passing -Xcompile.invokedynamic to JRuby (or via JRUBY_OPTS env) or passing -Djruby.compile.invokedynamic to the JVM.
  • Run sufficient iterations for the application to warm up and results to level off.
  • Try other GC modes on the JVM, such as using the parallel collector with -J-XX:+UseParallelGC passed to the JVM (or prefixed with -J passing to JRuby). The Parallel GC currently provides the best throughput on JRuby.

Ensure your Ruby code is compiling

You can also monitor the JRuby JIT to ensure it is compiling code by passing -Xjit.logging to JRuby. Methods that do not JIT compile will perform significantly worse. Such compilation failures should be reported to the JRuby team.

Other guides on the web can provide additional recommendations for tuning and benchmarking JVM-based applications.

Monitor GC overhead

The JVM has many mechanisms for monitoring the garbage collector. The simplest of these – for HotSpot-based JVMs like the OpenJDK builds maintained by Oracle, Red Hat, Amazon, Twitter, Microsoft, and Azul – is to pass -XX:+PrintGCDetails to the JVM (prefixed with -J if passed to JRuby.

Excessive GC may indicate a problem or missing optimization in JRuby, or it may indicate an area of excess allocation in your application.

Note also that the JVM will try to use a large amount of memory to give its GC room to work, so a direct comparison of JRuby's memory footprint without GC tuning will be misleading. You can use JVM flags like -Xmx<size> to set a smaller maximum heap, but smaller heaps may spend more time in GC.

Avoid Heavy IO

It is also important to avoid heavy IO (reading/writing to files, sockets, or console) unless you're actually benchmarking IO or it's necessary for the code you are benchmarking. IO skews execution performance tremendously and can produce results that vary based on many system-level factors. IO is also sometimes slower in JRuby, so benchmarks of execution performance will be inaccurate when heavy IO is involved.

Watch for excess exception throwing and thread-based fibers

Excessive throwing of exceptions will greatly reduce performance on JRuby, due to the high cost of generating a stack trace on the JVM. You can monitor stack trace generation for exceptions and Kernel#caller by passing -Xlog.backtraces to JRuby.

Fibers can also hinder performance on pre-Loom JVMs due to JRuby's use of native threads. Excessive fiber creation is usually the problem when a process has hundreds of idle threads, and thread-based fibers are slower to start up and context-switch than fibers based on Loom's virtual threads. Virtual threads will be used automatically on a Loom-based JVM with virtual threading enabled (such as via the --enable-preview JVM flag).

Utilize multiple cores

JRuby is capable of utilizing all CPU cores in your system thanks to the JVM's excellent support for threads. For benchmarks where CRuby would use multiple processes, try to utilize multiple threads in the same JRuby/JVM instance.

Report results and issues

The JRuby team is standing by to help diagnose performance problems! If your application does not perform as well on JRuby as you expect, let the team know with a message or issue and we will help you find the problem.

Clone this wiki locally