Skip to content

BumbleBench and HumbleBench

Shelley Lambert edited this page Jul 26, 2018 · 2 revisions

BumbleBench is a microbenchmark tool intended to make it as easy as possible to avoid common pitfalls when microbenchmarking Java. It is intended to make sure that test runs spend most of their time running the desired piece of code compiled at the highest possible level of quality. This is surprisingly tricky in an environment with dynamic compilation employing aggressive speculative optimizations.

The name "BumbleBench" derives from the manner in which the tool varies the iteration count of the benchmark's main loop in order to determine the highest iteration count that can be completed within a given target duration. The target score vacillates around the estimated maximum achievable score, alternating between low and high target scores in an attempt to converge on the actual achievable score, while remaining sensitive to variations in performance that can occur due to effects like jit compilation occurring during the run.

Table of Contents

Quick start

The 30-second introduction

What: BumbleBench is a JAR file containing two classes (MicroBench and MiniBench) that you can extend to implement your microbenchmark by writing its inner loop. BumbleBench automatically adjusts the iteration count of your loop to measure how many iterations can complete within a given target duration.

When: BumbleBench is appropriate any time you have a workload you would like to run many times repeatedly in order to gauge its speed.

Where: BumbleBench is on GitHub.

Why: The priority for BumbleBench is ease of use. The intent is that you can read this Quick Start guide in five minutes, and then write a microbenchmark, and it will behave the way you want.

How to write tests: Write a benchmark class that extends MicroBench or MiniBench, add it to the jar file, and run java -jar BumbleBench.jar [Benchmark]. Additional options can be set using the java -D option, or by providing a .properties file alongside your benchmark class.

How to build your jar: you can build a jar from source in RTC client. Use it's build.xml to run an ant build.

Who: Message #testing slack channel if you need help.

Running a benchmark

You can list the available benchmarks with this command:

 java -jar BumbleBench.jar

Then you can choose one of the benchmarks, and run it with this command:

 java -jar BumbleBench.jar [Benchmark name]

BumbleBench options are set using -DBumbleBench.xxx=yyy. To see a list of all available options for your benchmark, set the listOptions option:

 java -DBumbleBench.listOptions -jar BumbleBench.jar [Benchmark name]

(For boolean options, the =yyy part can be omitted.)

Profiling

BumbleBench has a facility to finish its run with a single extra-long batch, during which you can run a profiling tool like tprof. To enable this feature, use -DBumbleBench.longBatchSeconds=''nnn''. This will cause BumbleBench to wait for you to hit Enter, and then to perform one batch of the specified duration:

 -= BumbleBench series 2 version 3.2 running net.adoptopenjdk.bumblebench.examples.TrigBench  Mon Sep 29 11:22:26 EDT 2014 =-
 
               Target    Est     Uncert% MaxPeak Peak    Peak%   %paused
     0.0s:  >! 110       120.0    24.0   110     110     470.0
     0.0s:  >! 134.4     148.8    28.8   134.4   134.4   490.1
     0.0s:  >! 170.2     191.7    34.6   170.2   170.2   513.7
   TrigBench score: 1.1853785E7 (11.85M)
       uncertainty:   0.8%
 
    -- LONG BATCH --
 Press <Enter> to begin a 30-second batch...
 
 Running for 30 seconds...
 ...done.

Multi-threaded runs

BumbleBench offers a way to run multiple copies of the same benchmark in multiple threads.

The basic technique uses -DBumbleBench.parallelInstances=''N''. This causes BumbleBench to create N instances of your benchmark class, and then run them in N threads. Each instance of your benchmark is controlled using a pair of BlockingQueues to communicate target and result scores for each batch to an instance of ParallelBench, which manages the threads and aggregates the results.

There are three different settings for the aggregationStyle option that controls how the benchmark score is computed: AVERAGE: The score is the arithmetic mean all scores achieved by the threads. This is the default. If the benchmark scales perfectly, then you should get the same score with parallelInstances as without it.

SUM: The score is the total of all scores achieved by the threads.

MIN: The score is the lowest score achieved by any of the benchmark threads. This is effectively the score achieved by the longest-running thread.

Class-per-instance runs

If your benchmark class has mutable static fields, you may still be able to run parallel instances. If you specify the option flag classPerInstance, BumbleBench will load and initialize your benchmark class N times, each with its own separate copy of the static fields.

This mode is not recommended because it can cause the jit to do weird things like compile the exact same code repeatedly for each instance. It's meant as a workaround to get easy parallel runs of a benchmark that was not designed with parallelism in mind. If you care about parallel performance measurement, it's better to write the benchmark to use instance variables instead of statics.

Note that only the main benchmark class is loaded multiple times. If your benchmark uses other global data, it probably needs to be modified in order to run in parallel.

Writing a benchmark

The basic MicroBench looks like this:

    protected long doBatch(long numIterations) throws InterruptedException {
       for (long i = 0; i < numIterations; i++) {
         '''''// WORKLOAD GOES HERE'''''
       }
       return numIterations;
    }
 }

The basic MiniBench is a little more complex, but a surprising number of benchmarks make good use of its nested loop structure:

    protected int maxIterationsPerLoop(){ return 1234567; } // Max allowable value of numIterationsPerLoop
 
    protected long doBatch(long numLoops, int numIterationsPerLoop) throws InterruptedException {
       '''''// YOU CAN ALLOCATE SOME RESOURCES OR DO OTHER SETUP HERE ...'''''
       for (long i = 0; i < numLoops; i++) {
          '''''// ... AND HERE.'''''
          for (int j = 0; j < numIterationsPerLoop; j++) {
             startTimer();
             '''''// WORKLOAD GOES HERE - preferably just a call to a method that does the actual work'''''
             pauseTimer();
          }
       }
       return numLoops * numIterationsPerLoop;
    }
 }

Briefly:

  • For a MicroBench:
    • Implement long doBatch(long numIterations) to do the requested number of iterations and return the number of iterations performed.
    • By default, the timer is always running. If you have portions of your workload you don't want timed, you'll need to call pauseTimer() and startTimer() around those portions.
  • For a MiniBench:
    • Implement long doBatch(long numLoops, int numIterationsPerLoop) to do the logic for one batch and return the number of iterations performed.
    • Implement int maxIterationsPerLoop() to indicate the limit on the numIterationsPerLoop parameter for doBatch.
    • By default, the timer is not running when doBatch begins. Call startTimer() and pauseTimer() around the portions of the batch logic that you want to measure.

Random hints

  • Make your benchmark class final, and make as many as possible of its fields final
  • Implement your setup parameters as final static fields initialized using the option method
  • Try to set up your test so it scales properly, ie. the score should be largely invariant with respect to the setup parameters, including targetDuration and the number of iterations per batch.
  • Put the timed portion of a MiniBench in its own method. That makes it easier to study in isolation. The size of the inner loop should be arranged to make the call overhead insignificant, and anyway, it can be measured by disabling inlining.
  • Use locals instead of fields wherever convenient. You can even use a local to privatize a field before startTimer if you want.
  • Naming:
    • static final fields should use UPPERCASE_NAMES
    • other fields (static or instance fields) should begin with an underscore to distinguish them from locals, and otherwise use camelCase
  • Import classes, not whole packages

HumbleBench

HumbleBench is a subclass of MiniBench intended to facilitate benchmarking at low optimization levels by throttling the real workload. By default HumbleBench will try to run doBatch for a third of a percent of the time. That should be enough to compile at warm, but not at hot. However, it's important to confirm with a verbose log that methods have been compiled at the expected level.

    @Override protected final void setup(int numIterations) { // optional
       '''''// YOU CAN ALLOCATE SOME RESOURCES OR DO OTHER SETUP HERE'''''
    }
 
    public static class Workload extends AbstractWorkload {
       @Override public void doBatch(HumbleBench bench, int numIterations) {
          for (int i = 0; i < numIterations; i++) {
             '''''// WORKLOAD GOES HERE'''''
          }
       }
    }
 }

See net.adoptopenjdk.bumblebench.humble.SameStringsEqualsBench for an example.

HumbleBench has two main options:

  • loadFactor=N: Spend 1/N of the time in the workload method (default 300)
  • fanout=N: Load N copies of the workload to allow shorter batches (default 12)
Rough correspondence between optimization level and loadFactor :
warm loadFactor=300
hot loadFactor=50
scorching loadFactor=1

If a workload makes calls that are not inlined, fanout will be ineffective at reducing time spent in the callee. For this reason, it must not be set too high in workloads in which inlining is expected at warm, or else the callee's high fan-in will prevent inlining.

There is a heuristic for setting the default batchTargetDuration to avoid overlong batches. If overriding this default, note that targetIncludesPauses defaults to false under HumbleBench.

Overview

The Motivation

Java microbenchmarks are notoriously hard to write, and can easily end up measuring the wrong thing. Bumblebench is maintained by people experienced in Java performance tuning, and is designed to make microbenchmarking less error prone.

The goal is to allow people to write microbenchmarks like they do in a static language, with one or two little loops that just run the desired workload. In Java that typically does not work well, but BumbleBench is designed to make it work as well as possible.

The Gist

Bumblebench allows people to write microbenchmarks by implementing a single method called doBatch:

    protected long doBatch(long numIterations) throws InterruptedException {
       for (long i = 0; i < numIterations; i++) {
         '''''// WORKLOAD GOES HERE'''''
       }
       return numIterations;
    }

Bumblebench calls this method repeatedly with various values of numIterations, with the objective to make the doBatch call last for some specified target duration (default being one second). It alternates passing numIterations that it believes are first too low, and then too high, to finish within the target duration. These are called "lowball" and "highball" guesses. Whenever it's right, the "uncertainty" gets smaller, causing Bumblebench to attempt lowball and highball values closer and closer together. When it's wrong, the uncertainty increases. (Uncertainty can also increase if a lowball guess is very low, or a highball guess is very high.)

The output

The output of a run looks like this:

              Target    Est     Uncert% MaxPeak Peak    Peak%   %paused
    0.0s:  >! 110       110.0K   24.0   110     110     470.0
    0.0s:  >! 123.2K    3.850M   28.8   123.2K  123.2K  1172.2
    0.4s:  >! 4.404M    11.29M   34.6   4.404M  4.404M  1529.8
    1.6s:  <  13.24M    11.73M   20.7   4.404M  4.404M  1529.8
    2.5s:  >  10.52M    11.74M   12.4   10.52M  10.52M  1616.8
    3.5s:  <  12.47M    11.57M    7.5   10.52M  10.52M  1616.8
    4.5s:  >  11.14M    11.70M    4.5   11.14M  11.14M  1622.6
    5.5s:  <  11.97M    11.80M    2.7   11.14M  11.14M  1622.6
    6.5s:  >  11.64M    11.86M    1.6   11.64M  11.64M  1627.0
    7.5s:  <  11.95M    11.80M    1.0   11.64M  11.64M  1627.0
    8.5s:  >  11.74M    11.78M    0.6   11.74M  11.74M  1627.9
    9.5s:  <  11.81M    11.75M    0.3   11.74M  11.74M  1627.9
   10.5s:  >  11.73M    11.79M    0.2   11.74M  11.74M  1627.9
   -- ballpark --
   11.5s:  <  11.80M    11.70M    0.3   11.74M  11.74M  1627.9
   12.5s:  >  11.68M    11.77M    0.3   11.74M  11.74M  1627.9
   13.5s:  >! 11.78M    11.84M    0.4   11.78M  11.78M  1628.2
   14.5s:  >! 11.86M    11.89M    0.4   11.86M  11.86M  1628.9
   15.5s:  <  11.91M    11.81M    0.5   11.86M  11.86M  1628.9
   16.5s:  <! 11.78M    11.69M    0.6   11.86M  -∞      --
   17.5s:  >  11.66M    11.84M    0.7   11.86M  11.66M  1627.1
   18.6s:  <  11.88M    11.80M    0.4   11.86M  11.66M  1627.1
   19.6s:  <! 11.77M    11.45M    0.5   11.86M  11.66M  1627.1
   20.6s:  >  11.42M    11.76M    0.6   11.86M  11.66M  1627.1
   21.6s:  <  11.80M    11.79M    0.4   11.86M  11.66M  1627.1
   22.6s:  <! 11.76M    11.69M    0.5   11.86M  11.66M  1627.1
   23.6s:  >  11.67M    11.77M    0.6   11.86M  11.67M  1627.2
   24.6s:  <  11.80M    11.66M    0.7   11.86M  11.67M  1627.2
   25.6s:  <! 11.63M    11.59M    0.8   11.86M  -∞      --
   26.6s:  >  11.54M    11.59M    0.5   11.86M  11.54M  1626.2
   27.6s:  >! 11.62M    11.75M    0.6   11.86M  11.62M  1626.8
   28.6s:  >! 11.78M    11.78M    0.7   11.86M  11.78M  1628.2
   29.6s:  >! 11.82M    11.83M    0.8   11.86M  11.82M  1628.6
   30.6s:  <  11.88M    11.85M    0.5   11.86M  11.82M  1628.6
   31.6s:  <! 11.82M    11.81M    0.6   11.86M  -∞      --
   32.6s:  >  11.77M    11.84M    0.4   11.86M  11.77M  1628.1
   33.6s:  <  11.86M    11.78M    0.4   11.86M  11.77M  1628.1
   34.6s:  >  11.76M    11.79M    0.3   11.86M  11.77M  1628.1
   35.6s:  >! 11.81M    11.82M    0.3   11.86M  11.81M  1628.4
   36.6s:  <  11.84M    11.81M    0.2   11.86M  11.81M  1628.4
   37.6s:  >  11.80M    11.86M    0.2   11.86M  11.81M  1628.4
   38.6s:  <  11.87M    11.86M    0.1   11.86M  11.81M  1628.4
   39.6s:  >  11.86M    11.86M    0.1   11.86M  11.86M  1628.8
   40.6s:  >! 11.86M    11.91M    0.1   11.86M  11.86M  1628.9
   41.6s:  <  11.91M    11.77M    0.1   11.86M  11.86M  1628.9
   42.6s:  >  11.77M    11.81M    0.1   11.86M  11.86M  1628.9
   -- finale --
   43.6s:  >! 11.82M    11.83M    0.2   11.86M  11.86M  1628.9
   44.6s:  <  11.84M    11.83M    0.1   11.86M  -∞      --
   45.6s:  <! 11.82M    11.80M    0.1   11.86M  -∞      --
   46.6s:  <! 11.79M    11.69M    0.1   11.86M  -∞      --
   47.6s:  >  11.68M    11.88M    0.2   11.86M  11.68M  1627.3
   48.6s:  <  11.89M    11.77M    0.2   11.86M  11.68M  1627.3
   49.6s:  >  11.76M    11.77M    0.1   11.86M  11.76M  1628.0
   50.6s:  <  11.78M    11.65M    0.1   11.86M  11.76M  1628.0
   51.6s:  >  11.64M    11.70M    0.2   11.86M  11.76M  1628.0
   52.6s:  >! 11.71M    11.83M    0.2   11.86M  11.76M  1628.0
   53.6s:  <  11.84M    11.80M    0.3   11.86M  11.76M  1628.0
   54.6s:  >  11.78M    11.88M    0.3   11.86M  11.78M  1628.2
   55.6s:  <  11.90M    11.75M    0.4   11.86M  11.78M  1628.2
   56.6s:  <! 11.73M    11.67M    0.4   11.86M  -∞      --
   57.6s:  >  11.65M    11.78M    0.5   11.86M  11.65M  1627.1
 
  TrigBench score: 1.1859834E7 (11.86M 1628.9%)
      uncertainty:   0.5%

There's one line per batch (literally a call to doBatch). The columns are: Target: The number of iterations requested for this batch

Est: The actual estimated throughput achieved, based on the actual number of iterations performed and the actual duration

Uncert%: The current uncertainty, or difference between lowball and highball estimates

MaxPeak: The largest successful numIterations value ever recorded

Peak: The largest successful numIterations value in the current run, where a "run" ends at an unsuccessful lowball batch

Peak%: The "Peak" in log points to help with mental math

%paused: The fraction of the run where the timer was paused. If the timer was never paused, this will be blank.

In practical terms:

  • "Target" isn't terribly useful under normal circumstances.
  • "Est" usually best characterizes the current instantaneous performance of the benchmark.
  • "Uncert%" indicates the precision of the measurements.
  • "MaxPeak" is usually the best "overall score" indicator for the benchmark.
  • "Peak" is like "MaxPeak", but if performance suffers a drop during the run, "Peak" will reflect that while "MaxPeak" will ignore it.
In addition, there are a number of symbols on the left-hand side:

<:

failure: the doBatch call finished after the target duration had elapsed

>:

success: the doBatch call finished before the target duration had elapsed

!:

the result was surprising (a lowball run failed, or a highball run succeeded)

?:

the workload has declined to provide throughput estimates, opting to give just "pass" and "fail" results

There are two additional lines that indicate different "phases" of BumbleBench:

-- ballpark --: At this point, BumbleBench has some confidence the score is near its eventual final value. It could be wrong, of course, so it runs the benchmark for some additional time to make sure the score is stable.

-- finale --: At this point, BumbleBench is ready to finish up. However, if the current score has regressed from a prior max-peak value, then the benchmark may actually be unable to sustain the measured max-peak score. To avoid egregiously temporary max-peak scores, BumbleBench reduces the max-peak score to the current peak score, and runs for a small amount of additional time to allow the score to regain its prior max-peak if it can.

Some Design Highlights

  1. Configuration is all through -D property settings, instead of command line arguments, allowing the use of static final fields (which are readily optimized) that can be altered without recompiling the program. Try -DBumbleBench.listOptions to see all the options available.
  2. All microbenchmark kernel logic can go into the single doBatch method. This encourages the use of local variables, which are readily optimized. In particular, Bumblebench offers pauseTimer() and startTimer() methods that make it easy to record the timing of just a portion of the doBatch logic, further encouraging people to put the whole kernel in one method.
  3. Bumblebench design encourages a style of one benchmark per program. Running multiple benchmarks in succession in a single Java program can cause phase changes in common infrastructural classes, distorting the performance of the later benchmarks.
  4. doBatch can return the number of iterations actually performed. If it is awkward to perform the actual desired number of iterations, doBatch can perform a few more or a few less without harming the accuracy of the timing results. For instance, if the kernel has two nested loops of M and N iterations respectively, then it can return M*N even if that is not exactly equal to numIterations.
  5. Benchmark classes are instantiated, so a benchmark can record state in the benchmark class instance. This allows for state that outlives the doBatch call, yet does not require mutable static variables. This is the preferred way to achieve #Multi-threaded runs.

Illustration

This section is confusing. Don't read it unless you want a deep understanding of exactly what BumbleBench measures.
This all may seem quite abstract and arbitrary. Hopefully, a picture will help. Here is a graph showing how BumbleBench's heuristics operate. It is a section from near the end of an actual BumbleBench run. right

The blue dashed lines indicate the range in which BumbleBench believes the benchmark score lies at any given time, and the orange line shows the estimated score reported by the workload. This is a highly variable workload, so sometimes the score is between the dashed lines, and sometimes it is not. Each batch that runs is represented by an asterisk.

The key to understanding BumbleBench's approach is that BumbleBench does not trust estimated scores; that is, the orange line does not directly affect the final benchmark score. Rather, BumbleBench is only interested in successful batches, which are those that were able to finish the requested number of iterations before the deadline elapsed. Thus, graphically speaking, the final score reported always corresponds to the location of some asterisk that lies under the orange line.

The distance between the dashed lines is the "uncertainty". BumbleBench continually tries to reduce its uncertainty about the score, which means it wants to minimize the distance between the dashed lines while keeping the orange line in between them. To bring the dashed lines closer together, BumbleBench alternately attempts to raise the bottom line and lower the top line. It does so by requesting batches (represented by the asterisks) at various scores. An asterisk on the bottom line is a "lowball" batch, and an asterisk on the top line is a "highball" batch.

The yellow line is the so-called "peak" score. That indicates the highest successful batch, meaning the highest asterisk that was under the orange line. For example, right before 34 seconds, there is a blue asterisk under the orange line. This indicates a successful batch, and since that batch was above the most recent peak score, this established a new peak score, moving the yellow line up to the asterisk. This happens again at 46.6, 49, and 50 seconds.

This "peak" score, however is designed to be sensitive to slowdowns in the benchmark. When a batch fails, and the target score of that batch was below the peak score, then BumbleBench concludes that the performance may have dropped, and attempts to measure this by resetting the peak score. You can see this occurring just after 44 seconds: the orange line came in under the asterisk, meaning the batch failed, and the asterisk was under the most recent location of the yellow line. In response, BumbleBench reset the peak score, causing the yellow line to fall from its previous location to -infinity. It recovered again just after 46 seconds, when a batch succeeded (the orange line rose above the asterisk), causing the yellow line to jump back up to the location of the asterisk.

The green line is the so-called "maxPeak" score. It maintains the highest observed value of the peak score, and so it never decreases. The only exception to this is when the "finale" phase begins, which occurred at 42.6 seconds in the graph. To guard against the case in which the benchmark peaks early and then never again achieves its highest score, BumbleBench adjusts maxPeak so it equals the peak score at the start of the finale; that is, it brings the green line down to meet the yellow line. Aside from this one exception, the green line always increases. This can be seen to happen at 50 seconds, for example.

It is the final value of the green line, the maxPeak score, that is eventually reported as the benchmark score.

Tutorial

THIS IS A WORK IN PROGRESS

TrigBench

The simplest "hello world" example is a tiny microbenchmark that measures Math.sin. It can be found in net/adoptopenjdk/BumbleBench/examples/TrigBench.java.

 public final class TrigBench extends MicroBench {
    protected long doBatch(long numIterations) throws InterruptedException {
       double argument = 0.1;
       for (long i = 0; i < numIterations; i++)
          argument = 4.6 * Math.sin(argument); // Chaos!
       return numIterations;
    }
 }

You can build this file, then it to BumbleBench.jar using this command:

 jar -uf BumbleBench.jar net/adoptopenjdk/BumbleBench/examples/TrigBench.class

Then run it like this:

 java -jar BumbleBench.jar TrigBench

You should see output like this:

> java -jar BumbleBench.jar TrigBench

              Target    Est     Uncert% MaxPeak Peak    Peak%   %paused
    0.0s:  >! 110       110.0K   24.0   110     110     470.0
    0.0s:  >! 123.2K    3.850M   28.8   123.2K  123.2K  1172.2
    0.4s:  >! 4.404M    11.29M   34.6   4.404M  4.404M  1529.8
    1.6s:  <  13.24M    11.73M   20.7   4.404M  4.404M  1529.8
    2.5s:  >  10.52M    11.74M   12.4   10.52M  10.52M  1616.8
    3.5s:  <  12.47M    11.57M    7.5   10.52M  10.52M  1616.8
    4.5s:  >  11.14M    11.70M    4.5   11.14M  11.14M  1622.6
    5.5s:  <  11.97M    11.80M    2.7   11.14M  11.14M  1622.6
    6.5s:  >  11.64M    11.86M    1.6   11.64M  11.64M  1627.0
    7.5s:  <  11.95M    11.80M    1.0   11.64M  11.64M  1627.0
    8.5s:  >  11.74M    11.78M    0.6   11.74M  11.74M  1627.9
    9.5s:  <  11.81M    11.75M    0.3   11.74M  11.74M  1627.9
   10.5s:  >  11.73M    11.79M    0.2   11.74M  11.74M  1627.9
   -- ballpark --
   11.5s:  <  11.80M    11.70M    0.3   11.74M  11.74M  1627.9
   12.5s:  >  11.68M    11.77M    0.3   11.74M  11.74M  1627.9
   13.5s:  >! 11.78M    11.84M    0.4   11.78M  11.78M  1628.2
   14.5s:  >! 11.86M    11.89M    0.4   11.86M  11.86M  1628.9
   15.5s:  <  11.91M    11.81M    0.5   11.86M  11.86M  1628.9
   16.5s:  <! 11.78M    11.69M    0.6   11.86M  -∞      --
   17.5s:  >  11.66M    11.84M    0.7   11.86M  11.66M  1627.1
   18.6s:  <  11.88M    11.80M    0.4   11.86M  11.66M  1627.1
   19.6s:  <! 11.77M    11.45M    0.5   11.86M  11.66M  1627.1
   20.6s:  >  11.42M    11.76M    0.6   11.86M  11.66M  1627.1
   21.6s:  <  11.80M    11.79M    0.4   11.86M  11.66M  1627.1
   22.6s:  <! 11.76M    11.69M    0.5   11.86M  11.66M  1627.1
   23.6s:  >  11.67M    11.77M    0.6   11.86M  11.67M  1627.2
   24.6s:  <  11.80M    11.66M    0.7   11.86M  11.67M  1627.2
   25.6s:  <! 11.63M    11.59M    0.8   11.86M  -∞      --
   26.6s:  >  11.54M    11.59M    0.5   11.86M  11.54M  1626.2
   27.6s:  >! 11.62M    11.75M    0.6   11.86M  11.62M  1626.8
   28.6s:  >! 11.78M    11.78M    0.7   11.86M  11.78M  1628.2
   29.6s:  >! 11.82M    11.83M    0.8   11.86M  11.82M  1628.6
   30.6s:  <  11.88M    11.85M    0.5   11.86M  11.82M  1628.6
   31.6s:  <! 11.82M    11.81M    0.6   11.86M  -∞      --
   32.6s:  >  11.77M    11.84M    0.4   11.86M  11.77M  1628.1
   33.6s:  <  11.86M    11.78M    0.4   11.86M  11.77M  1628.1
   34.6s:  >  11.76M    11.79M    0.3   11.86M  11.77M  1628.1
   35.6s:  >! 11.81M    11.82M    0.3   11.86M  11.81M  1628.4
   36.6s:  <  11.84M    11.81M    0.2   11.86M  11.81M  1628.4
   37.6s:  >  11.80M    11.86M    0.2   11.86M  11.81M  1628.4
   38.6s:  <  11.87M    11.86M    0.1   11.86M  11.81M  1628.4
   39.6s:  >  11.86M    11.86M    0.1   11.86M  11.86M  1628.8
   40.6s:  >! 11.86M    11.91M    0.1   11.86M  11.86M  1628.9
   41.6s:  <  11.91M    11.77M    0.1   11.86M  11.86M  1628.9
   42.6s:  >  11.77M    11.81M    0.1   11.86M  11.86M  1628.9
   -- finale --
   43.6s:  >! 11.82M    11.83M    0.2   11.86M  11.86M  1628.9
   44.6s:  <  11.84M    11.83M    0.1   11.86M  -∞      --
   45.6s:  <! 11.82M    11.80M    0.1   11.86M  -∞      --
   46.6s:  <! 11.79M    11.69M    0.1   11.86M  -∞      --
   47.6s:  >  11.68M    11.88M    0.2   11.86M  11.68M  1627.3
   48.6s:  <  11.89M    11.77M    0.2   11.86M  11.68M  1627.3
   49.6s:  >  11.76M    11.77M    0.1   11.86M  11.76M  1628.0
   50.6s:  <  11.78M    11.65M    0.1   11.86M  11.76M  1628.0
   51.6s:  >  11.64M    11.70M    0.2   11.86M  11.76M  1628.0
   52.6s:  >! 11.71M    11.83M    0.2   11.86M  11.76M  1628.0
   53.6s:  <  11.84M    11.80M    0.3   11.86M  11.76M  1628.0
   54.6s:  >  11.78M    11.88M    0.3   11.86M  11.78M  1628.2
   55.6s:  <  11.90M    11.75M    0.4   11.86M  11.78M  1628.2
   56.6s:  <! 11.73M    11.67M    0.4   11.86M  -∞      --
   57.6s:  >  11.65M    11.78M    0.5   11.86M  11.65M  1627.1
 
  TrigBench score: 1.1859834E7 (11.86M 1628.9%)
      uncertainty:   0.5%

This output contains a succession of lines, each representing a single batch (literally a call to TrigBench.doBatch)

EmptyBench

 -= BumbleBench series 2 version 3.1 running net.adoptopenjdk.bumblebench.examples.EmptyBench  Sun Sep 14 23:36:39 EDT 2014 =-
 
              Target    Est     Uncert% MaxPeak Peak    Peak%   %paused
    0.0s:  >! 110       120.0    24.0   110     110     470.0
    0.0s:  >! 134.4     148.8    28.8   134.4   134.4   490.1
    0.0s:  >! 170.2     191.7    34.6   170.2   170.2   513.7
    0.0s:  >! 224.8     257.9    40.0   224.8   224.8   541.5
    0.0s:  >! 309.5     361.0    40.0   309.5   309.5   573.5
    0.0s:  >! 433.3     505.5    40.0   433.3   433.3   607.1
    0.0s:  >! 606.6     707.7    40.0   606.6   606.6   640.8
    0.0s:  >! 849.2     990.7    40.0   849.2   849.2   674.4
    0.0s:  >! 1189      1387     40.0   1189    1189    708.1
    0.0s:  >! 1664      1942     40.0   1664    1664    741.7
    0.0s:  >! 2330      2719     40.0   2330    2330    775.4
    0.0s:  >! 3262      3806     40.0   3262    3262    809.0
    0.0s:  >! 4567      5328     40.0   4567    4567    842.7
    0.0s:  >! 6394      7460     40.0   6394    6394    876.3
    0.0s:  >! 8952      10.44K   40.0   8952    8952    910.0
    0.0s:  >! 12.53K    14.62K   40.0   12.53K  12.53K  943.6
    0.0s:  >! 17.54K    20.47K   40.0   17.54K  17.54K  977.3
    0.0s:  >! 24.56K    28.66K   40.0   24.56K  24.56K  1010.9
    0.0s:  >! 34.39K    40.12K   40.0   34.39K  34.39K  1044.5
  EmptyBench score: Infinity (∞)
       uncertainty:   0.0%

BumbleBench correctly reports that this benchmark runs infinitely fast: no matter now many iterations are requested, a batch finishes long before the target time has elapsed.

MORE TO COME

FAQ

How can I change the default value of one of BumbleBench's built-in options?

Put your custom settings in a .properties file with the same name as your benchmark's class, and put it in the benchmark directory inside BumbleBench.jar. (See net/adoptopenjdk/bumblebench/examples/TardyBench.properties for example.)

How can I tell whether the timer is running?

You probably don't want to. If you want the timer paused, just call pauseTimer, and if you want it running, call startTimer. These methods are idempotent, so it's harmless to call them if the timer is already in the desired state. Adding code to check the current state would be overly complicated, unnecessary, and could distort performance measurements.

If you really must check, then call isTimerPaused, but first you must promise that you have read the preceding paragraph.

Could BumbleBench offer a per-batch initialization method called before doBatch?

No. This is intentional. Having initialization in another method would require information to be passed to doBench in fields. Locals are faster than fields, so having doBatch load data from fields could distort your measurements.

Instead of an initialization method, BumbleBench encourages you to call pauseTimer and startTimer around any logic in doBatch that you do not wish to measure. Of course, these methods can also distort measurements if you do them too often, but they do so in a very predictable way (each of these acts like a call to System.currentTimeMillis), and they are normally amortized into insignificance as long as you don't call them in your workload's inner loop.

Initialization to be performed just once, however, is another matter. For adjustable settings, use the option method to set a static final field. To acquire and release resources (like opening and closing a network socket), you can override bumbleMain, adding your own logic before and after calling super.bumbleMain().

Could BumbleBench offer processing of command line arguments?

No. This is intentional. Final static fields can be folded away by the JIT compiler as though their values had been hard-coded into the program. In contrast, variables (even local variables) with values derived from command line options cannot be folded away, and are more likely to distort your measurements.

Adjustable benchmark parameters should use static final fields initialized using the BumbleBench option method.

Variations on a benchmark can be written as subclasses. BumbleBench even goes to some effort to support a dot syntax A.B if you'd like to implement your benchmark variation as an inner subclass B of a common superclass A, if that's how you'd like to organize your code. (See net/adoptopenjdk/BumbleBench/lambda/DispatchBench.java for example.) We recommend all classes in the MicroBench and MiniBench hierarchy be either abstract or final. This can help the JIT to optimize the benchmark harness so the measurement is focused on your workload.