Run microbenchmarks in Elm.
Table of Contents
Here's a sample, benchmarking Array
.
import Array
import Benchmark exposing (..)
suite : Benchmark
suite =
let
sampleArray =
Array.initialize 100 identity
in
describe "Array"
[ -- nest as many descriptions as you like
describe "slice"
[ benchmark "from the beginning" <|
\_ -> Array.slice 50 100 sampleArray
, benchmark "from the end" <|
\_ -> Array.slice 0 50 sampleArray
]
]
This code uses a few common functions:
describe
to organize benchmarksbenchmark
to run benchmarks
For a more thorough overview, I wrote an introduction to elm-benchmark. Please note that the article was written for a previous version so some details may have changed slightly.
Here are the commands (with explanation) that you should run to get started:
mkdir benchmarks # create a benchmarks directory
elm install elm-explorations/benchmark # get this project, including the browser runner
Then add your benchmarks in benchmark/YourBenchmarkName.elm
Benchmark.Runner
provides program
, which takes a Benchmark
and runs it in the browser.
To run the sample above, you would do:
import Benchmark.Runner exposing (BenchmarkProgram, program)
main : BenchmarkProgram
main =
program suite
Compile and open in your browser to start the benchmarking run.
Some general principles:
- Don't compare raw values from different machines.
- When you're working on speeding up a function, keep the old implementation around and use
compare
to measure your progress. - "As always, if you see numbers that look wildly out of whack, you shouldn’t rejoice that you have magically achieved fast performance—be skeptical and investigate!" – Bryan O'Sullivan
Goodness of fit is a measurement of how well our prediction fits the measurements we have collected. You want this to be as close to 100% as possible. In benchmark:
- 99% is great
- 95% is okay
- 90% is not great, consider closing other programs on your computer and re-running
- 80% and below, the result has been highly affected by outliers. Please do not trust the results when goodness of fit is this low.
benchmark will eventually incorporate this advice into the reporting interface. See Issue #4.
For more, see Wikipedia: Goodness of Fit.
They're not, but they look like it because we interleave runs and only update the UI after collecting one of each. Keep reading for more on why we do this!
When we measure the speed of your code, we take the following steps:
- We warm up the JIT so that we can get a good measurement.
- We measure how many runs of the function will fit into a small but measurable timespan.
- We collect multiples of this amount, until we have enough to create a trend. We can do this because running a benchmark twice should take twice as long as running it once, so we can create a reliable prediction by splitting sample sizes among a number of buckets.
- Once we have enough we present our prediction of runs per second on your computer, in this configuration, now. We try to be as consistent as possible, but be aware that the environment matters a lot!
If the run contains multiple benchmarks, we interleave sampling between them. This means that given three benchmarks we would take one sample of each and continue in that pattern until they were complete.
We do this because the system might be busy with other work when running the first, but give its full attention to the second and third. This would make one artificially slower than the others, so we would get misleading data!
By interleaving samples, we spread this offset among all the benchmarks. It sets a more even playing field for all the benchmarks, and gives us better data.
- Bryan O'Sullivan's Criterion, from which we take our prediction technique.
- Gary Bernhardt's Readygo, from which we take interleaved runs.
- Special thanks to Luke Westby for both the initial implementation of the measurement code and starting the conversations that lead to this library.
benchmark is licensed under a 3-Clause BSD License.