-
Notifications
You must be signed in to change notification settings - Fork 13
Getting Started with Core_bench
Core_bench is a micro-benchmarking library for OCaml that can measure execution costs of operations that take 1ns to about 100ms. Core_bench tries to measure execution costs of such short-lived computations precisely while trying to account for delayed GC costs and noise introduced by other activity on the system.
The easiest way to get started is using an example:
open Core open Core_bench let main () = Random.self_init (); let x = Random.float 10.0 in let y = Random.float 10.0 in Command.run (Bench.make_command [ Bench.Test.create ~name:"Float add" (fun () -> ignore (x +. y)); Bench.Test.create ~name:"Float mul" (fun () -> ignore (x *. y)); Bench.Test.create ~name:"Float div" (fun () -> ignore (x /. y)); ]) let () = main ()
When compiled this gives you an executable:
$ ./z.exe Estimated testing time 30s (3 benchmarks x 10s). Change using -quota SECS. ┌───────────┬──────────┬─────────┬────────────┐ │ Name │ Time/Run │ mWd/Run │ Percentage │ ├───────────┼──────────┼─────────┼────────────┤ │ Float add │ 2.53ns │ 2.00w │ 41.04% │ │ Float mul │ 2.50ns │ 2.00w │ 40.63% │ │ Float div │ 6.16ns │ 2.00w │ 100.00% │ └───────────┴──────────┴─────────┴────────────┘
If any of the functions resulted in allocation of words on the major heap (mjWd) or promotions, columns corresponding to those would be automatically displayed. In general, if a column does not have sginificant values, the column is not displayed. The most common options one would want to change are the -q
flag which controls the time quota for testing and enabling/disabling specific columns.
In the simple case, a benchmark is simply a unit -> unit
thunk and a name:
Bench.Test.create ~name:"Float add" (fun () -> ignore (x +. y));
One can also create indexed benchmarks, which can be helpful in understanding non-linearities in the execution profiles of functions. For example:
open Core.Std open Core_bench.Std let main () = Command.run (Bench.make_command [ Bench.Test.create_indexed ~name:"Array.create" ~args:[1;10;100;200;300;400] (fun len -> Staged.stage (fun () -> ignore(Array.create ~len 0))); ]) let () = main ()
which produces:
$ ./z.exe -q 3 Estimated testing time 18s (6 benchmarks x 3s). Change using -quota SECS. ┌──────────────────┬────────────┬─────────┬──────────┬────────────┐ │ Name │ Time/Run │ mWd/Run │ mjWd/Run │ Percentage │ ├──────────────────┼────────────┼─────────┼──────────┼────────────┤ │ Array.create:1 │ 26.60ns │ 2.00w │ │ 0.99% │ │ Array.create:10 │ 35.29ns │ 11.00w │ │ 1.31% │ │ Array.create:100 │ 108.39ns │ 101.00w │ │ 4.03% │ │ Array.create:200 │ 178.45ns │ 201.00w │ │ 6.64% │ │ Array.create:300 │ 1_996.86ns │ │ 301.00w │ 74.25% │ │ Array.create:400 │ 2_689.28ns │ │ 401.00w │ 100.00% │ └──────────────────┴────────────┴─────────┴──────────┴────────────┘
Core_bench produces self documenting executables. This documentation also closely corresponds to the functionality exposed through the .mli file and is a great way to interactively explore what the various options do. At the time of this writing -?
displays:
Benchmark for Float add, Float mul, Float div z.exe [COLUMN ...] Columns that can be specified are: time - Number of nano secs taken. cycles - Number of CPU cycles (RDTSC) taken. alloc - Allocation of major, minor and promoted words. gc - Show major and minor collections per 1000 runs. percentage - Relative execution time as a percentage. speedup - Relative execution cost as a speedup. samples - Number of samples collected for profiling. Columns with no significant values will not be displayed. The following columns will be displayed by default: time alloc percentage Error Estimates =============== To display error estimates, prefix the column name (or regression) with a '+'. Example +time. (1) R^2 is the fraction of the variance of the responder (such as runtime) that is accounted for by the predictors (such as number of runs). More informally, it describes how good a fit we're getting, with R^2 = 1 indicating a perfect fit and R^2 = 0 indicating a horrible fit. Also see: http://en.wikipedia.org/wiki/Coefficient_of_determination (2) Bootstrapping is used to compute 95% confidence intervals for each estimate. Because we expect runtime to be very highly correlated with number of runs, values very close to 1 are typical; an R^2 value for 'time' that is less than 0.99 should cause some suspicion, and a value less than 0.9 probably indicates either a shortage of data or that the data is erroneous or peculiar in some way. Specifying additional regressions ================================= The builtin in columns encode common analysis that apply to most functions. Bench allows the user to specify custom analysis to help understand relationships specific to a particular function using the flag "-regression" . It is worth noting that this feature requires some understanding of both linear regression and how various quatities relate to each other in the OCaml runtime. To specify a regression one must specify the responder variable and a command separated list of predictor variables. For example: +Time:Run,mjGC,Comp which asks bench to estimate execution time using three predictors namely the number of runs, major GCs and compaction stats and display error estimates. Drop the prefix '+' to suppress error estimation. The variables available for regression include: Time - Time Cycls - Cycles Run - Runs per sampled batch mGC - Minor Collections mjGC - Major Collections Comp - Compactions mWd - Minor Words mjWd - Major Words Prom - Promoted Words One - Constant predictor for estimating measurement overhead === flags === [-all-values] Show all column values, including very small ones. [-ascii] Display data in simple ascii based tables. [-ci-absolute] Display 95% confidence interval in absolute numbers [-clear-columns] Don't display default columns. Only show user specified ones. [-display STYLE] Table style (short, tall, line, blank or column). Default short. [-fork] Fork and run each benchmark in separate child-process [-geometric SCALE] Use geometric sampling. (default 1.01) [-linear INCREMENT] Use linear sampling to explore number of runs, example 1. [-load FILE] Analyze previously saved data files and don't run tests. [-load] can be specified multiple times. [-no-compactions] Disable GC compactions. [-overheads] Show measurement overheads, when applicable. [-quota SECS] Time quota allowed per test (default 10s). [-reduced-bootstrap] Reduce the number of bootstrapping iterations [-regression REGR] Specify additional regressions (See -? help). [-save] Save benchmark data to .txt files. [-stabilize-gc] Stabilize GC between each sample capture. [-v] High verbosity level. [-width WIDTH] width limit on column display (default 200). [-build-info] print info about this build and exit [-version] print the version of this build and exit [-help] print this help text and exit (alias: -?)