Reproducing

We split this into three steps:

Getting the software environment.
Running the benchmarks.
Running the analysis.

Getting the software environment with Nix

Nix package manager^[See https://nixos.org/guides/how-nix-works] is a user-level package manager available on many platforms. Nix uses build-from-source and a binary cache; this is more resilient than Dockerhub because if the binary cache goes down or removes packages for any reason, the client can always build them from source, so long as the projects don't disappear from GitHub. We considered creating a Docker image, but BenchExec, the tool we use to get consistent running times, manipulates cgroups and systemd, and we did not have enough time to figure out how to run in Docker or Podman.

Install Nix with:

$ curl -f -L https://install.determinate.systems/nix | sh -s -- install

This installer also enables "flakes" and the "nix-command". If you installed Nix by another method, see this page to enable flakes and the nix-command.

$ git clone https://github.com/charmoniumQ/PROBE
$ cd PROBE/benchmark

Then use Nix to activate the development; this may take ten minutes as it builds the necessary dependencies. This will start a new interactive shell in the proper environment. From here, you will have all the dependencies to run the steps below.

$ nix develop

Extra steps

Test nix shell '.#rr' --command rr record result/bin/ls. If this issues an error regarding kernel.perf_event_paranoid, follow its advice and confirm that resolves the error.
- AMD Zen CPUs may require extra setup
- Note that RR may require extra setup for virtual machines.
- Finally, AMD CPUs are hit-or-miss with rr, due to its low-level machine-code operations. This is a known weakness of rr. If you can't get rr to work, and you can't get an Intel CPU, you may disable rr by removing it from working at the bottom of prov_collectors.py. You will still be able to collect statistics on the other (more portable) provenance collectors.

Running the benchmarks

Note that we use the Python from our software environment, not from the system, to improve determinism. We wrote a front-end to run the scripts called runner.py.

Run with --help for more information. Briefly, it takes a --collectors, --workloads, and --iterations, which specify what to run. For the paper, we ran

$ ./runner.py \
    --collectors run-for-usenix \
    --workloads run-for-usenix \
    --iterations 5 \
    --verbose

Multiple --collectors and --workloads can be given, for example,

$ ./result/bin/python runner.py \
    --collectors noprov \
    --collectors strace \
    --workloads lmbench \
    --workloads postmark

See the bottom of prov_collectors.py and workloads.py for the name-mapping.

Running the analysis

The analysis is written in a Jupyter notebook called ``notebooks/cross-val.ipynb''. It begins by checking for anomalies in the data, which we've automated as much as possible, but please sanity check the graphs before proceeding.

The notebook can be launched from our software environment by:

cd notebooks
jupyter notebook

Adding new benchmark items or provenance collectors

For new benchmark items,

Go to workload.py.
Write a new workload class that implements the Workload interface.
Add an instance of your workload class to WORKLOAD_GROUPS.
Call ./result/bin/runner (see above) with --workloads workload_name where workload_name is the lowercased name of your workload class or name of a group containing your workload class.

For new provenance collectors:

Go to collectors.py.
Write a new class that implements the Collector interface.
Add an instance of your workload class to COLLECTORS.
Call ./result/bin/runner (see above) with --collectors collector_name where collector_name is the lowercased name of your collector class or name of a group containing your collector class.

Note that the attribute nix_packages, in both cases, contains a list of strings that reference packages defined as in package outputs for the current architecture the flake.nix. Using Nix to build our software environment ensures that all architectures and POSIX platforms can reproducibly build the relevant software environments.

Command used

./runner.py --workloads small-calib --collectors fast   --iterations 1 --warmups 30
./runner.py --workloads big-calib   --collectors noprov --iterations 1 --warmups 5  --append
./runner.py --workloads fast        --collectors fast   --iterations 3 --warmups 3  --append

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REPRODUCING.md

REPRODUCING.md

Reproducing

Getting the software environment with Nix

Extra steps

Running the benchmarks

Running the analysis

Adding new benchmark items or provenance collectors

Command used

Files

REPRODUCING.md

Latest commit

History

REPRODUCING.md

File metadata and controls

Reproducing

Getting the software environment with Nix

Extra steps

Running the benchmarks

Running the analysis

Adding new benchmark items or provenance collectors

Command used