Skip to content

Latest commit

 

History

History
348 lines (260 loc) · 14.4 KB

tasks.org

File metadata and controls

348 lines (260 loc) · 14.4 KB

Evaluating system-level provenance tools for practical use

  • Paper directory
  • Computational provenance := how did this file get produced? What binaries, data, libraries, other files that got used? What about the computational provenance of those files?
  • System-level provenance collects this data without knowing anything about the underlying programs (black box); just looking at syscalls or the like.
  • This paper is a lit review of provenance systems

Provenance research presentation to GNU/Linux User’s Group

Provenance presentation to UBC

  • Presentation on Google Drive

Measuring provenance overheads

  • Paper directory
  • Take provenance systems and benchmarks from the lit review, apply all prov systems to all benchmarks
  • Reproducing: See REPRODUCING.md
  • Code directory
    • prov_collectors.py contains “provenance collectors”
    • workloads.py contains the “workloads”; The workloads have a “setup” and a “run” phase. For example, “setup” may download stuff (we don’t want to time the setup; that would just benchmark the internet service provider), whereas “run” will do the compile (we want to time only that).
    • runner.py will select certain collectors and workloads; if it succeeds, the results get stored in .cache/, so subsequent executions with the same arguments will return instantly
    • experiment.py contains the logic to run experiments (especailly cleaning up after them)
    • run_exec_wrapper.py knows how to execute commands in a “clean” environment and cgroup
    • Stats-larger.ipynb has the process to extract statistics using bayesian inference from the workflow runs
    • flake.nix contains the Nix expressions which describe the environment in which everything runs
    • result/ directory contains the result of building flake.nix; all binaries and executables should come from result/ in order for the experiment to be reproducible

Rapid review

Redo rapid review with snowballing

Include record/replay terms

  • Add sciunit
  • Add reprozip
  • Add DetTrace
  • Add CDE
  • Add Burrito
  • Add Sumatra

Get workloads to work

Get Apache to compile

  • We need to get src_sh{./result/bin/python runner.py apache} to work

Cannot find pcre-config

  • I invoke src_sh{./configure –with-pcre-config=/path/to/pcre-config}, and ./configure will still complain (“no pcre-config found”).
  • I ended up patching with httpd-configure.patch.

lber.h not found

  • /nix/store/2z0hshv096hhavariih722pckw5v150v-apr-util-1.6.3-dev/include/apr_ldap.h:79:10: fatal error: lber.h: No such file or directory

Get Spack workloads to compile

  • We need to get src_sh{./result/bin/python runner.py spack} to work
  • See docstring of SpackInstall in workloads.py.
  • Spack installs a target package (call it $spec) and all of $spec’s dependencies. Then it removes $spec, while leaving the dependencies.

Write a Workload class for Apache + ApacheBench

  • Compiling Apache is an interesting benchmark, but running Apache with a predefined request load is also an interesting benchmark.
  • We should write a new class called ApacheLoad that installs Apache in its setup() (for simplicity, we won’t reuse the version we built earlier), downloads a ApacheBench, and in the run() runs the server with the request load using only tools from result/ or .work/.

Compile Linux benchmark

  • Write a class that compiles the Linux kernel (just the kernel, no user-space software), using only tools from result/.
  • The benchmark should use a specific pin of the Linux kernel and set kernel build options. Both should be customizable and set by files that are checked into Git. However, the Linux source tree should not be checked into Git (see build Apache, where I download the source code in setup() and cache it for future use).

Postmark workload

lmbench benchmark

Write a ProFTPD benchmark

Refactor BLAST workloads

  • It should be easy to run them a large consistent set of many different BLAST apps.
  • Maybe have a 1 min, 10 min, and 60 min randomly-selected, but fixed, configuration

Create mercurial/VCS workload

[#A] Workflow benchmarks

[#A] ML benchmarks

[#A] Simulation benchmarks

[#A] Filebench benchmark

[#A] Shellbench

https://github.com/shellspec/shellbench

[#A] Include xz in workload

BACKLOG Make browser benchmarks

BACKLOG SSH

https://github.com/LineRate/ssh-perf

BACKLOG THTTPD and cherokee

http://www.acme.com/software/thttpd/ https://github.com/larryhe/tinyhttpd https://github.com/mendsley/tinyhttp https://cherokee-project.com/

BACKLOG SPEC CPU 2006

BACKLOG Create CVS workload

BACKLOG VIC

BACKLOG FIE

BACKLOG Run xSDK codes

BACKLOG Investigate Sysbench

BACKLOG investigate BT-IO

https://www.nas.nasa.gov/software/npb.html

Make API easier to use

Write run.py

  • Just runs one workload
  • –setup, –main, –teardown

Refactor runner.py

  • Change to run_store_analyze.py
  • runner.py mixes code for selecting benchmarks and prov collectors with code for summarizing statistical outputs.
  • Use –benchmarks and –collectors to form a grid
  • Accept –iterations, –seed, –fail-first
  • Accept –analysis $foo
  • Should have an –option to import external workloads and prov_collectors
  • Should have –re-run, which removes .cache/results_* and .cache/$hash

Refactor stats.py

  • Should have Callable[pandas.DataFrame, None]

[#A] Allow classes to specify Nix packages

  • setup() should do nix build and add to path

Refactor workloads.py

  • Should accept a tempdir
  • Should be smaller
  • Should have teardown

Refactor run_exec_wrapper.py

  • Should fail gracefully when cgroups are not available, or even degrade to using no containers

Document user interface

Make easier to install

[#C] Package Python code for PyPI using Poetry

Provenance collectors

Fix Sciunits

  • We need to get src_sh{./result/bin/python runner.py sciunit} to work.
  • Sciunit is a Python package which depends on a binary called ptu.
  • Sciunit says “sciunit: /nix/store/7x6rlzd7dqmsa474j8ilc306wlmjb8bp-python3-3.10.13-env/lib/python3.10/site-packages/sciunit2/libexec/ptu: No such file or directory”, but on my system, that file does exist! Why can’t sciunits find it?
  • Answer: That file exists; it is an ELF binary, it’s “interpreter” is set to /lib64/linux-something.so. That interpreter does not exist. I replaced this copy of ptu with the nix-built copy of ptu.

Fix sciunit

Fix strace unparsable lines

Fix rr to measure storage overhead

Package CARE

https://proot-me.github.io/care/

Package/write-up PTU

[#A] Debug PTU

[#A] Research Parrot

[#C] Write BPF trace

  • We need to write a basic prov collector for BPF trace. The collector should log files read/written by the process and all children processes. Start by writing prov.bt.

[#C] Fix Spade+FUSE

  • We need to get src_sh{./result/bin/python runner.py spade_fuse} to work.

[#C] Get SPADE Neo4J database to work

  • src_sh{./result/bin/spade start && echo “add storage Neo4J $PWD/db” | ./result/bin/spade control}
  • Currently, that fails with “Adding storage Neo4J… error: Unable to find/load class”
  • The log can be found in ~~/.local/share/SPADE/current.log~.
  • ~/.local/share/SPADE/lib/neo4j-community/lib/*.jar contains Neo4J classes. I believe these are on the classpath. However, this is a different version of Java or something like that, which refuses to load those jars.

BACKLOG discuss VAMSA

BACKLOG Build CentOS packages

Stats

Measure arithmetic intensity for each

  • IO calls / CPU sec, where CPU sec is itself a random variable

Measure slowdown as a function of arithmetic intensity

[#C] Count dynamic instructions in entire program

  • IO calls / 1M dynamic instruction

Plot IO vs CPU sec

Plot confidence interval of slowdown per arithmetic intensity

Evaluate prediction based on arithmetic intensity

  • slowdown(prov_collector) * cpu_to_wall_time(workload) * runtime(workload) ~ runtime(workload, prov_collector)
  • What is the expected percent error?

Characterize benchmarks and benchmark classes by syscall breakdown

BACKLOG Revise bayesian model to use benchmark class

  • How many classes and benchmarks does one need?

BACKLOG Use G-means or X-means to learn the number of clusters

Writing

Write introduction

Write background

Write literature rapid review section

Write benchmark and prov collector collection

Revise introduction (60)

  • Smoosh Motivation and Background together
  • Lead with the problem
  • 1 problem -> provenance (vs perf overhead) -> 3 other problems solved -> 3 ways to gather

Explain how strace, ltrace, fsatrace, rr got to be there

Explain how Sciunits, ReproZip got to be there

Describe experimental results

[#B] Explain the capabilities/features of each prov tracer

  • Table of capabilities (vDSO)

Discussion

  • What provenance methods are most promising?
  • Threats to validity
  • Mathematical model
  • Few of the tools are applicable to comp sci due to methods
  • How many work for distributed systems
  • How to handle network
  • Microbechmark vs applications?
  • Non-negative linear regression

[#B] Story-telling

  • Gaps in prior work re comp sci
  • Stakeholder perspectives:
    • Tool developers, users, facilities people
  • Longterm archiving an execution, such that it is re-executable
  • I/O defn? I/O includes stuff like username, clock_gettime

Conclusion

Threats to validity

Background

Page-limit

Reproducibility appendix

  • Need Intel CPU?

Why not VMs?

BACKLOG Record/replay reproducibility with library interposition

  • Paper directory
  • Record/replay is an easier way to get reproducibility than Docker/Nix/etc.
  • Use library interpositioning to make a record/replay tool that is faster than other record/replay tools

Get global state vars

Vars