Skip to content

ipyflow/nbsafety-experiments

Repository files navigation

Readme

There are two main functionalities in that this: scraping and replaying. You don’t need to deal with scraping if using the already-scraped traces.sqlite sqlite database. Besides that there are two scripts worth noting: one which replays a single notebook session, and one which replays a whole set of sessions after filtering based on some criteria

Replaying a single session

replay-session.py replays a single notebook session (given trace_id and session_id, basically ids for the repository and per-repository session), handling things like timeouts, figuring out packages that need installation, coverting Python 2 to Python 3 using the 2to3 tool, etc. It also counts the number of exceptions that occurred during replay; probably worth filtering out sessions where more than ~5-10% of the cell executions give an exception. There’s also a bunch of ancillary stuff in there that’s specific to nbsafety, like counting how often the user picks a stale cell for re-execution or a refresher cell; if just using the replay functionality and not replicating nbsafety results, this can just be deleted. Note that it assumes availability of tables replay_stats and replay_exception_stats in the traces.sqlite database whose schemas must be manually generated; the PyCharm sqlite connector is pretty good for this.

Replaying all sessions satisfying filtering criteria

run-replay-experiments.py runs all the sessions through a filtering process and replays all sessions that pass a filter. A bunch of the filtering criteria were manually specified after seeing nonsensical sessions that were replayed. It also accepts a --version argument; if you specify the same version, it skips sessions that were already replayed; if you specify a new version, it starts from scratch. There are also some nbsafety-specific parameters:

  • --naive-refresher-computation: is a baseline used in the paper,
  • --forward-only-propagation: used to measure utility of highlights where new ones are only created in later cells (spacially relative to the currently executed one) instead of both earlier and later cells
  • --no-nbsafety: used to determine how much faster non-nbsafety replay was (to see what nbsafety overhead was like).

When replaying these sessions, it is probably a good idea to do so in a chrooted environment or docker container, since the sessions are untrusted code that sometimes do some fairly strange things.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published