Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark performance overhead of spy #3541

Open
olevski opened this issue Jun 27, 2023 · 1 comment
Open

Benchmark performance overhead of spy #3541

olevski opened this issue Jun 27, 2023 · 1 comment

Comments

@olevski
Copy link
Member

olevski commented Jun 27, 2023

We should have an idea of how much overhead using something like spy represents.

For this purpose I propose two tests:

  1. Overhead for an optimized script/command @Panaetius mentions he has something in Rust for reading many files at once
  2. Overhead for a python ML model. Here it would be nice to test out training an image classification model. We can either get a simple one from the academic team or find one online. For example the CIFAR-10 dataset is ~200MB and can be easily fed into some reference image recognition CNN.

The goal is to compare the increase in execution time from running with spy relative to running without it.

@olevski olevski added kind/enhancement status/triage Issue needs to be triaged labels Jun 27, 2023
@Panaetius
Copy link
Member

I used the imdb dataset from https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz which is around 80mb but contains ~100k text files.
the rust code used for preprocessing it it just creates nice csv files out of it, I ran it once for train and once for test.
Only takes 0.1s to run in my machine, but then renku would take on the order of a day to check if the data is pulled from lfs (this was created to have that addressed) and to create metadata

@Panaetius Panaetius removed the status/triage Issue needs to be triaged label Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

2 participants