Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking #57

Open
AhmetCanSolak opened this issue Feb 16, 2023 · 4 comments
Open

Benchmarking #57

AhmetCanSolak opened this issue Feb 16, 2023 · 4 comments
Labels
CI Continuous integration enhancement New feature or request performance Speed and memory usage of the code

Comments

@AhmetCanSolak
Copy link
Contributor

We need to set up a benchmarking infra which we can run on different contexts and get help to make more educated decisions related to the performance. I'm inclined to use asv: https://github.com/airspeed-velocity/asv, but first wanted to hear more opinions. I want to hear your opinions on both benchmarking frameworks and what aspects of iohub you wish to see in benchmarks.

@AhmetCanSolak AhmetCanSolak added the enhancement New feature or request label Feb 16, 2023
@JoOkuma
Copy link
Member

JoOkuma commented Feb 16, 2023

I think benchmarking compression algorithms might be interesting. However, there are already several benchmarks about this on the web; the only additional information we would get it's if some algorithms might perform better in images, which might yield different performances in fluorescent vs. brightfield images.

I have mixed feelings about benchmarking chunk size, I think it's more processing step/dataset dependent, and a good benchmark will be very difficult to integrate into the CI, because it would require different storages and applications.

I'm happy with asv, but I don't know other benchmarking frameworks.

@ziw-liu
Copy link
Collaborator

ziw-liu commented Feb 16, 2023

I haven't been exposed to frameworks other than asv either.

There was some manual benchmarking results done in the time of waveorder.io regarding different Blosc algorithms (speed and ratio) on a single 500 MB dataset, which informed the choice of the default compressor that's inherited by iohub:

https://github.com/czbiohub/iohub/blob/2a48cf597612e72716ea1e6dc9281845e052d6c1/iohub/ngff.py#L386-L393

It's interesting to see how sparsity and patterns of BF, fluorescence, and mixed images will affect the results, and we can potentially recommend different compression schemes for different datasets.


By @camFoltz:

Compression level 1:

Compression level 9:

@mattersoflight
Copy link
Collaborator

mattersoflight commented Feb 17, 2023

Are you thinking of benchmarks that should be run during CI to catch performance gains or drops? If yes, I'd suggest timing the write and read operations for a 1GB random array so we can evaluate how using different dependencies (zarr-python vs tensorstore) affects the performance when using a single process or multiple processes.

Separately, we do need to know io performance as a function of chunk size and as a function of compression for our HPC infrastructure, specifically when using ESS or scratch space from the compute nodes. This is needed to make sound choices for how to run different pipelines. These benchmarks need not (should not) run on the CI servers. Also useful to evaluate the speed and compression ratios for different modalities of data. That also need not run during CI.

@JoOkuma
Copy link
Member

JoOkuma commented Feb 17, 2023

Got it, @mattersoflight . I was thinking about a benchmark with CI.
I think you're right with your suggestion of non-CI benchmarks.

@ziw-liu ziw-liu added performance Speed and memory usage of the code CI Continuous integration labels Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous integration enhancement New feature or request performance Speed and memory usage of the code
Projects
None yet
Development

No branches or pull requests

4 participants