-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Choice of Zarr compressor #1661
Comments
All the Zarr files I have made from parcels have been compressed. You need not set a compressor; the default is to compress. For efficiency I have found the correct setting of chunk size to be more important. |
I mean the type of compression that is used (there are various available compressors zarr-developers/numcodecs). I agree about chunk size, but it would be good to also investigate the compression algorithm used as I think we just rely on the default which may not be best given our data. |
With all due respect, I get the impression you miss the point of the paper linked above. The high I/O-time is attributed in major parts to the loading of the fieldsets. Correct if I am mistaken, but that doesn't have anything to do with the zarray compressor of the particle file, does it ? I wish you get the answers you seek through your benchmarking approach. |
Thanks for clarifying! :) I haven't had the time yet to fully go through the paper. I've updated the description here to match. |
The writing of the
zarr
file inparticlefile.py
doesn't look to set a compressor. Setting a compressor can significantly help trading off compute for storage or vice versa [1]. The default chosen by zarr is the Blosc compressor, which is a "meta-compressor" (using different algorithms under the hood). Perhaps its worth investigating other compressors to see if there's one that is best suited for our simulation output.@erikvansebille do you know if zarr compressors have been a topic of discussion before?
The text was updated successfully, but these errors were encountered: