Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable compression for hash_table #210

Open
ArvidJB opened this issue Oct 19, 2021 · 7 comments
Open

Enable compression for hash_table #210

ArvidJB opened this issue Oct 19, 2021 · 7 comments
Assignees

Comments

@ArvidJB
Copy link
Collaborator

ArvidJB commented Oct 19, 2021

As a follow up to #205, can we enable compression for the hash_table datasets?

Ideally this should be configurable in some way, maybe as an argument to VersionedHDF5File(f, hash_table_compression='lzf')? If this turns out to be difficult it's also okay to use some default compression for all hash_table datasets.

@ArvidJB
Copy link
Collaborator Author

ArvidJB commented Oct 19, 2021

Also, the hash_table chunk sizes should be configurable. I see that Hashtable does take a chunk_size argument, but unfortunately that's not exposed to the user.

@asmeurer
Copy link
Collaborator

The hash table looks the same regardless of what is in the dataset. So I would think it's better to just find a compression that works and use it everywhere.

@ArvidJB
Copy link
Collaborator Author

ArvidJB commented Oct 19, 2021

Both 'gzip' and 'lzf' seem to work and are part of the h5py install:
https://docs.h5py.org/en/stable/high/dataset.html#lossless-compression-filters
We usually pick 'lzf' for everything other than dtype='object'. Maybe we can just go with 'lzf' as the default?

@quarl
Copy link
Member

quarl commented Oct 19, 2021

lzf sounds good, let's go with that for now.

Aaron:
We are trying to meet a tight deadline and need a release with this fix by tomorrow.
Is it better for you to make the change or Arvid?

@asmeurer
Copy link
Collaborator

We are trying to meet a tight deadline and need a release with this fix by tomorrow.

Sorry, didn't see this comment until just now. I have a fix at #211. I will merge it an make a release as soon as the tests pass.

@asmeurer
Copy link
Collaborator

I didn't make the chunk size configurable yet. Hopefully that isn't also something you need done urgently. I'm actually not even sure if the hashtable dataset needs to be chunked at all. I might need to play with this.

@ArvidJB
Copy link
Collaborator Author

ArvidJB commented Oct 19, 2021

Thanks, just the compression helps us a lot already!
Thanks for the quick work and also making a release already!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants