Enable compression for hash_table #210

ArvidJB · 2021-10-19T14:38:51Z

As a follow up to #205, can we enable compression for the hash_table datasets?

Ideally this should be configurable in some way, maybe as an argument to VersionedHDF5File(f, hash_table_compression='lzf')? If this turns out to be difficult it's also okay to use some default compression for all hash_table datasets.

The text was updated successfully, but these errors were encountered:

ArvidJB · 2021-10-19T14:45:11Z

Also, the hash_table chunk sizes should be configurable. I see that Hashtable does take a chunk_size argument, but unfortunately that's not exposed to the user.

asmeurer · 2021-10-19T19:45:06Z

The hash table looks the same regardless of what is in the dataset. So I would think it's better to just find a compression that works and use it everywhere.

ArvidJB · 2021-10-19T20:09:40Z

Both 'gzip' and 'lzf' seem to work and are part of the h5py install:
https://docs.h5py.org/en/stable/high/dataset.html#lossless-compression-filters
We usually pick 'lzf' for everything other than dtype='object'. Maybe we can just go with 'lzf' as the default?

quarl · 2021-10-19T20:18:26Z

lzf sounds good, let's go with that for now.

Aaron:
We are trying to meet a tight deadline and need a release with this fix by tomorrow.
Is it better for you to make the change or Arvid?

See deshaw#210.

asmeurer · 2021-10-19T22:51:59Z

We are trying to meet a tight deadline and need a release with this fix by tomorrow.

Sorry, didn't see this comment until just now. I have a fix at #211. I will merge it an make a release as soon as the tests pass.

asmeurer · 2021-10-19T23:16:40Z

I didn't make the chunk size configurable yet. Hopefully that isn't also something you need done urgently. I'm actually not even sure if the hashtable dataset needs to be chunked at all. I might need to play with this.

ArvidJB · 2021-10-19T23:26:36Z

Thanks, just the compression helps us a lot already!
Thanks for the quick work and also making a release already!

asmeurer added a commit to asmeurer/versioned-hdf5 that referenced this issue Oct 19, 2021

Always enable lzf compression on the hashtable dataset

167d8e1

See deshaw#210.

asmeurer mentioned this issue Oct 19, 2021

Always enable lzf compression on the hashtable dataset #211

Merged

ericdatakelly added this to the October 2021 milestone Oct 22, 2021

ericdatakelly assigned asmeurer Oct 22, 2021

ericdatakelly modified the milestones: October 2021, November 2021 Dec 2, 2021

ericdatakelly modified the milestones: November 2021, December 2021 Dec 14, 2021

ericdatakelly modified the milestones: December 2021, January 2022 Jan 4, 2022

ericdatakelly modified the milestones: January 2022, February 2022 Feb 14, 2022

ericdatakelly removed this from the February 2022 milestone Mar 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable compression for hash_table #210

Enable compression for hash_table #210

ArvidJB commented Oct 19, 2021

ArvidJB commented Oct 19, 2021

asmeurer commented Oct 19, 2021

ArvidJB commented Oct 19, 2021

quarl commented Oct 19, 2021

asmeurer commented Oct 19, 2021

asmeurer commented Oct 19, 2021

ArvidJB commented Oct 19, 2021

Enable compression for hash_table #210

Enable compression for hash_table #210

Comments

ArvidJB commented Oct 19, 2021

ArvidJB commented Oct 19, 2021

asmeurer commented Oct 19, 2021

ArvidJB commented Oct 19, 2021

quarl commented Oct 19, 2021

asmeurer commented Oct 19, 2021

asmeurer commented Oct 19, 2021

ArvidJB commented Oct 19, 2021