ATL03 Cloud Optimized Samples #398
betolink
started this conversation in
Show and tell
Replies: 2 comments 2 replies
-
I just noticed that the rechunked versions (most of them) are not readable out of the box with xarray(via h5netcdf), I get a long stack-trace exception and the last lines indicate that some references for the group are not there. Using File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()
File /srv/conda/envs/notebook/lib/python3.11/site-packages/h5py/_hl/group.py:353, in Group.__getitem__(self, name)
350 """ Open an object in the file """
352 if isinstance(name, h5r.Reference):
--> 353 oid = h5r.dereference(name, self.id)
354 if oid is None:
355 raise ValueError("Invalid HDF5 object reference")
File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()
File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()
File h5py/h5r.pyx:83, in h5py.h5r.dereference()
KeyError: 'Unable to open object by token (bad object header version number)' |
Beta Was this translation helpful? Give feedback.
0 replies
-
Dimension scales are causing the problem. I failed to re-reference the dimension scales correctly. Let me see if I can figure out how to do that in a sensible way... |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a few ATL03 files that were cloud optimized and put in a cryocloud-accessible S3 bucket. I wonder if they could be used to benchmark access to some of the SlideRule workflows or just to test that we can open them in a performant way. This could be h5coro or h5py, if h5py is used we'll have to adjust some parameters to let the library know we are opening a file with a different file layout, see reference gist
So we have 3 different sizes and each file has 3 different flavors.
-page-only-8mb.h5
files were repacked with the paged aggregation strategy, this affects only the metadata by consolidating it and forcing the library to use a fixed page size for remote requests.rechunked-100k-page-8mb.h5
: This flavor has the same page aggregation of the metadata and also repacked theheights
group, this means thath_ph
are now in chunks of 100k items instead of 10k, same level 6 compression (not a lot of compression).rechunked-100k-page-8mb-repacked.h5
: same as above but I ran theh5repack
command at the end to reduce wasted internal space. Mixed results on that, the h5stat still indicates that we have a lot of unaccounted space in the files after we rechunk them.I'll run some benchmarks on these files again and will post them here, again if they could be tested with SlideRule that would be great.
Beta Was this translation helpful? Give feedback.
All reactions