Open on-disk kerchunk references as a virtual dataset #118

TomNicholas · 2024-05-16T04:27:34Z

It might be useful to be able to open an existing kerchunk json/parquet file as a virtual dataset, e.g to make changes to it before writing it back out.

This is essentially the kerchunk version of suggestion (2) here #63 (comment).

This should be really easy to implement: We already have a function for doing it (dataset_from_kerchunk_refs), we just have to teach open_virtual_dataset that existing kerchunk json/parquet files are also valid filetypes to pass in.

The text was updated successfully, but these errors were encountered:

jsignell · 2024-05-16T20:01:16Z

I can take a crack at this one.

TomNicholas · 2024-05-16T20:02:28Z

You might have to fight @norlandrhagen haha

jsignell · 2024-05-16T20:04:49Z

Oh! I can back off :) I do think there is an argument to be made for having different methods for open_virtual_dataset and open_as_virtual_dataset or something depending on whether the input is kerchunk-style refs vs actual data.

norlandrhagen · 2024-05-16T20:26:34Z

Sorry @jsignell! I should have mentioned it in this issue :) If I open a PR, would you mind taking a look at it?

TomNicholas · 2024-05-16T20:35:57Z

I do think there is an argument to be made for having different methods for open_virtual_dataset and open_as_virtual_dataset or something depending on whether the input is kerchunk-style refs vs actual data.

Yeah this is an interesting question. The same thing will arise for Zarr stores too: should there be a different function to open zarr arrays backed by chunk manifests vs zarr arrays backed by actual bytes on-disk in the store? I think in that context it would be confusing to have two functions, especially as "mixed" zarr stores are possible (and useful).

jsignell · 2024-05-16T21:09:16Z

should there be a different function to open zarr arrays backed by chunk manifests vs zarr arrays backed by actual bytes on-disk in the store? I think in that context it would be confusing to have two functions, especially as "mixed" zarr stores are possible (and useful).

I would say it makes sense to my brain to have separate functions for "just reading" vs "doing work" so that I can form an expectation about how long something will take to run. But I would expect the open_as_virtual_dataset to accept either a legacy file or a kerchunk reference file or a zarr or a ref zarr.

TomNicholas added the references generation Reading byte ranges from archival files label May 16, 2024

TomNicholas mentioned this issue May 16, 2024

Appending to references on disk #21

Open

norlandrhagen added a commit that referenced this issue May 16, 2024

first stab at issue #118

39d7735

norlandrhagen mentioned this issue May 16, 2024

Open Kerchunk refs as Virtual Dataset #119

Closed

4 tasks

okz mentioned this issue May 18, 2024

Better documentation #8

Open

TomNicholas mentioned this issue Jun 4, 2024

Rewrite paths in a manifest #130

Closed

TomNicholas mentioned this issue Sep 25, 2024

Reading virtual references back out into VirtualiZarr Manifests earth-mover/icechunk#104

Open

norlandrhagen mentioned this issue Oct 8, 2024

Allow open_virtual_dataset to read existing Kerchunk references #251

Merged

10 tasks

norlandrhagen closed this as completed Oct 18, 2024

TomNicholas mentioned this issue Dec 18, 2024

Aspirational use case: [C]Worthy mCDR OAE Atlas dataset #132

Open

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open on-disk kerchunk references as a virtual dataset #118

Open on-disk kerchunk references as a virtual dataset #118

TomNicholas commented May 16, 2024 •

edited

Loading

jsignell commented May 16, 2024

TomNicholas commented May 16, 2024

jsignell commented May 16, 2024

norlandrhagen commented May 16, 2024

TomNicholas commented May 16, 2024 •

edited

Loading

jsignell commented May 16, 2024

Open on-disk kerchunk references as a virtual dataset #118

Open on-disk kerchunk references as a virtual dataset #118

Comments

TomNicholas commented May 16, 2024 • edited Loading

jsignell commented May 16, 2024

TomNicholas commented May 16, 2024

jsignell commented May 16, 2024

norlandrhagen commented May 16, 2024

TomNicholas commented May 16, 2024 • edited Loading

jsignell commented May 16, 2024

TomNicholas commented May 16, 2024 •

edited

Loading

TomNicholas commented May 16, 2024 •

edited

Loading