Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAQ updates #266

Merged
merged 6 commits into from
Oct 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 25 additions & 16 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,13 @@ API Reference
.. currentmodule:: virtualizarr

VirtualiZarr has a small API surface, because most of the complexity is handled by xarray functions like ``xarray.concat`` and ``xarray.merge``.
Users can use xarray for every step apart from reading and serializing virtual references.

Manifests
=========

.. currentmodule:: virtualizarr.manifests
.. autosummary::
:nosignatures:
:toctree: generated/

ChunkManifest
ManifestArray

User API
========

Reading
=======
-------

.. currentmodule:: virtualizarr.backend
.. autosummary::
Expand All @@ -30,7 +22,7 @@ Reading


Serialization
=============
-------------

.. currentmodule:: virtualizarr.accessor
.. autosummary::
Expand All @@ -41,9 +33,8 @@ Serialization
VirtualiZarrDatasetAccessor.to_zarr
VirtualiZarrDatasetAccessor.to_icechunk


Rewriting
=============
---------

.. currentmodule:: virtualizarr.accessor
.. autosummary::
Expand All @@ -52,9 +43,27 @@ Rewriting

VirtualiZarrDatasetAccessor.rename_paths

Developer API
=============

If you want to write a new reader to create virtual references pointing to a custom file format, you will need to use VirtualiZarr's internal classes.

Manifests
---------

VirtualiZarr uses these classes to store virtual references internally.

.. currentmodule:: virtualizarr.manifests
.. autosummary::
:nosignatures:
:toctree: generated/

ChunkManifest
ManifestArray


Array API
=========
---------

VirtualiZarr's :py:class:`~virtualizarr.ManifestArray` objects support a limited subset of the Python Array API standard in :py:mod:`virtualizarr.manifests.array_api`.

Expand Down
28 changes: 28 additions & 0 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,3 +68,31 @@ We have a lot of ideas, including:
- [Generating references without kerchunk](https://github.com/zarr-developers/VirtualiZarr/issues/78)

If you see other opportunities then we would love to hear your ideas!

## Is this compatible with Icechunk?

Yes! VirtualiZarr allows you to ingest data as virtual references and write those references into an Icechunk Store. See the [Icechunk documentation on creating virtaul datasets.](https://icechunk.io/icechunk-python/virtual/#creating-a-virtual-dataset-with-virtualizarr)

## I already have Kerchunked data, do I have to redo that work?

No - you can simply open the Kerchunk-formatted references you already have into VirtualiZarr directly. Then you can re-save them into a new format, e.g. [Icechunk](https://icechunk.io/) like so:

```python
from virtualizarr import open_virtual_dataset

vds = open_virtual_dataset('refs.json')
# vds = open_virtual_dataset('refs.parq') # kerchunk parquet files are supported too

vds.virtualize.to_icechunk(icechunkstore)
```

## Can I add a new reader for my custom file format?

There are a lot of legacy file formats which could potentially be represented as virtual zarr references (see [this issue](https://github.com/zarr-developers/VirtualiZarr/issues/218) for some examples). VirtualiZarr ships with some readers for common formats (e.g. netCDF/HDF5), but you may want to write your own reader for some other file format.

VirtualiZarr is designed in a way to make this as straightforward as possible. If you want to do this then [this comment](https://github.com/zarr-developers/VirtualiZarr/issues/262#issuecomment-2429968244
) will be helpful.

You can also use this approach to write a reader that starts from a kerchunk-formatted virtual references dict.

Currently if you want to call your new reader from `virtualizarr.open_virtual_dataset` you would need to open a PR to this repository, but we plan to generalize this system to allow 3rd party libraries to plug in via an entrypoint (see [issue #245](https://github.com/zarr-developers/VirtualiZarr/issues/245)).