DOC: Missing page on layers of Zarr abstractions #2956
Labels
documentation
Improvements to the documentation
help wanted
Issue could use help from someone with familiarity on the topic
Describe the issue linked to the documentation
tl;dr: A page is missing from the Zarr docs which distinguishes between the different layers of specs, APIs, ABCs, formats etc.
It took me a year of working closely with zarr (via VirtualiZarr) to fully understand that "Zarr" is a multi-layered thing, which you can opt-in and out of at many different levels.
What is zarr?
My current understanding is that "Zarr" encompasses all of these things:
Store
subclasses, which are key-value stores with a standardized API implementing the spec, but otherwise can do whatever they like behind the scenes, including not writing using the "native zarr" format,Store
implementations, which generally do write using the "native zarr" format,Store
subclasses, allowing python client code to treat many storage systems as interchangable.I feel this nuance and hierarchy is not clearly documented anywhere.
"Zarr" projects and how they fit in
It matters because there are now many projects which opt-in to some of these layers but not others. For example:
zarr.abc
provides (3),zarr.storage
provides (4), which writes to local and object storage using (1) but without explicitly noting that,zarr.api
provides (5), and can only interact with implementations of (3), such as (4),zarr-js
) generally use (1) as their format, following (2), and take vague inspiration from (4) and (5).ManifestStore
class is a concrete implementation of (3), but it eschews (1), with the aim of allowing access to an extensible set of non-zarr data formats on disk via (5).zarrs
has done) would follow (2) and the Icechunk spec, potentially with API inspiration from (3) and (5) but otherwise nothing else formal.coordinates
attribute it adds),Store
implementations).Current docs
The main zarr homepage only says
which is true but insufficient.
This is a problem because it leads to confusion as to what zarr "is" - for example many people understandably but mistakenly think that (1) is Zarr, and that (2) is the specification for this file format. It also makes it harder for potential contributors (e.g. @nenb) to place their ideas within the framework of the zarr project.
I think this separation of layers is awesome, I just wish I could have understood it a year earlier via reading the docs, instead of having to have it explained to me one-to-one by the likes of @d-v-b, @jhamman, and @rabernat.
cc also @maxrjones @paraseba
Suggested fix for documentation
We should have a new page on the main Zarr docs explaining these layers, and a page on the zarr-python docs explaining how it fits into this framework. Other projects such as Icechunk and VirtualiZarr can then more easily explain their relationship to Zarr.
The text was updated successfully, but these errors were encountered: