zarr-developers · TomNicholas · Apr 4, 2025 · Apr 4, 2025 · Apr 4, 2025 · Apr 4, 2025
diff --git a/_data/navigation.yml b/_data/navigation.yml
@@ -19,7 +19,19 @@ sidebar:
         url: "#sponsorship"
       - title: "Videos"
         url: "#videos"
-  - title: Subpages
+  - title: Technical
+    children:
+      - title: "Components"
+        url: '/components'
+      - title: "Flexibility"
+        url: '/flexibility'
+      - title: "Implementations"
+        url: '/implementations'
+      - title: "Specification"
+        url: https://zarr-specs.readthedocs.io/
+      - title: "ZEPs"
+        url: '/zeps'
+  - title: Community
     children:
       - title: "Adopters"
         url: "/adopters"
@@ -31,13 +43,7 @@ sidebar:
         url: '/conventions'
       - title: "Datasets"
         url: '/datasets'
-      - title: "Implementations"
-        url: '/implementations'
       - title: "Office Hours"
         url: "/office-hours"
       - title: "Slides"
-        url: "/slides"
-      - title: "Specification"
-        url: https://zarr-specs.readthedocs.io/
-      - title: "ZEPs"
-        url: '/zeps'
+        url: "/slides"
diff --git a/components/index.md b/components/index.md
@@ -0,0 +1,46 @@
+---
+layout: single
+author_profile: false
+title: Zarr Components
+sidebar:
+  title: "Components"
+  nav: sidebar
+---
+
+Zarr consists of several components, both abstract and concrete. 
+These span both the physical storage layer and the conceptual structural layer. 
+Zarr-related projects all use the Zarr Protocol (and hence data model), described by the [Zarr Specification](https://zarr-specs.readthedocs.io/), but otherwise may choose to implement other layers however they wish.
+
+## Abstract components
+
+These abstract components together describe what type of data can be stored in zarr, and how to store it, without assuming you are working in a particular programming language, or with a particular storage system.
+
+**Protocol**: All zarr-related projects use the Zarr Protocol, described in the [Zarr Specification](https://zarr-specs.readthedocs.io/), which allows transfer of chunked array data and metadata between devices (or between memory regions of the same device). 
+The protocol works by serializing and de-serializing array data as byte streams and storing both this data and accompanying metadata via an [Abstract Key-Value Store Interface](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#abstract-store-interface). 
+A system of [Codecs](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#chunk-encoding) is used to describe the encoding and serialization steps.
+
+**Data Model**: The specification's description of the [Stored Representation](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#stored-representation) implies a particular data model, based on the [HDF Abstract Data Model](https://support.hdfgroup.org/documentation/hdf5/latest/_h5_d_m__u_g.html). 
+It consists of a hierarchical tree of groups and arrays, with optional arbitrary metadata at every node. This model is completely domain-agnostic.
+
+**Format**: If the keys in the abstract key-value store interface are mapped unaltered to paths in a POSIX filesystem or prefixes in object storage, the data written to disk will follow the "Native Zarr Format". 
+Most, but not all, zarr implementations will serialize to this format.
+
+**Extensions**: Zarr provides a core set of generally-useful features, but extensions to this core are encouraged. These might take the form of domain-specific [metadata conventions](https://zarr.dev/conventions/), new codecs, or additions to the data model via [extension points](https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#extension-points). These can be abstract, or enforced by implementations or client libraries however they like, but generally should be opt-in.
+
+## Concrete components
+
+Concrete implementations of the abstract components can be implemented in any language. 
+The canonical reference implementation is [Zarr-Python](https://github.com/zarr-developers/zarr-python), but there are many [other implementations](https://zarr.dev/implementations/). 
+Zarr-Python contains reference examples of useful constructs that can be re-implemented in other languages.
+
+**Abstract Base Classes**: Zarr-python's [`zarr.abc`](https://zarr.readthedocs.io/en/stable/api/zarr/abc/index.html) module contains abstract base classes enforcing a particular python realization of the specification's Key-Value Store interface, using a `Store` ABC, which is based on a `MutableMapping`-like API. 
+This component is concrete in the sense that it is implemented in a specific programming language, and enforces particular syntax for getting and setting values in a key-value store.
+
+**Store Implementations**: Zarr-python's [`zarr.storage`](https://zarr.readthedocs.io/en/stable/api/zarr/abc/index.html) module contains concrete implementations of the `Store` ABC for interacting with particular storage systems.
+The zarr-python store implementations which write to local filesystems or object storage write data in the Native Zarr Format. 
+It's expected that most users of zarr from python will just use one of these implementations.
+
+**User API**: Zarr-python's [`zarr.api`](https://zarr.readthedocs.io/en/stable/api/zarr/abc/index.html) module contains functions and classes for interacting with any concrete implementation of the `zarr.abc.Store` interface. 
+This allows user applications to use a standard zarr API to read and write from a variety of common storage systems.
+
+These various components allow for a huge amount of [flexibility](https://zarr.dev/flexibility/).
diff --git a/flexibility/index.md b/flexibility/index.md
@@ -0,0 +1,58 @@
+---
+layout: single
+author_profile: false
+title: Zarr's Flexibility
+sidebar:
+  title: "Flexibility"
+  nav: sidebar
+---
+
+One of Zarr's greatest strengths is its flexibility, or "hackability".
+This largely comes from the separation of distinct [Zarr Components](https://zarr.dev/components/), but there are a range of other properties that make zarr flexible too.
+
+## Types of flexibility
+
+This flexibility comes in several forms:
+- The Zarr protocol is device agnostic.
+- The Zarr data model is domain agnostic.
+- Key-value stores are an almost universal abstraction in data systems, and so can almost always be mapped to existing system interfaces.
+- The Zarr format on-disk is extremely simple.
+- Storing each chunk under a different key allows implementations to scale their IO throughput in a variety of simple ways.
+- The reference Zarr implementation is written in Python, a very hackable language, with ABCs you can use when creating new store implementations.
+- Components are seperated: the protocol, file format, standard API, ABC, and store implementations are all separate.
+- There is no requirement to use more than one zarr component - individual projects can achieve powerful functionality by intelligently using only some of the Zarr components.
+- You can define your own codecs.
+- You are free to create your own domain-specific metadata standard and enforce it upon zarr stores however you like.
+- Zarr v3 has nascent support for other extension points, including defining your own type of chunk grid, data types, and more.
+- [Zarr Enhancement Proposals](https://zarr.dev/zeps/) (or "ZEPs") provide a mechanism for enhancing or adding to the specification in a community-standardized way.
+
+## Examples
+
+Here are a few zarr-related software projects, which each make use of a selected subset of different zarr components to achieve interesting functionality. 
+These particular projects are more than simply zarr implementations written in a different language (you can find a [list of implementations here](https://zarr.dev/implementations/)).
+
+- **MongoDBStore** is a concrete store implementation in python, which stores values in a MongoDB NoSQL database under zarr keys. 
+It is therefore spec-compliant, and can be interacted with via the zarr-python user API, but does not write data in the native zarr format.
+
+- [**VirtualiZarr**](https://github.com/zarr-developers/VirtualiZarr) provides a concrete store implementation in python (the `ManifestStore`) which stores references to locations and byte ranges of chunks on disk inside "chunk manifests", which reside inside files stored in other binary formats such as netCDF. 
+These references are generated by "readers", which do the job of parsing the file structure and mapping the contents to the zarr data model. 
+VirtualiZarr therefore eschews the native zarr format but still provides spec-compliant access to non-zarr-formatted data using zarr-python's API, without duplicating the original data.
+The manifests effectively act as an indirection layer between the zarr-spec-compliant key interface, and the actual location of the chunks in storage.
+
+- [**NCZarr**](https://docs.unidata.ucar.edu/nug/current/nczarr_head.html) and [**Lindi**](https://github.com/NeurodataWithoutBorders/lindi) can both in some sense be considered as the opposite of VirtualiZarr - they allow interacting with zarr-formatted data on disk via a non-zarr API. 
+Lindi maps zarr's data model to the HDF data model and allows access to via the `h5py` library through the [`LindiH5pyFile`](https://github.com/NeurodataWithoutBorders/lindi/blob/b125c111880dd830f2911c1bc2084b2de94f6d71/lindi/LindiH5pyFile/LindiH5pyFile.py#L28) class. 
+[NCZarr](https://docs.unidata.ucar.edu/nug/current/nczarr_head.html) allows interacting with zarr-formatted data via the netcdf-c library. 
+Note that both libraries implement optional additional optimizations by going beyond the zarr specification and format on disk, which is not recommended.
+
+- [**Tensorstore**](https://github.com/google/tensorstore) is a general storage library written in C++ that can write to the Zarr format (so is a spec-compliant non-python "native" store implementation) but also to other array formats such as N5.
+As it can write to multiple different storage sytems, it effectively has its own set of concrete store implementations.
+Additional features are provided, notably using an Optionally-Cooperative Distributed B+Tree (OCDBT) on top of a base key-value store to implement ACID transactions. 
+It still stores all data using the native Zarr Format, but versions keys at the store level.
+
+- [**Icechunk**](https://icechunk.io/) is a cloud-native tensor storage engine which also provides ACID transactions, but does so via indirection between a zarr-spec-compliant key-value store interface and a specialized non-zarr-native storage layout on-disk (for which Icechunk has it's own format specification). 
+Whilst the core icechunk client is written in rust, the `icechunk-python` client implements a concrete subclass of the zarr-python `Store` ABC. 
+Therefore libraries such as [xarray](https://xarray.dev/) can use the zarr-python user API to read and write to icechunk stores, effectively treating them as version-controlled zarr stores. 
+Icechunk also integrates with VirtualiZarr as a serialization format for byte range references. 
+Together they allow data stored in non-zarr formats to be committed to a persistent icechunk store and read back later via the zarr-python API without duplicating the original data chunks.
+
+We also have a full list of [zarr implementations](https://zarr.dev/implementations/).
diff --git a/index.md b/index.md
@@ -32,28 +32,28 @@ can be represented as a key-value store, including most commonly POSIX file
 systems and cloud object storage but also zip files as well as relational and
 document databases.
 
-See the following GitHub repositories for more information:
-
-* [Zarr Python](https://github.com/zarr-developers/zarr)
-* [Zarr Specs](https://github.com/zarr-developers/zarr-specs)
-* [Numcodecs](https://github.com/zarr-developers/numcodecs)
-* [Z5](https://github.com/constantinpape/z5)
-* [N5](https://github.com/saalfeldlab/n5)
-* [Zarr.jl](https://github.com/meggart/Zarr.jl)
-* [ndarray.scala](https://github.com/lasersonlab/ndarray.scala)
+For more details read about the various [components of Zarr](https://zarr.dev/components/), 
+see the canonical [Zarr-Python](https://github.com/zarr-developers/zarr-python) implementation, 
+or look through [other Zarr implementations](https://zarr.dev/implementations/) for one in your preferred language.
 
 ## Applications
 
-* Simple and fast serialization of NumPy-like arrays, accessible from languages including Python, C, C++, Rust, Javascript and Java
-* Multi-scale n-dimensional image storage, e.g. in light and electron microscopy
-* Geospatial rasters, e.g. following the NetCDF / CF metadata conventions
+* Multi-scale n-dimensional image storage, e.g. in light and electron microscopy.
+* Genomics data, e.g. for quantitative and population genetics.
+* Gridded scientific data in various domains, such as CFD or Plasma Physics.
+* Geospatial rasters, e.g. following the NetCDF data model.
+* Checkpointing ML model weights.
 
 ## Features
 
+* Serialize NumPy-like arrays in a simple and fast way.
+* Access from languages including Python, C, C++, Rust, Javascript and Java.
 * Chunk multi-dimensional arrays along any dimension.
+* Compress array chunks via an extensible system of compressors.
 * Store arrays in memory, on disk, inside a Zip file, on S3, etc.
 * Read and write arrays concurrently from multiple threads or processes.
 * Organize arrays into hierarchies via annotatable groups.
+* Extend easily thanks to the [flexible design](https://zarr.dev/flexibility/).
 
 ## Sponsorship