Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: zarr spec v3: adds optional dimensions and the "netZDF" format #276

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
101 changes: 97 additions & 4 deletions docs/spec/v2.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _spec_v2:

Zarr storage specification version 2
Zarr storage specification version 3
====================================

This document provides a technical specification of the protocol and format
Expand Down Expand Up @@ -78,6 +78,14 @@ filters
filters are to be applied. Each codec configuration object MUST contain a
``"id"`` key identifying the codec to be used.

The following keys MAY be present:

dimensions
A list of string or ``null`` values providing optional names for each ofthe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ofthe/of the/

array's dimensions. If provided, the list MUST have length equal to the
number of array dimensions. If omitted, the array MUST be treated
equivalently to providing dimensions as a list of all ``null`` values.

Other keys MUST NOT be present within the metadata object.

For example, the JSON object below defines a 2-dimensional array of 64-bit
Expand All @@ -98,6 +106,10 @@ using the Blosc compression library prior to storage::
"clevel": 5,
"shuffle": 1
},
"dimensions": [
"row",
"column"
]
"dtype": "<f8",
"fill_value": "NaN",
"filters": [
Expand All @@ -108,7 +120,7 @@ using the Blosc compression library prior to storage::
10000,
10000
],
"zarr_format": 2
"zarr_format": 3
}

.. _spec_v2_array_dtype:
Expand Down Expand Up @@ -284,6 +296,20 @@ zarr_format
An integer defining the version of the storage specification to which the
array store adheres.

The following keys are OPTIONAL:

dimensions
A JSON object defining a map from string dimension names to integer sizes.
All arrays in a group or its descendents with dimension names MUST have
matching size along their named dimensions, unless any of those dimensions
are overriden by dimensions in a descendent group.
netzdf
An optional boolean indicating whether arrays within the group and its
descendents adhere to the more restrictive "netZDF" file-format (detailed
below), in which dimensions are REQUIRED for all arrays. If omitted,
software SHOULD NOT make assumptions about whether or not dimensions can be
found on all arrays.

Other keys MUST NOT be present within the metadata object.

The members of a group are arrays and groups stored under logical paths that
Expand Down Expand Up @@ -312,6 +338,66 @@ For example, the JSON object below encodes three attributes named
"baz": [1, 2, 3, 4]
}

.. _spec_v2_dimensions:

Dimensions
----------

Groups and arrays can be associated with optional dimension names. This feature
is intended to facilitate self-described datasets.

Setting dimensions on groups is an OPTIONAL way to indicate that arrays that
use reuse the same dimension have a consistent size. When a dimension is set on
a group, the size of each dimensions on arrays inside that group is REQUIRED to
match. This includes arrays inside descendent groups, unless the dimension is
explicitly overwritten by dimensions on a descendent group.

For example, the JSON objects below describe a hierarchy of arrays, where the
dimension ``x`` has size 1000 on the array ``foo`` and size 2000 on the array
``nested/bar``::

.zgroup:
{
"dimensions": {"x": 1000},
...
}
foo/.zarray:
{
"dimensions": ["x"],
"shape": [1000]
...
}
nested/.zgroup:
{
"dimensions": {"x": 2000},
...
}
nested/bar/.zarray:
{
"dimensions": ["x"],
"shape": [2000]
...
}

If dimensions were removed from ``nested/.zarray`` then the array store would
be invalid, because the array ``nested/bar`` has inconsistent size for
dimension ``x`` from the size of the dimension in the root group.

.. _spec_v2_netzdf:

NetZDF
------

NetZDF is a more restricted variant of the Zarr storage format, with the
following changes:

* The "dimensions" field is REQUIRED for all arrays.
* All entries in "dimensions" on arrays MUST be strings: ``null`` dimensions are
not allowed.
* Every dimension on an array MUST be found on dimensions in an ancestor group.
* The "netzdf" field is REQUIRED, with a value of ``true``, on all groups that
that obey the netZDF spec.

.. _spec_v2_examples:

Examples
Expand Down Expand Up @@ -358,7 +444,7 @@ Inspect the array metadata::
20,
20
],
"zarr_format": 2
"zarr_format": 3
}

Chunks are initialized on demand. E.g., set some data::
Expand Down Expand Up @@ -433,7 +519,7 @@ Inspect the group metadata::

>>> print(open('data/group.zarr/.zgroup').read())
{
"zarr_format": 2
"zarr_format": 3
}

Create a sub-group::
Expand Down Expand Up @@ -495,6 +581,13 @@ What has been stored::
Changes
-------

Version 3 changes
~~~~~~~~~~~~~~~~~~~

* Optional support for named dimensions on arrays and groups.
* Added a description of the more restrictive "netZDF" format, inspired by the
`netCDF <https://www.unidata.ucar.edu/netcdf>`_ data model.

Version 2 clarifications
~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down