Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: zarr spec v3: adds optional dimensions and the "netZDF" format #276

Closed
wants to merge 2 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 92 additions & 4 deletions docs/spec/v2.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _spec_v2:

Zarr storage specification version 2
Zarr storage specification version 3
====================================

This document provides a technical specification of the protocol and format
Expand Down Expand Up @@ -78,6 +78,14 @@ filters
filters are to be applied. Each codec configuration object MUST contain a
``"id"`` key identifying the codec to be used.

The following keys MAY be present:

dimensions
A list of string or ``null`` values providing optional names for each ofthe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/ofthe/of the/

array's dimensions. If provided, the list MUST have length equal to the
number of array dimensions. If omitted, the array MUST be treated
equivalently to providing dimensions as a list of all ``null`` values.

Other keys MUST NOT be present within the metadata object.

For example, the JSON object below defines a 2-dimensional array of 64-bit
Expand All @@ -98,6 +106,10 @@ using the Blosc compression library prior to storage::
"clevel": 5,
"shuffle": 1
},
"dimensions": [
"row",
"column"
]
"dtype": "<f8",
"fill_value": "NaN",
"filters": [
Expand All @@ -108,7 +120,7 @@ using the Blosc compression library prior to storage::
10000,
10000
],
"zarr_format": 2
"zarr_format": 3
}

.. _spec_v2_array_dtype:
Expand Down Expand Up @@ -284,6 +296,20 @@ zarr_format
An integer defining the version of the storage specification to which the
array store adheres.

The following keys are OPTIONAL:

dimensions
A JSON object defining a map from string dimension names to integer sizes.
All arrays in a group or its descendents with dimension names MUST have
matching size along their named dimensions, unless any of those dimensions
are overriden by dimensions in a descendent group.
netzdf
An optional boolean indicating whether arrays within the group and its
descendents adhere to the more restrictive "netZDF" file-format (detailed
below), in which dimensions are REQUIRED for all arrays. If omitted,
software SHOULD NOT make assumptions about whether or not dimensions can be
found on all arrays.

Other keys MUST NOT be present within the metadata object.

The members of a group are arrays and groups stored under logical paths that
Expand Down Expand Up @@ -312,6 +338,61 @@ For example, the JSON object below encodes three attributes named
"baz": [1, 2, 3, 4]
}

.. _spec_v2_dimensions:

Dimensions
----------

Groups and arrays can be associated with optional dimension names. This feature
is intended to facilitate self-described datasets.

Dimensions are required to be consistent. Any dimensions set on an array
(any non-``null`` value), MUST also be defined on an ancestor group. Dimension
sizes can be overwritten in descendent groups, but the size of each named
dimensions on an array MUST match the size of that dimension on the most direct
ancestor group on which it is defined.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to change this, to make group dimensions and consistency entirely optional:

If dimensions are set in a group, their sizes on all contained arrays
are REQUIRED to be consistent. Dimension sizes can be overwritten
in descendant groups, but the size of each named dimension (any
non-`null` value) on an array MUST match the size of that dimension
on the most direct ancestor group on which it is defined.


For example, the JSON objects below describe a hierarchy of arrays, where the
dimension ``x`` has size 1000 on the array ``foo`` and size 2000 on the array
``nested/bar``::

.zgroup:
{
"dimensions": {"x": 1000},
...
}
foo/.zarray:
{
"dimensions": ["x"],
"shape": [1000]
...
}
nested/.zgroup:
{
"dimensions": {"x": 2000},
...
}
nested/bar/.zarray:
{
"dimensions": ["x"],
"shape": [2000]
...
}

.. _spec_v2_netzdf:

NetZDF
------

NetZDF is a more restricted variant of the Zarr storage format, with the
following changes:

* The "dimensions" field is REQUIRED for all arrays.
* All entries in "dimensions" on arrays MUST be strings: ``null`` dimensions are
not allowed.
* The "netzdf" field is REQUIRED, with a value of ``true``, on all groups that
that obey the netZDF spec.

.. _spec_v2_examples:

Examples
Expand Down Expand Up @@ -358,7 +439,7 @@ Inspect the array metadata::
20,
20
],
"zarr_format": 2
"zarr_format": 3
}

Chunks are initialized on demand. E.g., set some data::
Expand Down Expand Up @@ -433,7 +514,7 @@ Inspect the group metadata::

>>> print(open('data/group.zarr/.zgroup').read())
{
"zarr_format": 2
"zarr_format": 3
}

Create a sub-group::
Expand Down Expand Up @@ -495,6 +576,13 @@ What has been stored::
Changes
-------

Version 3 changes
~~~~~~~~~~~~~~~~~~~

* Optional support for named dimensions on arrays and groups.
* Added a description of the more restrictive "netZDF" format, inspired by the
`netCDF <https://www.unidata.ucar.edu/netcdf>`_ data model.

Version 2 clarifications
~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down