Skip to content

Commit

Permalink
update ZEP4 based on feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
rabernat committed Jun 29, 2023
1 parent bed09e0 commit 85b14eb
Showing 1 changed file with 77 additions and 4 deletions.
81 changes: 77 additions & 4 deletions draft/ZEP0004.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,9 @@ Conventions can also help users switch between different storage libraries more
Since Zarr and HDF5 implement nearly identical data models, a single convention can be applied to both formats.
This allows downstream software to maintain better separation of concerns between storage and domain-specific logic.

Conventions are modular and composable. A single group or array can conform to multiple conventions.


## Usage and Impact

We demonstrate the usage and impact with a simple example: unit encoding.
Expand All @@ -77,7 +80,7 @@ z = zarr.open(
'example_with_units.zarr', mode='w', shape=(10000, 10000), chunks=(1000, 1000), dtype='f4'
)
z.attrs['units'] = 'm^2'
z.attrs['zarr_convention'] = "units-v1"
z.attrs['zarr_conventions'] = ["units-v1"]
```

Reading back the data with zarr-python would work just fine. However, reading the data with a unit-aware package would enable automatic decoding.
Expand All @@ -103,14 +106,84 @@ None of this requires any awareness of units on part of Zarr itself.
Conventions by definition are fully backwards and forwards compatible with all versions of Zarr, including both V2 and V3.
They require no changes to the spec or extensions and can be ignored by Zarr implementations.

Existing Zarr data conforming to _ad-hoc_ conventions that predate this ZEP can be supported via the **legacy conventions** mechanism described below.
New Zarr data written after this ZEP has been accepted should use the **explicit conventions** approach.

## Detailed description

This ZEP itself describes the _process_ by which conventions may be proposed, discussed, accepted, and published by the Zarr community.
This ZEP itself describes the structure and process by which conventions may be proposed, discussed, accepted, and published by the Zarr community.
This process is intended to be much more lightweight and informal than a spec change, which requires a ZEP.

If this ZEP is accepted, Conventions will be added to the <https://zarr-specs.readthedocs.io/> website as a new top-level heading
Conventions will be described by a _convention document_.
Conventions documents will be added to the <https://zarr-specs.readthedocs.io/> website as a new top-level heading
and a corresponding folder will be created in the `zarr-developers/zarr-specs` repo.
A conventions template, which accompanies this ZEP, will be added to that folder.
A convention document template, which accompanies this ZEP, will be added to that folder.
The template is intentionally simple.

### Identifying a Convention

In its convention document, a must should declare itself as either an **explicit convention** or a **legacy convention**.

#### Explicit Convention

The preferred way of identifying the presence of a convention in a Zarr group or array is via the attribute `zarr_conventions`.
This attribute must be an array of strings; each string is an identifier for the convention.
Multiple conventions may be present.

For example, a group metadata JSON document with conventions present might look like this

```
{
"zarr_format": 3,
"node_type": "group",
"attributes": {
"zarr_conventions": ["units-v1", "foo],
}
}
```

where `units-v1` and `bar` are the convention identifiers.

#### Legacy Convention

A legacy convention is a convention already in use that predates this ZEP.
Data conforming to legacy conventions will not have the `zarr_conventions` attribute.
The conventions document must therefore specify how software can identify the presence of the convention through a series of rules or tests.

For those comfortable with the terminology, legacy conventions can be thought of as a "conformance class" and a corresponding "conformance test".

An example of a legacy convention might be existing Zarr data written following [CF Conventions](https://cfconventions.org/Data/cf-conventions/cf-conventions-1.10/cf-conventions.html).
Such data will have the group attribute `conventions` set to the value `CF-1.10` (or perhaps a different version number).
This forms the basis for a test for whether the group conforms to the convention.

#### Namespacing

Conventions may choose to store their attributes on a specific namespace.
This ZEP does not specify how namespacing works; that is up to the convention.
For example, the namespace may be specified as a prefix on attributes, e.g.

```
{
"attributes": {"units-v1:units": "m^2"}
}
```

or via a nested JSON object, e.g.

```
{
"attributes": {"units-v1": {"units: "m^2"}}
}
```

The use of namespacing is optional and is up to the convention to decide.

#### Versioning

There may be multiple versions of a convention.
It is recommended for a convention to explicitly declare its version.
For an explicit convention, the version identifier may be encoded into the convention identifier string, but this is not required.
The convention document should specify how to identify the convention version.

### New Convention Process

Expand Down

0 comments on commit 85b14eb

Please sign in to comment.