Skip to content

Commit

Permalink
Replace chunk_memory_layout with transpose codec
Browse files Browse the repository at this point in the history
  • Loading branch information
jbms committed Nov 30, 2022
1 parent a242b32 commit 0778cdf
Show file tree
Hide file tree
Showing 2 changed files with 52 additions and 49 deletions.
45 changes: 45 additions & 0 deletions docs/codecs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,51 @@ type is encoded as a 4-byte big endian two's complement integer, and the
representation of the data type, which is always little endian, is used
instead.

.. _transpose-codec:

Transpose
---------

Codec URI:
https://purl.org/zarr/spec/codec/transpose

Permutes the dimensions of the chunk array.

Configuration parameters
~~~~~~~~~~~~~~~~~~~~~~~~

order:
Required. Must be one of:

- An array of integers specifying a permutation of ``0``, ``1``, ...,
``n-1``, where ``n`` is the number of dimensions in the decoded chunk
representation provided as input to this codec.
- The string ``"C"``, equivalent to specifying the identity permutation
``0``, ``1``, ..., ``n-1``. This makes the codec a no-op.
- The string ``"F"``, equivalent to specifying the permutation ``n-1``, ...,
``1``, ``0``.

Format and algorithm
~~~~~~~~~~~~~~~~~~~~

The decoded chunk representation to which this codec is applied must be an
array. Implementations must fail if this codec is specified immediately after
another codec that produces a byte string as its encoded representation.

Given a chunk array ``A`` with shape ``A_shape`` as the decoded representation,
the encoded representation is an array ``B`` with the same data type as ``A``
and shape ``B_shape``, where:

- ``B_shape[i] = A_shape[order[i]]`` for all dimension indices ``i``, and
- ``B[B_pos] = A[A_pos]``, where ``B_pos[i] = A_pos[order[i]]``, for all chunk
positions ``A_pos`` and dimension indices ``i``.

.. note::

Implementations of this codec may simply construct a virtual view that
represents the transposed result, and avoid physically transposing the
in-memory representation when possible.

Deprecated codecs
=================

Expand Down
56 changes: 7 additions & 49 deletions docs/core/v3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -750,16 +750,8 @@ To encode or decode a chunk, the encoded and decoded representations for each
codec in the chain must first be determined as follows:

1. The initial decoded representation, ``decoded_representation[0]`` is
multi-dimensional array with the same data type as the zarr array, and a
shape determined according to the value of ``chunk_memory_layout`` as
follows:

- If ``chunk_memory_layout`` is equal to ``"C"``, the shape is equal to the
chunk shape.
- If ``chunk_memory_layout`` is equal to ``"F"``, the shape is equal to the
chunk shape, with the dimension order reversed.
- If ``chunk_memory_layout`` is defined by an extension, the extension
defines the shape.
multi-dimensional array with the same data type as the zarr array, and shape
equal to the chunk shape.

2. For each codec ``i``, the encoded representation is equal to the decoded
representation ``decoded_representation[i+1]`` of the next codec, and is
Expand Down Expand Up @@ -789,17 +781,9 @@ Encoding procedure
Based on the computed ``decoded_representations`` list, a chunk is encoded using
the following procedure:

1. The chunk array ``A`` (with a shape equal to the chunk shape, and data type
equal to the zarr array data type) is logically transformed into the initial
*encoded chunk* ``EC[0]`` of the type specified by
``decoded_representation[0]`` according to the ``chunk_memory_layout`` as
follows:

- If ``chunk_memory_layout`` is equal to ``"C"``, ``EC[0]`` equals ``A`` (no
transformation).
- If ``chunk_memory_layout`` is equal to ``"F"``, the dimension order is reversed.
- If ``chunk_memory_layout`` is defined by an extension, the extension
defines the transformation to perform.
1. The initial *encoded chunk* ``EC[0]`` of the type specified by
``decoded_representation[0]`` is equal to the chunk array ``A`` (with a shape
equal to the chunk shape, and data type equal to the zarr array data type).

2. For each codec ``codecs[i]`` in ``codecs``, ``EC[i+1] :=
codecs[i].encode(EC[i])``.
Expand Down Expand Up @@ -827,14 +811,7 @@ the following procedure:
3. For each codec ``codecs[i]`` in ``codecs``, iterating in reverse order,
``EC[i] := codecs[i].decode(EC[i+1], decoded_representation[i])``.

4. The chunk array ``A`` is computed from ``EC[0]`` according to the
``chunk_memory_layout`` as follows:

- If ``chunk_memory_layout`` is equal to ``"C"``, ``A`` equals ``EC[0]`` (no
transformation).
- If ``chunk_memory_layout`` is equal to ``"F"``, the dimension order is reversed.
- If ``chunk_memory_layout`` is defined by an extension, the extension
defines the transformation to perform.
4. The chunk array ``A`` is equal to ``EC[0]``.

Specifying codecs
-----------------
Expand Down Expand Up @@ -1068,22 +1045,6 @@ following mandatory names:
the specification. The ``type`` is required and the value is
defined by the extension.

``chunk_memory_layout``
^^^^^^^^^^^^^^^^^^^^^^^

The internal memory layout of the chunks. Use the value "C" to
indicate `C contiguous memory layout`_ or "F" to indicate
`F contiguous memory layout`_ as defined in this specification.

The ``chunk_memory_layout`` value is an extension point and may be
defined by an extension. If the chunk memory layout type
is defined by an extension, then the value must be an
object containing the names ``extension`` and ``type``. The
``extension`` is required and the value must be a URI that
identifies the extension and dereferences to a
human-readable representation of the specification. The ``type`` is
required and the value is defined by the extension.

``fill_value``
^^^^^^^^^^^^^^

Expand Down Expand Up @@ -1182,7 +1143,6 @@ compressed using gzip compression prior to storage::
"chunk_shape": [1000, 100],
"separator" : "/"
},
"chunk_memory_layout": "C",
"codecs": [{
"type": "https://purl.org/zarr/spec/codec/gzip/1.0",
"configuration": {
Expand Down Expand Up @@ -1213,7 +1173,6 @@ chunking as above, but using an extension data type::
"chunk_shape": [1000, 100],
"separator" : "/"
},
"chunk_memory_layout": "C",
"codecs": [{
"type": "https://purl.org/zarr/spec/codec/gzip/1.0",
"configuration": {
Expand All @@ -1229,7 +1188,7 @@ chunking as above, but using an extension data type::
comparison with spec v2,
``dtype`` has been renamed to ``data_type``,
``chunks`` has been renamed to ``chunk_grid``,
``order`` has been renamed to ``chunk_memory_layout``,
``order`` has been replaced by the ``transform`` codec,
the separate ``filters`` and ``compressor`` fields been combined into the single ``codecs`` field,
``zarr_format`` has been removed,

Expand Down Expand Up @@ -1735,7 +1694,6 @@ generic ``extensions`` in `entry point metadata`_ ``must_underst
array ``extensions`` in `Array metadata`_ ``must_understand``
data type `data_type`_ no ``fallback``
chunk grid `chunk_grid`_ always
chunk memory layout `chunk_memory_layout`_ always
storage transformer `storage_transformers`_ always
======================= ========================================= =====================

Expand Down

0 comments on commit 0778cdf

Please sign in to comment.