From 0778cdf26396c0f17003d9cbd39f3e7a0d090252 Mon Sep 17 00:00:00 2001 From: Jeremy Maitin-Shepard Date: Wed, 30 Nov 2022 14:37:55 -0800 Subject: [PATCH] Replace chunk_memory_layout with transpose codec --- docs/codecs.rst | 45 +++++++++++++++++++++++++++++++++++++ docs/core/v3.0.rst | 56 ++++++---------------------------------------- 2 files changed, 52 insertions(+), 49 deletions(-) diff --git a/docs/codecs.rst b/docs/codecs.rst index c170a0b3..4e5f056b 100644 --- a/docs/codecs.rst +++ b/docs/codecs.rst @@ -211,6 +211,51 @@ type is encoded as a 4-byte big endian two's complement integer, and the representation of the data type, which is always little endian, is used instead. +.. _transpose-codec: + +Transpose +--------- + +Codec URI: + https://purl.org/zarr/spec/codec/transpose + +Permutes the dimensions of the chunk array. + +Configuration parameters +~~~~~~~~~~~~~~~~~~~~~~~~ + +order: + Required. Must be one of: + + - An array of integers specifying a permutation of ``0``, ``1``, ..., + ``n-1``, where ``n`` is the number of dimensions in the decoded chunk + representation provided as input to this codec. + - The string ``"C"``, equivalent to specifying the identity permutation + ``0``, ``1``, ..., ``n-1``. This makes the codec a no-op. + - The string ``"F"``, equivalent to specifying the permutation ``n-1``, ..., + ``1``, ``0``. + +Format and algorithm +~~~~~~~~~~~~~~~~~~~~ + +The decoded chunk representation to which this codec is applied must be an +array. Implementations must fail if this codec is specified immediately after +another codec that produces a byte string as its encoded representation. + +Given a chunk array ``A`` with shape ``A_shape`` as the decoded representation, +the encoded representation is an array ``B`` with the same data type as ``A`` +and shape ``B_shape``, where: + +- ``B_shape[i] = A_shape[order[i]]`` for all dimension indices ``i``, and +- ``B[B_pos] = A[A_pos]``, where ``B_pos[i] = A_pos[order[i]]``, for all chunk + positions ``A_pos`` and dimension indices ``i``. + +.. note:: + + Implementations of this codec may simply construct a virtual view that + represents the transposed result, and avoid physically transposing the + in-memory representation when possible. + Deprecated codecs ================= diff --git a/docs/core/v3.0.rst b/docs/core/v3.0.rst index 064426a5..8e7068df 100644 --- a/docs/core/v3.0.rst +++ b/docs/core/v3.0.rst @@ -750,16 +750,8 @@ To encode or decode a chunk, the encoded and decoded representations for each codec in the chain must first be determined as follows: 1. The initial decoded representation, ``decoded_representation[0]`` is - multi-dimensional array with the same data type as the zarr array, and a - shape determined according to the value of ``chunk_memory_layout`` as - follows: - - - If ``chunk_memory_layout`` is equal to ``"C"``, the shape is equal to the - chunk shape. - - If ``chunk_memory_layout`` is equal to ``"F"``, the shape is equal to the - chunk shape, with the dimension order reversed. - - If ``chunk_memory_layout`` is defined by an extension, the extension - defines the shape. + multi-dimensional array with the same data type as the zarr array, and shape + equal to the chunk shape. 2. For each codec ``i``, the encoded representation is equal to the decoded representation ``decoded_representation[i+1]`` of the next codec, and is @@ -789,17 +781,9 @@ Encoding procedure Based on the computed ``decoded_representations`` list, a chunk is encoded using the following procedure: -1. The chunk array ``A`` (with a shape equal to the chunk shape, and data type - equal to the zarr array data type) is logically transformed into the initial - *encoded chunk* ``EC[0]`` of the type specified by - ``decoded_representation[0]`` according to the ``chunk_memory_layout`` as - follows: - - - If ``chunk_memory_layout`` is equal to ``"C"``, ``EC[0]`` equals ``A`` (no - transformation). - - If ``chunk_memory_layout`` is equal to ``"F"``, the dimension order is reversed. - - If ``chunk_memory_layout`` is defined by an extension, the extension - defines the transformation to perform. +1. The initial *encoded chunk* ``EC[0]`` of the type specified by + ``decoded_representation[0]`` is equal to the chunk array ``A`` (with a shape + equal to the chunk shape, and data type equal to the zarr array data type). 2. For each codec ``codecs[i]`` in ``codecs``, ``EC[i+1] := codecs[i].encode(EC[i])``. @@ -827,14 +811,7 @@ the following procedure: 3. For each codec ``codecs[i]`` in ``codecs``, iterating in reverse order, ``EC[i] := codecs[i].decode(EC[i+1], decoded_representation[i])``. -4. The chunk array ``A`` is computed from ``EC[0]`` according to the - ``chunk_memory_layout`` as follows: - - - If ``chunk_memory_layout`` is equal to ``"C"``, ``A`` equals ``EC[0]`` (no - transformation). - - If ``chunk_memory_layout`` is equal to ``"F"``, the dimension order is reversed. - - If ``chunk_memory_layout`` is defined by an extension, the extension - defines the transformation to perform. +4. The chunk array ``A`` is equal to ``EC[0]``. Specifying codecs ----------------- @@ -1068,22 +1045,6 @@ following mandatory names: the specification. The ``type`` is required and the value is defined by the extension. -``chunk_memory_layout`` -^^^^^^^^^^^^^^^^^^^^^^^ - - The internal memory layout of the chunks. Use the value "C" to - indicate `C contiguous memory layout`_ or "F" to indicate - `F contiguous memory layout`_ as defined in this specification. - - The ``chunk_memory_layout`` value is an extension point and may be - defined by an extension. If the chunk memory layout type - is defined by an extension, then the value must be an - object containing the names ``extension`` and ``type``. The - ``extension`` is required and the value must be a URI that - identifies the extension and dereferences to a - human-readable representation of the specification. The ``type`` is - required and the value is defined by the extension. - ``fill_value`` ^^^^^^^^^^^^^^ @@ -1182,7 +1143,6 @@ compressed using gzip compression prior to storage:: "chunk_shape": [1000, 100], "separator" : "/" }, - "chunk_memory_layout": "C", "codecs": [{ "type": "https://purl.org/zarr/spec/codec/gzip/1.0", "configuration": { @@ -1213,7 +1173,6 @@ chunking as above, but using an extension data type:: "chunk_shape": [1000, 100], "separator" : "/" }, - "chunk_memory_layout": "C", "codecs": [{ "type": "https://purl.org/zarr/spec/codec/gzip/1.0", "configuration": { @@ -1229,7 +1188,7 @@ chunking as above, but using an extension data type:: comparison with spec v2, ``dtype`` has been renamed to ``data_type``, ``chunks`` has been renamed to ``chunk_grid``, - ``order`` has been renamed to ``chunk_memory_layout``, + ``order`` has been replaced by the ``transform`` codec, the separate ``filters`` and ``compressor`` fields been combined into the single ``codecs`` field, ``zarr_format`` has been removed, @@ -1735,7 +1694,6 @@ generic ``extensions`` in `entry point metadata`_ ``must_underst array ``extensions`` in `Array metadata`_ ``must_understand`` data type `data_type`_ no ``fallback`` chunk grid `chunk_grid`_ always -chunk memory layout `chunk_memory_layout`_ always storage transformer `storage_transformers`_ always ======================= ========================================= =====================