Skip to content

Commit

Permalink
Replace chunk_memory_layout with transpose codec
Browse files Browse the repository at this point in the history
  • Loading branch information
jbms committed Dec 3, 2022
1 parent 89f92f9 commit 28bf3a1
Show file tree
Hide file tree
Showing 2 changed files with 111 additions and 110 deletions.
103 changes: 103 additions & 0 deletions docs/codecs/transpose/v1.0.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
.. _transpose-codec-v1:

============================
Transpose codec (version 1.0)
============================

**Editor's draft 26 July 2019**

Specification URI:
https://purl.org/zarr/spec/codecs/endian/1.0
Corresponding ZEP:
`ZEP 1 — Zarr specification version 3 <https://zarr.dev/zeps/draft/ZEP0001.html>`_
Issue tracking:
`GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/codec>`_
Suggest an edit for this spec:
`GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/main/docs/codecs/transpose/v1.0.rst>`_

Copyright 2020 `Zarr core development team
<https://github.com/orgs/zarr-developers/teams/core-devs>`_. This work
is licensed under a `Creative Commons Attribution 3.0 Unported License
<https://creativecommons.org/licenses/by/3.0/>`_.

----


Abstract
========

Defines a codec that permutes the dimensions of the chunk array.


Status of this document
=======================

.. warning::
This document is a draft for review and subject to changes.
It will become final when the `Zarr Enhancement Proposal (ZEP) 1 <https://zarr.dev/zeps/draft/ZEP0001.html>`_
is approved via the `ZEP process <https://zarr.dev/zeps/active/ZEP0000.html>`_.


Document conventions
====================

Conformance requirements are expressed with a combination of
descriptive assertions and [RFC2119]_ terminology. The key words
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
parts of this document are to be interpreted as described in
[RFC2119]_. However, for readability, these words do not appear in all
uppercase letters in this specification.

All of the text of this specification is normative except sections
explicitly marked as non-normative, examples, and notes. Examples in
this specification are introduced with the words "for example".


Configuration parameters
========================

order:
Required. Must be one of:

- An array of integers specifying a permutation of ``0``, ``1``, ...,
``n-1``, where ``n`` is the number of dimensions in the decoded chunk
representation provided as input to this codec.
- The string ``"C"``, equivalent to specifying the identity permutation
``0``, ``1``, ..., ``n-1``. This makes the codec a no-op.
- The string ``"F"``, equivalent to specifying the permutation ``n-1``, ...,
``1``, ``0``.

Format and algorithm
====================

The decoded chunk representation to which this codec is applied must be an
array. Implementations must fail if this codec is specified immediately after
another codec that produces a byte string as its encoded representation.

Given a chunk array ``A`` with shape ``A_shape`` as the decoded representation,
the encoded representation is an array ``B`` with the same data type as ``A``
and shape ``B_shape``, where:

- ``B_shape[i] = A_shape[order[i]]`` for all dimension indices ``i``, and
- ``B[B_pos] = A[A_pos]``, where ``B_pos[i] = A_pos[order[i]]``, for all chunk
positions ``A_pos`` and dimension indices ``i``.

.. note::

Implementations of this codec may simply construct a virtual view that
represents the transposed result, and avoid physically transposing the
in-memory representation when possible.

References
==========

.. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
Requirement Levels. March 1997. Best Current Practice. URL:
https://tools.ietf.org/html/rfc2119
Change log
==========

No changes yet.
118 changes: 8 additions & 110 deletions docs/core/v3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -347,19 +347,6 @@ The following figure illustrates the first part of the terminology:
extensions may define other grid types such as
rectilinear grids.

.. _memory layout:
.. _memory layouts:

*Memory layout*

An array_ is associated with a memory layout which defines how to
construct a binary representation of a single chunk_ by organising
the binary values of the elements_ within the chunk_ into a single
contiguous sequence of bytes. This specification defines two types
of memory layout based on "C" (row-major) and "F" (column-major)
ordering of elements_, but extensions may define other
memory layouts.

.. _metadata document:
.. _metadata documents:

Expand All @@ -369,8 +356,7 @@ The following figure illustrates the first part of the terminology:
which is a machine-readable document containing essential
processing information about the node. For example, an array_
metadata document will specify the number of dimensions_, shape_,
`data type`_, grid_, `memory layout`_ and codec_ for that
array_.
`data type`_, grid_, and codec_ for that array_.

Groups can have an optional metadata document which provides extra
information about a group.
Expand Down Expand Up @@ -698,52 +684,6 @@ arbitrary length in a "negative" direction along any dimension.
``0-16, 30-31``. When writing such chunks it is recommended to use the current fill value
for elements outside the bounds of the array.

Chunk memory layouts
====================

An array has a memory layout, which defines the way that the binary
values of the array elements are organised within each chunk to form a
contiguous sequence of bytes. This contiguous binary representation of
a chunk is then the input to the array's chunk encoding pipeline,
described in later sections. Typically, when reading data, an
implementation will load this binary representation into a contiguous
memory buffer to allow direct access to array elements without having
to copy data.

The core specification defines two types of contiguous memory
layout. However, extensions may define other memory
layouts. Note that there may be an interdependency between memory
layouts and data types, such that certain memory layouts may only be
applicable to arrays with certain data types.

Row-major (C-style) memory layout
---------------------------------

In this memory layout, the binary values of the array elements are
organised into a sequence such that the last dimension of the array is
the fastest changing dimension, also known as "row-major" order. This
layout is only applicable to arrays with fixed size data types.

For example, for a two-dimensional array with chunk shape (`dy`, `dx`),
the binary values for a given chunk are taken from chunk elements in
the order (0, 0), (0, 1), (0, 2), ..., (`dy` - 1, `dx` - 3), (`dy` - 1, `dx` -
2), (`dy` - 1, `dx` - 1).

Column-major (F-style) memory layout
------------------------------------

In this memory layout, the binary values of the array elements are
organised into a sequence such that the first dimension of the array
is the fastest changing dimension, also known as "column-major"
order. This layout is only applicable to arrays with fixed size data
types.

For example, for a two-dimensional array with chunk shape (`dy`,
`dx`), the binary values for a given chunk are taken from chunk
elements in the order (0, 0), (1, 0), (2, 0), ..., (`dy` - 3, `dx` -
1), (`dy` - 2, `dx` - 1), (`dy` - 1, `dx` - 1).


Chunk encoding
==============

Expand All @@ -757,16 +697,8 @@ To encode or decode a chunk, the encoded and decoded representations for each
codec in the chain must first be determined as follows:

1. The initial decoded representation, ``decoded_representation[0]`` is
multi-dimensional array with the same data type as the zarr array, and a
shape determined according to the value of ``chunk_memory_layout`` as
follows:

- If ``chunk_memory_layout`` is equal to ``"C"``, the shape is equal to the
chunk shape.
- If ``chunk_memory_layout`` is equal to ``"F"``, the shape is equal to the
chunk shape, with the dimension order reversed.
- If ``chunk_memory_layout`` is defined by an extension, the extension
defines the shape.
multi-dimensional array with the same data type as the zarr array, and shape
equal to the chunk shape.

2. For each codec ``i``, the encoded representation is equal to the decoded
representation ``decoded_representation[i+1]`` of the next codec, and is
Expand Down Expand Up @@ -796,17 +728,9 @@ Encoding procedure
Based on the computed ``decoded_representations`` list, a chunk is encoded using
the following procedure:

1. The chunk array ``A`` (with a shape equal to the chunk shape, and data type
equal to the zarr array data type) is logically transformed into the initial
*encoded chunk* ``EC[0]`` of the type specified by
``decoded_representation[0]`` according to the ``chunk_memory_layout`` as
follows:

- If ``chunk_memory_layout`` is equal to ``"C"``, ``EC[0]`` equals ``A`` (no
transformation).
- If ``chunk_memory_layout`` is equal to ``"F"``, the dimension order is reversed.
- If ``chunk_memory_layout`` is defined by an extension, the extension
defines the transformation to perform.
1. The initial *encoded chunk* ``EC[0]`` of the type specified by
``decoded_representation[0]`` is equal to the chunk array ``A`` (with a shape
equal to the chunk shape, and data type equal to the zarr array data type).

2. For each codec ``codecs[i]`` in ``codecs``, ``EC[i+1] :=
codecs[i].encode(EC[i])``.
Expand Down Expand Up @@ -834,14 +758,7 @@ the following procedure:
3. For each codec ``codecs[i]`` in ``codecs``, iterating in reverse order,
``EC[i] := codecs[i].decode(EC[i+1], decoded_representation[i])``.

4. The chunk array ``A`` is computed from ``EC[0]`` according to the
``chunk_memory_layout`` as follows:

- If ``chunk_memory_layout`` is equal to ``"C"``, ``A`` equals ``EC[0]`` (no
transformation).
- If ``chunk_memory_layout`` is equal to ``"F"``, the dimension order is reversed.
- If ``chunk_memory_layout`` is defined by an extension, the extension
defines the transformation to perform.
4. The chunk array ``A`` is equal to ``EC[0]``.

Specifying codecs
-----------------
Expand Down Expand Up @@ -1091,22 +1008,6 @@ following mandatory names:
the specification. The ``type`` is required and the value is
defined by the extension.

``chunk_memory_layout``
^^^^^^^^^^^^^^^^^^^^^^^

The internal memory layout of the chunks. Use the value "C" to
indicate `C contiguous memory layout`_ or "F" to indicate
`F contiguous memory layout`_ as defined in this specification.

The ``chunk_memory_layout`` value is an extension point and may be
defined by an extension. If the chunk memory layout type
is defined by an extension, then the value must be an
object containing the names ``extension`` and ``type``. The
``extension`` is required and the value must be a URI that
identifies the extension and dereferences to a
human-readable representation of the specification. The ``type`` is
required and the value is defined by the extension.

``fill_value``
^^^^^^^^^^^^^^

Expand Down Expand Up @@ -1207,7 +1108,6 @@ compressed using gzip compression prior to storage::
"chunk_shape": [1000, 100],
"separator" : "/"
},
"chunk_memory_layout": "C",
"codecs": [{
"type": "https://purl.org/zarr/spec/codecs/gzip/1.0",
"configuration": {
Expand Down Expand Up @@ -1238,7 +1138,6 @@ chunking as above, but using an extension data type::
"chunk_shape": [1000, 100],
"separator" : "/"
},
"chunk_memory_layout": "C",
"codecs": [{
"type": "https://purl.org/zarr/spec/codecs/gzip/1.0",
"configuration": {
Expand All @@ -1254,7 +1153,7 @@ chunking as above, but using an extension data type::
comparison with spec v2,
``dtype`` has been renamed to ``data_type``,
``chunks`` has been renamed to ``chunk_grid``,
``order`` has been renamed to ``chunk_memory_layout``,
``order`` has been replaced by the ``transform`` codec,
the separate ``filters`` and ``compressor`` fields been combined into the single ``codecs`` field,
``zarr_format`` has been removed,

Expand Down Expand Up @@ -1767,7 +1666,6 @@ metadata encoding ``metadata_encoding`` in `entry point metadata`_ always
array ``extensions`` in `Array metadata`_ ``must_understand``
data type `data_type`_ no ``fallback``
chunk grid `chunk_grid`_ always
chunk memory layout `chunk_memory_layout`_ always
storage transformer `storage_transformers`_ always
======================= ================================================ =====================

Expand Down

0 comments on commit 28bf3a1

Please sign in to comment.