Skip to content

Commit a9ebbf2

Browse files
committed
restore single-integer chunk edge length declaration
1 parent bee5f0a commit a9ebbf2

File tree

1 file changed

+94
-19
lines changed

1 file changed

+94
-19
lines changed

chunk-grids/rectilinear/README.md

Lines changed: 94 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,104 @@
11
# Rectilinear chunk grid
22

3+
## Abstract
4+
5+
This document defines a `chunk_grid` object to support rectilinear chunk grids. A rectilinear grid
6+
is a grid parametrized by a sequence of elements per axis, where each sequence of elements may be
7+
irregularly spaced. From a chunking perspective, a rectilinear grid is defined by a sequence of
8+
(potentially) variable-length intervals, or chunk edge lengths, for each axis of an array.
9+
10+
## Indexing
11+
12+
The following diagram illustrates a rectilinear chunk grid. The chunk edge lengths are not to scale.
13+
14+
```bash
15+
24 14
16+
┌───────────────────────┌──────────────┐
17+
│ │ │
18+
│ │ │
19+
│ chunk 0,0 │ chunk 0,1 │
20+
16 │ │ │
21+
│ │ │
22+
│ │ │
23+
│ │ │
24+
│───────────────────────└──────────────│
25+
│ │ │
26+
│ │ │
27+
10 │ chunk 1,0 │ chunk 1,1 │
28+
│ │ │
29+
│ │ │
30+
└───────────────────────└──────────────┘
31+
```
32+
33+
Every array index resolves to a specific chunk, which can be identified by its index in the chunk
34+
grid, and an index *within* that chunk, which we refer to here as the "chunk index".
35+
36+
In this example, the array index `(36, 15)` resolves to the chunk grid index `(1, 0)` and the
37+
chunk index `(12, 15)`.
38+
39+
More generally, given a tuple of tuples of edge lengths `L` and an array index `idx`, the `nth`
40+
element of `idx` (denoted `idx[n]`) maps to a chunk grid index by applying the following procedure:
41+
compute the cumulative sum `C` of the edge lengths in `L[n]`, i.e.
42+
`C := (L[n][0], L[n][0] + L[n][1], ...)`. The chunk grid index for
43+
`idx[n]` is given by the index of the first element of `C` that equals or exceeds `idx[n]`.
44+
45+
Once the chunk grid index is resolved, the chunk index *within* that chunk can be determined by
46+
subtracting `C[n-1]` (the cumulative sum at the previous chunk grid index) if `n > 0`, or 0, from
47+
`idx[n]`.
48+
349
## Metadata
450

551
| field | type | required |
652
| - | - | - |
7-
| `"name"` | Literal `"rectilinear"` | yes |
8-
| `"configuration"` | [#configuration][] | yes |
53+
| `name` | Literal `"rectilinear"` | yes |
54+
| `configuration` | [configuration](#configuration) | yes |
955

1056
### Configuration
1157

1258
| field | type | required | notes |
1359
| - | - | - | - |
14-
| `"kind"` | Literal `"inline"` | yes | |
15-
| `"chunk_shapes"` | array of [Chunk edge lengths](#chunk-edge-lengths) | yes | The length of `"chunk_shapes"` MUST match the number of dimensions of the array.
60+
| `kind` | Literal `"inline"` | yes | see [kinds of encodings](#kinds-of-encodings) |
61+
| `chunk_shapes` | array of [Chunk edge lengths](#chunk-edge-lengths) | yes | The length of `chunk_shapes` MUST equal the number of dimensions of the array.
62+
63+
#### Kinds of encodings
64+
65+
This specification defines a single permitted value for the `kind` field, namely the string
66+
`"inline"`. Additions to this specification could define new permitted values for the `kind` field
67+
which could define new semantics for the `chunk_shapes` field
1668

1769
#### Chunk edge lengths
1870

19-
The edge lengths of the chunks along an array axis `A` are represented by an array that can contain two types of elements:
20-
- an integer that explicitly denotes an edge length.
21-
- an array that denotes a [run-length encoded](#run-length-encoding) sequence of integers, each of which denotes an edge length.
71+
The edge lengths of the chunks for an array axis with length `L` can be declared in two ways.
2272

23-
The sum of the edge lengths MUST match the length of the array along the axis `A`.
73+
- as an integer
74+
75+
A single integer defines the step size of a regular 1-dimensional grid.
76+
77+
To convert a single integer `m` into a sequence of explicit chunk edge lengths for an array axis
78+
with length `L`, repeat the integer `m` until it defines a sequence with a sum greater than or equal to `L`.
79+
80+
For example, if `L` is 10, and `m` is 3, the explicit list of chunk lengths is `[3, 3, 3, 3]`.
81+
82+
- as an array that can contain two types of elements:
83+
- an integer that explicitly denotes an edge length.
84+
- an array that denotes a [run-length encoded](#run-length-encoding) sequence of integers,
85+
each of which denotes an edge length.
86+
87+
The sum of the edge lengths MUST equal or exceed `L`. Overflowing `L` by multiple chunks is
88+
permitted.
2489

2590
#### Run-length encoding
2691

2792
This specificiation defines a JSON representation for run-length encoded sequences.
2893

29-
A run-length encoded sequence of `N` repetitions of some value `V` is denoted by the length-2 JSON array `[V, N]`.
94+
A run-length encoded sequence of `N` repetitions of some value `V` is denoted by the JSON array `[V, N]`. Both `V` and `N` MUST be integers.
3095

3196
For example, the sequence `[1, 1, 1, 1, 1]` becomes `[1, 5]` after applying this run-length encoding.
3297

33-
## Resolving
34-
3598
## Example
3699

37-
This example demonstrates 5 different ways of specifying a rectilinear chunk grid for an array with shape `(6, 6, 6, 6, 6)`.
100+
This example demonstrates different ways of declaring the edge lengths for a rectilinear chunk grid
101+
via the `chunk_shapes` field.
38102

39103
```javascript
40104
{
@@ -45,22 +109,33 @@ This example demonstrates 5 different ways of specifying a rectilinear chunk gri
45109
"configuration": {
46110
"kind": "inline",
47111
"chunk_shapes": [
48-
[[2, 3]], // expands to [2, 2, 2]
49-
[[1, 6]], // expands to [1, 1, 1, 1, 1, 1]
50-
[1, [2, 1], 3], // expands to [1, 2, 3]
51-
[[1, 3], 3], // expands to [1, 1, 1, 3]
52-
[6] // expands to [6]
112+
4, // integer. expands to [4, 4]
113+
[1, 2, 3], // explicit list of edge lengths. expands to itself.
114+
[[4, 2]], // run-length encoded. expands to [4, 4].
115+
[[1, 3], 3], // run-length encoded and explicit list. expands to [1, 1, 1, 3]
116+
[4, 4, 4] // explicit list with overflow chunks
53117
]
54118
}
55119
}
56120
}
57121
```
58122

123+
## Compatibility with other chunk grids
124+
125+
A rectilinear grid is a generalization of a regular grid (a grid of regularly-spaced elements). Any
126+
[regular chunk grid ](https://zarr-specs.readthedocs.io/en/latest/v3/chunk-grids/regular-grid/index.html)
127+
can be converted losslessly to a rectilinear chunk grid.
128+
129+
The simplest procedure is to copy the
130+
`chunk_shape` field of the regular chunk grid and assign it to the `chunk_shapes` attribute of the
131+
rectilinear chunk grid.
132+
59133
## Prior work
60134

61-
A scheme for rectilinear chunking was proposed in a [Zarr extension proposal](https://zarr.dev/zeps/draft/ZEP0003.html) (ZEP). The specification presented here builds on the ZEP 3 proposal and adapts it to the Zarr V3.
135+
A scheme for rectilinear chunking was proposed in a
136+
Zarr extension proposal (ZEP) called [ZEP 0003](https://zarr.dev/zeps/draft/ZEP0003.html).
137+
The specification presented here builds on the ZEP 003 proposal and adapts it to the Zarr V3.
62138

63139
Key difference between this specification and ZEP 003:
64140
- This specification adds run-length encoding for integer sequences
65141
- This specification uses the key `"chunk_shapes"` in the `configuration` field, while ZEP 0003 uses the key `"chunk_shape"`.
66-
- Zep 0003 defines a meaning for single-integer elements of its `chunk_shape` metadata: `"chunk_shape" : [10]` declares a sequence of chunks with length 10 repeated to match the shape of the array. While convenient, we avoid the single-integer form here because it ambiguously handles chunks at the end of an array.

0 commit comments

Comments
 (0)