Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self contained arrays #274

Open
DennisHeimbigner opened this issue Oct 29, 2023 · 3 comments
Open

Self contained arrays #274

DennisHeimbigner opened this issue Oct 29, 2023 · 3 comments

Comments

@DennisHeimbigner
Copy link

I get the impression that their is a hidden assumption in the spec that all the information about an array
must be defined with the array. Two examples:
1. named_dimensions are specific to a specific array and have no meaning outside that array's direct metadata.
2. data type extensions must be repeated with each array that uses it.
There are probably others that I have not yet spotted.
In any case, if this is a hidden assumption, then it should be made explicit.

@DennisHeimbigner
Copy link
Author

The discussion in Issue #273
seems to validate my assumption that each array is completely independent of all other arrays
and its metadata is completely self contained including named dimensions and all extensions.
I recall a discussion about this in a Zarr meeting a long time ago. The reason for this assumption
is to support parallel processing at the array level so that a processor need not access any other metadata
and can operate on the array completely independently.
One consequence should be that groups serve only as namespaces and that the group's zarr.info
should not need to exist.
The one flaw in this is group level attributes. It is unclear what the use is for a group level attribute.
If it impacts the processing of an array in any way, then it violates the self-contained nature of arrays.
If it is to provide some documentation of the file, then it should be sufficient to only have attributes
in the top-level group. Further these attributes should have no consequence for processing an array.

@LDeakin
Copy link

LDeakin commented Oct 31, 2023

The spec says that array/group attributes are intended for storage of arbitrary user metadata. Attributes are not intended to change how arrays are encoded. If they did, that would not be supported by other implementations.

@zoj613
Copy link
Contributor

zoj613 commented May 15, 2024

I have a question: How do implementations of the zarr spec represent the underlying array data? Are chunks just normal in-memory array objects (e.g a numpy array in python)? If so, what if the chunk is too big to fit in memory? Or is array manipulation only ever done via updating the metadata file? I'm interested in implementing the v3 spec in a functional language that currently doesn't have an implementation. I tried reading the spec but it does not seem to mention any example of how array chunks are represented in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants