-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why do storage transformers need "type" separate from "configuration" #191
Comments
Good point, I just took this design over from other extension points. This also touches on the question if all extension-points (e.g. dtype, transformers) also need an entry in the Those are the different points in my mind, trying to order it a bit:
|
As an aside, I again get the feeling that I could use lots of examples to help make these types of decisions. |
Here's a proposal for a more coherent terminology and config: Zarr has extension points, which allow to add new functionality without changing the core specification. Those are
Additionally, there will be metadata conventions (zarr-developers/zeps#28) (also for groups and arrays), which do not contain functionality needed by zarr implementations itself, but higher-level libs and apps. Preferably specific extension points should be used over the more generic "extensions", which can be used if non of the others match. An array config with all different types of extension points could look like this: {
"shape": [10000, 1000],
"data_type": { // string or extension point object with fallback
"name": "datetime",
"configuration": {
"unit": "ns"
},
"fallback": "int64"
},
"chunk_grid": {
"type": { // string or extension point object
"name": "hexagonal",
"configuration": {
"origin": "…"
}
},
"chunk_shape": [1000, 100],
"separator" : "/"
},
"codecs": [ // list of extension point objects
{
"name": "gzip",
"configuration": {
"level": 1
}
}
],
"storage_transformers": [ // list of extension point objects
{
"name": "sharding",
"configuration": {
"type": "indexed",
"chunks_per_shard": [2, 2]
}
}
],
"fill_value": null,
"extensions": [ // list of extension objects with must_understand
{
"name": "my_extension",
"must_understand": false,
"configuration": {
"foo": "bar"
}
}
],
"attributes": {}
} This is slightly different from the current version, but uses more coherent extension point objects. The {
"chunk_grid": {
"name": "hexagonal",
"configuration": {
"chunk_shape": [1000, 100],
"origin": "…",
"separator" : "/"
}
}
} This would remove one level and might make more sense if any future chunk-grids do not have a chunk_shape as currently defined. The behavior when an extension point is needed to be able to read array chunk data is the same as now:
For groups the What do you think about this @jbms @joshmoore @rabernat @WardF @jakirkham? Happy to discuss this later in the ZEP meeting. @jbms Brought up that we might also use the top level object of the metadata for extensions instead of the |
Thanks, I think this is moving in the right direction. I think it would be nice to be as consistent as possible, so that it is easier to remember when writing manually. In particular, rather than have a mix of "type" and "name", just always use one key. |
Yep, meant to use |
I think for storage transformers it might be better to just use a single name, like "name", rather than both name and type. |
Yep, in this case the type is just a part of the transformer-specific config, but I agree that we can probably drop it. |
This is what V3 currently says about how to specify storage transformers
zarr-specs/docs/core/v3.0.rst
Lines 1165 to 1174 in b509f14
Why can't we just put
type
insideconfiguration
? That just seems simpler. Plus, it may not make sense to definetype
for some storage transformers. That meanstype
is a transformer-specific configuration parameter anyway.cc @jstriebel
The text was updated successfully, but these errors were encountered: