-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ZEP 8 (URL syntax) draft #48
base: main
Are you sure you want to change the base?
Conversation
@normanrz Please take a look. |
@MSanKeys963 Looks like there is an issue with the docs build that is unrelated to this PR. |
@martindurant Would appreciate your perspective on this --- I imagine you might say that we should just use fsspec syntax instead, though. |
Well indeed, I could say "why invent another"; although translating between |
While standardizing a URL scheme has benefits on its own, I think the main benefit/motivation for this ZEP is the formalization of Zip stores. Essentially, to comply with this ZEP, implementations need to implement zip stores. Maybe that should be written out more explicitly? |
While this ZEP was prompted by our discussion about zip stores, my intention was that we standardize on the syntax for various protocols, but that implementations would choose which ones to support. I think we could also push implementations to support zip format, but I'm not sure I want to tie that to this URL syntax proposal. |
@bogovicj this might also be relevant for your OME transformations proposal. |
@jbms: I have added #51 to fix the RTD build. Can you please update your PR? |
title: ZEP0008 | ||
description: URL syntax | ||
parent: draft ZEPs | ||
nav_order: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nav_order: 1 | |
nav_order: 8 |
Thanks @jbms for putting this together! There are a few situations I came up with for which I'm not sure what the What does it look like to use Base URL: Is it correct / valid to use Base URL: If one needs to add an adapter in a relative way, how does one go about it? Base URL: Which, if any, of these do you think should be used? Are any of these invalid?
|
One more thing: We've found it useful to be able to reference a particular part of the attributes stored in json this zarr3 zarr.json
Could you envision adding an For example: A specific use case: I often re-use and reference transformations. Since these are described by metadata (not arrays), For example, if this were adopted, something like this would not uncommon in my workflows:
|
On Tue, Nov 14, 2023, 05:53 John Bogovic ***@***.***> wrote:
Thanks @jbms <https://github.com/jbms> for putting this together! There
are a few situations I came up with for which I'm not sure what the
relative URL should be
What does it look like to use ..: to "go up" multiple levels?
Is this correct / valid?
Base URL: gs://bucket/0.zip|zip:a|zarr3:i
Relative URL: ..:..:1.zip|zip:b|zarr3:ii
Resolved URL: gs://bucket/1.zip|zip:b|zarr3:ii
I was imagining that the relative url would be:
`|..|..:1.zip|zip:b|zarr3:ii`
The part after the | is always the scheme, and a scheme of .. is needed to
get to the parent store.
Is it correct / valid to use .. in the "path part" of relative URL, after
a ..:?
Base URL: gs://bucket/0/a/i.zarr|zarr3:foo
Relative URL: ..:../b/i.zarr|zarr3:foo
Resolved URL: gs://bucket/0/b/i.zarr|zarr3:foo
If one needs to add an adapter in a relative way, how does one go about it?
For example:
Base URL: gs://bucket/0/a/i.zarr Desired Resolved URL:
gs://bucket/0/a/i.zarr|zarr3:foo`
Which, if any, of these do you think should be used? Are any of these
invalid?
- .|zarr3:foo (clearest to me)
- |zarr3:foo
- zarr3:foo
I was imagining `|zarr3:foo`
The existing standard interpretation of a relative url of `.` means to
strip everything after the last slash, and we should be consistent with
that. Therefore if the base url were specified as
`gs://bucket/0/a/i.zarr/` then `.|zarr3:foo` would also be valid, but
probably should not be preferred.
…
—
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAEJ2TUR5G466LQFB4DE63YENZUBAVCNFSM6AAAAAA4R5AJVCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJQGI2DIMZQG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
On Tue, Nov 14, 2023, 07:21 John Bogovic ***@***.***> wrote:
One more thing:
We've found it useful to be able to reference a particular part of the
attributes stored in json
with a URL. For example, for
this zarr3 zarr.json
{
"zarr_format": 3,
"node_type": "array",
"shape": [10000, 1000],
"dimension_names": ["rows", "columns"],
"data_type": "float64",
"chunk_grid": {
"name": "regular",
"configuration": {
"chunk_shape": [1000, 100]
}
},
"chunk_key_encoding": {
"name": "default",
"configuration": {
"separator": "/"
}
},
"codecs": [{
"name": "gzip",
"configuration": {
"level": 1
}
}],
"fill_value": "NaN",
"attributes": {
"foo": 42,
"bar": "apples",
"baz": [1, 2, 3, 4]
}
}
- /attributes/baz[0] points to 1
- /shape points to [10000, 1000]
- /chunk_grid/configuration points to { "chunk_shape": [1000, 100] }
Could you envision adding an attributes: or zarr.json:, or similar
adapter, that enaables this?
Yes, having a scheme for accessing an attribute sounds like a good idea.
One option would be a specific scheme for zarr attributes, like zarr3a, e.g:
"gs://bucket/0.zip|zip:a|zarr3:i|zarr3a:/foo"
or
"gs://bucket/0.zip|zip:a/i|zarr3a:/foo"
Another option would be a json scheme for accessing any json file, e.g.:
"gs://bucket/0.zip|zip:a|zarr3:i/zarr.json|json:/attributes/foo"
Then there is the question of what syntax to use for specifying the path
within the json document. A natural choice would be the existing json
pointer syntax (https://datatracker.ietf.org/doc/html/rfc6901), e.g.
"/transform/1". The json pointer syntax does use an unusual escaping
syntax for handling member names containing "/": for example, if you have
an object like:
{"foo/bar": 10. "foo~bar": 11}
then to access the 10 value you use a json pointer of "/foo~1bar", and to
access the 11 value you use a json pointer of "/foo~0bar".
In my opinion this escaping mechanism is rather unfortunate since it is
easy to forget the meaning of "~0" and "~1", but it isn't an issue if you
can avoid using "/" or "~" in member names.
… For example: gs://bucket/0.zip|zip:a|zarr3:i|zarr.json:attributes/foo
A specific use case: I often re-use and reference transformations. Since
these are described by metadata (not arrays),
and so referencing the specific metadata is helpful.
For example, if this were adopted, something like this would not uncommon
in my workflows:
{
"type" : "sequence",
"transformations" : [
{ "url" : "..:/localTransformations|zarr.json:/transform[1]" },
{ "url" : "gs://bucket/path/to/templateTransformation.zarr|zarr3:sharedTransforms|zarr.json:/transform[0]" },
]
}
—
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABAEJ2TAPT5G4BH5TRGA2TDYEOD5ZAVCNFSM6AAAAAA4R5AJVCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJQGQ2DCMZYGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
||
- An fsspec URL may be accepted by existing URL parsers/matchers not | ||
specifically designed for fsspec. | ||
- Because the interpreation the `::` delimiter within an fsspec URL differs from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Because the interpreation the `::` delimiter within an fsspec URL differs from | |
- Because the interpretation of the `::` delimiter within an fsspec URL differs from |
No description provided.