Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paths as URIs #243

Merged
merged 78 commits into from
Dec 3, 2024
Merged

Paths as URIs #243

merged 78 commits into from
Dec 3, 2024

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Oct 2, 2024

This PR closes #242 at the data model level - all paths are coerced to absolute URIs (i.e. file:///directory/test.nc or s3://bucket/test.nc) as they go into the Manifest.

As this forbids constructing manifests using relative paths, it requires minor changes to many tests (e.g. test.nc-> /test.nc). It also will require slightly more invasive changes to any tests that involve kerchunk references.

Sub-tasks:

@TomNicholas TomNicholas marked this pull request as ready for review October 19, 2024 02:33
@TomNicholas
Copy link
Member Author

What should we do with the paths in kerchunk references? Are they are always meant as absolute? I guess we should assume they are absolute, unless they have ./ or ../ at the start? Would fsspec ever produce relative paths like that?

cc @martindurant

@martindurant
Copy link
Member

Are they are always meant as absolute?

They are always meant "as interpreted by the target filesystem". The nature of that filesystem might be implied by the protocol of a path alone, but commonly additional arguments are also required. This means, that relative paths do work if the target happens to be the local filesystem (file://), but I think of the other filesystems, only ssh supports this concept at all. I would not expect this to be meaningful for basically any practical case.

Note that the dir:// filesystem adds prefixes to URLs for any filesystem, if that's useful at all.

@martindurant
Copy link
Member

(I am happy to require absolute paths even if it makes some tests slightly more verbose)

@TomNicholas
Copy link
Member Author

Thanks @martindurant !

The nature of that filesystem might be implied by the protocol of a path alone, but commonly additional arguments are also required.

But is the nature of the filesystem explicitly recorded in the kerchunk references format anywhere? Obviously if the prefix is explicit (e.g. s3://) then you know but otherwise?

that relative paths do work

I guess we should assume they are absolute, unless they have ./ or ../ at the start?

Would this approach work then?

(I am happy to require absolute paths even if it makes some tests slightly more verbose)

This might be helpful if the above approach doesn't work.

@martindurant
Copy link
Member

is the nature of the filesystem explicitly recorded in the kerchunk references format

No. The original intention was to have these in the "templates", but in practice, the remote_protocol, remote_options and fss arguments to ReferenceFileSystem are used (and often encoded in Intake prescriptions) in cases of ambiguity.

@TomNicholas TomNicholas mentioned this pull request Nov 15, 2024
20 tasks
@TomNicholas
Copy link
Member Author

On cloudpathlib, is upath any use?

I think that would have the same problem as cloudpathlib - I don't necessarily want these paths to have any connection to an actual storage layer, I just want to strictly validate and manipulate their forms.

Do you really need a wrapper?

Turns out I didn't need a wrapper, I can get away with just more if...else logic for each of the possible types of paths. It's slightly less neat but it avoids the dependency issues.

@TomNicholas
Copy link
Member Author

TomNicholas commented Dec 3, 2024

This now works!

It now builds on #323 instead of #318, which allowed me test handling of relative paths for dmrpp.

The typing CI is failing but only with errors inside the hdf reader (cc @sharkinsspatial), in lines I didn't touch, which I don't understand, so I'm going to punt on that so I can merge this (because it blocks earth-mover/icechunk#402).

@TomNicholas TomNicholas merged commit a08b495 into main Dec 3, 2024
10 of 11 checks passed
@TomNicholas TomNicholas deleted the paths_as_uris branch December 3, 2024 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DMR++ internals Kerchunk Relating to the kerchunk library / specification itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Forbid relative paths, and use file URI scheme internally?
5 participants