Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify rule for colon parsing between URI-path and object-path #1025

Closed
jpivarski opened this issue Nov 15, 2023 · 1 comment · Fixed by #1028
Closed

Simplify rule for colon parsing between URI-path and object-path #1025

jpivarski opened this issue Nov 15, 2023 · 1 comment · Fixed by #1028
Assignees
Labels
feature New feature or request

Comments

@jpivarski
Copy link
Member

This comes from #1016 (review), but I've copied it here to make it a real issue.

@lobis:

Right now I like this solution (not 100% sure it's going to work though and not implemented in this PR):

  • Uproot can only be used to open files with .root extension.
  • Object is specified after the root file in the urlpath (not necessarily at the end). For example: "simplecache::zip://uproot-issue121.root:Events/MET_pt::file:///tmp/pytest-of-runner/pytest-0/test_fsspec_zip0/uproot-issue121.root.zip". In this case the object is Events/MET_pt.
  • Find the object by searching for a filename ending in .root followed by single :. (Will this be robust enough?).

@jpivarski:

We can't assume that all ROOT files end with a .root extension, but we could limit the applicability of the colon-parsing to just those files that do. (The colon-parsing is supposed to be a convenience. If you have a weird situation—a ROOT file not ending in .root—you don't deserve a convenience!)

The rule for what we do has to be simple enough to communicate and hopefully guess or intuit. Although the rules are complicated now, they're made that way to correspond to intuition. But how about the following?

  1. If the string contains .root:, the colon of the last .root: is used to split between URI-path and object-path.
  2. Otherwise, the whole string is interpreted as a URI-path, every time.

This breaking change would have to go in when the minor version changed. Ideally, we would need to have warned users about this ahead of time, but I don't see a way to do that.

The above would not remove the existing rules that pathlib.Path is entirely interpreted as a URI-path, every time, and the {"uri-path": "object-path"} syntax would still be recognized.

@lobis:

I agree with those rules, they are simple enough and should be robust (we'll see if they can handle all cases, thankfully you cannot name a windows drive ".root"...).

  • Find .root followed by single : (but not by ::!)
  • There shouldn't be any restriction to object paths except they cannot contain : (in that case use the dict notation).
  • I think it makes more sense to apply the rule to the first .root: instead of the last given how fsspec chains urls, but I cannot think of any reasonable case where it would trigger more than once (a zip file having a .root extension instead of .zip...). In this case for instance matching the first appearance would make it work.

In any case I will make a separate PR for this into main-fsspec (which will be live next minor version).

Otherwise do you @nsmith- @jpivarski have any other comments regarding this PR? I'm not planning on further changes.

I'll add that the reason I suggested the last .root: is because it might be part of a directory name. If someone's working on a system with a strangely named directory, there's not a lot they can do about it (manually creating symlinks is more work than I'm considering reasonable). Since : aren't allowed to appear in object-paths (for the sake of colon-parsing), it's the last colon that matters.

@jpivarski jpivarski added the feature New feature or request label Nov 15, 2023
@lobis lobis self-assigned this Nov 15, 2023
@lobis lobis linked a pull request Nov 16, 2023 that will close this issue
@lobis
Copy link
Collaborator

lobis commented Nov 16, 2023

Fixed in #1028 (will take some time to go into main).

@lobis lobis closed this as completed Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
2 participants