You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now I like this solution (not 100% sure it's going to work though and not implemented in this PR):
Uproot can only be used to open files with .root extension.
Object is specified after the root file in the urlpath (not necessarily at the end). For example: "simplecache::zip://uproot-issue121.root:Events/MET_pt::file:///tmp/pytest-of-runner/pytest-0/test_fsspec_zip0/uproot-issue121.root.zip". In this case the object is Events/MET_pt.
Find the object by searching for a filename ending in .root followed by single :. (Will this be robust enough?).
We can't assume that all ROOT files end with a .root extension, but we could limit the applicability of the colon-parsing to just those files that do. (The colon-parsing is supposed to be a convenience. If you have a weird situation—a ROOT file not ending in .root—you don't deserve a convenience!)
The rule for what we do has to be simple enough to communicate and hopefully guess or intuit. Although the rules are complicated now, they're made that way to correspond to intuition. But how about the following?
If the string contains .root:, the colon of the last.root: is used to split between URI-path and object-path.
Otherwise, the whole string is interpreted as a URI-path, every time.
This breaking change would have to go in when the minor version changed. Ideally, we would need to have warned users about this ahead of time, but I don't see a way to do that.
The above would not remove the existing rules that pathlib.Path is entirely interpreted as a URI-path, every time, and the {"uri-path": "object-path"} syntax would still be recognized.
I agree with those rules, they are simple enough and should be robust (we'll see if they can handle all cases, thankfully you cannot name a windows drive ".root"...).
Find .root followed by single : (but not by ::!)
There shouldn't be any restriction to object paths except they cannot contain : (in that case use the dict notation).
I think it makes more sense to apply the rule to the first .root: instead of the last given how fsspec chains urls, but I cannot think of any reasonable case where it would trigger more than once (a zip file having a .root extension instead of .zip...). In this case for instance matching the first appearance would make it work.
In any case I will make a separate PR for this into main-fsspec (which will be live next minor version).
Otherwise do you @nsmith-@jpivarski have any other comments regarding this PR? I'm not planning on further changes.
I'll add that the reason I suggested the last.root: is because it might be part of a directory name. If someone's working on a system with a strangely named directory, there's not a lot they can do about it (manually creating symlinks is more work than I'm considering reasonable). Since : aren't allowed to appear in object-paths (for the sake of colon-parsing), it's the last colon that matters.
The text was updated successfully, but these errors were encountered:
This comes from #1016 (review), but I've copied it here to make it a real issue.
@lobis:
@jpivarski:
The above would not remove the existing rules that
pathlib.Path
is entirely interpreted as a URI-path, every time, and the{"uri-path": "object-path"}
syntax would still be recognized.@lobis:
I'll add that the reason I suggested the last
.root:
is because it might be part of a directory name. If someone's working on a system with a strangely named directory, there's not a lot they can do about it (manually creating symlinks is more work than I'm considering reasonable). Since:
aren't allowed to appear in object-paths (for the sake of colon-parsing), it's the last colon that matters.The text was updated successfully, but these errors were encountered: