-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"source"
encoding for datasets opened from fsspec
objects
#8923
Conversation
Could use `getattr(filename_or_obj, "path", filename_or_obj)` to avoid `isinstance` checks.
Without knowing much (I generally |
Shouldn't |
the main use case is indeed to extract additional data, which you'd do immediately after
As far as I can tell, they only convert path-likes to string (which these objects are not, they are file-like, not path-like). Are you suggesting we should change that? |
I think this is fine, but our long-term goal is to delete |
my impression of that discussion was that we wanted to either return the encoding in a separate object, or somehow remove the encoding after the first operation (i.e. not carry it around). Either way would be fine with me, since I would still have access to it immediately after opening. |
Would a dataset with this in encoding be round tripped without error? Would be good to test that |
I'm not opposed to adding an explicit test (since I can't find any existing one right now), but if it would cause problems we'd also have those with string paths / urls – and those have been working just fine since long ago. As far as I can tell, |
Ah thanks. My mistake m I thought we were sticking in the fsspec object not just the path |
as far as I can tell, we could write anything in that encoding ( |
When opening files from path-like objects (
str
,pathlib.Path
), the backend machinery (_dataset_from_backend_dataset
) sets the"source"
encoding. This is useful if we need the original path for additional processing, like writing to a similarly named file, or to extract additional metadata. This would be useful as well when usingfsspec
to open remote files.In this PR, I'm extracting the
path
attribute that mostfsspec
objects have to set that value. I've considered usingisinstance
checks instead of thegetattr
-with-default, but the list of potential classes is too big to be practical (at least 4 classes just withinfsspec
itself).If this sounds like a good idea, I'll update the documentation of the
"source"
encoding to mention this feature.whats-new.rst