Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AbstractBufferedFile.__fspath__ breaks dask's PyArrow.orc test #104

Closed
TomAugspurger opened this issue Aug 13, 2019 · 8 comments · Fixed by #105
Closed

AbstractBufferedFile.__fspath__ breaks dask's PyArrow.orc test #104

TomAugspurger opened this issue Aug 13, 2019 · 8 comments · Fixed by #105

Comments

@TomAugspurger
Copy link
Collaborator

With fsspec master, the following dask test fails

$ pytest dask/dataframe/io/tests/test_orc.py::test_orc_with_backend 

The pyarrow error message isn't really helpful. I've traced it down to AbstractBufferedFile.fspath. Removing that, the test passes.

diff --git a/fsspec/spec.py b/fsspec/spec.py
index 0b66f99..6567ad9 100644
--- a/fsspec/spec.py
+++ b/fsspec/spec.py
@@ -1162,10 +1162,6 @@ class AbstractBufferedFile(io.IOBase):
 
     def __str__(self):
         return "<File-like object %s, %s>" % (type(self.fs).__name__, self.path)
-
-    def __fspath__(self):
-        return self.fs.protocol + "://" + self.path
-
     __repr__ = __str__
 
     def __enter__(self):

Does fspath make sense on this object? Is it actually present on the filesystem, or just in memory?

xref dask/dask#5267

@martindurant
Copy link
Member

Drat, that is annoying. fspath was specifically asked for, and it makes a lot of sense in the context of fsspec: it gives back the URL with which you can access this file. pyarrow should really not be using it to determine whether the file is local or not (the behaviour of fspath is not defined for non local files) - they should use .fileno() instead.
Not sure what to do about it yet.

@TomAugspurger
Copy link
Collaborator Author

Hmm does __fspath__ say anything about remote vs. local? I don't see anything in the PEP.

@TomAugspurger
Copy link
Collaborator Author

It's not quite clear to me, but should we interpret the the filesystem in "Return the file system path representation of the object." from https://docs.python.org/3/library/os.html#os.PathLike.__fspath__ as "the local filesystem"? I'm not sure what else it could be understood as.

@martindurant
Copy link
Member

Yeah, so it's less that clear. If only people hadn't started using Path objects...
You think it should be removed?

@TomAugspurger
Copy link
Collaborator Author

Based on my (possibly incorrect) reading of the PEP, __fspath__() should return the path to the file on the local file system.

You say you have a .fileno() that pyarrow should be using. Does every file descriptor have an associated path?

@martindurant
Copy link
Member

you have a .fileno() that pyarrow should be using

The standard python file object has this. Actually, all IOBase derivatives have this (including fsspec ones like s3fs files), and it returns UnsupportedOperation where it isn't a real local file. They also have the path attribute; but fspath was created so you could use paths and Paths interchangeably.

Does every file descriptor have an associated path?

As opposed to something ephemeral like a socket? I don't know, but it would be good to find out if pyarrow would use a file descriptor or not.

@TomAugspurger
Copy link
Collaborator Author

I think the fact that io.BytesIO doesn't implement __fspath__ hints that it's meant for files that are local to the filesystem.

In [11]: import os

In [12]: f = io.BytesIO(b'')

In [13]: os.fspath(f)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-4431b4b0ce22> in <module>
----> 1 os.fspath(f)

TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO

@martindurant
Copy link
Member

OK, OK: I'll remove it again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants