-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Partial support for filesystems in FileSet #374
Partial support for filesystems in FileSet #374
Conversation
First steps to add filesystem support in fsspec.
It turns out that glob.glob adds a trailing / for directories, whereas FileSystem.glob does not. Add this trailing / manually for directories, as further processing expects this.
Because Bill and his friends decided to use the forward slash for parameters because they didn't have directories anyway before I was even born, they're making life harder for us.
I don't know why it still fails on Windows despite replacing / by os.sep. |
Add unit tests to specifically test that other filesystems work with fileset (not passing yet).
Make necessary adaptations to actually support other file systems. Not all file systems have a root (s3fs, zipfile do not), and if they don't, then the FileHandler insistance on absolute paths means it's not possible for any file to exist on such a system. The tests pass on my system on Python 3.8.
This still fails on Windows (I don't know why) and it doesn't work as intended with S3FileSystem (it seems globbing behaves differently there): |
Finding files is extremely slow due to globbing in s3fs being extremely slow, probably due to the problem reported at fsspec/s3fs#378. |
Note that at present the s3fs example only works with s3fs 0.5.0 and not with the latest released s3fs 0.5.1, due to the bug reported at fsspec/s3fs#378. Hopefully that problem will be fixed so that this searching works with s3fs 0.5.2 or 0.6. |
I don't understand why this raises IndexError on Windows and I don't have a Windows machine with Python setup, but maybe adding it to the exception list makes this pass. Not sure if it would.
Remove unused imports and convert a static method to a regular method to account for the need of self.file_system.
Can anyone with access to a Windows Python installation shed light on why the tests may be failing on Windows? Did the tests actually succeed on Windows prior to this PR? |
Fix one more hardcoded / replacing or extending by os.sep. I don't know what happens if on Windows one interacts with remote filesystems such as FTP or S3FS (/ or \?), so test for both.
Hi @gerritholl! I can test it on a Windows machine, but not before some time later next week since I don't have a setup ready. |
Meanwhile fsspec/s3fs#379 was merged so the typhon functionality and the documented example should work with the latest s3fs master or the next release. |
This seems to be caused by inconsistent handling of path separators. One place I found to be inconsistent is in
|
Ah, that makes sense. If we search on a remote file system like this PR intends to support, then the assumption that the file system uses |
In a fileset, accept \\ to match / on Windows. The generic nature of AbstractFileSystem means that os.sep may not match the separator used on the filesystem. fsspec solves this by always using /, even on Windows, but forcing this would break backward compatibility, therefore accept both.
With filesystems support, filesestyms always use / even on Windows, therefore use posixpath.join to construct test cases.
Replace separater in base testdir because fsspec always uses normal separators, not windows ones.
on windows posixpath.abspath still gives \\... relpace expicitly
With all the commits to make it work on Windows it has become regrettably somewhat ugly, and perhaps breaking backward compatibility despite tests passing. Can Windows users please comment if it breaks their workflow or not? |
I don't know if there are any Windows users of Typhon at the moment, so don't expect an answer to your question. We mainly added Windows as a supported platform because Typhon is a Python-only package and it seemed like the right thing to do to be good Python citizens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good in general 👍 I only left some minor comments that you may address or ignore ;)
typhon/files/handlers/common.py
Outdated
def _ensure_local_filesystem(self, file_info): | ||
if not isinstance(file_info.file_system, LocalFileSystem): | ||
raise NotImplementedError( | ||
f"File handler {str(type(self).__name__):s} can only " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess type(...).__name__
is always a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, simplified.
def __repr__(self): | ||
return f"FileInfo(\n '{self.path}',\n" \ | ||
f" times={self.times},\n" \ | ||
f" attr={self.attr}\n)" | ||
f" attr={self.attr},\n" \ | ||
f" fs={self.file_system})" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The __repr__
method is used to represent an object in containers (e.g. a list of FileInfo
) so this implementation might be a bit wordy. But this is just a fine-tuning comment that you can happily ignore.
I thought I had write access here, but apparently not (anymore?) as no "merge" button shows up ("Only those with write access to this repository can merge pull requests.). |
Merges can only be done by maintainers which are currently Lukas and me. |
This PR adds partial support for fsspec AbstractFileSystem implementation in FileSet, in particular for finding files. An example using
sf3s
:Gives: