-
-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Support fsspec mutable mapping objects in zarr.open #2774
base: main
Are you sure you want to change the base?
Conversation
try: # type: ignore[unreachable] | ||
import fsspec | ||
|
||
if isinstance(store_like, fsspec.mapping.FSMap): | ||
if path: | ||
raise TypeError( | ||
"'path' was provided but is not used for FSMap store_like objects" | ||
) | ||
if storage_options: | ||
raise TypeError( | ||
"'storage_options was provided but is not used for FSMap store_like objects" | ||
) | ||
store = FsspecStore.from_mapper(store_like, read_only=_read_only) | ||
else: | ||
raise ( | ||
TypeError(f"Unsupported type for store_like: '{type(store_like).__name__}'") | ||
) | ||
except ImportError: | ||
raise ( | ||
TypeError(f"Unsupported type for store_like: '{type(store_like).__name__}'") | ||
) from None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all ugly code because I don't want to assume that fsspec is installed or import it at the module level. I'll look into a better approach similar to xarray's module_available code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we pack this logic into a stand-alone function that basically narrows its input to a FsspecStore
or errors? and maybe we could have some boolean variables in this module that represent if fsspec is present or not, e.g. something like this at the top of the file:
try:
import fsspec
has_fsspec = True
except ImportError:
has_fsspec = False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we pack this logic into a stand-alone function that basically narrows its input to a FsspecStore
or errors? and maybe we could have some boolean variables in this module that represent if fsspec is present or not, e.g. something like this at the top of the file:
try:
import fsspec
has_fsspec = True
except ImportError:
has_fsspec = False
def _make_async(fs: AbstractFileSystem) -> AsyncFileSystem: | ||
try: | ||
from fsspec.implementations.asyn_wrapper import AsyncFileSystemWrapper | ||
|
||
fs = AsyncFileSystemWrapper(fs) | ||
except ImportError as e: | ||
raise ImportError( | ||
f"The filesystem '{fs}' is synchronous, and the required " | ||
"AsyncFileSystemWrapper is not available. Upgrade fsspec to version " | ||
"2024.12.0 or later to enable this functionality." | ||
) from e | ||
return fs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this a function because I think it'll be helpful for wrapping sync filesystems for both from_url
and from_mapper
.
I recall there being issues with fsmap before, but I confess I don't really know what an fsmap is -- can someone explain what an fsmap is, how it differs from an fsspec filesystem, and why people would use one over the other? I have a vague feeling that it could be useful to have a |
FSMap was created specifically for the needs of Zarr, and it could have been essentially the same as the v2 FSStore, but was much quicker to get out and working within dask/fsspec. FSMap is a dict-compatible interface (mutable-mapping) on top of a FS instance, which zarr worked with since forever and ignores some FS functionality like the file-like API. To make it work with v3 might be complex, because zarr makes async calls, and the FSMap interface is blocking, even if the underlying FS is async. That means that there will be sync() within sync(), which might still work as zarr maintains its own event loop in a thread separate from fsspec's. |
To be clear: this PR does not use the mapper, but constructs a normal store from the mapper's attributes. I support this path. |
thanks for the questions and answers, Davis and Martin! IIUC there are three cases that we'd need to account for:
|
The instance has all the arguments it was made with as attributes, so you can make a new instance with asynchronous=True from those.
Is there a reason to bother doing this? |
I was thinking based on https://filesystem-spec.readthedocs.io/en/latest/async.html#limitations that LocalStore may be faster since it was designed around providing an async interface. |
IMO it's simpler if the path is always fsmap -> fsspecstore |
I doubt it. The disk is not really async (at least with the standard syscalls python uses), so none of the async code should make any difference for local reads at all. |
This is rough code, but I made some progress on supporting FSMap types and wanted to open a PR for early feedback. This isn't a priority for me, so I'd welcome anyone to take over this PR and/or close it and work in a different PR.
Addresses #2706
TODO:
docs/user-guide/*.rst
changes/