Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Support fsspec mutable mapping objects in zarr.open #2774

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

maxrjones
Copy link
Member

@maxrjones maxrjones commented Jan 28, 2025

This is rough code, but I made some progress on supporting FSMap types and wanted to open a PR for early feedback. This isn't a priority for me, so I'd welcome anyone to take over this PR and/or close it and work in a different PR.

Addresses #2706

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Jan 28, 2025
Comment on lines +315 to +335
try: # type: ignore[unreachable]
import fsspec

if isinstance(store_like, fsspec.mapping.FSMap):
if path:
raise TypeError(
"'path' was provided but is not used for FSMap store_like objects"
)
if storage_options:
raise TypeError(
"'storage_options was provided but is not used for FSMap store_like objects"
)
store = FsspecStore.from_mapper(store_like, read_only=_read_only)
else:
raise (
TypeError(f"Unsupported type for store_like: '{type(store_like).__name__}'")
)
except ImportError:
raise (
TypeError(f"Unsupported type for store_like: '{type(store_like).__name__}'")
) from None
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all ugly code because I don't want to assume that fsspec is installed or import it at the module level. I'll look into a better approach similar to xarray's module_available code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we pack this logic into a stand-alone function that basically narrows its input to a FsspecStore or errors? and maybe we could have some boolean variables in this module that represent if fsspec is present or not, e.g. something like this at the top of the file:

try:
  import fsspec
  has_fsspec = True
except ImportError:
  has_fsspec = False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we pack this logic into a stand-alone function that basically narrows its input to a FsspecStore or errors? and maybe we could have some boolean variables in this module that represent if fsspec is present or not, e.g. something like this at the top of the file:

try:
  import fsspec
  has_fsspec = True
except ImportError:
  has_fsspec = False

Comment on lines +34 to +45
def _make_async(fs: AbstractFileSystem) -> AsyncFileSystem:
try:
from fsspec.implementations.asyn_wrapper import AsyncFileSystemWrapper

fs = AsyncFileSystemWrapper(fs)
except ImportError as e:
raise ImportError(
f"The filesystem '{fs}' is synchronous, and the required "
"AsyncFileSystemWrapper is not available. Upgrade fsspec to version "
"2024.12.0 or later to enable this functionality."
) from e
return fs
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this a function because I think it'll be helpful for wrapping sync filesystems for both from_url and from_mapper.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 28, 2025

I recall there being issues with fsmap before, but I confess I don't really know what an fsmap is -- can someone explain what an fsmap is, how it differs from an fsspec filesystem, and why people would use one over the other?

cc @martindurant

I have a vague feeling that it could be useful to have a Store class that wraps generic MutableMapping instances, and maybe fsmaps could go there, but that requires me knowing more about the user context for fsmap.

@martindurant
Copy link
Member

FSMap was created specifically for the needs of Zarr, and it could have been essentially the same as the v2 FSStore, but was much quicker to get out and working within dask/fsspec.

FSMap is a dict-compatible interface (mutable-mapping) on top of a FS instance, which zarr worked with since forever and ignores some FS functionality like the file-like API.

To make it work with v3 might be complex, because zarr makes async calls, and the FSMap interface is blocking, even if the underlying FS is async. That means that there will be sync() within sync(), which might still work as zarr maintains its own event loop in a thread separate from fsspec's.

@martindurant
Copy link
Member

To be clear: this PR does not use the mapper, but constructs a normal store from the mapper's attributes. I support this path.

@maxrjones
Copy link
Member Author

thanks for the questions and answers, Davis and Martin!

IIUC there are three cases that we'd need to account for:

  1. FSMap wraps an async instance of an async-compatible filesystem
    Solution - as implemented in this PR, extract the wrapped filesystem and use it to open an FsspecStore
  2. FSMap wraps a non-async instance of a non-async-compatible filesystem
    Solution option 1 - extract the wrapped filesystem and wrap in AsyncFileSystemWrapper to open an FsspecStore as in Wrap sync fs for xarray.to_zarr #2533
    Solution option 2 - if it's a "file" protocol, extract the wrapped filesystem and open a LocalStore rather than FsspecStore, for other protocols, wrap in AsyncFileSystemWrapper to open an FsspecStore
  3. FSMap wraps a non-async instance of an async-compatibile filesystem
    Solution - this is the case I'm not sure about. Is there a way to convert from a sync instance to an async instance without needing to wrap it in AsyncFileSystemWrapper? @martindurant could you please offer guidance here?

@martindurant
Copy link
Member

Is there a way to convert from a sync instance to an async instance without needing to wrap it in AsyncFileSystemWrapper

The instance has all the arguments it was made with as attributes, so you can make a new instance with asynchronous=True from those.

if it's a "file" protocol, extract the wrapped filesystem and open a LocalStore rather than FsspecStore

Is there a reason to bother doing this?

@maxrjones
Copy link
Member Author

maxrjones commented Jan 28, 2025

if it's a "file" protocol, extract the wrapped filesystem and open a LocalStore rather than FsspecStore

Is there a reason to bother doing this?

I was thinking based on https://filesystem-spec.readthedocs.io/en/latest/async.html#limitations that LocalStore may be faster since it was designed around providing an async interface.

@d-v-b
Copy link
Contributor

d-v-b commented Jan 28, 2025

IMO it's simpler if the path is always fsmap -> fsspecstore

@martindurant
Copy link
Member

LocalStore may be faster since it was designed around providing an async interface.

I doubt it. The disk is not really async (at least with the standard syscalls python uses), so none of the async code should make any difference for local reads at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs release notes Automatically applied to PRs which haven't added release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants