Skip to content

multiple zarr files + fsspec.get_mapper #286

Closed
@Mikejmnez

Description

@Mikejmnez

I have a sequence of zarr files distributed across different nodes that I want to read in parallel, while only providing a string (glob-like) path.

The behavior I want to emulate:
For netcdf-files, we can do this using

url = fsspec.open_local(paths)

where paths is given by

paths= '/directoryA/*/subdirectoryB/*.nc'

such that
len(glob(paths)) = len(url)
e.g. 5 (5 nc-files distributed on different directories). The url is then used as an argument for xarray.open_mfdataset

The problem
zarr files open with a mapper (url=fsspec.get_mapper(paths) with url as an argument to xarray.open_zarr), and a glob-like path does not work as nicely (compact) as it does with fsspec.open_local() and nc-files. That is, given

paths= '/directoryA/*/subdirectoryB/*'

(where the zarr stores appear as directories) we get

len(fsspec.get_mapper(paths))=0

If you just try, the right hand side is zero, while the LHS > 0.

A solution to the problem is to just pass the glob-like path directly to _open_zarr (with proper modifications to _open_zarr function much like xarray.open_mfdataset). I am just wondering if fsspec.get_mapper(paths) can take a glob-like path string and I just haven't figured out yet how...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions