Description
I have a sequence of zarr files distributed across different nodes that I want to read in parallel, while only providing a string (glob-like) path.
The behavior I want to emulate:
For netcdf-files, we can do this using
url = fsspec.open_local(paths)
where paths is given by
paths= '/directoryA/*/subdirectoryB/*.nc'
such that
len(glob(paths)) = len(url)
e.g. 5 (5 nc-files distributed on different directories). The url
is then used as an argument for xarray.open_mfdataset
The problem
zarr files open with a mapper (url=fsspec.get_mapper(paths)
with url as an argument to xarray.open_zarr
), and a glob-like path does not work as nicely (compact) as it does with fsspec.open_local()
and nc-files. That is, given
paths= '/directoryA/*/subdirectoryB/*'
(where the zarr stores appear as directories) we get
len(fsspec.get_mapper(paths))=0
If you just try, the right hand side is zero, while the LHS > 0.
A solution to the problem is to just pass the glob-like
path directly to _open_zarr
(with proper modifications to _open_zarr
function much like xarray.open_mfdataset
). I am just wondering if fsspec.get_mapper(paths)
can take a glob-like path string and I just haven't figured out yet how...