-
-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for transpose and moveaxis #1256
Comments
Hi @John-P. Thanks for raising this & sorry for the confusion. If you are suggesting additional methods for zarr-python itself, then I'll transfer the issue back. If you are suggesting that the transpositions should be stored in the file format itself then this would be the place for the issue. cc: @MSanKeys963 |
xref: #1236 |
Ah ok maybe it is better on zarr-python then, sorry for that. I can repost over there if you want to close this. |
It's fine. I'll transfer. (Thanks for bearing with us) |
As noted in the OP, there are already libraries that support this (like Dask). Would add XArray to that list. There may be even more as the Array API sees broader adoption. Given there are libraries already solving the computation workflow side and Zarr is focused on the storage side, think keeping a cleaner separation of concerns (workflow from storage) will yield a better user experience (easy to see what to use, where to look, clear sense of how to compose). So would prefer not implementing this. |
I've actually really struggled to use dask or xarray for this. Dask arrays don't seem to work in subprocesses (just silently hangs, although I have never used dask before and my may doing this wrong) and xarray does not seem to be able to handle a zarr array unless it is on disk (e.g. doesn't work with tifffile zarr store etc.), in a group, and with special metadata. I am just a quite baffled as to how difficult it is to do this. Edit: Any advice of how to actually get this working with dask/xarray or similar would be appreciated as all of my attempts have encountered some critical issue. |
Dask is optional. You don't have to use Dask with Xarray.
Definitely not true. We use cloud-based Zarr arrays all the time. If you can share a reproducible example of how you're trying to load data in Xarray, I'd be happy to try to help debug. |
One of the major appeals of zarr for me was the ability to read arrays from subprocesses. However, I cannot get this to work with xarray. It simply hangs. Although from the documentation it appears that if that backend supports multiple processes then this should work. It functions just fine with zarr but when wrapping in xarray it deadlocks. I don't think I am using dask here unless xarray is doing something under the hood. I am able to get it to work if I do use dask and setup up a client etc but that seems like a lot on unnecessary complexity for simply reading the array. Here is a simplified snippet of how the array is loaded: import tifffile
import xarray as xr
import zarr
path = ...
tiff = tifffile.TiffFile(path)
# Zarr store contains a group with arrays under keys [0, 1, ...]
zarr_tiff_store = tiff.aszarr()
zarr_group = zarr.open(zarr_tiff_store, mode="r")
dataset = xr.open_zarr(zarr_tiff_store, consolidated=False)
# Xarray sets the dtype wrong so I have to copy over from zarr (a bug?)
for key, array in zarr_group.items():
dataset[key] = dataset[key].astype(array.dtype)
# Normalise axes to be TZYXC
tzyxc_dataset = dataset.copy()
tzyxc_dataset["0"] = tzyxc_dataset["0"].expand_dims(
dim=[a for a in "TCZYX" if a not in tzyxc_dataset["0"].dims],
)
tzyxc_dataset["0"] = tzyxc_dataset["0"].transpose(
"T", "Z", "Y", "X", "C"
) If I create this from the file path in a couple of subprocesses they deadlock when both trying to read the array. |
I keep encountering situations where I would really like to use transpose or moveaxis as with a numpy array. This is possible via creating a dask array from a zarr array. However, this seems like something that should be a part of zarr-python. Is there any interest in implementing this, perhaps in the same or similar way that it has been for dask?
EDIT: Just posting this here as the Zarr-python issues page suggests posting new feature proposals here.
The text was updated successfully, but these errors were encountered: