You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to read a zarr store into an xarray dataset, make a few modifications, and then save the resulting dataset back to the same-named zarr store by overwriting it. However, my data values change in the process of overwriting the zarr store. I have been able to reproduce this behavior without even changing any of the dataset values before trying to overwrite the store.
importxarrayasxr# Create very simple datasettest_da=xr.DataArray([float(1), float(2)])
test_ds=test_da.to_dataset(name='test_data')
test_ds=test_ds.astype('int8')
test_ds
# Save created dataset as a zarr storetest_ds.to_zarr('test.zarr', mode='w')
# Open the zarr store and retrieve the datatest_retrieve=xr.open_dataset('test.zarr', engine='zarr')
# At this point, the data values are as expectedtest_retrieve.compute()
# Save the retrieved data to the same zarr storetest_retrieve.to_zarr('test.zarr', mode='w')
# The values have changed, both in memory and in the zarr storetest_retrieve.compute()
Similar behavior occurs when the dtype is float, but then the values are usually (always?) replaced with nan.
An obvious workaround is to save to a differently named zarr store, delete the original, and then move the new store to the original path, but this creates a lot of overhead since I am working with thousands of zarr stores. Also, the fact that the values change to potentially valid results when using integer data types means that the data corruption could go undetected.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I am trying to read a zarr store into an xarray dataset, make a few modifications, and then save the resulting dataset back to the same-named zarr store by overwriting it. However, my data values change in the process of overwriting the zarr store. I have been able to reproduce this behavior without even changing any of the dataset values before trying to overwrite the store.
Similar behavior occurs when the dtype is float, but then the values are usually (always?) replaced with nan.
An obvious workaround is to save to a differently named zarr store, delete the original, and then move the new store to the original path, but this creates a lot of overhead since I am working with thousands of zarr stores. Also, the fact that the values change to potentially valid results when using integer data types means that the data corruption could go undetected.
Beta Was this translation helpful? Give feedback.
All reactions