-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] appending to zarr in object store fails #11
Comments
I do see the same thing, even if adding |
I would say this bug is consistent with my experience, but I was never able to nail down the problem precisely. I also have a counter-example in which append did work with gcsfs: https://github.com/pangeo-forge/pangeo-smithy/blob/0c31241198db764ffc3b2979ecd2413b26cba5ad/pipeline.py |
I found this doc useful when exploring if this could be a consistency issue: https://cloud.google.com/storage/docs/consistency tldr; gcs lists read-after-write under strongly consistent operations. |
Am I correct in thinking that the main problem is the |
I have been trying to update a key in-place, and I find that the immediate read-after-write consistency is not the case - you can get previous value(s) back from either the public HTTP link or the www.googleapis.com/download/storage endpoint for a long time. I even found things like
(i.e., never the most recent value, and not even consistent, and this remained true for minutes later, even though the bytes count in the storage console updated) One possible mitigation could be: when uploading the file, the API returns the generation ID, which should be unique. One can request contents by specific generation ID and/or can specify that an update should only happen on a generation-specific precondition. How to code this into gcsfs?? |
It is also true that the directory listing can become out of date, which would give your the wrong file-size when opening a file, but this is not an issue when you want to read the whole file (which the zarr case) |
This is resolved by #27, which no longer uses appending. |
I've been trying out the
http_xarray_zarr
pipeline we recently added to this repository. I think I've run into a bug (or invalid assumption) related to rapid appending to Zarr object stores. Here's a simple example:This raises for the gcs store...
You'll note the dimensions in the two datasets differ along the time axis (4 vs 36). I first noticed this behavior when using the above mentioned Prefect.Flow to build a zarr store incrementally in GCS. I'm leaning toward this being an issue with object store or file system mapper consistency but I'm not sure how to diagnose that exactly. I'm hoping @rabernat or @martindurant can provide some guidance here.
The text was updated successfully, but these errors were encountered: