-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix zarr append dtype checks #6476
Conversation
This WIP should close #6345 when complete. I'll mark as Ready for review along with an explanatory comment when it's ready. |
Here's a first pass at a solution for #6345. This is a new area for me, so I certainly look forward to feedback from those with more experience. The issue identified in #6345 was lack of support for append-mode Zarr writes for variables with dtype The MRE at the top of the linked issue demonstrated this problem for the case in which a new variable is being added, to which @shoyer replied in #6345 (comment):
In this comment, Stephan also links to the motivating case for the datatype check, which was to prevent truncation of strings of a greater maximum length (e.g. In sum, it seems to me that the requirements for a fix to these datatype checks are as follows:
This PR accomplishes this by leaving the initial checks largely intact, with the following adjustments:
I've incorporated coverage for the above-listed requirements into the test suite. The way the datatype validation is now written, any datatype (not just Thank you in advance to reviewers of my first xarray PR. (Noting also that it looks like the docs build failure is common to other current PRs, and not specific to this PR.) |
Hi @cisaacstern — thanks a lot and welcome to xarray! This looks very coherent, as far as the context I have. Any thoughts from others who know the area better? |
I will take a look soon!
…On Thu, Apr 14, 2022 at 6:23 PM Maximilian Roos ***@***.***> wrote:
Hi @cisaacstern <https://github.com/cisaacstern> — thanks a lot and
welcome to xarray!
This looks very coherent, as far as the context I have. Any thoughts from
others who know the area better?
—
Reply to this email directly, view it on GitHub
<#6476 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJJFVVACXN36KM4XREIBCLVFDAKVANCNFSM5TE2KQ4Q>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@shoyer just wanted to chime in with a bump to say that I'll greatly appreciate your review of this PR when you get a moment. This fix is currently the only blocker for pangeo-forge/staged-recipes#120. (I've just confirmed today that installing xarray from this PR branch resolves the error there, as detailed in bullets 2-3 of pangeo-forge/staged-recipes#120 (comment).) I'm sure you're quite busy, so just want to emphasize how much I appreciate your attention to this, whenever you get a chance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Charles for your contribution here, your gentle reminder and your patience :)
This looks great to me. My only suggestion is to summarize some of the insights from your comments here into code, where they will be easier to find for future readers/editors.
): | ||
# and not re.match('^bytes[1-9]+$', var.dtype.name)): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a brief comment (e.g, based on your comment here: #6476 (comment)) to summarize why it's OK not to check these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the review!
I've added the requested comment in 9f97f00
Thanks @cisaacstern this is a great first contribution. Hope to see you around again! |
Thanks all for your mentorship on this! Excited to continue contributing. |
* main: (24 commits) Fix overflow issue in decode_cf_datetime for dtypes <= np.uint32 (pydata#6598) Enable flox in GroupBy and resample (pydata#5734) Add setuptools as dependency in ASV benchmark CI (pydata#6609) change polyval dim ordering (pydata#6601) re-add timedelta support for polyval (pydata#6599) Minor Dataset.map docstr clarification (pydata#6595) New inline_array kwarg for open_dataset (pydata#6566) Fix polyval overloads (pydata#6593) Restore old MultiIndex dropping behaviour (pydata#6592) [docs] add Dataset.assign_coords example (pydata#6336) (pydata#6558) Fix zarr append dtype checks (pydata#6476) Add missing space in exception message (pydata#6590) Doc Link to accessors list in extending-xarray.rst (pydata#6587) Fix Dataset/DataArray.isel with drop=True and scalar DataArray indexes (pydata#6579) Add some warnings about rechunking to the docs (pydata#6569) [pre-commit.ci] pre-commit autoupdate (pydata#6584) terminology.rst: fix link to Unidata's "netcdf_dataset_components" (pydata#6583) Allow string formatting of scalar DataArrays (pydata#5981) Fix mypy issues & reenable in tests (pydata#6581) polyval: Use Horner's algorithm + support chunked inputs (pydata#6548) ...
commit 398f1b6 Author: dcherian <deepak@cherian.net> Date: Fri May 20 08:47:56 2022 -0600 Backward compatibility dask commit bde40e4 Merge: 0783df3 4cae8d0 Author: dcherian <deepak@cherian.net> Date: Fri May 20 07:54:48 2022 -0600 Merge branch 'main' into dask-datetime-to-numeric * main: concatenate docs style (pydata#6621) Typing for open_dataset/array/mfdataset and to_netcdf/zarr (pydata#6612) {full,zeros,ones}_like typing (pydata#6611) commit 0783df3 Merge: 5cff4f1 8de7061 Author: dcherian <deepak@cherian.net> Date: Sun May 15 21:03:50 2022 -0600 Merge branch 'main' into dask-datetime-to-numeric * main: (24 commits) Fix overflow issue in decode_cf_datetime for dtypes <= np.uint32 (pydata#6598) Enable flox in GroupBy and resample (pydata#5734) Add setuptools as dependency in ASV benchmark CI (pydata#6609) change polyval dim ordering (pydata#6601) re-add timedelta support for polyval (pydata#6599) Minor Dataset.map docstr clarification (pydata#6595) New inline_array kwarg for open_dataset (pydata#6566) Fix polyval overloads (pydata#6593) Restore old MultiIndex dropping behaviour (pydata#6592) [docs] add Dataset.assign_coords example (pydata#6336) (pydata#6558) Fix zarr append dtype checks (pydata#6476) Add missing space in exception message (pydata#6590) Doc Link to accessors list in extending-xarray.rst (pydata#6587) Fix Dataset/DataArray.isel with drop=True and scalar DataArray indexes (pydata#6579) Add some warnings about rechunking to the docs (pydata#6569) [pre-commit.ci] pre-commit autoupdate (pydata#6584) terminology.rst: fix link to Unidata's "netcdf_dataset_components" (pydata#6583) Allow string formatting of scalar DataArrays (pydata#5981) Fix mypy issues & reenable in tests (pydata#6581) polyval: Use Horner's algorithm + support chunked inputs (pydata#6548) ... commit 5cff4f1 Merge: dfe200d 6144c61 Author: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> Date: Sun May 1 15:16:33 2022 -0700 Merge branch 'main' into dask-datetime-to-numeric commit dfe200d Author: dcherian <deepak@cherian.net> Date: Sun May 1 11:04:03 2022 -0600 Minor cleanup commit 35ed378 Author: dcherian <deepak@cherian.net> Date: Sun May 1 10:57:36 2022 -0600 Support dask arrays in datetime_to_numeric
to_zarr
raisesValueError: Invalid dtype
withmode='a'
(but not withmode='w'
) #6345whats-new.rst
api.rst