-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle _FillValue in variable-length unicode string variables #1802
Conversation
Fixed bug in conventions.py:952 which caused read error on netCDF4 files with variable-length unicode strings with _FillValues. decode_cf_variable checks string_encoding, which was previously only defined for dtype.kind == 'S'. Bug fix defines for dtype.kind in ['S', 'O', 'U'].
hmm. Seems I'm touching on a much larger issue here: Unidata/netcdf4-python#730 The round-trip works for me using a netcdf4 engine once this fix is implemented in conventions.py. There are tests that are ready to demonstrate this in test_backends.py:836-843, but running these tests (by removing the Should these use cases be split up? |
…h strings with _FillValue
Ok this is good to go if you all do want to enable _FillValue for variable-length unicode strings with a netCDF4 backend. Seems like there's a lot of prior work/thinking in this space though so no worries if you want to wait. |
@@ -949,7 +949,7 @@ def decode_cf_variable(name, var, concat_characters=True, mask_and_scale=True, | |||
|
|||
original_dtype = data.dtype | |||
|
|||
if concat_characters and data.dtype.kind == 'S': | |||
if concat_characters and data.dtype.kind in ['U', 'S', 'O']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this fix is quite right. Both stacking characters and decoding bytes -> unicode only make sense if the variable has a NumPy S
dtype (i.e., fixed length bytes).
We do need to fix this bug.... but I think the check below that uses string_encoding
is actually in the wrong place. It makes sense to check _FillValue
in writing/encoding, not reading/decoding.
https://github.com/pydata/xarray/pull/1803/files#r159131975
git diff upstream/master **/*py | flake8 --diff
whats-new.rst
for all changes andapi.rst
for new APIFor testing - I could use some guidance. Not sure if it's worth creating a fixture set or something just for this issue. If so, would that go in test_backends?