Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle _FillValue in variable-length unicode string variables #1802

Closed
wants to merge 5 commits into from
Closed

Handle _FillValue in variable-length unicode string variables #1802

wants to merge 5 commits into from

Conversation

delgadom
Copy link
Contributor

@delgadom delgadom commented Dec 28, 2017

For testing - I could use some guidance. Not sure if it's worth creating a fixture set or something just for this issue. If so, would that go in test_backends?

Fixed bug in conventions.py:952 which caused read error on netCDF4 files with variable-length unicode strings with _FillValues. decode_cf_variable checks string_encoding, which was previously only defined for dtype.kind == 'S'. Bug fix defines for dtype.kind in ['S', 'O', 'U'].
@delgadom
Copy link
Contributor Author

hmm. Seems I'm touching on a much larger issue here: Unidata/netcdf4-python#730

The round-trip works for me using a netcdf4 engine once this fix is implemented in conventions.py. There are tests that are ready to demonstrate this in test_backends.py:836-843, but running these tests (by removing the pytest.raises lines) applies to both netCDF4 and h5netcdf backends.

Should these use cases be split up?

@delgadom
Copy link
Contributor Author

lol. no I'm just walking around in your footsteps @shoyer. I've just enabled the tests you presumably wrote for #1647 & #1648. Curious why variable-length unicode strings with _FillHoles using netCDF4 doesn't currently work in master?

@delgadom
Copy link
Contributor Author

Ok this is good to go if you all do want to enable _FillValue for variable-length unicode strings with a netCDF4 backend. Seems like there's a lot of prior work/thinking in this space though so no worries if you want to wait.

@@ -949,7 +949,7 @@ def decode_cf_variable(name, var, concat_characters=True, mask_and_scale=True,

original_dtype = data.dtype

if concat_characters and data.dtype.kind == 'S':
if concat_characters and data.dtype.kind in ['U', 'S', 'O']:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this fix is quite right. Both stacking characters and decoding bytes -> unicode only make sense if the variable has a NumPy S dtype (i.e., fixed length bytes).

We do need to fix this bug.... but I think the check below that uses string_encoding is actually in the wrong place. It makes sense to check _FillValue in writing/encoding, not reading/decoding.
https://github.com/pydata/xarray/pull/1803/files#r159131975

@delgadom delgadom closed this Jan 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UnboundLocalError when opening netCDF file
2 participants