Handle _FillValue in variable-length unicode string variables #1802

delgadom · 2017-12-28T23:13:54Z

Closes UnboundLocalError when opening netCDF file #1781
Tests added
Tests passed
Passes git diff upstream/master **/*py | flake8 --diff
Fully documented, including whats-new.rst for all changes and api.rst for new API

For testing - I could use some guidance. Not sure if it's worth creating a fixture set or something just for this issue. If so, would that go in test_backends?

Fixed bug in conventions.py:952 which caused read error on netCDF4 files with variable-length unicode strings with _FillValues. decode_cf_variable checks string_encoding, which was previously only defined for dtype.kind == 'S'. Bug fix defines for dtype.kind in ['S', 'O', 'U'].

delgadom · 2017-12-29T00:11:53Z

hmm. Seems I'm touching on a much larger issue here: Unidata/netcdf4-python#730

The round-trip works for me using a netcdf4 engine once this fix is implemented in conventions.py. There are tests that are ready to demonstrate this in test_backends.py:836-843, but running these tests (by removing the pytest.raises lines) applies to both netCDF4 and h5netcdf backends.

Should these use cases be split up?

…h strings with _FillValue

…r netCDF

delgadom · 2017-12-29T00:22:57Z

lol. no I'm just walking around in your footsteps @shoyer. I've just enabled the tests you presumably wrote for #1647 & #1648. Curious why variable-length unicode strings with _FillHoles using netCDF4 doesn't currently work in master?

delgadom · 2017-12-29T16:45:37Z

Ok this is good to go if you all do want to enable _FillValue for variable-length unicode strings with a netCDF4 backend. Seems like there's a lot of prior work/thinking in this space though so no worries if you want to wait.

shoyer · 2017-12-31T00:51:34Z

xarray/conventions.py

@@ -949,7 +949,7 @@ def decode_cf_variable(name, var, concat_characters=True, mask_and_scale=True,

    original_dtype = data.dtype

-    if concat_characters and data.dtype.kind == 'S':
+    if concat_characters and data.dtype.kind in ['U', 'S', 'O']:


I don't think this fix is quite right. Both stacking characters and decoding bytes -> unicode only make sense if the variable has a NumPy S dtype (i.e., fixed length bytes).

We do need to fix this bug.... but I think the check below that uses string_encoding is actually in the wrong place. It makes sense to check _FillValue in writing/encoding, not reading/decoding.
https://github.com/pydata/xarray/pull/1803/files#r159131975

delgadom added 2 commits December 28, 2017 15:02

create entry in whatsnew

1c0ef31

delgadom added 2 commits December 28, 2017 16:19

remove error from netCDF4 backend when writing unicode variable-lengt…

521fff5

…h strings with _FillValue

test roundtrip for unicode variable-length strings with _FillValue fo…

f04bdbe

…r netCDF

enforce pytest.raises for h4netcdf, passes for netcdf4. update whatsnew

99873ca

shoyer reviewed Dec 31, 2017

View reviewed changes

delgadom closed this Jan 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle _FillValue in variable-length unicode string variables #1802

Handle _FillValue in variable-length unicode string variables #1802

delgadom commented Dec 28, 2017 •

edited

Loading

delgadom commented Dec 29, 2017

delgadom commented Dec 29, 2017

delgadom commented Dec 29, 2017

shoyer Dec 31, 2017

Handle _FillValue in variable-length unicode string variables #1802

Handle _FillValue in variable-length unicode string variables #1802

Conversation

delgadom commented Dec 28, 2017 • edited Loading

delgadom commented Dec 29, 2017

delgadom commented Dec 29, 2017

delgadom commented Dec 29, 2017

shoyer Dec 31, 2017

Choose a reason for hiding this comment

delgadom commented Dec 28, 2017 •

edited

Loading