Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode/Decode complex fill values #363

Merged
merged 5 commits into from
Feb 26, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,9 @@ Bug fixes
* Ensure ``DictStore`` contains only ``bytes`` to facilitate comparisons and protect against writes.
By :user:`John Kirkham <jakirkham>`, :issue:`350`

* Test and fix an issue (w.r.t. fill values) when storing complex data to ``Array``.
By :user:`John Kirkham <jakirkham>`, :issue:`363`

* Always use a ``tuple`` when indexing a NumPy ``ndarray``.
By :user:`John Kirkham <jakirkham>`, :issue:`376`

Expand Down
9 changes: 9 additions & 0 deletions zarr/meta.py
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,11 @@ def decode_fill_value(v, dtype):
return np.NINF
else:
return np.array(v, dtype=dtype)[()]
elif dtype.kind in 'c':
v = (decode_fill_value(v[0], dtype.type().real.dtype),
decode_fill_value(v[1], dtype.type().imag.dtype))
v = v[0] + 1j * v[1]
return np.array(v, dtype=dtype)[()]
elif dtype.kind == 'S':
# noinspection PyBroadException
try:
Expand Down Expand Up @@ -201,6 +206,10 @@ def encode_fill_value(v, dtype):
return int(v)
elif dtype.kind == 'b':
return bool(v)
elif dtype.kind in 'c':
v = (encode_fill_value(v.real, dtype.type().real.dtype),
encode_fill_value(v.imag, dtype.type().imag.dtype))
return v
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it this encoding / decoding of fill values is necessary is necessary because json does not support serialization of complex data types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right.

This will encode to a JSON list, which is nice for readability. Though we could certainly imagine other ways of handling this (e.g. base64 encoded string).

Any preferences?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems totally reasonable.

elif dtype.kind in 'SV':
v = base64.standard_b64encode(v)
if not PY2: # pragma: py2 no cover
Expand Down
9 changes: 9 additions & 0 deletions zarr/tests/test_core.py
Original file line number Diff line number Diff line change
Expand Up @@ -977,6 +977,15 @@ def test_dtypes(self):
z[:] = a
assert_array_almost_equal(a, z[:])

# complex
for dtype in 'c8', 'c16':
z = self.create_array(shape=10, chunks=3, dtype=dtype)
assert z.dtype == np.dtype(dtype)
a = np.linspace(0, 1, z.shape[0], dtype=dtype)
a -= 1j * a
z[:] = a
assert_array_almost_equal(a, z[:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the special treatment required for fill values, would it make sense to add an explicit check that complex data with missing values is handled correctly? I see that the metadata encoding is checked below, but not the actual filling procedure (as far as I can tell).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate a bit on what kind of representation of missing data we are discussing? We handle NaN currently, but maybe this is not what you are thinking about.

Copy link
Contributor

@rabernat rabernat Dec 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is to make sure that missing data (the kind that triggers the fill_value machinery to be activated) is present in the array. For example, by adding

a[0] = np.nan

However, I might be misunderstanding what fill_value does in zarr. Is fill_value only used to represent uninitialized portions of the array? Or are all nans explicitly translated to fill_value before serialization?

Copy link
Member Author

@jakirkham jakirkham Feb 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK fill_value is only used to handle uninitialized chunks. Am not aware of any special behavior around NaNs in Zarr. Does that help?

Edit: IOW this assert_array_almost_equal line is testing the uninitialized chunks are constructed with the fill_value.


# datetime, timedelta
for base_type in 'Mm':
for resolution in 'D', 'us', 'ns':
Expand Down
54 changes: 54 additions & 0 deletions zarr/tests/test_meta.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,60 @@ def test_encode_decode_array_2():
assert [df.get_config()] == meta_dec['filters']


def test_encode_decode_array_complex():

# some variations
for k in ['c8', 'c16']:
compressor = Blosc(cname='lz4', clevel=3, shuffle=2)
dtype = np.dtype(k)
fill_value = dtype.type(np.nan-1j)
meta = dict(
shape=(100, 100),
chunks=(10, 10),
dtype=dtype,
compressor=compressor.get_config(),
fill_value=fill_value,
order=dtype.char,
filters=[]
)

meta_json = '''{
"chunks": [10, 10],
"compressor": {
"id": "blosc",
"clevel": 3,
"cname": "lz4",
"shuffle": 2,
"blocksize": 0
},
"dtype": "%s",
"fill_value": ["NaN", -1.0],
"filters": [],
"order": "%s",
"shape": [100, 100],
"zarr_format": %s
}''' % (dtype.str, dtype.char, ZARR_FORMAT)

# test encoding
meta_enc = encode_array_metadata(meta)
assert_json_equal(meta_json, meta_enc)

# test decoding
meta_dec = decode_array_metadata(meta_enc)
assert ZARR_FORMAT == meta_dec['zarr_format']
assert meta['shape'] == meta_dec['shape']
assert meta['chunks'] == meta_dec['chunks']
assert meta['dtype'] == meta_dec['dtype']
assert meta['compressor'] == meta_dec['compressor']
assert meta['order'] == meta_dec['order']
# Based off of this SO answer: https://stackoverflow.com/a/49972198
assert np.all(
fill_value.view((np.uint8, fill_value.itemsize)) ==
meta_dec['fill_value'].view((np.uint8, meta_dec['fill_value'].itemsize))
)
assert [] == meta_dec['filters']


def test_encode_decode_array_datetime_timedelta():

# some variations
Expand Down