-
-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for shapes in dtype definitions #296
Conversation
Thank you @onalant. There's some complexities here around structured versus unstructured arrays which I'd like to understand a bit better. In numpy, if you make an unstructured array with a dtype with a shape, the shape from the dtype gets pushed into the shape of the array. E.g.: In [12]: a = np.zeros(100, dtype='(10, 10)<f4')
In [13]: a.shape
Out[13]: (100, 10, 10)
In [14]: a.dtype.descr
Out[14]: [('', '<f4')]
In [15]: a.dtype
Out[15]: dtype('float32') But if you are creating a structured array, then the dtype is preserved, e.g.: In [16]: a = np.zeros(100, dtype=[('foo', '(10, 10)<f4'), ('bar', 'i8')])
In [17]: a.dtype
Out[17]: dtype([('foo', '<f4', (10, 10)), ('bar', '<i8')])
In [18]: a.dtype.descr
Out[18]: [('foo', '<f4', (10, 10)), ('bar', '<i8')]
In [19]: a.shape
Out[19]: (100,) In zarr I think we want to follow this behaviour, or at least support the structured array case (which I think #111 is targeting). Currently I think your PR would fail on the structured array case, and would not emulate numpy behaviour in the unstructured case? (Apologies if I am missing or misunderstanding anything here.) cc @jakirkham |
Re: unstructured array consistency with numpy - The inconsistency was an oversight on my part. My apologies about this. I will push a fix as soon as possible. Re: structured arrays - Here is what I am seeing (with some cursory testing). Initializing an array from a pre-allocated numpy ndarray of a structured dtype functions as I would expect:
Using
This error appears in master for structured dtypes in general (e.g. However, when fill_value=None:
The dtype is preserved after writing to and fetching from disk. Please let me know if I have misunderstood something, or if I can provide any additional information. |
Thanks @onalant, I'll follow up asap. |
Not running into this issue after reinstalling the package. |
@alimanfoo Following up on the PR; is there anything I can do to help? I am working on getting the documentation and additional unit tests |
Thanks @onalant. Tests for the structured array cases would be good. @jakirkham what structured dtype features do you think we need to support? Field with shape. Also should we support nested fields (i.e.., fields within fields)? |
@alimanfoo In 24b56cf you will find metadata encode-decode tests for I have one question, however: which test modules require expansion As an aside, I believe the Travis failure is a consequence of adding Please let me know if any of this is unclear. Thank you for your time. |
@alimanfoo Just wanted to follow up. Is there anything else you need |
Thanks Tarik, I'll take a look asap. Apologies if I am cutting in and out
of radio contact, things are a bit busy for me at the moment.
…On Mon, 10 Sep 2018, 21:26 Tarik Onalan, ***@***.***> wrote:
@alimanfoo <https://github.com/alimanfoo> Just wanted to follow up. Is
there anything else you need
for functionality verification?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#296 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QsoeKPzprhJ-JHM6gfL_TeCu0z1wks5uZsr4gaJpZM4WUupD>
.
|
Hi @onalant, Just took a look through. The implementation in zarr/meta.py looks good. The new tests in zarr/tests/test_meta.py also look good. The tests in zarr/tests/test_crude.py cover some important cases, but it would be better if these could be incorporated into existing dtype tests. In zarr/tests/test_core.py there is a test_dtypes function, it would be great if you could extend this to cover the dtypes currently tested in test_crude.py. Many thanks! |
@alimanfoo Understood. I will move the tests in |
@@ -1631,6 +1627,10 @@ def test_astype(self): | |||
expected = data.astype(astype) | |||
assert_array_equal(expected, z2) | |||
|
|||
def test_unstructured_array(self): | |||
# skip this one, cannot do delta on unstructured array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technicality: numcodecs cannot do deltas on unstructured array dtypes
in string form e.g. "(4, 4)f4"
. Since the expanded dtype is already
tested in test_dtypes
, skip instead.
@alimanfoo Sorry to disturb you, but I want to confirm whether the Would you like me to append this feature to the documentation? |
Sorry to be largely absent here. Skimmed the tests (though haven't looked at any code in detail). The test cases covered seem about right. Basically we want to make sure these cases are covered and likely mixed together in different ways.
Will try to take a closer look, but can't promise anything before Friday unfortunately. |
Hello, I would just like to follow-up on this issue. I am currently Is there anything else I can do to expedite this PR? |
Hi Tarik, I'm away at a conference this week but can follow up next week.
…On Mon, 24 Sep 2018, 19:05 Tarik Onalan, ***@***.***> wrote:
Hello, I would just like to follow-up on this issue. I am currently
slightly bogged down with other work, so I do not think I can write
documentation of the quality I see in the repository. However,
I would be willing to assist in expanding the test suite, if the
current additions are insufficient.
Is there anything else I can do to expedite this PR?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#296 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8Qi8t9_oKvTUOfu8QdjJNWwxH-IkLks5ueRDGgaJpZM4WUupD>
.
|
Following up on the PR. My apologies if I am coming across as terse. |
On Tue, 2 Oct 2018 at 22:45, Tarik Onalan ***@***.***> wrote:
Following up on the PR.
My apologies if I am coming across as terse.
Not at all, it is good to keep up momentum! I'll get back to you asap.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @onalant, I very much appreciate the work put into this, especially the thorough testing. It all looks good to me modulo some very minor stylistic comments. CI is passing and coverage is 100% so I think this is basically good to go.
zarr/tests/test_core.py
Outdated
@@ -826,6 +826,27 @@ def test_nchunks_initialized(self): | |||
z[:] = 42 | |||
assert 10 == z.nchunks_initialized | |||
|
|||
def test_unstructured_array(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we rename this "test_array_dtype_shape()" or something like that, just to be a bit more specific about what this test is about.
zarr/tests/test_core.py
Outdated
assert_array_equal(a['bar'], z['bar']) | ||
assert_array_equal(a['baz'], z['baz']) | ||
else: | ||
# BUG(onalant): https://www.github.com/numpy/numpy/issues/11946 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but I don't think you need to sign your comments, and this could be slightly more descriptive. E.g.:
# workaround for numpy bug https://www.github.com/numpy/numpy/issues/11946
zarr/tests/test_core.py
Outdated
assert_array_equal(a['bar'], z['bar']) | ||
assert_array_equal(a['baz'], z['baz']) | ||
else: | ||
# BUG(onalant): https://www.github.com/numpy/numpy/issues/11946 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
zarr/tests/test_meta.py
Outdated
@@ -116,6 +116,92 @@ def test_encode_decode_array_2(): | |||
assert [df.get_config()] == meta_dec['filters'] | |||
|
|||
|
|||
def test_encode_decode_array_unstructured(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we rename to "test_encode_decode_array_dtype_shape()" to be slightly more specific.
zarr/tests/test_meta.py
Outdated
# test decoding | ||
meta_dec = decode_array_metadata(meta_enc) | ||
assert ZARR_FORMAT == meta_dec['zarr_format'] | ||
# NOTE(onalant): https://github.com/zarr-developers/zarr/pull/296#issuecomment-417608487 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to include this link, the sentence in the next comment line explains the rationale.
zarr/tests/test_meta.py
Outdated
# To maintain consistency with numpy unstructured arrays, unpack dimensions into shape. | ||
assert meta['shape'] + meta['dtype'].shape == meta_dec['shape'] | ||
assert meta['chunks'] == meta_dec['chunks'] | ||
# NOTE(onalant): https://github.com/zarr-developers/zarr/pull/296#issuecomment-417608487 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to include this link, the sentence in the next comment line explains the rationale.
zarr/tests/test_meta.py
Outdated
# test decoding | ||
meta_dec = decode_array_metadata(meta_enc) | ||
assert ZARR_FORMAT == meta_dec['zarr_format'] | ||
# NOTE(onalant): https://github.com/zarr-developers/zarr/pull/296#issuecomment-417608487 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to include this link, the sentence in the next comment line explains the rationale.
zarr/tests/test_meta.py
Outdated
# To maintain consistency with numpy unstructured arrays, unpack dimensions into shape. | ||
assert meta['shape'] + meta['dtype'].shape == meta_dec['shape'] | ||
assert meta['chunks'] == meta_dec['chunks'] | ||
# NOTE(onalant): https://github.com/zarr-developers/zarr/pull/296#issuecomment-417608487 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to include this link, the sentence in the next comment line explains the rationale.
@jakirkham one thought that occurs is that we should probably add some clarification to the storage spec to state how we expect dtypes for structured arrays with subshape and/or nested fields are encoded in array metadata, given that these are now supported. I think this is another case where we are clarifying an area of ambiguity rather than changing anything, and so can edit the spec without requiring a spec version change. @onalant if you are happy to finish up the coding work, I'd be happy to deal with the spec changes and release notes. |
Yeah that sounds reasonable to me as well. It’s probably also worth noting that chunking does not happen within the structured type. IIRC HDF5 handles this the same way. |
@alimanfoo Thank you for the style suggestions. I have made the Remark: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super, many thanks @onalant. I'll push some spec edits to your branch, to keep this all in one place, before merging.
As a result, disable dtype and shape checks in test_encode_decode_array_shape. One drawback is that zarr.zeros and similar functions setting fill_value != None throw exceptions.
This reverts commit 55e725d.
Cases for unstructured, structured, and nested-structured dtypes. Probably not ready for production as-is.
Otherwise, array data would not match shape with unstructured dtype.
Rebased on upstream master, which should fix coverage issues. Working on docs now... |
I've pushed some work on the spec to describe how structured arrays with subshape and/or nested fields are encoded. In the process of playing with this I also found and fixed a small bug that came up when trying to read data from a structured array which had not been initialised with any data yet (and refactored the structured array tests in the process). Just release notes to do now, otherwise should be good to go. |
Merging, thanks @onalant for seeing this one through, much appreciated. |
The objective of this PR is to resolve #111. Metadata encoding and
decoding is modified to accept shaped fields of structured dtypes and
shaped arrays of primitives (e.g.
(10,10)f8
).Tests not utilizing optional requirements pass on Python 2.7 and 3.6. I
am working on setting up my environment with the optional requirements,
and hope to have those tests running very soon. If needed, I can expand
the test suite to more rigorously test the new encoding functionality.
docs/tutorial.rst
will also be updated to include information aboutshaped fields in dtypes.
Thank you for your time and consideration.
TODO:
tox -e docs
)