-
-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Supporting structured array records with non-trivial shape #111
Comments
Thanks for raising, I think the problem here is that the encoding/decoding
of the structured dtype to/from the json metadata is not smart enough at
the moment to handle structured dtype with one or more multi-value fields
(not quite sure what to call it, but I think we know what we're talking
about). Some modification to the encode_dtype and decode_dtype functions
should do fix it. May also require some change to the storage specification
to describe the encoding.
…On Friday, February 10, 2017, jakirkham ***@***.***> wrote:
With a structured array in NumPy, it is possible to create a record that
has a non-trivial shape of its own in addition to the shape of the recorded
array. An example use case would be having a record that is a centroid of
some object. The centroid will have dimensions of whatever it had come from
in addition to there being one centroid per record for how many ever
records there are. It appears that Zarr does not support this use case ATM.
Not sure if that is a bug/unsupported feature or an intentional restriction.
Example of the error:
In [1]: import numpy
In [2]: import zarr
In [3]: a = numpy.array([(2, [4.0, 6.0]), (3, [5.0, 7.0])], dtype=[("a", numpy.uint64), ("b", numpy.float, (2,))])
In [4]: z = zarr.array(a)---------------------------------------------------------------------------ValueError Traceback (most recent call last)/opt/conda/envs/test/lib/python3.5/site-packages/zarr/meta.py in decode_array_metadata(s)
23 try:---> 24 dtype = decode_dtype(meta['dtype'])
25 fill_value = decode_fill_value(meta['fill_value'], dtype)
/opt/conda/envs/test/lib/python3.5/site-packages/zarr/meta.py in decode_dtype(d)
78 def decode_dtype(d):---> 79 d = _decode_dtype_descr(d)
80 return np.dtype(d)
/opt/conda/envs/test/lib/python3.5/site-packages/zarr/meta.py in _decode_dtype_descr(d)
73 else:---> 74 d = [(f, _decode_dtype_descr(v)) for f, v in d]
75 return d
/opt/conda/envs/test/lib/python3.5/site-packages/zarr/meta.py in <listcomp>(.0)
73 else:---> 74 d = [(f, _decode_dtype_descr(v)) for f, v in d]
75 return d
ValueError: too many values to unpack (expected 2)
During handling of the above exception, another exception occurred:
MetadataError Traceback (most recent call last)<ipython-input-4-3d14a571b8d9> in <module>()----> 1 z = zarr.array(a)
/opt/conda/envs/test/lib/python3.5/site-packages/zarr/creation.py in array(data, **kwargs)
306
307 # instantiate array--> 308 z = create(**kwargs)
309
310 # fill with data
/opt/conda/envs/test/lib/python3.5/site-packages/zarr/creation.py in create(shape, chunks, dtype, compressor, fill_value, order, store, synchronizer, overwrite, path, chunk_store, filters, cache_metadata, **kwargs)
88 # instantiate array
89 z = Array(store, path=path, chunk_store=chunk_store,---> 90 synchronizer=synchronizer, cache_metadata=cache_metadata)
91
92 return z
/opt/conda/envs/test/lib/python3.5/site-packages/zarr/core.py in __init__(self, store, path, read_only, chunk_store, synchronizer, cache_metadata)
98
99 # initialize metadata--> 100 self._load_metadata()
101
102 # initialize attributes
/opt/conda/envs/test/lib/python3.5/site-packages/zarr/core.py in _load_metadata(self)
108 """(Re)load metadata from store."""
109 if self._synchronizer is None:--> 110 self._load_metadata_nosync()
111 else:
112 mkey = self._key_prefix + array_meta_key
/opt/conda/envs/test/lib/python3.5/site-packages/zarr/core.py in _load_metadata_nosync(self)
123
124 # decode and store metadata--> 125 meta = decode_array_metadata(meta_bytes)
126 self._meta = meta
127 self._shape = meta['shape']
/opt/conda/envs/test/lib/python3.5/site-packages/zarr/meta.py in decode_array_metadata(s)
35 )
36 except Exception as e:---> 37 raise MetadataError('error decoding metadata: %s' % e)
38 else:
39 return meta
MetadataError: error decoding metadata: too many values to unpack (expected 2)
Conda environment:
name: testchannels: !!python/tuple
- conda-forge
- defaultsdependencies:
- conda-forge::blas=1.1=openblas
- conda-forge::ca-certificates=2017.1.23=0
- conda-forge::certifi=2017.1.23=py35_0
- conda-forge::decorator=4.0.11=py35_0
- conda-forge::fasteners=0.14.1=py35_2
- conda-forge::ipython=5.2.2=py35_0
- conda-forge::ipython_genutils=0.1.0=py35_0
- conda-forge::monotonic=1.2=py35_1
- conda-forge::ncurses=5.9=10
- conda-forge::numpy=1.12.0=py35_blas_openblas_200
- conda-forge::openblas=0.2.19=0
- conda-forge::openssl=1.0.2h=3
- conda-forge::pexpect=4.2.1=py35_0
- conda-forge::pickleshare=0.7.3=py35_0
- conda-forge::pip=9.0.1=py35_0
- conda-forge::prompt_toolkit=1.0.13=py35_0
- conda-forge::ptyprocess=0.5.1=py35_0
- conda-forge::pygments=2.2.0=py35_0
- conda-forge::python=3.5.3=1
- conda-forge::readline=6.2=0
- conda-forge::setuptools=33.1.0=py35_0
- conda-forge::simplegeneric=0.8.1=py35_0
- conda-forge::six=1.10.0=py35_1
- conda-forge::sqlite=3.13.0=1
- conda-forge::tk=8.5.19=1
- conda-forge::traitlets=4.3.0=py35_0
- conda-forge::wcwidth=0.1.7=py35_0
- conda-forge::wheel=0.29.0=py35_0
- conda-forge::xz=5.2.2=0
- conda-forge::zarr=2.1.4=py35_0
- conda-forge::zlib=1.2.11=0
- libgfortran=3.0.0=1
- pip:
- ipython-genutils==0.1.0
- prompt-toolkit==1.0.13prefix: /opt/conda/envs/test
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/issues/111>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8Qs1Nymp_1mrTV1hMQ4YKVBbdzlZwks5rbIGhgaJpZM4L9gEx>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: alimanfoo@googlemail.com
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
Ok, wasn't sure if this was an encoding issue with metadata only or if it would effect storage format as well. IIUC it is a metadata only issue (with some updated docs). |
Yes it should just be a metadata issue I think, in principle zarr should be
able to handle any numpy dtype (with caveat of object dtypes that need
special filter).
…On Friday, February 10, 2017, jakirkham ***@***.***> wrote:
Ok, wasn't sure if this was an encoding issue with metadata only or if it
would effect storage format as well. IIUC it is a metadata only issue (with
some updated docs).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/issues/111#issuecomment-279022400>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8Qi5LsanjSk0wOfV0ZnY_MRU9O6Z4ks5rbKmIgaJpZM4L9gEx>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: alimanfoo@googlemail.com
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
What is the ETA for the 2.3 release? I am hitting this issue and see that it has been resolved in 2.3 |
FWIW I'd like to get 2.3 out before the end of January, although I haven't said that out loud yet. |
@alimanfoo that would be great. |
With a structured array in NumPy, it is possible to create a record that has a non-trivial shape of its own in addition to the shape of the recorded array. An example use case would be having a record that is a centroid of some object. The centroid will have dimensions of whatever it had come from in addition to there being one centroid per record for how many ever records there are. It appears that Zarr does not support this use case ATM. Not sure if that is a bug/unsupported feature or an intentional restriction.
Example of the error:
Conda environment:
The text was updated successfully, but these errors were encountered: