Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceptions when creating an array that has an object #806

Closed
abergou opened this issue Aug 9, 2021 · 9 comments
Closed

Exceptions when creating an array that has an object #806

abergou opened this issue Aug 9, 2021 · 9 comments

Comments

@abergou
Copy link

abergou commented Aug 9, 2021

I noticed two issues:

  1. Zarr raises an exception when attempting to create an array with a structured dtype that contains an object.
>>> import numcodecs
>>> import zarr
>>> foo = zarr.open('foo')
>>> foo.create('bar', dtype=[('x', float), ('y',object)],  shape=(10, 20), object_codec=numcodecs.Pickle())
TypeError                                 Traceback (most recent call last)
    ...
MetadataError: error decoding metadata: Cannot change data-type for object array.

I think that the issue is in the functions encode_fill_value and decode_fill_value. A structured dtype that contains an object reports its kind as 'V' so zarr encodes it using standard_b64encode, but if dtype.has_object is true then it should first pickle the fill_value and only then encode it.

  1. This is a related problem to item 1. zarr essentially only supports fill-values of None for object arrays:
>>> import numpy
>>> import zarr
>>> x = zarr.open('x')
>>> y = x.create('y', shape=(2, 2), dtype='O', fill_value=zarr.Blosc, object_codec=numcodecs.Pickle())
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
    ...
TypeError: Object of type type is not JSON serializable
  • Value of zarr.__version__: 2.8.3
  • Value of numcodecs.__version__: 0.6.4 -- 0.8.0
  • Version of Python interpreter: 3.7.4 -- 3.9.6
  • Operating system (Linux/Windows/Mac): Mac/Linux
  • How Zarr was installed (e.g., "using pip into virtual environment", or "using conda"): pip or conda

Edit: I cleaned up the second example (I copied and pasted an incorrect reproducer here).

@abergou abergou changed the title Exception when creating an array that has an object Exceptions when creating an array that has an object Aug 9, 2021
@joshmoore
Copy link
Member

@abergou: and just to confirm, does passing data= at the time of creation work for you?

@abergou
Copy link
Author

abergou commented Aug 9, 2021

Thanks for the quick reply @joshmoore!

Where do I pass in data=? I don't see it as an argument to Group.create. If I simply pass it into foo.create then I get back the same error as before:

>>> import numcodecs
>>> import numpy
>>> import zarr
>>> foo = zarr.open('foo')
>>> foo.create('bar', dtype=[('x', float), ('y',object)],  shape=(10, 20), object_codec=numcodecs.Pickle(), data=numpy.zeros((10, 20), dtype=[('x', float), ('y', object)]))
TypeError                                 Traceback (most recent call last)
    ...
MetadataError: error decoding metadata: Cannot change data-type for object array.

Quick note: I fixed up the second reproducer I had above. I had a copy paste error that I didn't notice before.

@joshmoore
Copy link
Member

This was an example that was working for me:

import numpy as np
import zarr

t = [
    ("label-value", int),
    ("r", int),
    ("g", int),
    ("b", int),
    ("a", int),
    ("object-type", "U20"),
    ("object-id", int),
    ("description", "U200")]

data = list()
for x in range(100):
    data.append((1, 100, 100, 100, 100, "Mask", 123456, "some text here"))
    data.append((2, 200, 200, 200, 200, "Mask", 567896, "some more text"))
a = np.array(data, dtype=t)

z = zarr.open("s.zarr")
z.array(name="a", data=a, chunks=(10,))

@abergou
Copy link
Author

abergou commented Aug 9, 2021

Ah thanks! I missed the Group.array method. For me using the data input leads to the same issue MetadataError:

>>> import numcodecs
>>> import numpy
>>> import zarr
>>> data = numpy.zeros((10, 20), dtype=[('x', float), ('y',object)])
>>> foo = zarr.open('foo')
>>> foo.array('bar', data, object_codec=numcodecs.Pickle())
MetadataError                             Traceback (most recent call last)
    ...
MetadataError: error decoding metadata: Cannot change data-type for object array.

@abergou
Copy link
Author

abergou commented Aug 9, 2021

One more issue that is also related to the above: for some object arrays zarr can silently change the object type in the array:

>>> import collections
>>> import numcodecs
>>> import zarr
>>> x = zarr.open('x')
>>> y = x.create('y', shape=(2, 2), dtype='O', fill_value=collections.Counter(), object_codec=numcodecs.Pickle())
>>> y[0, 0]
{}
>>> type(y[0, 0])
dict

@abergou
Copy link
Author

abergou commented Aug 20, 2021

@joshmoore I actually have a patch for this that I'll submit a pull request for imminently.

abergou pushed a commit to abergou/zarr-python that referenced this issue Aug 20, 2021
* Ensures that the fill value of structured arrays that contain objects
  is encoded using object_codec.
@joshmoore
Copy link
Member

💯

joshmoore added a commit that referenced this issue Aug 30, 2021
* Fix structured arrays that contain objects #806

* Ensures that the fill value of structured arrays that contain objects
  is encoded using object_codec.

* Add test and fix-up to ensure compatibility

* Update docs/release.rst

* Fixup unit testss

Don't specify protocol: makes unit tests pass in python3.7
N5 doesn't support object codecs

* Fixup linting error

Explicitly handle an error condition that can only happen if
encode_fill_value or decode_fill_value are directly called.

* Add encode/decode tests for codecov

* Explicitly import Pickle from numcodecs for mypy

* Migrate test from #702

With thanks to @ombschervister

* Install types-setuptools for CI

Co-authored-by: Attila Bergou <attila@alumni.cmu.edu>
Co-authored-by: Josh Moore <j.a.moore@dundee.ac.uk>
Co-authored-by: jmoore <josh@glencoesoftware.com>
@joshmoore
Copy link
Member

Closed by #813 (v2.9.4)

@abergou
Copy link
Author

abergou commented Aug 30, 2021

Thanks @joshmoore !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants