-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can't have numpy datatypes in attributes #156
Comments
Thanks for raising, following the xarray work with interest. Are there any
object types other than numpy dtype that would need special handling when
going to/from JSON?
On Sun, 8 Oct 2017 at 05:24, Ryan Abernathey ***@***.***> wrote:
We are working on the zarr backend for XArray (pydata/xarray#1528
<pydata/xarray#1528>). XArray likes to put all
kinds of weird stuff into attributes, including numpy datatypes and even
numpy arrays. This is because the netCDF data model
<http://www.unidata.ucar.edu/software/netcdf/netcdf/Attributes.html>
allows attributes to have all of the same types as variables.
Instead, in zarr, the attributes have to be json-serializable. So this
doesn't work:
za = zarr.create(shape=(1), store='tmp_file')
za.attrs['foo'] = np.float32(0)
It raises TypeError: Object of type 'float32' is not JSON serializable.
We will need some sort of workaround for this in order to make zarr work
as a store for xarray.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/issues/156>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAq8Qj8wHfZ4RC1F6Eg-3Tln229IK7gPks5sqE5ygaJpZM4Pxlnb>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596
Email: alimanfoo@googlemail.com
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
|
Is this still of interest, @rabernat? |
I am very interested in this issue. I need to store exact binary values and datetime objects as attributes. To work around the limitations of JSON, I currently encode these attributes as strings and put the burden on the consumer of the data to correctly decode them to the actual data types. This is not ideal. Ideally, any data type that is valid for an array ought to be valid for an attribute (like it is in the netCDF model). This issue seems to be related to #244 and #216 One approach that might address both issues is to allow |
Could you please elaborate on this point a bit? What sorts of things are you imagining storing here? |
I would like to store as attributes any of the data types described in the "Data Type Encoding" section of the Zarr specification. Specifically, in my real-world usage, I have encountered inconvenience with attribute values that are
I am also excited by the possibility of storing attributes that are arbitrary objects, such as JSON documents, although I haven't expressly encountered this requirement yet. It is worth noting that, in NetCDF, attribute values are really 1-dimensional arrays:
|
Sorry for the very long delay.
This is a really great point. Though this raises the question, would the best way to represent this data be an array with attributes that are array values or would it be a group with many arrays? |
I'm using xarray/zarr and find the attributes usage constraining as well. https://json-tricks.readthedocs.io/en/latest/ It uses the same api as json and solves many of the common use cases. |
Thanks @jewfro-cuban, I didn't know about json-tricks, looks nice. The encoding format seems generally very sensible, although I guess we'd want to avoid supporting arbitrary class instances as a potential security issue. Is there a way we could just depend on json-tricks, but with |
We could always check if that shows up in the result and error out if so. |
I encountered the same problem, and I would like to add that for me it would be enough if I could pass a custom |
I was recently hit by this very same problem, with reference to HDF5 files, which also allow for array attributes. For example from
which are rendered by
When converting from HDF5 to ZARR,
Since I have a bunch of files to convert I implemented a quick fix in miccoli/zarr-python@380ee7c07 I'm not sure if this is of general interest, but if there is enough interest I can open a PR. Open question:
|
We are working on the zarr backend for XArray (pydata/xarray#1528). XArray likes to put all kinds of weird stuff into attributes, including numpy datatypes and even numpy arrays. This is because the netCDF data model allows attributes to have all of the same types as variables.
Instead, in zarr, the attributes have to be json-serializable. So this doesn't work:
It raises
TypeError: Object of type 'float32' is not JSON serializable
.We will need some sort of workaround for this in order to make zarr work as a store for xarray.
The text was updated successfully, but these errors were encountered: