-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Schemas? #24
Comments
this is super useful info @ivirshup ! trying to put things together in a very high level view, do you think what we'd want is something like the |
I think so. I think the real challenge is that we want to be able to validate xarray, anndata, and json objects (at least) within the same framework. We want to be able to say things like "the 'regions' from this json are the categorical values of this I think we'll also want to validate existing objects and not define as many classes as |
Starting to hit some very real limitations of For example, I cannot say the arrays at property Past that, I wouldn't be able to say: "the value of property 'c' must be a property of a sibling object". |
some more thoughts after taking an in-depth look for the IO Assumptions1. TYpes of Elements in SpatialData - Image `type: Image`
- Regions `type: Union[Labels, Shapes]`
- Labels `type: Labels`
- Shapes `type: Shapes` (base type)
- Polygons `type: Polygons`
- Circles `type: Circles`
- Squares `type: Squares`
- Points `type: Points`
- Tables `type: Tables` 2. Native PyData types v. new spatialdata classes
However, what we really need right now for a data type that represent an element is:
It seems that we can do both points by simply saving the above objects in e.g. User interaction with
|
Thanks for the information, the approach and code looks good to me. If I understood correctly you propose to drop the classes For example it's handy to have types that can be checked and used to act differently depending on the spatial element type. Also in this way there is only the type I would propose for the moment to expose only native types to the users, and to store all the information inside them (axes, transformations, coordinate systems), but for convenience to keep having thin wrappers to them in the code, at least for the moment. So for instance |
Short reply for now: @giovp, there's been some previous discussion about this on the zarr gitter. |
thanks for both answers, I think I'll go ahead and do a first implementation of this. @LucaMarconato I think your comment is worth further discussion so I'll create a new separate issue. |
Closing, schemas have been implemented. |
A number of our "elements" will be data structures defined in different libraries with specific contents. How do we define this programmatically?
This has come up many times, so I'm opening this issue to start collecting thoughts and possible solutions.
Tools/ approaches
pydantic
– the data validation tool so beloved it delayed a python release.panderas
– Specifications for pandas dataframes, integrated withpydantic
. They would like to be able to validate other numeric objects (e.g. from xarray) Abstract out validation logic to support non-pandas dataframes, e.g. spark, dask, etc unionai-oss/pandera#381xarray-schema
– xarray-schema lets one define schema's for xarray objects. It's mentioned in a number of xarray issues, often in conjunction with pydantic (Representing & checking Dataset schemas pydata/xarray#1900).spatial-image
– Thespatial-image
package usesxarray-dataclasses
to map between an in memory xarray DataArray and OME-NGFF images.Requirements
I think we would need to use something fairly extendable. That is, we are using a number of classes including
pd.DataFrame
,ad.AnnData
,xr.DataArray
. We should ideally be able to express a schema for all of these using a single framework.The text was updated successfully, but these errors were encountered: