On disk versioning feedback #149

ivirshup · 2023-02-23T16:09:51Z

Two main points of feedback on how versioning is done for the on-disk format. One minor one.

Version the elements separately

I think it makes a lot of sense to version the elements separately. For example, I can see wanting to change how points are encoded, but only that. I don't think we need to bump the version number for tables, shapes, images, just because points changed.

Use something other than `@type`

I think @type is a weird option for reserved keyword since it's not how either anndata or OME specifies the type of an element. How we can do "metadata conventions" in a consistent way has been discussed around zarr quite a few times (e.g. ZEP 4). I'm not completley settled on what should be done, but can suggest some alternatives.

I'll try and figure out which option I like the most ASAP.

Use anndata's `encoding_type`, `encoding_version`

We could just use anndata's encoding_type, encoding_version.

Advantages

consistency with anndata
Could eventually hook into the anndata IO dispatch system

Main disadvantage

The spatial data table is a superset of an AnnData. So if we're now using the same key, what's the type? If it's anndata, the attributes won't be part of the spec. If it's spatialdata-table or ngff:regions_table it's overwriting that it's also an anndata.

That said, the anndata library could just register ngff:regions_table to be read in as an AnnData. This library could even do that registration with:

from anndata._io.specs import _REGISTRY, IOSpec
from anndata._io.specs.methods import read_anndata
import zarr

_REGISTRY.register_read(zarr.Group, IOSpec("ngff:region_table", "0.5"))(read_anndata)

But that's only if you DON'T want to do something custom at read time, e.g. anything with the other attributes.

Use some separate field

This could be something like spatialdata-encoding-type. I personally would like this over @type, since @type is a kinda taken convention, which nothing else here is

Use some separate name-spaced field

I think long term it would make sense to namespace these things. For example:

.attrs = {
	"encoding": {
		"spatialdata": {"type": "spatialdata-table", "0.1"},
		"anndata": {"type": "anndata", "version": "0.1.0"},
		"ngff": {"type": "table", "version": "0.5"}
	}
	...
}

Where spatialdata knows that it's a superset of "anndata", and to check for attrs["encoding"]["anndata"].

While this would eventually require coordination, one could start doing it here. Would probably require more thought as I think this will actually be an issue for arrow tables.

Minor point: also write the software version into the attributes

Something that mudata does (like this), which could be quite helpful for debugging, is to write the library and version into metadata.

It would also be nice for devs to see that one weird file is weird because it was written by a development version.

The text was updated successfully, but these errors were encountered:

LucaMarconato · 2023-02-23T21:03:52Z

Thanks for initiating the discussion.

I agree on the versioning of the elements separately and on having the software version in the attributes. I am not familiar with the rest so I won't comment it.

LucaMarconato · 2024-12-01T12:52:46Z

An update on this:

versioning the elements is now implemented
we ended up using the approach described from Isaac as "Use some separate field". This is specifically how the AnnData table metadata written on disk looks like:

{                                                                                                                                                
    "encoding-type": "anndata",
    "encoding-version": "0.1.0",
    "instance_key": "cell_id",
    "region": "cells",
    "region_key": "region",
    "spatialdata-encoding-type": "ngff:regions_table",
    "version": "0.1"
}

The only open point is the following:

write the software version into the attribute.

LucaMarconato · 2024-12-01T13:12:49Z

Now also the last point is implemented (in #711, which when merged will close this issue). I currently don't write this for reach element (so I don't have to bump the version number for
each of them), but only at the root level. This is how the metadata looks like in the .zattrs:

    "spatialdata_attrs": {
        "spatialdata_software_version": "0.2.5.post1.dev11+gdc4f508.d20241113",
        "version": "0.1"
    }
}

ivirshup added the I/O 💿 label Feb 23, 2023

giovp mentioned this issue Feb 24, 2023

cleanup format #154

Merged

LucaMarconato linked a pull request Dec 1, 2024 that will close this issue

Adding attrs at the SpatialData object level #711

Merged

7 tasks

LucaMarconato closed this as completed in #711 Dec 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On disk versioning feedback #149

On disk versioning feedback #149

ivirshup commented Feb 23, 2023

LucaMarconato commented Feb 23, 2023 •

edited

Loading

LucaMarconato commented Dec 1, 2024 •

edited

Loading

LucaMarconato commented Dec 1, 2024 •

edited

Loading

On disk versioning feedback #149

On disk versioning feedback #149

Comments

ivirshup commented Feb 23, 2023

Version the elements separately

Use something other than @type

Use anndata's encoding_type, encoding_version

Use some separate field

Use some separate name-spaced field

Minor point: also write the software version into the attributes

LucaMarconato commented Feb 23, 2023 • edited Loading

LucaMarconato commented Dec 1, 2024 • edited Loading

LucaMarconato commented Dec 1, 2024 • edited Loading

Use something other than `@type`

Use anndata's `encoding_type`, `encoding_version`

LucaMarconato commented Feb 23, 2023 •

edited

Loading

LucaMarconato commented Dec 1, 2024 •

edited

Loading

LucaMarconato commented Dec 1, 2024 •

edited

Loading