Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tables to metadata #347

Closed
tcompa opened this issue Mar 9, 2023 · 3 comments
Closed

Add tables to metadata #347

tcompa opened this issue Mar 9, 2023 · 3 comments
Labels
Tables AnnData and ROI/feature tables

Comments

@tcompa
Copy link
Collaborator

tcompa commented Mar 9, 2023

As per ome/ngff#64, there should be a tables key in .../tables/.zattrs.

# The tables group is a container which holds one or multiple tables that are compatible with AnnData.
        |
        │                     # The tables group MAY be in the root of the zarr file.
        ├── .zgroup           # The tables group MAY be in root or in another group.
        |
        ├── .zattrs           # .zattrs MUST contain "tables", which lists the keys of the subgroups that are tables. In this case, the only table is "my_table".
                              # hence .zattrs should be equal to { "tables": [ "my_table" ] }.
        |
        └── my_table
            │                     # The table group MAY be in the root of the zarr file.
            ├── .zgroup           # The table group MAY be in root or in another group.
            |
            ├── .zattrs           # .zattrs MUST contain "type", which is set to "ngff:region_table"
            |                     # .zattrs MUST contain "region", which is the path to the data the table is annotating.
            |                     # "region" MUST be a single path (single region) or an array of paths (multiple regions).
            |                     # "region" paths MUST be objects with a key "path" and the path value MUST be a string.
            |                     # .zattrs MUST contain "region_key" if "region" is an array. "region_key" is the key in obs denoting which region a given row corresponds to.
            |                     # .zattrs MAY contain "instance_key", which is the key in obs that denotes which instance in "region" the row corresponds to. If "instance_key" is not provided, the values from the obs .zattrs "_index" key is used.
            │
            ├── X                 # You MAY add an zarr array X.
            │   │                 # X MUST not be a complex type (i.e., MUST be a single type)
            │   │                 # X MAY be chunked as the user desires.
            │   ├── .zarray
            │   ├── 0.0
            │   │   ...
            │   └── n.m
@jluethi
Copy link
Collaborator

jluethi commented Mar 13, 2023

Let's also check whether with the current writer we use we get the relevant metadata in the obs & var folders etc :)

@tcompa
Copy link
Collaborator Author

tcompa commented Mar 13, 2023

Let's also check whether with the current writer we use we get the relevant metadata in the obs & var folders etc :)

We currently write anndata tables to disk via the anndata.experimental.write_elem function, which does not handle anything but the table - as far as I understand.
At the moment, I'm adding the OME-NGFF metadata by hand, acting on the zarr attrs objects. This can look like

        # (1) DEFINE ZARR GROUP
        group_tables = zarr.group(f"{in_path}/{component}/tables/")

        # (2) WRITE ANNDATA TABLE
        write_elem(group_tables, bounding_box_ROI_table_name, bbox_table)

        # (3) WRITE OME-NGFF METADATA
        if "tables" in group_tables.attrs.keys():   # FIXME: simplify this if/else
            current_tables = group_tables.attrs["tables"]
        else:
            current_tables = []
        if bounding_box_ROI_table_name in current_tables:  # FIXME: move this check to an earlier stage of the task
            raise ValueError(
                f"{in_path}/{component}/tables/ already includes "
                f"{bounding_box_ROI_table_name=} in {current_tables=}"
            )
        new_tables = current_tables + [bounding_box_ROI_table_name]
        group_tables.attrs["tables"] = new_tables

If we need to add zarr attrs for other hierarchy levels (e.g. inside obs or var), the simplest way would be to proceed in the same way, that is, by opening the right zarr group and acting on its attrs.

It seems to me that the anndata write_elem function does not mention anything related to writing zarr attrs.

@jluethi
Copy link
Collaborator

jluethi commented Mar 13, 2023

Ok. Let's see if ome-zarr-py comes up with useful wrappers for table writing and otherwise, once the spec is fully defined, manually write those .zattrs :)

@tcompa tcompa closed this as completed in 4d1f859 Mar 14, 2023
@jluethi jluethi moved this from Done to Done Archive in Fractal Project Management Apr 18, 2023
@tcompa tcompa added the Tables AnnData and ROI/feature tables label Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Tables AnnData and ROI/feature tables
Projects
None yet
Development

No branches or pull requests

2 participants