Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: multiple tables in a SpatialData object #43

Closed
kevinyamauchi opened this issue Oct 28, 2022 · 0 comments · Fixed by #455
Closed

Discussion: multiple tables in a SpatialData object #43

kevinyamauchi opened this issue Oct 28, 2022 · 0 comments · Fixed by #455
Labels
API About the API/ UX

Comments

@kevinyamauchi
Copy link
Collaborator

This issue is to discuss the possibility of allowing multiple tables per SpatialData object. The implementation may be a list of tables, a dictionary of tables, or a MuData object containing multiple tables. Below are the notes from the Oct 28, 2022 hackathon in Munich. cc @giovp , @LucaMarconato , @ivirshup .

Single-table

Pro

  • Intuitive compounds statements for queries on tables
idx = (sdata.tables['expression'].obs.cell_type == "Neuron") & (sdata.obs.n_counts > 100)
sdata = sdata[idx, :]

Cons

  • We 100% need a SpatialDataContainer
sc = SpatialContainer('...')
sdata = sc.to_spatialdata(elements=['/images/...', '...'])

everyting = SpatialData('...')
sdata = everyhing.query.elements(elements=['/images/...', '...'])
subsdata = sdata.query.elements(elements=['/images/...'])
  • Points can't be annotated with a table -> For Points (but not for Circles), you can store expression inside the Points.

Multi-table

Pro

  • No need for SpatialDataContainer
  • Points and Circles are unified, because both can be annotated by a Table
sdata0 = sdata.query.coordinate_system('coordinate_system_name', filter_rows=False)
sdata1 = sdata.query.bounding_box()
sdata1 = sdata.query.polygon('/polygons/annotations')
sdata1 = sdata.query.table(table_key=..., attr: ["obs", "var"], query="""""")

see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html

import squidpy as sq

adata = sq.datasets.mibitof()
df = adata.obs
out1 = df.query("(point == 23) & (cell_size > 200)")
out2 = df.loc[(df.point == 23) & (df.cell_size > 200)]
pd.testing.assert_frame_equal(out1, out2)

Cons

  • Complex selection of multiple tables
idx = (sdata.table.obs.cell_type == "Neuron") & (sdata.table.obs.n_counts > 100)
sdata = sdata.query.table(key='expression', index=idx)

class SpatialData
    @property
    def table(self) -> AnnData:
        if len(self.tables) == 1:
            k = list(self.tables.keys())[0]
            return self.tables[k]
        else:
            raise ValueError("...")

('A > B' & 'cell_type == "Neuron"')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API About the API/ UX
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants