Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor filter_table_by_elements #701

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Conversation

giovp
Copy link
Member

@giovp giovp commented Sep 5, 2024

I would like to refactor this function in order to support multiple tables.

Features

Let's say I have a region element shapes1, I would like to filter the table or tables that contain instances from that region. This filtering should have the following features:

  • do not copy, but return a view
  • optionally return a mapping table_name -> instances or the same list of tables

@melonora @LucaMarconato I have looked back at the multiple table in-memory design, but I couldn't find a clear description of any constraint or expected behaviour in the conversation or in the design document. In particular, from our previous conversations, I understood that:

  • a table can map to multiple elements (e.g. diff views of the same annotation)
  • a table can have rows that map to an element, and rows that don't map to any element
  • a table can also not map to any element

can two tables map to the same element?
Thanks for the clarification

Copy link

codecov bot commented Sep 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.88%. Comparing base (774b492) to head (1d0519e).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #701      +/-   ##
==========================================
+ Coverage   91.83%   91.88%   +0.05%     
==========================================
  Files          44       44              
  Lines        6781     6780       -1     
==========================================
+ Hits         6227     6230       +3     
+ Misses        554      550       -4     
Files with missing lines Coverage Δ
src/spatialdata/_core/query/relational_query.py 91.26% <100.00%> (+0.60%) ⬆️
src/spatialdata/_core/spatialdata.py 90.94% <100.00%> (+0.05%) ⬆️

... and 1 file with indirect coverage changes

@melonora
Copy link
Collaborator

melonora commented Sep 9, 2024

Hi @giovp, yes 2 tables can map to the same element, that is no problem.

Copy link
Collaborator

@melonora melonora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @giovp. I would like a little bit more clarification why certain functions seem to have been recreated here as I think it duplicates some code.

Also, I think it would be good to be a bit more explicit in the docstrings of how this would differ from the join functions as in both we now have the terminology of filtering without specifying how it differs. I know it is a private function here, but good to have it stated I think.

src/spatialdata/_core/query/relational_query.py Outdated Show resolved Hide resolved
src/spatialdata/_core/query/relational_query.py Outdated Show resolved Hide resolved
Comment on lines -138 to +146
rng = np.random.default_rng(seed=0)
full_sdata["table"].obs["annotated_shapes"] = rng.choice(["circles", "poly"], size=full_sdata["table"].shape[0])
adata = full_sdata["table"]
adata = full_sdata["table"].copy()

circles_instances = full_sdata["circles"].index.values
poly_instances = full_sdata["poly"].index.values

adata = adata[: len(circles_instances) + len(poly_instances), :].copy()
adata.obs["annotated_shapes"] = ["circles"] * len(circles_instances) + ["poly"] * len(poly_instances)
adata.obs["instance_id"] = np.concatenate([circles_instances, poly_instances])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test had quite a big bug. Basically, the table natively from conftest annotates labels, but here it was re used to annotate shapes and circles. Now, both shapes and circles have 5 instances only, and so the table was being filtered only by coordinate system, but this meant that the table had the first five instances mapping to the poly/circles, but then all the other instances also present, which did not map to anything. This is because the filtering was happening with the now removed filter_table_by_element_names which wasn't checking for that.

I will add test so that the filter_table function always return correct tables.

@@ -58,31 +58,6 @@ def get_element_annotators(sdata: SpatialData, element_name: str) -> set[str]:
return table_names


def _filter_table_by_element_names(table: AnnData | None, element_names: str | list[str]) -> AnnData | None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is buggy, and it doesn't make sense with the new multiple table design, as it can return wrong tables. I removed it and refactored it in the other filter method. I will add test for this. See other comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants