Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TableModel().validate() intentionally skips check for unique instance_key values (grouped by region_key values) #715

Open
LucaMarconato opened this issue Sep 28, 2024 · 0 comments

Comments

@LucaMarconato
Copy link
Member

The check of unique instance_key values (after grouping them by region_key values) is an expensive check, therefore it is performed only during table parsing and not validation.

Anyway, we should add a flag to validate() to allowing performing this check, and do this when writing.

We need to modify the tests for checking for this (something like uncommenting the code here below).

    @pytest.mark.parametrize("model", [TableModel])
    @pytest.mark.parametrize("region", [["sample_1"] * 5 + ["sample_2"] * 5])
    def test_table_instance_key_values_not_unique(self, model: TableModel, region: str | np.ndarray):
        region_key = "region"
        obs = pd.DataFrame(RNG.integers(0, 100, size=(10, 3)), columns=["A", "B", "C"])
        obs[region_key] = region
        obs["A"] = [1] * 5 + list(range(5))
        adata = AnnData(RNG.normal(size=(10, 2)), obs=obs)

        # check parse fails
        with pytest.raises(ValueError, match=re.escape("Instance key column for region(s) `sample_1`")):
            model.parse(adata, region=region, region_key=region_key, instance_key="A")
        # # check also validate fails
        # adata.uns[TableModel.ATTRS_KEY] = {
        #     TableModel.REGION_KEY: region,
        #     TableModel.REGION_KEY_KEY: region_key,
        #     TableModel.INSTANCE_KEY: "A",
        # }
        # with pytest.raises(ValueError, match=re.escape("Instance key column for region(s) `sample_1`")):
        #     model().validate(adata)
        # del adata.uns[TableModel.ATTRS_KEY]

        adata.obs["A"] = [1] * 10

        # check parse fails
        with pytest.raises(ValueError, match=re.escape("Instance key column for region(s) `sample_1, sample_2`")):
            model.parse(adata, region=region, region_key=region_key, instance_key="A")
        # # check also validate fails
        # adata.uns[TableModel.ATTRS_KEY] = {
        #     TableModel.REGION_KEY: region,
        #     TableModel.REGION_KEY_KEY: region_key,
        #     TableModel.INSTANCE_KEY: "A",
        # }
        # with pytest.raises(ValueError, match=re.escape("Instance key column for region(s) `sample_1, sample_2`")):
        #     model().validate(adata)
        # del adata.uns[TableModel.ATTRS_KEY]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant