Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong subsetting for drop groups in sample annotation #244

Closed
c-mertes opened this issue Aug 11, 2021 · 0 comments
Closed

Wrong subsetting for drop groups in sample annotation #244

c-mertes opened this issue Aug 11, 2021 · 0 comments
Labels
bug Something isn't working

Comments

@c-mertes
Copy link
Contributor

c-mertes commented Aug 11, 2021

When using 2 DROP_GROUP's where one is a substring of the other, the utils.subsetBy function will return false results.

drop/drop/utils.py

Lines 68 to 86 in 8df2d1e

def subsetBy(df, column, values, exact_match=True):
"""
Subset by one or more values of different columns from data frame
:param df: data frame
:param column: column to subset by
:param values: values to subset by
:param exact_match: default True. when False match substrings. Important for subsetting drop groups
:return: df subset by values and column
"""
if values is None:
return df
elif isinstance(values, str) and exact_match :
return df[df[column] == values]
elif not isinstance(values,str) and exact_match:
return df[df[column].isin(values)]
elif isinstance(values,str) and not exact_match:
return df[df[column].str.contains(values)]
else:
return df[df[column].str.contains("|".join(values))]

It should always be an exact match and not a fuzzy one. Here we should rather use a regex to select the requested rows. As cell entries can be separated by , like in the DROP_GROUP column, we could use something like:

For single string search:

    elif isinstance(values, str) and exact_match :
        return df[df[column].str.contains("(^|,)" + values + "(,|$)")]

For multi string search:

    elif isinstance(values, str) and exact_match :
        return df[df[column].str.contains("(^|,)" + "(" + "|".join([values, "abc"]) + ")" + "(,|$)")]

@mumichae @nickhsmith do you know exactly where this function is used? Only in the SampleAnnotation class?

@c-mertes c-mertes added the bug Something isn't working label Aug 11, 2021
@vyepez88 vyepez88 closed this as completed Oct 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants