Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter lineage_mutations densely #90

Open
mindoftea opened this issue Jun 7, 2023 · 2 comments
Open

Filter lineage_mutations densely #90

mindoftea opened this issue Jun 7, 2023 · 2 comments

Comments

@mindoftea
Copy link
Collaborator

In the lineage_mutations API handler (see /web/handlers/v2/genomics/lineage_mutations.py#L124), the response currently includes "None" values for all lineage/mutation pairs that occur at less than the cutoff frequency (a sparse response). It would be more useful to users of the Python package if instead the response was dense, containing as few "None"s as possible. This could be achieved by changing the frequency cutoff filter so that instead it removes only the mutations (columns) where no lineage matching the query is above the cutoff. Then the data which are "None" in the current response would be replaced by numeric values below the cutoff; remaining non-numeric values could be set to zero.

This should be a small change to one file, however we need to test whether it could potentially cause any issues for the front end before deploying to prod, particularly the lineage comparison tool.

@newgene
Copy link
Contributor

newgene commented Jun 7, 2023

I think either we set these "None" value to a valid null value in JSON, or just remove that field completely from the JSON object when it's "None". It should be sufficient to exclude these hits in the response.

@mindoftea
Copy link
Collaborator Author

I think that the preferred response for outbreakpy users would be a complete table with no nulls or missing values, but we could return nulls and tell downstream users that they need to query again with a lower cutoff frequency to get this data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants