Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CNV frequencies functions to their own module #665

Merged
merged 25 commits into from
Dec 4, 2024

Conversation

jonbrenas
Copy link
Collaborator

@jonbrenas jonbrenas commented Nov 25, 2024

Resolves #663 and works towards #366.

This passed all tests locally ... despite the fact that some parameters in the functions in the new file were not defined at all which leads be to believe that we never created tests for these functions (maybe a notebook or a legacy tests was used instead). I will try to remedy this.

Note (mostly to myself): The utility functions shared between all frequency analyses are imported from snp_frq.py here (because that's still where they are) but depending on the order in which PRs get merged, this may need to be changed.

@jonbrenas
Copy link
Collaborator Author

jonbrenas commented Nov 25, 2024

I have been trying to move the tests and it results in some gnarly numba errors. @alimanfoo, could I ask for your help?

I think I found a solution (let's hope it doesn't break everything else).

@jonbrenas
Copy link
Collaborator Author

Note: I had to modify the test for drop_invariant checking that the DataFrame without drop_invariant was strictly bigger than the one with it as it turned out that it was not always the case. Because of the way the data is simulated (each (windows, sample) pair gets a random value between -1 and 12), it is possible that every gene is amplified and deleted in at least 1 sample in 1 cohort (meaning that max_af is > 0 for every possible CNV). A quick test on actual data showed that drop_invariant does its job there so I assume it is OK but I am open to other opinions.

@jonbrenas jonbrenas marked this pull request as ready for review December 2, 2024 15:04
@jonbrenas
Copy link
Collaborator Author

Thank you @alimanfoo for the code showing how to generate coverage. It looks like variant_query is indeed the only part that is not covered. The other thing is nobs_mode which is used in snp_frq.py but not in cnv_frq.py. I am wondering if it is a parameter that should be added as well.

@alimanfoo
Copy link
Member

alimanfoo commented Dec 2, 2024

Hi @jonbrenas, out of interest, were you thinking to aim to merge this first, and then #630 afterwards?

Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, thanks so much @jonbrenas. Couple of suggestions...

malariagen_data/anoph/cnv_frq.py Outdated Show resolved Hide resolved
malariagen_data/anoph/cnv_frq.py Outdated Show resolved Hide resolved
malariagen_data/anoph/cnv_frq.py Outdated Show resolved Hide resolved
malariagen_data/anoph/cnv_frq.py Outdated Show resolved Hide resolved
malariagen_data/anopheles.py Outdated Show resolved Hide resolved
@alimanfoo alimanfoo changed the title Moving CNV frequencies functions to their own file Refactor CNV frequencies functions to their own module Dec 2, 2024
@jonbrenas jonbrenas marked this pull request as draft December 3, 2024 14:48
@jonbrenas
Copy link
Collaborator Author

I still need to add the 'variant_query' test and, maybe, 'nobs_mode`.

@jonbrenas jonbrenas marked this pull request as ready for review December 3, 2024 17:00
Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed a small simplification.

malariagen_data/anoph/cnv_frq.py Outdated Show resolved Hide resolved
malariagen_data/anoph/cnv_frq.py Outdated Show resolved Hide resolved
malariagen_data/anoph/cnv_frq.py Outdated Show resolved Hide resolved
Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looking good!

@alimanfoo
Copy link
Member

alimanfoo commented Dec 3, 2024

(Hi @jonbrenas, feel free to merge if checks pass.) edit: see requested changes below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jonbrenas, I just realised, now that you have all the new unit tests in the tests/anoph/test_cnv_frq.py module, you can rip out all the old CNV-related tests against real data in the tests/test_ag3.py and tests/test_af1.py modules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw part of the motivation for the refactoring was to move all tests to using simulated data. The CNV-related tests are basically almost all that's remaining of the legacy tests against real data. It will be great to remove them, very close then to completing the mission!

@alimanfoo alimanfoo mentioned this pull request Dec 3, 2024
@jonbrenas jonbrenas requested a review from alimanfoo December 3, 2024 23:32
@jonbrenas
Copy link
Collaborator Author

Merge complete and all legacy cnv tests have been removed.

Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jonbrenas. Looks like the _check_frequency() function in test_ag3.py can also be removed.

@alimanfoo alimanfoo added the BMGF-068808 Work supported by BMGF grant INV-068808 (MalariaGEN 2024-2027). label Dec 4, 2024
Copy link
Member

@alimanfoo alimanfoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

@alimanfoo alimanfoo merged commit f5e40ca into master Dec 4, 2024
10 checks passed
@alimanfoo alimanfoo deleted the 663-move-cnv-frequencies branch December 4, 2024 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BMGF-068808 Work supported by BMGF grant INV-068808 (MalariaGEN 2024-2027).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cnv_frequencies is treated differently from snp_frequencies and hap_frequencies
2 participants