Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring karyotype #689

Open
jonbrenas opened this issue Dec 6, 2024 · 1 comment
Open

Refactoring karyotype #689

jonbrenas opened this issue Dec 6, 2024 · 1 comment

Comments

@jonbrenas
Copy link
Collaborator

jonbrenas commented Dec 6, 2024

@cclarkson agreed with me that it might be a good idea to refactor karyotype so I move my comments from #652 to here.

The main issue, in my opinion, is that karyotype is a very special case because its code is in ag3.py (because we only have the inversion tags for ag), which means that its tests are in the legacy section (that we would like to get rid of). It also doesn't show up in the docs for the API. My instinct would be to rework it so that:

  1. The tags are stored on gcs instead of within a resources folder of malariagen_data, this would enable us to have them more readily available to users/partners, to version and update them (as we do with site_filters, for instance) or just update them (as we do with the metadata) as they get refined with more data (and the notebook already shows that they can be improved), and to have the structure in place if and when a similar list of tags is generated for funestus (which definitely has inversions), stephensi and darlingi (which probably have some inversions as well).
  2. The function is moved to anoph, same reason as above for future-proofing, even if for a while the function is hidden in funestus or even returns a NotImplementedError: the tags for the inversion {inversion} have not yet been generated
  3. It actually shows up in the docs
  4. It has tests on simulated data to be compliant with the rest of the package
  5. I think it would make sense to have multiple inversions in the same heatmap (the question then is do we modify karyotype to take a list of inversions or do we loop the function in the `karyotype_frequencies[_advanced]" functions)

Any opinions?

@jonbrenas
Copy link
Collaborator Author

This issue is now split into 4 (the last point is left to the person writing the code for the functions (probably myself) to choose): #697, #698, #699, #700.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants