Wish List Nov 2024 #394

emmahodcroft · 2024-11-19T11:32:40Z

Some ideas for CoVariants, priority/order shouldn't be taken as set in stone!

Except this one, high priority:

Stabilize website so there's no more risk of failing to build (may include storing data elsewhere)

Other ideas:

Improve graphs
- Allow zooming (there's some work on this here)
- Possibly making turning on/off variants easier
- Improve the legend (currently shows all null/0 values so it often runs off page)
- We currently have to store & then plot all null/0s or else the graphs don't plot correctly - this makes the files less efficient & leads to the annoying legends mentioned above
Allow website to be more customizable
- Allow users to pick from any available variants in Shared Mutations (customize visible columns)
- Allow users to show different genes outside of Spike for Shared Mutations (this information is currently already in clusters.py but could be copied out to different files like it is for Spike currently)
- Allow users to flip between showing the Nextstrain name ("23I") and the Pango name ("BA.2.86") across the website - ideally some kind of toggle at the top (or that moves with scroll) so they could turn this on/off easily anywhere and menus/plots would adjust (not necessary for page text or tables to adjust - I think). These are linked in clusters.py
- Allow users to specify what variants they'd like to show on the home page left-hand menu (lower priority)
Allow better defining mutations
- I currently manually curate defining mutations for new variants I add, there's no other resources that lists these that I know of. These files are found here (only on Github)
- I'll continue to manually curate for variants I track as this ensure the 'big ones' are absolutely 100% correct, but this information would be cool to a) display better & b) automatically generate for all variants (again, not available anywhere that I know of, but incredibly useful)
- Ideally we'd have auto-generated ones for all variants & this would be "overwritten" by a manual file if available (the manual file would display instead if detected)
- Some starting work on this was here, with a very ugly preview of the idea here -- ideas to make this prettier welcome!
- Cornelius would help us with writing the script to generate the files (he would probably generate them somewhere as part of his other workflows and we would pull them in)
Integrate better/expand to other pathogens?
- There's a 'frequencies' app by Neher lab that does flu (and in the works, some other stuff) (github) -- we'd need a longer convo to talk about the differences and complexities here but in theory it might be nice to align with them so that CoV plots could eventually be shown on this page, and in theory maybe we could both expand to other viruses

Backend stuff:

Potentially 'freeze' older data
- CoV was tracking variants long before 'Variants of Concern' existed, so I track some individual mutations (a bit of a crazy idea nowadays) and some variants that were never 'official'. This means a lot of overhead:
  - For current variants I simply follow Nextstrain & then can benefit from simply using their classification (already done in the files I receive) to partition into variants
  - However, for older ones they don't have the correct (for me) Nextstrain classification so I need to identify them by checking lists of mutations - this is very inefficient and takes a long time
  - I don't want to change how I currently count or plot the past as a) I do think these pre-variants are potentially genuinely interesting b) many people will have already built stuff expecting this to be stable
  - But it's very unlikely at this point that people are going to be uploading enough new sequences from 2020/2021 that this majorly changes the graphs. Thus, we could 'freeze' early data and not re-calculate older years everytime we re-run.
Improve backend efficiency generally
- With or without the above, running faster would be nice
- Reduce redundancy -- currently mut lists are often in 2-3 places, would be nice to have them in one place and then just get them for where they're needed
- Currently we rely on the display_name from clusters.py far too broadly and for too many things, and this makes things very inflexible and brittle... this should be modified (more convo/digging needed!)
- Adding a new variant could probably be made streamlined and more easy
- Set up automatic updates that go to staging?

The text was updated successfully, but these errors were encountered:

AdvancedCodingMonkey · 2024-12-02T13:27:48Z

Split into into separate issues to tackle

emmahodcroft added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers needs triage Pending maintainers' attention labels Nov 19, 2024

AdvancedCodingMonkey removed help wanted Extra attention is needed good first issue Good for newcomers needs triage Pending maintainers' attention labels Nov 28, 2024

AdvancedCodingMonkey closed this as completed Dec 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wish List Nov 2024 #394

Wish List Nov 2024 #394

emmahodcroft commented Nov 19, 2024

AdvancedCodingMonkey commented Dec 2, 2024

Wish List Nov 2024 #394

Wish List Nov 2024 #394

Comments

emmahodcroft commented Nov 19, 2024

AdvancedCodingMonkey commented Dec 2, 2024