-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequencies #497
Frequencies #497
Conversation
This is so cool. Thanks for getting this up. I'm totally sold this is a good direction. I want to dig in more, but I had a couple immediate thoughts:
This shows the 2008-2009 stream as persisting to 2018 at 5%. This could be an issue in the underlying frequency calculations of course, but still should be figured out. |
@trvrb looking at something like http://localhost:4000/flu/h3n2/ha/12y?c=num_date&dmax=2014-02-23&m=num_date (i.e. max date feb 2014), there's hundreds of tips that have
|
Streams rise up in frequency from the bottom in 6acacd3 |
closes #357 |
The stacking order is now determined by the rise over time, which looks great for genotypes, but a bit weird for date (probably due to the above issues). I'm going to stop development on this for a while while the validity of the tip frequencies are sorted out. Latest version is up on https://auspice-dev.herokuapp.com/flu |
Thanks for the adjustments @jameshadfield. Regarding stacking order, it seems most natural to me to keep it it in the same order as the color legend. For continuous colorBys like epitope it makes a lot more sense to have it cleanly ordered by value. Right now epitope looks funny to me: I see why you did this for genotypes as currently legend ordering doesn't correspond to appearance, like so: I would suggest keeping frequencies tied exactly to legend ordering but fix legend ordering for genotypes to reflect when particular genotypes were first observed. |
@trvrb order now the same as tree legend. Here's the screenshot now. |
This property was introduced with the original frequencies work¹ as an anticipated need², but it was never used. Omit it for now to avoid carrying around unnecessary baggage; it can be added back in the future easily if its time comes. I uncovered this while authoring a JSON Schema for the tip-frequencies format.³ ¹ In PR #497 as a7bda1e. ² nextstrain/augur#83 (comment) ³ nextstrain/augur#852
This PR implements some sort of frequency stream graph (the devil is in the details). cc @huddlej @trvrb @rneher
Availability:
git checkout frequencies bash scripts/get_data.sh python scripts/convert_augur_frequency_json.py # creates _pivots.json from _frequencies.json npm run start:local
How are frequencies calculated
The
global_clade:X
entries in the frequencies JSON are extracted into a separate smaller JSON (called_pivots.json
). For each tip in the tree (not internal branches) the frequencies (i.e. the array with n entries, where n is the number of pivots) are binned by thecolorBy
value (or the bounds if the scale is continuous). Non-visible tips (e.g. via filtering) are not included. For each of these bins, at each pivot point, the frequencies are summed and potentially normalized (see below). This allows the data to reflect the selectedcolorBy
as well as any enabled filters.The frequencies panel
Known bugs / things to do
genotype doesn't work< 0.0002
colorBy
of clade / named clades (or some ability to select this) would result in these being different streams in the graph.sometimes the stacking order is wrongScreenshots
12y coloured by date, all tips selected

12y coloured by antigenic advance, all tips selected

same data but restricted to asia

12y coloured by region, all tips selected
