Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clade Specification by AA and Nuc #206

Merged
merged 6 commits into from
Aug 27, 2018
Merged

Clade Specification by AA and Nuc #206

merged 6 commits into from
Aug 27, 2018

Conversation

emmahodcroft
Copy link
Member

This PR should allow specification of clades by nucleotide and/or AA, as was previously done (hopefully) in old augur.

I based this from what I found in the flu builds, but it would be good for someone to check this does all it needs to - particularly for flu! I preserved the method of specifying the mutations by 'gene position alt' to make moving these definitions easier.

One change is that translate now writes out mutations on the root node. For Fasta-input this is empty, but for VCF input it will contain mutations from the reference. This was mostly to make utils read_node_data work - otherwise the tree and aa_muts JSON don't have the same number of nodes. But it has a nice side-effect of being able to see AA (and in particular DRM) mutations that apply to the whole dataset with VCF input - matching what's often shown in TB papers, for example.

All the rest slots nicely into existing auspice functionality (clade_annotation sets node labels, clade_membership can be a colorBy), and if 633 is merged in auspice, the clades specified here can allow zooming on the tree by clade via URL.

I've changed the TB build so that it now uses this rather than the 'trait' 'cluster', and I've added it to the TB test as well - though it doesn't work perfectly as a node cannot be a clade, so it looks a bit off for a couple of nodes.

Clades in TB (rather than cluster, as previously):
image
Clade labels persist even when you change the colorBy.

Copy link
Contributor

@huddlej huddlej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is excellent and very helpful for the migration of flu to modular augur. I haven't run the code yet, but I've included some (sort of pedantic) comments below to improve code readability.

augur/clades.py Outdated
'''
Determines whether a node contains all mutations that define a clade
'''
isClade = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, isClade should be snakecased like is_clade.

augur/clades.py Outdated
isClade = False
for gene, muts in clade_mutations.items():
if (gene in node_muts and node_muts[gene] != []):
prs_muts = [(int(tu[1:-1]), tu[-1]) for tu in node_muts[gene]] #get mutations in right format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard to tell what is happening here based on the variable names and operations alone. Can you rename these variables (especially prs_muts) to be more descriptive and include an inline comment above this line with an example of what tu looks like?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comparison with empty list is not ideal -- I am always worried it might be comparing pointers or similar. The != [] isn't actually needed, empty lists evaluate to false.

@@ -170,6 +170,17 @@ def construct_mut(start, pos, end):
def assign_aa_vcf(tree, translations):
aa_muts = {}

#get mutations on the root
root = tree.get_nonterminals()[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use tree.root to get the root node.

augur/clades.py Outdated

def read_in_clade_definitions(clade_file):
'''
Reads in tab-seperated file that defines clades by amino-acid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring made me think clades couldn't be defined by nucleotides, but seeing the real TB clade definitions clarified this. Could you update the docstring comment and example to mention nucleotides, too?

Copy link
Member

@rneher rneher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, pending minor improvements.

augur/clades.py Outdated
isClade = False
for gene, muts in clade_mutations.items():
if (gene in node_muts and node_muts[gene] != []):
prs_muts = [(int(tu[1:-1]), tu[-1]) for tu in node_muts[gene]] #get mutations in right format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comparison with empty list is not ideal -- I am always worried it might be comparing pointers or similar. The != [] isn't actually needed, empty lists evaluate to false.

augur/clades.py Outdated
node_data = read_node_data(args.mutations, args.tree)
if node_data is None:
print("ERROR: could not read node data (incl sequences)")
return -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

standard error code would be +1
http://tldp.org/LDP/abs/html/exitcodes.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from translate, so I'll change there as well

@emmahodcroft
Copy link
Member Author

Thank you both - happy to make these changes.

@emmahodcroft
Copy link
Member Author

If all is ok, can we merge this? It has changes to the TB run, so would be easier to merge before I pull TB out to a separate repo.

@rneher rneher merged commit 01eee25 into master Aug 27, 2018
@rneher rneher deleted the clade branch August 27, 2018 07:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants