Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gubbins output: internal_5 / internal_6 node? #246

Closed
pneumowidow opened this issue Mar 25, 2019 · 6 comments
Closed

Gubbins output: internal_5 / internal_6 node? #246

pneumowidow opened this issue Mar 25, 2019 · 6 comments

Comments

@pneumowidow
Copy link

Hello,
I'm trying to use Gubbins to identify recombination regions between different strains of Streptococcus pneumoniae which belong to the same clone or single clonal complex. I performed hybrid assembly and used Snippy to generate a core alignment from the contigs. Since Gubbins requires the full genome alignment, I followed the advice of @tseemann and processed the full core alignment as stated: snippy-clean_full_aln core.full.aln > clean.full.aln

Afterwards, I ran gubbins with the default command:
run_gubbins.py --outgroup Reference -t fasttree --prefix Gubbins_SLV clean.full.aln

And got the following result:
image

Based on my snippy results, I have 5,747 snps in strain CH-216 relative to the Reference, but Gubbins gives me 0 total snps for this strain relative to the Reference? Why don't I have any snps here?

On the other hand, I'm mainly confused about the internal_5 and internal_6 nodes in my output file. Based on the Gubbins user Manual, these are internal nodes subtended by the branch, but I don't really understand how they relate to my data. Is internal_5 node the internal branch for CH-216 strain and internal_6 for CH-266 strain? How can I interpret the snps and recombination at the internal nodes together with my data?

Apologies if these seem like pretty straight forward results, but as a non-evolutionary biologist, I really need some guidance here.

Many thanks in advance!

@pneumowidow
Copy link
Author

Actually, I just figure it out. Sorry.

@ezherman
Copy link

Could you explain where the 0 SNPs issue came from? I am having the same issue. Any help would be really appreciated!

@nickjcroucher
Copy link
Owner

The results are provided per branch - there are zero base substitutions on the terminal branch, because they are instead reconstructed as occurring on one of the internal branches. There is a node_labelled output file which labels the internal nodes of the tree.

@ezherman
Copy link

Hi @nickjcroucher, thank you for responding and for explaining this. I now realise that I was interpreting the "Total SNPs" wrongly.

Just to sanity check, would a branch with "0 SNPs" be expected for an isolate that is divergent from all the other isolates? As in the image below. There were SNPs detected by snippy for that isolate, but my understanding is that when gubbins infers a separate clade with a single isolate, this will always lead to "0 SNPs" as there is only a terminal branch.

image

@nickjcroucher
Copy link
Owner

Good question - zero mutations on a branch normally indicates there are no private mutations on a branch, so the isolate is closely related to at least one other isolate. You have highlighted a special case of an outgroup - this is descended directly from the root. The branch to your outgroup is artificially split in two by the root - Gubbins puts all the mutations on one of these two components, otherwise they are randomly split across the branches, which leads to false negatives and false positives when it comes to inferring recombination. You can sum the events over both halves of the root to get the overall divergence of your outgroup from the rest of the tree.

@ezherman
Copy link

I see, makes sense! Thanks so much for clarifying ⭐.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants