Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plot_tree from phyloseq missing branches names #796

Closed
njbayonav opened this issue Jul 25, 2017 · 2 comments
Closed

plot_tree from phyloseq missing branches names #796

njbayonav opened this issue Jul 25, 2017 · 2 comments

Comments

@njbayonav
Copy link

njbayonav commented Jul 25, 2017

Hey all,

I have a dataset that is a subset from the original qiime output. What I did was:

1: Filter those OTUs that I am interested in from the OTU table using qiime:
filter_taxa_from_otu_table.py -i ITS1.biom -o Hymenochaetaceae_ITS1.biom -p f__Hymenochaetaceae

2: Then create a new fasta file that only had sequences from those IDs from the new OTU table:
filter_fasta.py -f rep_set.fna -o Hymenochaetaceae_ITS1.fna -b Hymenochaetaceae_ITS1.biom

3: Then align this fasta file and make the tree
align_seqs.py -i Hymenochaetaceae_ITS1.fna -m muscle -o Hymenochaetaceae_ITS1/
make_phylogeny.py -i Hymenochaetaceae_ITS1/Hymenochaetaceae_ITS1_aligned.fasta -o Hymenochaetaceae_ITS1_phylo.tre

  1. Then import the biom and the tree file in R using phyloseq and plot the tree
    biom="Hymenochaetaceae_ITS1.biom" tree="Hymenochaetaceae_ITS1_phylo.tre" biom=import_biom(biom,treefilename = tree, parseFunction=parse_taxonomy_default) tree=plot_tree(biom, nodelabf=nodeplotblank, ladderize="left", label.tips = "sample_Sample", title = "Hymenochaetaceae_ITS1",text.size = 3, color="Rank7")
    And finally, as you can see in the figure, my problem is that I get the tree plotted but not all branches has names on them. I guess those might be low abundant? But I am not totally sure what is happening.
    hymenochaetaceae_its1

I will appreciate any info!
Natalia

@njbayonav
Copy link
Author

I just created a table with the counts of reads of each species within each sample.
merged = merge_samples(biom, "Sample") genfac = factor(tax_table(merged)[, "Rank7"]) gentab = apply(otu_table(merged), MARGIN = 1, function(x) { tapply(x, INDEX = genfac, FUN = sum, na.rm = TRUE, simplify = TRUE) }) gentab

screen shot 2017-07-25 at 5 59 04 pm

In the table you will find:
-In purple, samples with no hits and no labels in the tree
-In yellow, samples with low abundance but with labels in the tree (3004T1)
-In orange, sample with hits, from low abundant to high abundant, but no labels in the tree.
-In blue, sample with high number of hits and label in the tree.

Then, I have more questions. I don't understand:

  1. How samples without hits passed any of these two filtered explained above: filter_taxa_from_otu_table.py or filter_fasta.py.
  2. Why some branches are labeled even if they have low absolute abundance? And some others are not, even if they have high abundance? What was the criterion?

@njbayonav
Copy link
Author

I noticed that the sequences corresponded to reference sequences that were not in my dataset so I filtered those out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant