Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The number of splits in a bin containing all splits does not equal the number of items from the interactive interface #2373

Open
mschecht opened this issue Nov 11, 2024 · 2 comments
Assignees
Labels

Comments

@mschecht
Copy link
Contributor

Short description of the problem

The number of splits in a bin containing all splits in the interactive interface does not equal the number of items of misc-data.

anvi'o version

$ anvi-self-test --version
Anvi'o .......................................: marie (v8-dev)
Python .......................................: 3.10.13

Profile database .............................: 40
Contigs database .............................: 24
Pan database .................................: 21
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 4
tRNA-seq database ............................: 2

System info

macOS Sonoma 14.6.1

Detailed description of the issue

Hi anvi'o community! In my analyses, I use bins to group items that have NA misc-data with their surround splits to pass along information. Unfortunately, I began noticing discrepancies in the number of items of misc-data vs a number of total splits in a collection of all splits i.e when I export a collection containing all splits from a profile-db it does not equal the number of items in the misc-data.

Here is an example with a bin with everything:
image

It has 8,641 splits:
image

You can reproduce the above here:

cd TEST/

anvi-interactive

and load the collection: IQtree_test_all_bin (it's a big interface and may take a second to load)

However, the number of leaves of the tree and the number of items in the misc-data do not match:

anvi-export-misc-data -p PROFILE.db --target-data-table items -o items.txt

$ wc -l  items.txt
8683 items.txt

8683 without the headers

Furthermore, the tree in the interface has the same number of leaves at items misc data:

library("ape")

> read.tree("Ribosomal_L14-AA_subset_remove_long_seqs_aligned_maxiters_2_trimmed_filtered_IQTREE_ultrafast_bootstrap.contree)

Phylogenetic tree with 8682 tips and 8678 internal nodes.

Tip labels:
  TARA_SAMEA4397472_METAG_Ribosomal_L14_000000000033, TARA_SAMEA4397472_METAG_Ribosomal_L14_000000000095, TARA_SAMEA2623059_METAG_Ribosomal_L14_000000000001, TARA_SAMEA4397930_METAG_Ribosomal_L14_000000000016, TARA_SAMEA2620970_METAG_Ribosomal_L14_000000000095, BGEO_SAMN07136678_METAG_Ribosomal_L14_000000000018, ...
Node labels:
  , 100, 89, 56, 14, 30, ...

Unrooted; includes branch lengths.

8682 tree tips

I started chatting with @metehaansever about this issue last week but here is the formal documentation of the bug. Thanks in advance for the help and support.

Files / commands to reproduce the issue

https://uchicago.box.com/s/ggg4xso3qxrvdjphsyx006ay1uuzvjcd

@mschecht mschecht added the bug label Nov 11, 2024
@FlorianTrigodet
Copy link
Contributor

Not a bug from the interface, but there is something wrong with the tree called Rooted final A. That tree contains less items than what is present in the contigs.db.

# export the bad tree and a good one
$ anvi-export-items-order -p PROFILE.db -o Rooted_final_A.txt --name Rooted_final_A
$ anvi-export-items-order -p PROFILE.db -o IQTree.txt --name IQTree

#count num of item (they all contain 'split')
$ grep -o "split" Rooted_final_A.txt | wc -l
8641
$ grep -o "split" IQTree.txt | wc -l
8682

How is that possible to have a tree with less leafs than items in a profile.db, I have no idea. If you try to re-import the bad tree, anvi'o complains:

$ anvi-import-items-order -p PROFILE.db -i Rooted_final_A.txt --name toto
Target database ..............................: PROFILE.db
Database type ................................: profile

Order file path ..............................: Rooted_final_A.txt
Order data type ..............................: newick
Order name ...................................: toto



Config Error: Ehem. There is something wrong with the incoming items order data here :/
              Basically, the names found in your input data do not match to the item names
              found in the database. For example, this item
              "BATS_SAMN08390924_METAG_Ribosomal_L14_000000000108_split_00001" is in your
              database, but not in your input data

@mschecht mschecht self-assigned this Nov 13, 2024
@mschecht
Copy link
Contributor Author

Thanks for diving in @FlorianTrigodet! I made a reproducible example with the files attached above. At step 3 is exactly where the number of items in the EVERYTHING bin change from 8,862 → 8,641. It has to do with rotating the tree.

Step 1:

Change items order to IQTree
image

image

correct number of leaves

Step 2:

Root here:

image

image

image

correct number of leaves

Step 3:

Rotate here:

image

image

image

Mysteriously 221 leaves disappear :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants