You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question about recall_avg_seq. Assuming I have 10 genomes, only 4 genomes's contigs are binned. The other 6 genomes' contigs are not present in any bins. When calculating the "recall_avg_seq", it will divided by the 4, not 10. I am wondering why design it. If I have two bin tools, tool1 bins 2 genomes and the recall for each genome is 0.9, then the recall_avg_seq will be 0.9. Tool 2 bins 3 genomes and the recall is 0.9,0.9,0.8. the recall_avg_seq is 0.87. But the tool 2 bins more genomes and have good completeness.
In addition, amber seems does not consider the overlap between contigs. For example, there is a genome whose length is 1000. One bin has two contigs covering positions 1-20 and 5-40. The other bin has two contigs covering positions 21-40 and 40-70. Because for the first bin, the genome recall is (20+36)/1000, and for the second bin, the genome recall is (20+31)/1000, amber will think the first bin is better. But the second bin covers more positions of the genomes, representing the real completeness. What do you think of it?
I am looking forward to your guidance.
Best,
Jiaojiao
The text was updated successfully, but these errors were encountered:
I believe the answer to your first question can be found in my previous reply. To consider all genomes in the average recall, please use recall_avg_seq_cami1 instead of recall_avg_seq. In recall_avg_seq_cami1, the recall for unbinned genomes is treated as 0.
Currently, AMBER cannot use sequence positions as input, so it cannot account for overlaps. But that's a great observation, and we plan to incorporate a feature like that in the next version of AMBER.
Dear authors,
I have a question about recall_avg_seq. Assuming I have 10 genomes, only 4 genomes's contigs are binned. The other 6 genomes' contigs are not present in any bins. When calculating the "recall_avg_seq", it will divided by the 4, not 10. I am wondering why design it. If I have two bin tools, tool1 bins 2 genomes and the recall for each genome is 0.9, then the recall_avg_seq will be 0.9. Tool 2 bins 3 genomes and the recall is 0.9,0.9,0.8. the recall_avg_seq is 0.87. But the tool 2 bins more genomes and have good completeness.
In addition, amber seems does not consider the overlap between contigs. For example, there is a genome whose length is 1000. One bin has two contigs covering positions 1-20 and 5-40. The other bin has two contigs covering positions 21-40 and 40-70. Because for the first bin, the genome recall is (20+36)/1000, and for the second bin, the genome recall is (20+31)/1000, amber will think the first bin is better. But the second bin covers more positions of the genomes, representing the real completeness. What do you think of it?
I am looking forward to your guidance.
Best,
Jiaojiao
The text was updated successfully, but these errors were encountered: