Why "recall_avg_seq" not average over the total number of genomes? #61

jiaojiaoguan · 2025-01-07T02:23:14Z

Dear authors,

I have a question about recall_avg_seq. Assuming I have 10 genomes, only 4 genomes's contigs are binned. The other 6 genomes' contigs are not present in any bins. When calculating the "recall_avg_seq", it will divided by the 4, not 10. I am wondering why design it. If I have two bin tools, tool1 bins 2 genomes and the recall for each genome is 0.9, then the recall_avg_seq will be 0.9. Tool 2 bins 3 genomes and the recall is 0.9,0.9,0.8. the recall_avg_seq is 0.87. But the tool 2 bins more genomes and have good completeness.

In addition, amber seems does not consider the overlap between contigs. For example, there is a genome whose length is 1000. One bin has two contigs covering positions 1-20 and 5-40. The other bin has two contigs covering positions 21-40 and 40-70. Because for the first bin, the genome recall is (20+36)/1000, and for the second bin, the genome recall is (20+31)/1000, amber will think the first bin is better. But the second bin covers more positions of the genomes, representing the real completeness. What do you think of it?

I am looking forward to your guidance.

Best,
Jiaojiao

fernandomeyer · 2025-01-07T09:07:42Z

Dear Jiaojiao,

I believe the answer to your first question can be found in my previous reply. To consider all genomes in the average recall, please use recall_avg_seq_cami1 instead of recall_avg_seq. In recall_avg_seq_cami1, the recall for unbinned genomes is treated as 0.

Currently, AMBER cannot use sequence positions as input, so it cannot account for overlaps. But that's a great observation, and we plan to incorporate a feature like that in the next version of AMBER.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why "recall_avg_seq" not average over the total number of genomes? #61

Why "recall_avg_seq" not average over the total number of genomes? #61

jiaojiaoguan commented Jan 7, 2025

fernandomeyer commented Jan 7, 2025

Why "recall_avg_seq" not average over the total number of genomes? #61

Why "recall_avg_seq" not average over the total number of genomes? #61

Comments

jiaojiaoguan commented Jan 7, 2025

fernandomeyer commented Jan 7, 2025