You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is your suggestion to extract the representative genome for a meta_ and ext_ mOTUs?
E.g., if we download meta_mOTU_v3_12240, it downloads 4361 genomes (even if we only choose one of the genomes it downloads all, but I saw it is there on your todo list already), and these genomes are ranging from ~800KB to ~4.8MB.
Our plan is to annotate the genomes we found in our metagenomics samples, and use the list of genes for further analysis. We have a list of about 2000 mOTUs (1/3 are ref, 2/3 are meta and ext), ideally we would like to end up with the same number of genomes to annotate (by prokka).
Should we use the genome that is the closest to the median / mean of the genome sizes in the mOTU?
Thanks in advance for your help
The text was updated successfully, but these errors were encountered:
Then you could either choose the genome with the best parameters (highest completeness and lowest contamination), or you could choose the genome that is in a centroid position. In other words, the genome that has the lowest distance to all other genomes in the cluster. You could calculate the distance with fastANI or MASH.
Thanks, I thought about a solution that doesn't require that much of re-processing. One would think that when these clusters / mOTUs were originally formed, something like this has been done already. Would be nice to have access to that data.
Hi,
What is your suggestion to extract the representative genome for a meta_ and ext_ mOTUs?
E.g., if we download meta_mOTU_v3_12240, it downloads 4361 genomes (even if we only choose one of the genomes it downloads all, but I saw it is there on your todo list already), and these genomes are ranging from ~800KB to ~4.8MB.
Our plan is to annotate the genomes we found in our metagenomics samples, and use the list of genes for further analysis. We have a list of about 2000 mOTUs (1/3 are ref, 2/3 are meta and ext), ideally we would like to end up with the same number of genomes to annotate (by prokka).
Should we use the genome that is the closest to the median / mean of the genome sizes in the mOTU?
Thanks in advance for your help
The text was updated successfully, but these errors were encountered: