Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract representative genome from a motu #4

Open
lkalmar opened this issue Mar 14, 2023 · 2 comments
Open

Extract representative genome from a motu #4

lkalmar opened this issue Mar 14, 2023 · 2 comments
Labels
question Further information is requested

Comments

@lkalmar
Copy link

lkalmar commented Mar 14, 2023

Hi,

What is your suggestion to extract the representative genome for a meta_ and ext_ mOTUs?

E.g., if we download meta_mOTU_v3_12240, it downloads 4361 genomes (even if we only choose one of the genomes it downloads all, but I saw it is there on your todo list already), and these genomes are ranging from ~800KB to ~4.8MB.

Our plan is to annotate the genomes we found in our metagenomics samples, and use the list of genes for further analysis. We have a list of about 2000 mOTUs (1/3 are ref, 2/3 are meta and ext), ideally we would like to end up with the same number of genomes to annotate (by prokka).

Should we use the genome that is the closest to the median / mean of the genome sizes in the mOTU?

Thanks in advance for your help

@AlessioMilanese AlessioMilanese added the question Further information is requested label Mar 19, 2023
@AlessioMilanese
Copy link
Member

Hi,

I would filter genomes based on completeness and contamination (based on CHECKM). You can find this information here:
https://zenodo.org/record/7146984#.ZBa4qbTMIbk

Then you could either choose the genome with the best parameters (highest completeness and lowest contamination), or you could choose the genome that is in a centroid position. In other words, the genome that has the lowest distance to all other genomes in the cluster. You could calculate the distance with fastANI or MASH.

@lkalmar
Copy link
Author

lkalmar commented Mar 20, 2023

Thanks, I thought about a solution that doesn't require that much of re-processing. One would think that when these clusters / mOTUs were originally formed, something like this has been done already. Would be nice to have access to that data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants