Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling eukaryote contigs? #54

Open
timghaly opened this issue Feb 2, 2024 · 3 comments
Open

Profiling eukaryote contigs? #54

timghaly opened this issue Feb 2, 2024 · 3 comments

Comments

@timghaly
Copy link

timghaly commented Feb 2, 2024

This looks like a great tool.

I'm wondering though how well metabuli would perform classifying environmental mciroeukaryotes. Particularly, because it looks like that Prodigal is used to generate the databases, which is not ideal for euk gene predictions. Would metabuli outperform MMSeqs2 Taxonomy with nr database for assigning euk taxonomy to contigs? If so, what metabuli database would be best suited?

Thanks!

@jaebeom-kim
Copy link
Member

Thank you for reaching out!
Great question!

You are right. Prodigal is developed for prokaryote genomes, so its predicted ORFs of eukaryotes are not meaningful.
However, even with the wrong ORFs, exact DNA 24-mer matches can be still found because query read is translated with all possible frames.
So, I'd like to say Metabuli can be as good as other DNA k-mer-based tools for eukaryotes.

If you use protein-based search like MMseqs2, reads from intergenic region cannot be mapped to any sequence in database. If your contigs are long enough to contain at least one protein coding gene, it would be fine.
For eukaryotes, Metabuli's advantage over MMseqs2 will be the ability to use the non-coding / intergenic regions.
However, we don't have any pre-built eukaryote database, yet.
We are planning to provide an index using NCBI's nt database.
I hope it will help you.

Thank you again:)

@timghaly
Copy link
Author

timghaly commented Feb 5, 2024

Okay, great. Thanks for you answer. I will give it a go after you release the indexed nt database.

Thanks for you help!

@JonathonMifsud
Copy link

+1 on an indexed nt database, this would be very useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants