Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to speed up MMseqs2 taxonomy assignment with GTDB database on large contigs? #424

Open
lucianhu opened this issue Oct 18, 2024 · 1 comment
Labels
question Further information is requested

Comments

@lucianhu
Copy link

lucianhu commented Oct 18, 2024

Hi nf-core/funcscan,

I am running the nf-core/funcscan pipeline to assign taxonomy to contigs using MMseqs2 with the GTDB database. My contigs are around 100-200 MB in size, and I am running the pipeline on a machine with the following specs:

  • 36 cores
  • 256 GB RAM

Despite utilizing all available resources, mmseq2 takes more than 4 hours per sample and does not finish. I am wondering if this runtime is normal or if there are ways to optimize the process to make it faster.

Questions:

  1. What are the common bottlenecks when running MMseqs2 with the GTDB database, and how can I address them?
  2. What is the expected runtime for MMseqs2 on contigs of this size?
  3. Are there specific MMseqs2 settings (e.g., sensitivity, database partitioning) that could help speed up the analysis without compromising too much accuracy?

Any advice or insights from your experience with MMseqs2 and GTDB would be appreciated!

Thanks

@lucianhu lucianhu added the bug Something isn't working label Oct 18, 2024
@jfy133
Copy link
Member

jfy133 commented Oct 18, 2024

@jfy133 jfy133 added question Further information is requested and removed bug Something isn't working labels Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants