Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anvi-Profile on .bam files taking long time #1343

Closed
ReneKat opened this issue Feb 3, 2020 · 7 comments
Closed

Anvi-Profile on .bam files taking long time #1343

ReneKat opened this issue Feb 3, 2020 · 7 comments

Comments

@ReneKat
Copy link

ReneKat commented Feb 3, 2020

Anvi'o version ...............................: esther (v6.1)
Profile DB version ...........................: 31
Contigs DB version ...........................: 14
Pan DB version ...............................: 13
Genome data storage version ..................: 6
Auxiliary data storage version ...............: 2
Structure DB version .........................: 1

MacOSX 10.14.6 (Mojave)
MacbookPro 16GB

Hello anvio community,
I am analyzing metagenomic samples on my MacBookPro which only has 16GB of RAM.
When I run the anvi-profile command on one of my .bam files the ETA is normally below 5-min.

03 Feb 20 12:31:27 Profiling w/7 threads] 617 of 51410 contigs ⚙ / MEM ☠️ 4.15 GB ETA: 1m11s

However, after 24-hours of runtime had only processed ~4000 of my 51,000 contigs in that particular .bam file.
I have added the following parameters to no avail:

--skip-SNV-profiling
--write-buffer-size 300
-T 7

None of these recommended modifications lower the processing time.

Files located here: https://drive.google.com/drive/folders/15WgP20wYTZT5hAkEcqyDK4rN2U3zjBPF?usp=sharing

Thank you for any tips!
René

@ReneKat
Copy link
Author

ReneKat commented Feb 3, 2020

I installed anvio using conda and these instructions:
http://merenlab.org/2016/06/26/installation-v2/

@meren
Copy link
Member

meren commented Feb 3, 2020

Dear René,

There are multiple recent improvements by @ekiefl (such as #1339) that address profiling bottlenecks in master with noticeable gains in speed.

If you were to start using the master repo you may have tested those changes.

Best,

@ekiefl
Copy link
Contributor

ekiefl commented Feb 3, 2020

Hi ReneKat,

Please ignore that eta. It has already been fixed in the master branch.

Yes, in the latest version of anvio, anvi-profile is currently much slower than it has to be. If you are use the master branch, you will notice big speed increases with --skip-SNV-profiling. I am currently working on speeding up every aspect of anvi-profile, and so far it is going very well. Once this is finished I think we are planning a release. I won't provide an ETA but certainly within 2 months. Bad timing for your analysis.

@ReneKat
Copy link
Author

ReneKat commented Feb 5, 2020

Hello @meren and @ekiefl,

Thank you very much for your timely replies. I have followed instructions at
http://merenlab.org/2016/06/26/installation-v2/#following-the-active-codebase-youre-a-wizard-arry
the Following the active codebase (you’re a wizard, arry) instructions.

I am rerunning the anvi-profile command on my smallest 21MB .bam file.
The ETA is updated to be more realistic: ~10 hours.
As you can imagine, I have 40 .bam files up to 52MB; so this is going to take 400-hours? :/

In the meantime, I am trying to get anvi'o installed on our AWS server to have more computing power than my MacBook Pro. However, with my limited permissions, will only be able to install the official released version, not the git master.

Thanks again for your time and efforts!

Best,
René

@ReneKat
Copy link
Author

ReneKat commented Feb 5, 2020

Okay, I forgot one very important argument when re-running my .bam file on the master branch:
The --skip-SNV-profiling argument. Once I included that, anvi-profile finished its job in less than a minute!! Wow!
Thanks so much!

@ReneKat ReneKat closed this as completed Feb 5, 2020
@ekiefl
Copy link
Contributor

ekiefl commented Feb 5, 2020

Glad it was that fast. I'm actually going to reopen this issue so that it's on our radar. Once profiling is improved substantially I will comment here with the pull request.

@ekiefl ekiefl reopened this Feb 5, 2020
@ekiefl
Copy link
Contributor

ekiefl commented Mar 6, 2020

Changes are in master #1362

It's way faster.

@ekiefl ekiefl closed this as completed Mar 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants