Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Reduction of memory consumption #628

Open
diego-rt opened this issue Feb 25, 2024 · 3 comments
Open

Feat: Reduction of memory consumption #628

diego-rt opened this issue Feb 25, 2024 · 3 comments
Assignees

Comments

@diego-rt
Copy link

Hello,

I'm working with a single cell ATAC dataset in an organism with a very large genome (32 Gbp). Therefore, the number of reads is quite high ~300M and I have a lot of candidate peaks:

INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #  The Hidden Markov Model for signals of binsize of 10 basepairs: 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #   open state index: state0 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #   nucleosomal state index: state2 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #   background state index: state1 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #   Starting probabilities of states: 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #                          open         bg        nuc 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #                         0.182     0.4838     0.3342 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #   HMM Transition probabilities: 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #                          open         bg        nuc 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #             open->      0.984   0.008203   0.007821 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #               bg->   0.004914     0.9406    0.05444 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #              nuc->   0.005975    0.06062     0.9334 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #   HMM Emissions (mean):  
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #                         short       mono         di        tri 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #             open:      0.4205      1.458      1.184     0.5775 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #               bg:    0.006671     0.8911   0.003568    0.00196 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #              nuc:      0.8497      1.735     0.2407   0.009357 
INFO  @ 25 Feb 2024 02:43:51: [35845 MB] #5 Decode with Viterbi to predict states 
INFO  @ 25 Feb 2024 02:45:29: [35845 MB] #5  Total candidate peaks : 1486723 

Unfortunately this means that the memory consumption is quite high and based on the current trend I guess I would need more than 1 Tb of memory to process using HMMRATAC:

INFO  @ 25 Feb 2024 19:33:03: [346296 MB] #    decoding 343000... 
INFO  @ 25 Feb 2024 19:39:18: [346623 MB] #    decoding 344000... 
INFO  @ 25 Feb 2024 19:45:37: [347801 MB] #    decoding 345000... 
INFO  @ 25 Feb 2024 19:53:59: [350527 MB] #    decoding 346000... 
INFO  @ 25 Feb 2024 20:00:37: [352610 MB] #    decoding 347000... 

Would be great if there was a way to decrease the memory consumption of this process.

Thanks a lot!

@taoliu
Copy link
Contributor

taoliu commented Feb 25, 2024

@diego-rt Thanks for the request! Yes. Optimizing memory usage for the decoding process is in our plan. Could you share with us the entire log -- even if it can't finish.

@diego-rt
Copy link
Author

Great to hear and thanks a lot for the quick reply! Here goes the full log.

Thank you!

HMMRATAC.log

@taoliu
Copy link
Contributor

taoliu commented Feb 25, 2024

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants