Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a usegroup? #19

Open
tdlong opened this issue Jun 28, 2018 · 6 comments
Open

Is there a usegroup? #19

tdlong opened this issue Jun 28, 2018 · 6 comments

Comments

@tdlong
Copy link

tdlong commented Jun 28, 2018

  1. I was annotating a mammalian genome and the program crashed. There does not appear to be intermediate files (beyond logs). Before I profile is it likely I just went above 500Gb of RAM and should be running this on a high memory node?

  2. I am getting lots and lots of warnings about reads having multiple hits and/or mapping to multiple chromosomes. This does not surprise me as I am giving the program the bam file from HISAT2 (and mammalian genomes have lots of pseudogenes, etc). Should that bam file be pre-processed to only consider "-q 30" read pairs? What is "best practice".

@ruolin
Copy link
Owner

ruolin commented Jun 29, 2018

@tdlong Hi I would love to create a user group. Thank you for the suggestion!

  1. How large is your bam file? If it went beyond RAM capacity, it can fail. But it might be bad memory allocations. It will be great if you can provide me such a BAM file so I can do something about it.

  2. Yep. "-q 10" might be enough. I would recommend it. I am currently working on a User guide. But Haven't quite finished. I will include the best practice in the user guide.

@tdlong
Copy link
Author

tdlong commented Jun 29, 2018 via email

@ruolin
Copy link
Owner

ruolin commented Jul 2, 2018

@tdlong First of all, thanks for using Strawberry and giving me some feedback. I really appreciate it. And please let me know after you profile it. I am willing to work with you to fix any issues/bugs if you found.

For your purpose, I agree if you have a very large bam. You can select a subset by using a known set of loci. But those highly expressed genes might be somehow in your interest.

I am considering a feature to process BAM file on the fly to avoid such mem problems. I am also interested in know if you have problems running other de novo assemblers, like Cufflinks or StringTie?

@tdlong
Copy link
Author

tdlong commented Jul 2, 2018 via email

@ruolin
Copy link
Owner

ruolin commented Jul 3, 2018

@tdlong Currently the parallelization can be improved a lot. The current multithreading has a huge overhead of dispatching. So I am not very surprised to see the low cpu load. Actually, I am recommending using 10 cores as the maximum for now so you won't waste the resources. Better multithreading is a feature I am working on right now. Right now using -p 10 is enough.

@tdlong
Copy link
Author

tdlong commented Jul 5, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants