Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More memory problems with SARS-CoV-2 #684

Closed
donkirkby opened this issue Feb 26, 2021 · 6 comments
Closed

More memory problems with SARS-CoV-2 #684

donkirkby opened this issue Feb 26, 2021 · 6 comments
Labels

Comments

@donkirkby
Copy link
Member

donkirkby commented Feb 26, 2021

Although #643 fixed all the known memory problems with SARS-CoV-2 samples, we just ran into a new one. Those errors occurred during aln2counts.py, but this error seems to happen during the remap step. Maybe bowtie2 is overwhelmed by so many reads on such a long reference? The sample with the error is COVIDVOC1WG-Unknown_S1 from the 29 Jan 2021 run. The two FASTQ files are about 900MB each. The remapped version took about 12 hours, and the assembled version took about 28 hours.

The error occured again on the remapped version with samples COVID242IPNT, COVID241IPNT, COVID236IPNT, COVID234IPNT, COVID223IPNT, and COVID230IPNT in the 24 Jul 2020.M01841 run. The remapped version took just over two hours for sample COVID242IPNT, the fastest so far.

@donkirkby donkirkby added the bug label Feb 26, 2021
@donkirkby donkirkby added this to the 7.15 milestone Feb 26, 2021
@donkirkby
Copy link
Member Author

In order to test how much memory is used, and where it exceeds the limit, you can limit memory usage the same way that Slurm does, using cgroups. I mostly followed this post, but I had to use the -a and -t options on cgcreate.

@donkirkby
Copy link
Member Author

I reproduced the problem on my workstation with sample COVID242IPNT-Unknown_S69 from the 24 Jul 2020.M01841 run. It exceeded the memory limit after 54 minutes. I think it was in the remap stage.

@donkirkby
Copy link
Member Author

I pinned it down to the Gotoh call, so this bug is more motivation to replace Gotoh. (See #556.)

@ArtPoon
Copy link
Contributor

ArtPoon commented Apr 29, 2021

Please note this is not gotoh2! "Gotoh" is a wrapper of alignment code that was in use by the lab well before I joined..

@donkirkby
Copy link
Member Author

That's true, and it's also not really a bug in the Gotoh code. It's just that Gotoh is too memory intensive for aligning sequences as long as SARS-CoV-2. I'm experimenting with minimap2.

@ArtPoon
Copy link
Contributor

ArtPoon commented Apr 29, 2021

I would be happy to see the end of the Gotoh code - it was completely unmaintainable and horrible before I put in the time trying to clean and document it, and it's still lousy!

We've found minimap2 to be quite memory efficient indeed:
https://github.com/PoonLab/covizu/blob/master/covizu/minimap2.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants