Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue] Very high memory usage #4

Open
njspix opened this issue Jun 26, 2020 · 4 comments
Open

[Issue] Very high memory usage #4

njspix opened this issue Jun 26, 2020 · 4 comments
Assignees

Comments

@njspix
Copy link

njspix commented Jun 26, 2020

Describe the bug
Biscuit uses memory in excess of 200G when mapping relatively small .fastq files against a joint reference.

BISCUIT version
Version: 0.3.16.20200420

Minimally Reproducible Example

  1. Reference genome: joint hg38d1 and mm10 reference
  2. Small dataset to run on (make sure the dataset returns the same error) OneDrive link
    These are a set of paired-end fastq files ~60MB/1.9 mn reads in size. They are negative controls from a single-cell experiment so probably will not map well.
  3. Command(s) run
qsub -I -l nodes=1:ppn=20,mem=200G
biscuit align -M -t 20 $ref 72_trimmed_R1.fq.gz 72_trimmed_R2.fq.gz > 72.sam
  1. Error given
    Process is killed (either by cluster scheduler or by Linux kernel) when RAM usage exceeds limit (200G in this instance).

Expected behavior
I'm not sure whether this represents a memory leak or just an edge case in which Biscuit is consuming a great deal of memory due to weirdnesses in the input file and the large reference.

Computer Resources

  • OS: CentOS 7, kernel 3.10.0-514.10.2.el7.x86_64
  • RAM: 376 G
  • CPUs: 40 hyperthreaded (virtual) cores

Additional context
I'm aware that memory usage scales with the # of cores used but I've successfully mapped much larger fastq files against the same reference using the same number of cores, so this issue was a surprise to me.

Screenshots
image

@bounlu
Copy link

bounlu commented Nov 7, 2023

I had the same issue for certain BAM files for the pileup process. The memory usage was rocket high for some samples, and I could not understand why. The server was out of memory due to these samples, although other times running many more samples have a very small memory usage. Certainly there is a bug somewhere.

@jamorrison
Copy link

@bounlu To make sure I'm understanding correctly, your issue was seeing high memory usage when running biscuit pileup? Or was it when running biscuit align?

@bounlu
Copy link

bounlu commented Nov 7, 2023

I never run biscuit align, I use Bismark aligned BAM files as the input so it was biscuit pileup.

@jamorrison
Copy link

Ah okay. That sounds like a slightly different issue to the one raised here. Do you minding opening a new issue and providing a small example that highlights the issue? It doesn't have to be something that causes the server to run out of memory, but something that at least shows the high memory usage you're seeing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants