-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive RAM usage #89
Comments
This happens when you have either (a) a large number of read-pairs that are extremely long distances apart, or (b) EXTREMELY high read density. Basically, as it parses the BAM file it keeps a running buffer of the read pairs that have not yet been matched. This works fine up until you have a huge number of reads in between any one read and its mate. It looks like it does Ok right up to the very end where it suddenly can't find any matches and just keeps loading more reads. What does the end of your BAM file look like? Do you have unmapped reads there or a huge number of reads that map to loose contigs or something? Do you have enormous numbers of reads on the MT chromosome? Are you studying a tissue with a ton of mitochondrial expression maybe? I'm not terribly surprised that downsampling causes the same issue. If you randomly downsample without making sure to keep paired reads matched up, then basically all your reads become pairless and QoRTs will try to read the entire uncompressed file into memory trying to find the missing reads. What happens if you feed it one complete chromosome? So like: samtools view -h sample.bam 1 > sample.chr1.bam Or maybe even handing it everything except MT? |
Thank you for you reply and insights. I don't really know much about this BAM, where reads are mapping to etc... which is why I am running QC on it :D The randomly downsampled BAM with 125M read pairs finally did manage to complete when given 512GB of RAM o_O Resource usage during the run. Complete run output
QoRTs multiplot on this sample. Many of the plots are empty. |
Hmm. Can you give me an ls of the output dir? And then check inside one of the files, say "QC.insert.size.txt.gz"? How did you do the downsampling? It may have had problems if the majority of the reads did not have matched pairs. Also: can you post the log? |
Output file list
WARN file says this:
Contents of QC.insert.size.txt
Subsampling was done as such:
I have two other samples (BAM files) which ran fine without downsampling or memory issues. They also had about 20% fewer reads. But, they also produced the warning about strand and many blank plots in the multi plots. So I am not sure if downsampling is the reason for this. It could be one of the many other issues that you mentioned. Here is the plotting script and log. Plot log
|
For the insert size, could you show the first 500 lines?
And could you maybe post the full QC.quals.r1.txt file? That one perplexes
me the most since none of this other stuff should affect it, it's dead
simple.
What version of R are you running? I haven't tested qorts on the newer
versions. It shouldn't make a difference but it's possible.
…On Fri, Feb 17, 2023, 10:41 AM Roy Francis ***@***.***> wrote:
Output file list
QC.biotypeCounts.txt
QC.chromCount.txt
QC.cigarLoci.deletionCounts.all.txt
QC.cigarLoci.deletionCounts.highCoverage.txt
QC.cigarLoci.insertionCounts.all.txt
QC.cigarLoci.insertionCounts.highCoverage.txt
QC.cigarOpDistribution.byReadCycle.R1.txt
QC.cigarOpDistribution.byReadCycle.R2.txt
QC.cigarOpLengths.byOp.R1.txt
QC.cigarOpLengths.byOp.R2.txt
QC.exonCounts.formatted.for.DEXSeq.txt
QC.FTnRrt5rbVMr.log
QC.gc.byPair.txt
QC.gc.byRead.txt
QC.gc.byRead.vsBaseCt.txt
QC.gc.R1.txt
QC.gc.R2.txt
QC.geneBodyCoverage.byExpr.avgPct.txt
QC.geneBodyCoverage.by.expression.level.txt
QC.geneBodyCoverage.genewise.txt
QC.geneCounts.formatted.for.DESeq.txt
QC.geneCounts.txt
QC.insert.size.byReadLen.txt
QC.insert.size.debug.dropped.txt
QC.insert.size.debug.txt
QC.insert.size.txt
QC.mismatchSizeRates.txt
QC.mismatchSummary.txt
QC.NVC.lead.clip.R1.txt
QC.NVC.lead.clip.R2.txt
QC.NVC.minus.clipping.R1.txt
QC.NVC.minus.clipping.R2.txt
QC.NVC.raw.R1.txt
QC.NVC.raw.R2.txt
QC.NVC.tail.clip.R1.txt
QC.NVC.tail.clip.R2.txt
QC.orderedChromList.txt
QC.overlapCoverage.txt
QC.overlapMismatch.byBase.txt
QC.overlapMismatch.byRead.txt
QC.overlapMismatch.byScoreAndBP.txt
QC.overlapMismatch.byScore.txt
QC.overlapMismatch.txt
QC.QORTS_COMPLETED_OK
QC.QORTS_COMPLETED_WARN
QC.QORTS_RUNNING
QC.quals.r1.txt
QC.quals.r2.txt
QC.readLenDist.txt
QC.referenceMismatch.byScoreAndBP.txt
QC.referenceMismatch.byScore.txt
QC.referenceMismatchCounts.txt
QC.referenceMismatchRaw.byReadStrand.txt
QC.spliceJunctionAndExonCounts.forJunctionSeq.txt
QC.spliceJunctionCounts.knownSplices.txt
QC.spliceJunctionCounts.novelSplices.txt
QC.summary.txt
QC.yX9gr2Yu8Jsk.log
Contents of *QC.insert.size.txt*
$ head sample-sub-qorts/QC.insert.size.txt
InsertSize Ct
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
$ tail sample-sub-qorts/QC.insert.size.txt
980064 1
1035994 1
1155833 1
1155846 3
1321162 1
1321165 2
1321172 2
1321176 1
1321183 1
1822321 1
Subsampling was done as such:
module load samtools/1.3
samtools view -b -s 0.6 sample.bam > sample-sub.bam
I have two other samples (BAM files) which ran fine without downsampling
or memory issues. They also had about 20% fewer reads. They also produced
several blank plots in the multi plots. So I am not sure if downsampling is
the reason for this.
Here is the plotting script and log.
Plot log
library(QoRTs)
res <- read.qc.results.data(infile.dir="data/raw/zumis/qorts/", decoder.files = "data/raw/zumis/qorts/decoder.txt",autodetectMissingSamples=TRUE)
column 'qc.data.prefix' not found in the decoder, assuming qc.data.prefix = ""
Note: no input.read.pair.count column found. This column is optional, but without it mapping rates cannot be calculated.
Note: no multi.mapped.read.pair.count column found. This column is optional, but without it (depending on how your aligner implements multi-mapping) multi-mapping rates might not be plotted.
infile.dir = data/raw/zumis/qorts/
scalaqc_file = QC.summary.txt.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Autodetected Paired-End mode.
(File 1 of 43): QC.gc.byPair.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 2 of 43): QC.gc.byRead.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 3 of 43): QC.gc.byRead.vsBaseCt.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 4 of 43): QC.quals.r1.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 5 of 43): QC.quals.r2.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 6 of 43): QC.cigarOpDistribution.byReadCycle.R1.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 7 of 43): QC.cigarOpDistribution.byReadCycle.R2.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 8 of 43): QC.cigarOpLengths.byOp.R1.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0.02 secs]
(File 9 of 43): QC.cigarOpLengths.byOp.R2.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0.02 secs]
(File 10 of 43): QC.geneBodyCoverage.by.expression.level.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 11 of 43): QC.geneCounts.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0.04 secs]
(File 12 of 43): QC.insert.size.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0.05 secs]
(File 13 of 43): QC.NVC.raw.R1.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 14 of 43): QC.NVC.raw.R2.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 15 of 43): QC.NVC.lead.clip.R1.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0.02 secs]
(File 16 of 43): QC.NVC.lead.clip.R2.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0.04 secs]
(File 17 of 43): QC.NVC.tail.clip.R1.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0.03 secs]
(File 18 of 43): QC.NVC.tail.clip.R2.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0.03 secs]
(File 19 of 43): QC.NVC.minus.clipping.R1.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 20 of 43): QC.NVC.minus.clipping.R2.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 21 of 43): QC.chromCount.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 22 of 43): QC.biotypeCounts.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 23 of 43): QC.geneBodyCoverage.byExpr.avgPct.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 24 of 43): QC.overlapCoverage.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 25 of 43): QC.overlapMismatch.byRead.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 26 of 43): QC.overlapMismatch.byScore.txt.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 27 of 43): QC.overlapMismatch.byBase.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 28 of 43): QC.overlapMismatch.byScoreAndBP.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0.03 secs]
(File 29 of 43): QC.readLenDist.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 30 of 43): QC.referenceMismatchCounts.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 31 of 43): QC.referenceMismatchRaw.byReadStrand.txt.done.
[time: 2023-02-17 11:27:09],[elapsed: 0.01 secs]
(File 32 of 43): QC.referenceMismatch.byScore.txt.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 33 of 43): QC.referenceMismatch.byScoreAndBP.txt.done.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 34 of 43): QC.mismatchSizeRates.txt.gz.done.
[time: 2023-02-17 11:27:09],[elapsed: 0.01 secs]
(File 35 of 43): QC.FQ.gc.byRead.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.gc.byRead.txt.gz. Skipping tests that use this data.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 36 of 43): QC.FQ.gc.byPair.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.gc.byPair.txt.gz. Skipping tests that use this data.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 37 of 43): QC.FQ.gc.R1.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.gc.R1.txt.gz. Skipping tests that use this data.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 38 of 43): QC.FQ.gc.R2.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.gc.R2.txt.gz. Skipping tests that use this data.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 39 of 43): QC.FQ.NVC.R1.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.NVC.R1.txt.gz. Skipping tests that use this data.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 40 of 43): QC.FQ.NVC.R2.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.NVC.R2.txt.gz. Skipping tests that use this data.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 41 of 43): QC.FQ.quals.r1.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.quals.r1.txt.gz. Skipping tests that use this data.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 42 of 43): QC.FQ.quals.r2.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.quals.r2.txt.gz. Skipping tests that use this data.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
(File 43 of 43): QC.FQ.readLenDist.txt.gzFailed: Cannot find file: data/raw/zumis/qorts/30dpf-sub-qorts/QC.FQ.readLenDist.txt.gz. Skipping tests that use this data.
[time: 2023-02-17 11:27:09],[elapsed: 0 secs]
calculating secondary data:
Calculating Quality Score Rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating cumulative gene coverage, by replicate...done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs]
Calculating cumulative gene coverage, by sample...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating Mapping Rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
calculating normalization factors, by sample...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
calculating normalization factors, by replicate...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
calculating normalization factors, by sample/replicate...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating summary stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs]
Calculating overlap mismatch-size rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating cumulative overlap mismatch-size rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0.03 secs]
Calculating overlap coverage Rates...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating overlap coverage Rates By Read...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating read length distribution...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating overlap by AVG score...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating overlap by MIN score...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Adding Min score error to summary tables...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating overlap by R1 score...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating overlap by R2 score...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating referenceMismatchCounts stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating referenceMismatch.byScore stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating referenceMismatchRaw.byReadStrand stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs]
Calculating referenceMismatch.byScoreAndBP stats...done. [time: 2023-02-17 11:27:09],[elapsed: 0.01 secs]
Calculating summary table...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating overlap mismatch combos...Calculating mismatch combo rates:...done. [time: 2023-02-17 11:27:09],[elapsed: 0 secs]
Calculating overlapMismatch.byScoreAndBP stats...done. [time: 2023-02-17 11:27:10],[elapsed: 0.55 secs]
done. [time: 2023-02-17 11:27:10],[elapsed: 0.56 secs]
Calculating NVC rates...done. [time: 2023-02-17 11:27:10],[elapsed: 0.05 secs]
done.
[time: 2023-02-17 11:27:10],[elapsed: 0.69 secs]
Skipping: "onTarget.rates","onTarget.counts","overlap.mismatch.byAvgQual"
Rasterize large plots: FALSE
Rasterize medium plots: FALSE
Skipping due to missing data: "mapping.rates","norm.factors","norm.vs.TC"
Plotting to the currently-open device...
Plotting extended...
Starting compiled plot...
null device
1
—
Reply to this email directly, view it on GitHub
<#89 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAWC53H52RP63GLRGUGVS4DWX6L2BANCNFSM6AAAAAAU4SENCU>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
500 lines of insert size Insert Size
QC.quals.r1
|
I have a 25 GB BAM file with about 400 million PE reads coming from the zUMIs pipeline. Single-cell SMART-Seq3 RNA-Seq reads with UMIs. I am running QoRTs QC on this and I am running into out of memory. I tried providing 128GB RAM and then raised it to 256GB and I still get the same error. Is it reasonable that more than 256GB RAM might be needed for a BAM file of this size?
This is my script.
In the output folder I get these two files:
QC.QORTS_RUNNING QC.yX9gr2Yu8Jsk.log
I randomly downsampled this BAM to a 15GB BAM to test and I still get the same error. I am starting to suspect it's not just the number of reads.
Complete run output
BAM preview
The text was updated successfully, but these errors were encountered: