Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error parsing SAM header #131

Closed
tgong1 opened this issue Apr 17, 2018 · 10 comments
Closed

Error parsing SAM header #131

tgong1 opened this issue Apr 17, 2018 · 10 comments

Comments

@tgong1
Copy link

tgong1 commented Apr 17, 2018

Hi,

Could you give me some ideas about this error message?
It look like it is generating this file:
gridss.tmp.extracted.Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.sv.bam

INFO 2018-04-17 13:01:56 ByReadNameSinglePassSamProgram Processed 464,000,000 records. Elapsed time: 00:16:46s. Time for last 1,000,000: 4s. Last read position: chr4:34,601,504
[Tue Apr 17 13:01:56 AEST 2018] gridss.SoftClipsToSplitReads done. Elapsed time: 14.75 minutes.
Runtime.totalMemory()=14218166272
ERROR 2018-04-17 13:01:56 CallVariants Fatal exception thrown by worker thread.
htsjdk.samtools.SAMFormatException: Error parsing SAM header. @sq line missing LN tag. Line:
@sq SN:HLA-C*16:01:; ; Line number 3286
at htsjdk.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:265)
at htsjdk.samtools.SAMTextHeaderCodec.access$200(SAMTextHeaderCodec.java:43)
at htsjdk.samtools.SAMTextHeaderCodec$ParsedHeaderLine.requireTag(SAMTextHeaderCodec.java:346)
at htsjdk.samtools.SAMTextHeaderCodec.parseSQLine(SAMTextHeaderCodec.java:215)
at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:111)
at htsjdk.samtools.SAMTextReader.readHeader(SAMTextReader.java:216)
at htsjdk.samtools.SAMTextReader.(SAMTextReader.java:63)
at htsjdk.samtools.SAMTextReader.(SAMTextReader.java:73)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:413)
at au.edu.wehi.idsv.alignment.ExternalProcessFastqAligner.align(ExternalProcessFastqAligner.java:46)
at au.edu.wehi.idsv.SplitReadRealigner.createSupplementaryAlignments(SplitReadRealigner.java:218)
at gridss.SoftClipsToSplitReads.doWork(SoftClipsToSplitReads.java:79)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
at au.edu.wehi.idsv.SAMEvidenceSource.execute(SAMEvidenceSource.java:156)
at au.edu.wehi.idsv.SAMEvidenceSource.ensureExtracted(SAMEvidenceSource.java:222)
at gridss.CallVariants$1$1.call(CallVariants.java:62)
at gridss.CallVariants$1$1.call(CallVariants.java:51)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Thank you,
Tingting

@d-cameron
Copy link
Member

htsjdk.samtools.SAMFormatException: Error parsing SAM header. @sq line missing LN tag. Line:
@sq SN:HLA-C*16:01:; ; Line number 3286

htsjdk (the library I am using to parse the SAM) considers your input file to have a malformed header. Running Picard tools ValidateSamFile should help you identify the error.

It should be noted that your input BAM contains contigs with characters which will produce an invalid VCF file. See samtools/hts-specs#124 for discussion regarding the issues with hg39 HLA contig names in the SAM and BAM specifications.

@tgong1
Copy link
Author

tgong1 commented Apr 19, 2018

Hi Daniel,

Thank you for your reply. I'll use Picard ValidateSamFile to check the error and figure out how to correct that,or in my case, maybe simply use reads with reference sequence name SN:chr1-22,X,Y.

Before trying that, I got some error messages for another GRIDSS run on the same input.bam, except use version 1.5.1 instead of 1.5.0 and use Blacklist.

Could you give me some ideas of the following errors? Are those due to the same reason - malformed header in input BAM?

main] CMD: bwa mem -t 64 /share/ClusterShare/biodata/contrib/weejar/gatk_hg38full_alt/Homo_sapiens_assembly38.fasta /share/ScratchGeneral/tingon/MEGAN/Patient_19651/GRIDSS_1.5.1_QRP19651/./Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.gridss.working/Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.realign.0.fq
[main] Real time: 956.889 sec; CPU: 22522.990 sec
Exception in thread "pool-61-thread-1" Exception in thread "Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.realign.0.bam" java.lang.RuntimeException: htsjdk.samtools.util.RuntimeIOException: Read error; BinaryCodec in readmode; file: /share/ScratchGeneral/tingon/MEGAN/Patient_19651/GRIDSS_1.5.1_QRP19651/./Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.gridss.working/Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.realign.0.bam
at au.edu.wehi.idsv.util.AsyncBufferedIterator$ReaderRunnable.run(AsyncBufferedIterator.java:147)
at java.lang.Thread.run(Thread.java:745)
Caused by: htsjdk.samtools.util.RuntimeIOException: Read error; BinaryCodec in readmode; file: /share/ScratchGeneral/tingon/MEGAN/Patient_19651/GRIDSS_1.5.1_QRP19651/./Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.gridss.working/Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.realign.0.bam
at htsjdk.samtools.util.BinaryCodec.readBytesOrFewer(BinaryCodec.java:420)
at htsjdk.samtools.util.BinaryCodec.readBytes(BinaryCodec.java:394)
at htsjdk.samtools.util.BinaryCodec.readByteBuffer(BinaryCodec.java:504)
at htsjdk.samtools.util.BinaryCodec.readInt(BinaryCodec.java:515)
at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:198)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:829)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:803)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:797)
at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:765)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:576)
at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:548)
at au.edu.wehi.idsv.util.AsyncBufferedIterator$ReaderRunnable.run(AsyncBufferedIterator.java:124)
... 1 more
Caused by: java.io.IOException: Unexpected compressed block length: 1 for /share/ScratchGeneral/tingon/MEGAN/Patient_19651/GRIDSS_1.5.1_QRP19651/./Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.gridss.working/Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.realign.0.bam
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:522)
at htsjdk.samtools.util.AsyncBlockCompressedInputStream$AsyncBlockCompressedInputStreamRunnable.run(AsyncBlockCompressedInputStream.java:225)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
htsjdk.samtools.util.RuntimeIOException: /share/ScratchGeneral/tingon/MEGAN/Patient_19651/GRIDSS_1.5.1_QRP19651/./Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.gridss.working/Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.realign.0.bam has invalid uncompressedLength: -100885512
at htsjdk.samtools.util.BlockCompressedInputStream.inflateBlock(BlockCompressedInputStream.java:543)
at htsjdk.samtools.util.BlockCompressedInputStream.processNextBlock(BlockCompressedInputStream.java:532)
at htsjdk.samtools.util.AsyncBlockCompressedInputStream$AsyncBlockCompressedInputStreamRunnable.run(AsyncBlockCompressedInputStream.java:225)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Thank you,
Tingting

@d-cameron
Copy link
Member

d-cameron commented Apr 19, 2018

htsjdk.samtools.util.RuntimeIOException: .......bam has invalid uncompressedLength: -10088551

This error message definitely looks like a file corruption/truncation issue.

@tgong1
Copy link
Author

tgong1 commented Apr 19, 2018

Hi Daniel,

Thank you for the reply.
The following is the script I use. Could you suggest some ways to resolve the file corruption/truncation issue?
I downloaded the latest release 1.5.1 from https://github.com/PapenfussLab/gridss/releases instead of using mvn clean package, which will give me gridss/target/gridss-1.5.0-jar-with-dependencies.jar.

Thank you,
Tingting

@d-cameron
Copy link
Member

Sorry of the delay in responding.

Could you suggest some ways to resolve the file corruption/truncation issue?

Does this issue occur consistently or is it a transient error?

Another option is that there's a fatal exception on one of the worker threads that isn't being handled correctly. Are there any ERRORs buried further up in the log file? In some cases GRIDSS waits for the other worker threads to complete gracefully instead of killing the entire program immediately. An unfortunately side effect of this is that the root cause exception sometimes isn't anywhere near the end of the log file.

@tgong1
Copy link
Author

tgong1 commented Apr 27, 2018

Thank you for the reply.

This issue also occurred when I run GRIDSS with another tumour sample (call it sample2), which is the one I got PositionalAssembler error (issue #129 ) in another GRIDSS run. I didn't see any error before this in the log file.

For error of parsing SAM header:
htsjdk.samtools.SAMFormatException: Error parsing SAM header. @sq line missing LN tag. Line:
@sq SN:HLA-C*16:01:; ; Line number 3286

I checked the header on line 3286:
@sq SN:HLA-C16:01:01 LN:3349 AH:
It has LN tag. Why it says missing LN tag?

I run Picard tools ValidateSamFile and it gave me the errors, not about the malformed header or contig names:
ERROR:MATE_CIGAR_STRING_INVALID_PRESENCE

Thanks,
Tingting

@d-cameron
Copy link
Member

It has LN tag. Why it says missing LN tag?

It may not be the input file that is missing the LN tag. There could be a bug in htsjdk that causes it to not handle : in header correctly so the malformed header is occuring in on the GRIDSS intermediate files. Can you diff the SAM headers in question to see if they are the same?

diff <(samtools view -H input.bam) <(samtools view -H WORKING_DIR/input.bam.gridss.working/input.bam.sv.bam)

@d-cameron
Copy link
Member

d-cameron commented Apr 30, 2018

htsjdk.samtools.SAMFormatException: Error parsing SAM header. @sq line missing LN tag. Line:
@sq SN:HLA-C*16:01:; ; Line number 3286
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:413)
at au.edu.wehi.idsv.alignment.ExternalProcessFastqAligner.align(ExternalProcessFastqAligner.java:46)

Actually, this SAM parsing error is coming from bwa, not the input file. GRIDSS reads the results of bwa directly so this output never hits the file system until after successful parsing and conversion to BAM. If you look at the GRIDSS log file, you'll see the command line arguments passed to bwa. If you rerun bwa manually, does the output lack the LN tag? What version of bwa do you have on PATH? It could be an issue with htsjdk not like the headers written by bwa in ALT mapping mode.

@d-cameron
Copy link
Member

I'm struggling to reproduce the errors you are encounter. My test case with hg38 bwa ALT contigs seem s to work fine on my system. The fact that it only occurs with some samples make me suspect that it might be something with your environment that's causing these issues.

Does this issue occur consistently or is it a transient error? Does the same sample always die at the same place? If you delete all GRIDSS intermediate files does GRIDSS die in the same location?

Is there any resource utilisation/quota enforced on your system? Could it be that processes spawned by GRIDSS are getting killed thus giving these strange intermittent errors?

@tgong1
Copy link
Author

tgong1 commented May 7, 2018

Thank you very much for the help.
It is the memory limit issue. No SAM parsing error anymore. However I still have the truncation error.
I will close this issue and raise another issue about the truncation error maybe.
Thank you again for the help.

@tgong1 tgong1 closed this as completed May 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants