-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error parsing SAM header #131
Comments
htsjdk (the library I am using to parse the SAM) considers your input file to have a malformed header. Running Picard tools ValidateSamFile should help you identify the error. It should be noted that your input BAM contains contigs with characters which will produce an invalid VCF file. See samtools/hts-specs#124 for discussion regarding the issues with hg39 HLA contig names in the SAM and BAM specifications. |
Hi Daniel, Thank you for your reply. I'll use Picard ValidateSamFile to check the error and figure out how to correct that,or in my case, maybe simply use reads with reference sequence name SN:chr1-22,X,Y. Before trying that, I got some error messages for another GRIDSS run on the same input.bam, except use version 1.5.1 instead of 1.5.0 and use Blacklist. Could you give me some ideas of the following errors? Are those due to the same reason - malformed header in input BAM? main] CMD: bwa mem -t 64 /share/ClusterShare/biodata/contrib/weejar/gatk_hg38full_alt/Homo_sapiens_assembly38.fasta /share/ScratchGeneral/tingon/MEGAN/Patient_19651/GRIDSS_1.5.1_QRP19651/./Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.gridss.working/Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.realign.0.fq Thank you, |
This error message definitely looks like a file corruption/truncation issue. |
Hi Daniel, Thank you for the reply. Thank you, |
Sorry of the delay in responding.
Does this issue occur consistently or is it a transient error? Another option is that there's a fatal exception on one of the worker threads that isn't being handled correctly. Are there any |
Thank you for the reply. This issue also occurred when I run GRIDSS with another tumour sample (call it sample2), which is the one I got PositionalAssembler error (issue #129 ) in another GRIDSS run. I didn't see any error before this in the log file. For error of parsing SAM header: I checked the header on line 3286: I run Picard tools ValidateSamFile and it gave me the errors, not about the malformed header or contig names: Thanks, |
It may not be the input file that is missing the LN tag. There could be a bug in htsjdk that causes it to not handle
|
Actually, this SAM parsing error is coming from |
I'm struggling to reproduce the errors you are encounter. My test case with hg38 bwa ALT contigs seem s to work fine on my system. The fact that it only occurs with some samples make me suspect that it might be something with your environment that's causing these issues. Does this issue occur consistently or is it a transient error? Does the same sample always die at the same place? If you delete all GRIDSS intermediate files does GRIDSS die in the same location? Is there any resource utilisation/quota enforced on your system? Could it be that processes spawned by GRIDSS are getting killed thus giving these strange intermittent errors? |
Thank you very much for the help. |
Hi,
Could you give me some ideas about this error message?
It look like it is generating this file:
gridss.tmp.extracted.Clean3_merged34_171108_FD00826235_gatkfull.hg38alt_postalt_fix_kmer_q15_TrimN_N0_L70_dedup.realigned.recalibrated_sort2_dedup2.realigned2.knownonly.realign.bam.sv.bam
INFO 2018-04-17 13:01:56 ByReadNameSinglePassSamProgram Processed 464,000,000 records. Elapsed time: 00:16:46s. Time for last 1,000,000: 4s. Last read position: chr4:34,601,504
[Tue Apr 17 13:01:56 AEST 2018] gridss.SoftClipsToSplitReads done. Elapsed time: 14.75 minutes.
Runtime.totalMemory()=14218166272
ERROR 2018-04-17 13:01:56 CallVariants Fatal exception thrown by worker thread.
htsjdk.samtools.SAMFormatException: Error parsing SAM header. @sq line missing LN tag. Line:
@sq SN:HLA-C*16:01:; ; Line number 3286
at htsjdk.samtools.SAMTextHeaderCodec.reportErrorParsingLine(SAMTextHeaderCodec.java:265)
at htsjdk.samtools.SAMTextHeaderCodec.access$200(SAMTextHeaderCodec.java:43)
at htsjdk.samtools.SAMTextHeaderCodec$ParsedHeaderLine.requireTag(SAMTextHeaderCodec.java:346)
at htsjdk.samtools.SAMTextHeaderCodec.parseSQLine(SAMTextHeaderCodec.java:215)
at htsjdk.samtools.SAMTextHeaderCodec.decode(SAMTextHeaderCodec.java:111)
at htsjdk.samtools.SAMTextReader.readHeader(SAMTextReader.java:216)
at htsjdk.samtools.SAMTextReader.(SAMTextReader.java:63)
at htsjdk.samtools.SAMTextReader.(SAMTextReader.java:73)
at htsjdk.samtools.SamReaderFactory$SamReaderFactoryImpl.open(SamReaderFactory.java:413)
at au.edu.wehi.idsv.alignment.ExternalProcessFastqAligner.align(ExternalProcessFastqAligner.java:46)
at au.edu.wehi.idsv.SplitReadRealigner.createSupplementaryAlignments(SplitReadRealigner.java:218)
at gridss.SoftClipsToSplitReads.doWork(SoftClipsToSplitReads.java:79)
at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:268)
at au.edu.wehi.idsv.SAMEvidenceSource.execute(SAMEvidenceSource.java:156)
at au.edu.wehi.idsv.SAMEvidenceSource.ensureExtracted(SAMEvidenceSource.java:222)
at gridss.CallVariants$1$1.call(CallVariants.java:62)
at gridss.CallVariants$1$1.call(CallVariants.java:51)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Thank you,
Tingting
The text was updated successfully, but these errors were encountered: