Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"IMMPOSSIBLE STATE! FATAL ERROR!" with Hisat2 data - Bioconda installation #79

Open
J-Calvelo opened this issue Apr 23, 2020 · 5 comments

Comments

@J-Calvelo
Copy link

Hello, I was trying to get splice junction counts from a Hisat2 alignment with the following line:
qorts QC --maxReadLength 101 --nameSorted --minMAPQ 50 test.bam wombase.gtf Count_Outputs

And got the following error:

Finished` reading SAM. Read: 11186385 reads/read-pairs.
Finished reading SAM. Used: 11039996 reads/read-pairs.
[Time: 2020-04-23 19:48:26] [Mem usage: [723MB / 1062MB]] [Elapsed Time: 00:22:50.0545]
> Read Stats:
>   READ_PAIR_OK                   11039996
>   TOTAL_READ_PAIRS               11186385
>   DROPPED_NOT_PROPER_PAIR        146389
>   DROPPED_READ_FAILS_VENDOR_QC   0
>   DROPPED_MARKED_NOT_VALID       0
>   DROPPED_CHROMS_MISMATCH        0
>   DROPPED_PAIR_STRANDS_MISMATCH  0
>   DROPPED_IGNORED_CHROMOSOME     0
>   DROPPED_NOT_UNIQUE_ALIGNMENT   0
>   DROPPED_NO_ALN_BLOCKS   0
>   DROPPED_NOT_MARKED_RG   -1
Pre-alignment read count unknown (Set --seqReadCt or --rawfastq)
Writing Output...
<====== FATAL ERROR! ======>
----------------------------
     Error message: "IMMPOSSIBLE STATE! FATAL ERROR! qcJunctionCounts.writeOutput, writing forSpliceSeq"
     Stack Trace:
        java.base/java.lang.Thread.getStackTrace(Thread.java:1606)
        internalUtils.Reporter$.error(Reporter.scala:294)
        qcUtils.qcJunctionCounts.$anonfun$writeOutput$3(qcJunctionCounts.scala:203)
        qcUtils.qcJunctionCounts.$anonfun$writeOutput$3$adapted(qcJunctionCounts.scala:194)
        scala.collection.Iterator.foreach(Iterator.scala:929)
        scala.collection.Iterator.foreach$(Iterator.scala:929)
        scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
        scala.collection.IterableLike.foreach(IterableLike.scala:71)
        scala.collection.IterableLike.foreach$(IterableLike.scala:70)
        scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        qcUtils.qcJunctionCounts.writeOutput(qcJunctionCounts.scala:194)
        qcUtils.runAllQC$.$anonfun$runOnSeqFile$8(runAllQC.scala:1410)
        qcUtils.runAllQC$.$anonfun$runOnSeqFile$8$adapted(runAllQC.scala:1410)
        scala.collection.Iterator.foreach(Iterator.scala:929)
        scala.collection.Iterator.foreach$(Iterator.scala:929)
        scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
        scala.collection.IterableLike.foreach(IterableLike.scala:71)
        scala.collection.IterableLike.foreach$(IterableLike.scala:70)
        scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        qcUtils.runAllQC$.runOnSeqFile(runAllQC.scala:1410)
        qcUtils.runAllQC$.run(runAllQC.scala:960)
        qcUtils.runAllQC$allQC_runner.run(runAllQC.scala:672)
        runner.runner$.main(runner.scala:97)
        runner.runner.main(runner.scala)
<==========================>
Exception in thread "main" java.lang.Exception: IMMPOSSIBLE STATE! FATAL ERROR! qcJunctionCounts.writeOutput, writing forSpliceSeq
	at internalUtils.Reporter$.error(Reporter.scala:299)
	at qcUtils.qcJunctionCounts.$anonfun$writeOutput$3(qcJunctionCounts.scala:203)
	at qcUtils.qcJunctionCounts.$anonfun$writeOutput$3$adapted(qcJunctionCounts.scala:194)
	at scala.collection.Iterator.foreach(Iterator.scala:929)
	at scala.collection.Iterator.foreach$(Iterator.scala:929)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
	at scala.collection.IterableLike.foreach(IterableLike.scala:71)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at qcUtils.qcJunctionCounts.writeOutput(qcJunctionCounts.scala:194)
	at qcUtils.runAllQC$.$anonfun$runOnSeqFile$8(runAllQC.scala:1410)
	at qcUtils.runAllQC$.$anonfun$runOnSeqFile$8$adapted(runAllQC.scala:1410)
	at scala.collection.Iterator.foreach(Iterator.scala:929)
	at scala.collection.Iterator.foreach$(Iterator.scala:929)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1417)
	at scala.collection.IterableLike.foreach(IterableLike.scala:71)
	at scala.collection.IterableLike.foreach$(IterableLike.scala:70)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at qcUtils.runAllQC$.runOnSeqFile(runAllQC.scala:1410)
	at qcUtils.runAllQC$.run(runAllQC.scala:960)
	at qcUtils.runAllQC$allQC_runner.run(runAllQC.scala:672)
	at runner.runner$.main(runner.scala:97)
	at runner.runner.main(runner.scala)

Not sure of what could have caused it. If its informative, I'm studying SL trans-splicing and for this alignments I removed the splicer leader sequences prior to the alignment so reads were hard trimmed.

Thanks

@hartleys
Copy link
Owner

hartleys commented Apr 23, 2020 via email

@J-Calvelo
Copy link
Author

Seems to be consistent and happens with other samples. Maybe there is something wrong with the conda version?
thanks
QC.5nZzr79gsrA4.log

@hartleys
Copy link
Owner

It looks like your flattened GFF has some sort of malformation.

This can happen if your input GTF has special characters in one or more of the gene IDs. In particular the colon character ":" will break geneIDs.

Could you send the GTF file? Or maybe just check the geneIDs?

@J-Calvelo
Copy link
Author

Sorry for the delay, that seems to be the issue. I got the original GFF3 from wormbase and then processed it with gffread.


HMN_01_pilon	WormBase_imported	exon	66428	68427	.	-	.	transcript_id "transcript:HmN_000001200"; gene_id "gene:HmN_000001200"; gene_name "HmN_000001200"; Name "HmN_000001200"; info "method:InterPro accession:IPR004709 description:Na+/H+ exchanger %0Amethod:InterPro accession:IPR006153 description:Cation/H+ exchanger %0Amethod:InterPro accession:IPR018422 description:Cation/H+ exchanger%2C CPA1 family"; biotype "protein_coding";
HMN_01_pilon	WormBase_imported	exon	68655	68661	.	-	.	transcript_id "transcript:HmN_000001200"; gene_id "gene:HmN_000001200"; gene_name "HmN_000001200"; Name "HmN_000001200"; info "method:InterPro accession:IPR004709 description:Na+/H+ exchanger %0Amethod:InterPro accession:IPR006153 description:Cation/H+ exchanger %0Amethod:InterPro accession:IPR018422 description:Cation/H+ exchanger%2C CPA1 family"; biotype "protein_coding";
HMN_01_pilon	WormBase_imported	CDS	66428	68427	.	-	2	transcript_id "transcript:HmN_000001200"; gene_id "gene:HmN_000001200"; gene_name "HmN_000001200"; Name "HmN_000001200"; info "method:InterPro accession:IPR004709 description:Na+/H+ exchanger %0Amethod:InterPro accession:IPR006153 description:Cation/H+ exchanger %0Amethod:InterPro accession:IPR018422 description:Cation/H+ exchanger%2C CPA1 family"; biotype "protein_coding";
HMN_01_pilon	WormBase_imported	CDS	68655	68661	.	-	0	transcript_id "transcript:HmN_000001200"; gene_id "gene:HmN_000001200"; gene_name "HmN_000001200"; Name "HmN_000001200"; info "method:InterPro accession:IPR004709 description:Na+/H+ exchanger %0Amethod:InterPro accession:IPR006153 description:Cation/H+ exchanger %0Amethod:InterPro accession:IPR018422 description:Cation/H+ exchanger%2C CPA1 family"; biotype "protein_coding";
HMN_01_pilon	WormBase_imported	exon	71217	71579	.	-	.	transcript_id "transcript:HmN_000001100"; gene_id "gene:HmN_000001100"; gene_name "HmN_000001100"; Name "HmN_000001100"; info "method:InterPro accession:IPR008949 description:Isoprenoid synthase domain superfamily"; biotype "protein_coding";
HMN_01_pilon	WormBase_imported	exon	71618	71754	.	-	.	transcript_id "transcript:HmN_000001100"; gene_id "gene:HmN_000001100"; gene_name "HmN_000001100"; Name "HmN_000001100"; info "method:InterPro accession:IPR008949 description:Isoprenoid synthase domain superfamily"; biotype "protein_coding";
HMN_01_pilon	WormBase_imported	exon	71848	71950	.	-	.	transcript_id "transcript:HmN_000001100"; gene_id "gene:HmN_000001100"; gene_name "HmN_000001100"; Name "HmN_000001100"; info "method:InterPro accession:IPR008949 description:Isoprenoid synthase domain superfamily"; biotype "protein_coding";
HMN_01_pilon	WormBase_imported	exon	72024	72233	.	-	.	transcript_id "transcript:HmN_000001100"; gene_id "gene:HmN_000001100"; gene_name "HmN_000001100"; Name "HmN_000001100"; info "method:InterPro accession:IPR008949 description:Isoprenoid synthase domain superfamily"; biotype "protein_coding";
HMN_01_pilon	WormBase_imported	exon	72640	72739	.	-	.	transcript_id "transcript:HmN_000001100"; gene_id "gene:HmN_000001100"; gene_name "HmN_000001100"; Name "HmN_000001100"; info "method:InterPro accession:IPR008949 description:Isoprenoid synthase domain superfamily"; biotype "protein_coding";
HMN_01_pilon	WormBase_imported	exon	72878	72883	.	-	.	transcript_id "transcript:HmN_000001100"; gene_id "gene:HmN_000001100"; gene_name "HmN_000001100"; Name "HmN_000001100"; info "method:InterPro accession:IPR008949 description:Isoprenoid synthase domain superfamily"; biotype "protein_coding";

Thanks

@hartleys
Copy link
Owner

Ah. Ok. The problem is that colons are being used in the transcriptIDs and geneIDs. This problem is a carryover from the way DEXSeq generates UIDs for each entry. I thought I built in a check for this but apparently not.

There are a couple different options. Easiest option would be to do a text-replace for the colons in the GFF3:

zcat mygff.gff.gz | sed 's/[:]/-/g' > myFixedGff.gff.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants