Skip to content
This repository has been archived by the owner on Jul 17, 2023. It is now read-only.

Manta breaks on getAlignmentStats_generateStats_000 with no reads passing filters #118

Open
mmterpstra opened this issue Jan 25, 2018 · 3 comments

Comments

@mmterpstra
Copy link

For a project I work with the single primer enrichment protocol for RNA-seq and this crashed manta. See below for 1. The error 2. Reads in the bam file as examples. The alignment is done with hisat2 and the TLEN was recalculated [picard] because this should be in reference genome space not transcript space according to the SAM spec. Other operations are calculations of MC tags [picard] and the Hard clipping of primers used in the enrichment [custom script].

Please see if you can find the problem / solution with the dataset / manta.

(PS I removed some things by filling in 'some' so the read group is actually empty)

[2018-01-25T11:19:26.115307] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [getAlignmentStats_generateStats_000] Error Message:
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [getAlignmentStats_generateStats_000] Last 16 stderr lines from task (of 16 total lines):
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.127955] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] FATAL_ERROR: GetAlignmentStats EXCEPTION: 2018-Jan-25 12:19:24: Operation not permitted: /data/umcg-mterpstra/apps/.tmp/easybuild/builds/manta/1.2.2/foss-2016a/manta-1.2.2.release_src/src/c++/lib/manta/ReadGroupStatsUtil.cpp(247): Throw in function void ReadGroupOrientTracker::finalize(const ReadCounter&)
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.136997] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] Dynamic exception type: boost::exception_detail::clone_impl<illumina::common::LogicException>
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.144773] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] std::exception::what: ERROR: Too few high-confidence read pairs (0) to determine pair orientation for read group '' in bam file 'some.bam'
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.152797] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] 	At least 100 high-confidence read pairs are required to determine pair orientation.
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.161261] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] 	Total sampled reads: 1860092
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.169247] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] 	Total sampled paired reads: 1860092
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.176785] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] 	Total sampled paired reads passing MAPQ filter: 1807934
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.183843] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] 	Total sampled high-confidence read pairs passing all filters: 0
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.190831] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] 
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.198795] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] 
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.206779] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] 
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.214872] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] ...caught in program.run()
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.221775] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] cmdline: /data/umcg-mterpstra/apps/software/manta/1.2.2-foss-2016a/libexec/GetAlignmentStats --ref /data/umcg-mterpstra/apps/data/ftp.broadinstitute.org/bundle/2.8/b37/human_g1k_v37_decoy.fasta --output-file /scratch/umcg-mterpstra/projects/some/workspace/alignmentStats.xml.tmpdir/alignmentStats.xml.000.xml --align-file some.bam
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.228779] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] version:	1.2.2
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.235830] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] buildTime:	2018-01-24T14:53:02.855215Z
[2018-01-25T11:19:26.131152] [peregrine.hpc.rug.nl] [106385_1] [WorkflowRunner] [ERROR] [2018-01-25T11:19:24.242850] [peregrine.hpc.rug.nl] [106385_1] [getAlignmentStats_generateStats_000] compiler:	g++-4.9.3
#header:
@RG	ID:1dcde81ed37ff75c4132265ab5c56e9a	LB:some_Exceptional_Prep_kit	PL:illumina	SM:some	PU:some_0426_1_CGGAGTAT	DT:2018-01-12T01:00:00+0100
@PG	ID:hisat2	PN:hisat2	VN:2.0.4	CL:"/software/software/hisat2/2.0.4-foss-2016a-Python-2.7.11/bin/hisat2-align-s --wrapper basic-0 -x /data/umcg-mterpstra/apps/data//ftp.broadinstitute.org/bundle/2.8/b37//hisat2/2.0.3-beta-goolf-1.7.20/human_g1k_v37_decoy --known-splicesite-infile /data/umcg-mterpstra/apps/data//ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/hisat2-2.0.3-beta-foss-2016a/Homo_sapiens.GRCh37.75.splicesites.txt --score-min L,0,-0.6 --sp 1,1.5 -D 20 -R 3 -S /dev/stdout --threads 1 -1 /tmp/71138.inpipe1 -2 /tmp/71138.inpipe2"
#non target
some:1:1110:11334:22774	1123	1	1480325	60	58M19770N93M	=	1500235	29587	CCGGGGCTTGATTCTCTTATTTCTGTCCAGCATATGTAAAATCCCATTCTGTGTATAGAGTTCTTTGTCTTTCCTAAGAAGATCACTGTACATCTGGTCATATGTGGTTTTGAAATCATAAACATTGGGCTTGTCGGGAGCTGGTCCTGGA	CCCCCBCCABFFGGGGGGGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHGHHHHHHHHIFHHHHHHHHHHHHHHHHHGHHHHGHHHHHHHHHFGHHHGGHHHHHHHHHHGHGHGHHHHGHHFHHFHHHHHHGHGHHGGCCCGHHEFHBHHGF	MC:Z:62M9561N54M19S	RG:Z:1dcde81ed37ff75c4132265ab5c56e9a	MI:Z:GGGAAT	MQ:i:60	YT:Z:CP	NU:Z:.	RX:Z:GGGAAT
some:1:1110:11334:22774	147	1	1500235	60	62M9561N54M19S	=	1480325	-29587	CTGGTCCTGGAAGCTTCACGTGAGTCCCTGTTCCAAAGGATCGGACGCTGAATCCCCGTTTGCTGAGGATGTTGTGCGCCTCCATGCTCCGGTTCTGGTTGCTCGAGCACACCACCAGCAAATTGTAAGGAACTC	<EHHEHHGFHFHGD.G.GFHHGHGGGHHHHF1HHGHGGGGGGGGDGHHHHHGFGGGGHHFHHHGGHHHHHGHGGGGGGGHHHHGHGGGGGHHHHGGHHHGGGGGGHHGHGGGHGHHHHGHHHFHHHHHHGGGGGG	MC:Z:58M19770N93M	RG:Z:1dcde81ed37ff75c4132265ab5c56e9a	MI:Z:GGGAAT	MQ:i:60	YT:Z:CP	NU:Z:.	RX:Z:GGGAAT
some:1:1112:8831:21715	161	1	1500289	60	27S8M9561N71M26S	=	1500289	9639	AAGAGATTTAACAAAGCACAAAATATTCCCGTTTGCTGAGGATGTTGTGCGCCTCCATGCTCCGGTTCTGGTTGCTCGAGCACACCACCGCCACCCGCAGCGGGGAGATCGGAAGAGCGTCGTGTAGGGAAA	GGGGGHHHHHGHHHHHHHHHHHHHHHHHHHGHHHGHHGGHGHHHHHHHGGGGEGGHHHHHFHHGGFGGHHHHGGHHHGGGGGHHHGHHGEGGGGHDECDGGGCGG?-CHHFD.FDFAGGGGGGGEGHHAHFG	MC:Z:27S8M9561N70M	RG:Z:1dcde81ed37ff75c4132265ab5c56e9a	MI:Z:ATGTTC	MQ:i:60	YT:Z:UP	NU:Z:.	RX:Z:ATGTTC
some:1:1112:8831:21715	81	1	1500289	60	27S8M9561N70M	=	1500289	-9639	AAGAGATTTAACAAAGCACAAAATATTCCCGTTTGCTGAGGATGTTGTGCGCCTCCATGCTCCGGTTCTGGTTGCTCGAGCACACCACCGCCACCCGCAGCGGGG	HHHHHHHHHHHHHHHHHHHHHHHHHGGGGGHHHHHHHHHHHHHGHGGGGGGGHHHHHHGGGGGHHHHHGHHHGGGGGHHHGHFEGGGGGGGFECCCCCCCCCCCC	MC:Z:27S8M9561N71M26S	RG:Z:1dcde81ed37ff75c4132265ab5c56e9a	MI:Z:ATGTTC	MQ:i:60	YT:Z:UP	NU:Z:.	RX:Z:ATGTTC
some:1:2111:14330:27469	1187	1	1509973	60	27S63M2D24M	=	1509973	133	TGGGGAAAACAAACCAGGGCGCTCCCTGGTTCCCACCCTACCGCGGCGCTTCCGCGCGAACAAAATGGCGGCCGCGGTGGCCGGAAGCGGGACGCGAAACGACGGCGCCGGCGG	GGGGGGGHHHHHHHGHFAEEGGGGDFGHHHHH3GHHGGHHHGGGCGGGFGFFHGGGG?@FCCHHCFFBBDCGGCGG-C@AGGCC?DAECFGAGGGGGFFFFFA=FDAFFAFFFF	MC:Z:20S63M2D68M	RG:Z:1dcde81ed37ff75c4132265ab5c56e9a	MI:Z:GGGGCA	MQ:i:60	YT:Z:CP	NU:Z:.	RX:Z:GGGGCA
some:1:2111:14330:27469	1107	1	1509973	60	20S63M2D68M	=	1509973	-133	AACAAACCAGGGCGCTCCCTGGTTCCCACCCTACCGCGGCGCTTCCGCGCGAACAAAATGGCGGCCGCGGTGGCCGGAAGCGGGACGCGAAACGACGGCGCCGGCGGTGTAGCGTGCGGCGACTGCGGGGCGGCCTCCCCGCCCACCCTGG	FFDFBFFFFD;D@.DFFFFFEE9.>99/E?A;.A@-;ACFA.-FCAFA;FB9.B/FFEBBBFFAFA?C.GGGGGGHFFCGGGGGGGGD/EGCGGGGGGGFFGGGFG2FGEGCFGGGGGGGHGGEFEEGEECGGGGGGGGBADBAC@BBBBB	MC:Z:27S63M2D24M	RG:Z:1dcde81ed37ff75c4132265ab5c56e9a	MI:Z:GGGGCA	MQ:i:60	YT:Z:CP	NU:Z:.	RX:Z:GGGGCA
#Target region
some:1:2109:18624:27633	1123	2	42472698	60	130M10813N15M6H	=	42472751	10958	CTTGAGTCACGAGTTCAGCAACAAGAAGATGAAATCACTGTGCTAAAGGCGGCTTTGGCTGATGTTTTGAGGCGTCTTGCAATCTCTGAAGATCATGTGGCCTCAGTGAAAAAATCAGTCTCAAGTAAAGGCCAACCAAGCCCTC	BCCCCFFFFFAAEFEFGGFGGGHGHHGFFHHHGHHHGHHHHHHHHHFFFHGGGGGHHHGFFGHHHHHHHGFHGGFGGGHHHHHHFHHHFHGHHGHHHHHHHHHHHHHHHGHHHGFHHHHBGGHHHHFHHFFGEHHG/FCGGGHHH	MC:Z:77M10813N15M40H	RG:Z:1dcde81ed37ff75c4132265ab5c56e9a	MI:Z:GCAACG	MQ:i:60	YT:Z:CP	NU:Z:533_EML4_NM_001145076_exon_2_0_chr2_42483641_f_0_3570427	RX:Z:GCAACG
some:1:2109:7770:23406	99	2	42472698	60	130M10813N15M6H	=	42472746	10958	CTTGAGTCACGAGTTCAGCAACAAGAAGATGAAATCACTGTGCTAAAGGCGGCTTTGGCTGATGTTTTGAGGCGTCTTGCAATCTCTGAAGATCATGTGGCCTCAGTGAAAAAATCAGTCTCAAGTAAAGGCCAACCAAGCCCTC	CCCCBFFFFFBCGGGGGFGGGGHHGHHGHHHHHHHHHHHHFHHHHHHHHHGGGGGHHHHGGHHHHHHHHGGHGGGGGGHHHHHHHHHHHHHHHHHHHGFHHHHHHHHHHHDGHGHHHHFHHHHHHHHEHHHHGHHGGGHGGGGHH	MC:Z:82M10813N15M39H	RG:Z:1dcde81ed37ff75c4132265ab5c56e9a	MI:Z:CGGCGG	MQ:i:60	YT:Z:CP	NU:Z:533_EML4_NM_001145076_exon_2_0_chr2_42483641_f_0_3570427	RX:Z:CGGCGG
some:1:2113:8484:13190	1123	2	42472698	60	130M10813N21M	=	42488286	15678	CTTGAGTCACGAGTTCAGCAACAAGAAGATGAAATCACTGTGCTAAAGGTGGCTTTGGCTGATGTTTTGAGGCGTCTTGCAATCTCTGAAGATCATGTGGCCTCAGTGAAAAAATCAGTCTCAAGTAAAGGCCAACCAAGCCCTCGAGCAG	BBBBBFFFFFBBGEGGGGGGGFGGFHHFHHHGFHHGHHHHFHHGHFHHHGFGHHHHHFHHGHHHHHHHGGGEEEGGFFHHHHHHHHHHHHHFGFHHHFHEGHHHGHHH3FFGAEGCHHFGBHFDHHHHGGHGAHHGGEHHGGHFECGGGGF	MC:Z:90M40H	RG:Z:1dcde81ed37ff75c4132265ab5c56e9a	MI:Z:TAATTG	MQ:i:60	YT:Z:CP	NU:Z:904_EML4_NM_019063_exon_3_0_chr2_42488261_f_1_3570429	RX:Z:TAATTG
some:1:2109:7770:23406	147	2	42472746	60	82M10813N15M39H	=	42472698	-10958	GCGGCTTTGGCTGATGTTTTGAGGCGTCTTGCAATCTCTGAAGATCATGTGGCCTCAGTGAAAAAATCAGTCTCAAGTAAAGGCCAACCAAGCCCTC	CFGBCFHAHHEHGHGHGFHFGF?GGHHFGHFGFHFFBEHGHHEFHHHHGHGFFGHHFGHGGHHG4B3EHHHGHHHHHHHFHHHGGFH?GHHGGGFFG	MC:Z:130M10813N15M6H	RG:Z:1dcde81ed37ff75c4132265ab5c56e9a	MI:Z:CGGCGG	MQ:i:60	YT:Z:CP	NU:Z:533_EML4_NM_001145076_exon_2_0_chr2_42483641_f_0_3570427	RX:Z:CGGCGG
some:1:2109:18624:27633	1171	2	42472751	60	77M10813N15M40H	=	42472698	-10958	TTTGGCTGATGTTTTGAGGCGTCTTGCAATCTCTGAAGATCATGTGGCCTCAGTGAAAAAATCAGTCTCAAGTAAAGGCCAACCAAGCCCTC	FGHGFFGHHHGFHCH?HGGGFGD4?333HFGFHGHHBHFFHFFFAEGGHHHHFHHGHFGGFFGHHEGBHHHHGG@@3HGDG3GCEHF?E?GG	MC:Z:130M10813N15M6H	RG:Z:1dcde81ed37ff75c4132265ab5c56e9a	MI:Z:GCAACGMQ:i:60	YT:Z:CP	NU:Z:533_EML4_NM_001145076_exon_2_0_chr2_42483641_f_0_3570427	RX:Z:GCAACG
some:1:2113:8484:13190	1171	2	42488286	60	90M40H	=	42472698	-15678	CCACAAGGACAGAGAGAAAAAAAAGAGGAATCTCATTCTAATGATCAAAGTCCACAAATTCGAGCATCACCTTCTCCCCAGCCCTCTTCA	?EHFGBHFGFHHHGHFGGGHFHHHFF4HFGDB33FF4BFFHHGDGGFFGFFGHGHHEGG@GHFHGFF33GBHEEC?GGGHGFEFEGDGDE	MC:Z:130M10813N21M	RG:Z:1dcde81ed37ff75c4132265ab5c56e9a	MI:Z:TAATTG	MQ:i:60	YT:Z:CP	NU:Z:904_EML4_NM_019063_exon_3_0_chr2_42488261_f_1_3570429	RX:Z:TAATTG
@ctsa
Copy link
Contributor

ctsa commented Jan 25, 2018

Thanks for providing some example reads. The failure you've encountered is in the first step of the SV caller, where it attempts to estimate the sequenced fragment length distribution. This is a particularly tricky operation for RNA.

While it is not possible to definitely diagnose the problem without the full data, but I can spot two issues which could be aggravating the operation:

  1. More than half of the reads shown above are marked as PCR-duplicates. These are all filtered out from fragment length estimation and variant calling early on. Please check if the fraction of PCR-duplicate labeled reads in this file reflects your intention.

  2. I think the hard-clipped reads are all being removed from consideration by manta upfront, and this might be overly conservative. Currently, in addition to other filtration criteria, manta will throw out any read when the CIGAR string contains an indel, more than one REFSKIP (N) segment, any hard clipping, and any soft-clip segment on the outside of the read pair. It seems reasonable that manta could accept hard-clip segment(s) to account for primer removal scenarios such as the above case if we carefully review this case -- I will raise this conversation with other RNA users.

@mmterpstra
Copy link
Author

mmterpstra commented Jan 26, 2018

Thanks this explains a lot:

  • The hardclipping is done on 100% of second of pair reads due to the custom protocol and a hand written solution to that: link to explanation of protocol . To make it more iffy the reads were trimmed based on location on the genome.
  • PCR duplicates will be horrible because some of the data is FFPE.
  • more than one REFSKIP (N) This would happen more often as reads get longer i think that original tophat0 and older alignment software had issues with this also reducing alignment rates ~10% when reads are 100 vs 75 bp. For this purpose this wouldn't be a problem though. Although some of the primers were designed to capture fusions.

Your answer solves my problem. Maybe consider skipping this step when a PE dist is set on the cmdline of manta. Then you can use another tool to calculate this rseqc (best) / picard(more consistent with how manta it will see).

@mmterpstra
Copy link
Author

mmterpstra commented Jan 26, 2018

temp solution:

samtools view -F 1024 -h some.bam | perl -wlane 'if(not(m/^[@]/)&& defined($F[5])){$F[5] =~ s/\d*H//g;print join("\t",@F);}else{print $_;}' | samtools view -Sb > some.clean.bam
samtools index some.clean.bam

n=1 test and seems to work...(This is tested on anther version 1.1.0 though)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants