Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: CIGAR and query sequence are of different length #1

Open
michieitel opened this issue Aug 5, 2019 · 15 comments
Open

ERROR: CIGAR and query sequence are of different length #1

michieitel opened this issue Aug 5, 2019 · 15 comments

Comments

@michieitel
Copy link

michieitel commented Aug 5, 2019

Hi!

I am getting an error when converting the graphmap2 (v 0.6.0) generated sam file (of nanopore cDNA reads mapped to a reference assembly) to bam.

These are the logs of the run:

[22:32:53 BuildIndexes] Loading reference sequences.
[22:32:54 SetupIndex_] Loading index from file: '/home/cgarcia/cbas_cDNA/data/CBAS_MASURCA-2_final.genome.scf.fasta.gmidx'.
[22:33:00 Index] Memory consumption: [currentRSS = 4364 MB, peakRSS = 4804 MB]
[22:33:00 Run] Hits will be thresholded at the percentil value (percentil: 99.000000%, frequency: 139).
[22:33:00 Run] Minimizers will be used. Minimizer window length: 5
[22:33:00 Run] Reference genome is assumed to be linear.
[22:33:00 Run] One or more similarly good alignments will be output per mapped read. Will be marked secondary.
[22:33:00 ProcessReads] All reads will be loaded in memory.
[22:33:53 ProcessReads] All reads loaded in 48.67 sec (size around 11231 MB). (9526475634 bases)
[22:33:53 ProcessReads] Memory consumption: [currentRSS = 19168 MB, peakRSS = 19168 MB]
[01:39:48 ProcessReads] [CPU time: 890914.56 sec, RSS: 501212 MB] Read: 11313268/11313268 (100.00%) [m: 9022586, u: 2290682]
[01:40:08 ProcessReads] Memory consumption: [currentRSS = 497484 MB, peakRSS = 502752 MB]

[01:40:08 ProcessReads] All reads processed in 890935.00 sec (or 14848.92 CPU min).
[W::sam_parse1] mapped query cannot have zero coordinate; treated as unmapped
[E::sam_parse1] CIGAR and query sequence are of different length
[W::sam_read1] Parse error at line 14814
[main_samview] truncated file.
[bam_sort_core] merging from 0 files and 80 in-memory blocks...

my commands were as follows:

Read alignment

graphmap-not_release align -t 80 -x rnaseq -K fasta -L sam --extcigar
-r /home/cgarcia/cbas_cDNA/data/CBAS_MASURCA-2_final.genome.scf.fasta
-d /home/cgarcia/cbas_cDNA/data/cbas_CANU_cDNA_correction-2_combined.correctedReads_100bp.fasta
-o /home/cgarcia/analysis/mapping/graphmap2/corrected2/CBAS_MASURCA-2_final.genome.scf._ONT_cdna_graphmap2_combined_corrected-2_100bp.sam

Convert sam to bam and sort

samtools view -@ 80 -b -S CBAS_MASURCA-2_final.genome.scf._ONT_cdna_graphmap2_combined_corrected-2_100bp.sam | samtools sort -@ 80 \

CBAS_MASURCA-2_final.genome.scf._ONT_cdna_graphmap2_combined_corrected-2_100bp.sorted.bam

The line it complains about is this one (in the sam):

33ef8047-1b3c-4389-aad6-9110103981f5 0 scf7180000042097 35520 0 4=3D3=1X1=44N2D1=1X6=1X3=1I1=1I4=23N1X1=1X1=1X5=1D1=1X4=7D4=1D3I33N1I2=2D1=3D5=2D3=2X2=2I3=1D4=1X2=3I1=1I1=1X2=1D5=1D1=1D1=1D1=1X1=1X1=1X81N1=1I1=1X2=1X3=1X2=2D1X1=1X2=1X1=1X2=1X2=1I1=2D3=1X1=1X1=5D1X1=1X4=1D4=1X4=1I2=1I1=1X1=1X2=2D1X1=1X1=1D1X2=1X6=2D3=48N3=1X1=1X7=1I3=2X1=1X1=1D2=1X1=3X3=1X2=1I1=1X2=148N5=1X6=1D1=2X2=3D3=2D4=133N2=1X2=1D3=3D1=2D2=1X3=2X4=1D1=1X5=214N3=1X6=1I6=1D1=1D2=1X1=3X5=1D2=1X4=1X2=3I1=1X4=2D1=2X5=172N2I1=1X1=1I2=5I1=1X6=1X1=2D1=1D5=1D2=1X3=1X1=17N1=1X4=6D5=3D1X3=4D6=443N * 0 0 TTTTTTTTTTTTTTTTTTTTTTTCTTTTTTTTTTTTCACCGTGAAACAAAAACAGATTCAGCTATTCAGAAAATAACATGTATACCTAAATTTTATATTTCTATGCAGTTGCTATGCTAATACTACAGCTACTAAACAATATATGTTTTTACACAAAATTCTGTGATCATTCCAGCTTGCTTAGAATAACCTTCTCCAGTTTGACTGTGTCAGCTGCAAATTTGCAGTTCCTTCAGCTAGTTTCTCAGTGGCCATTTGGTCCTCATTTAGCCCACCTGGTGCTTCTCGTTGAGAGATATTTTCTCAATATCAAGTGCCTTTGCTTTATCAGCATCAAATGCTTAGTAACTGATTCAGTTGATTCTTTAGCTTATCCAGAAGGGCAGGGGGAGATAGTAAGGTTATCACAACCCAGCCAATGCAGATTTGGATGACACTAAAGATCCCAGCGAATCGTACTAACATTTGGATAACGATTGAAAGCAAACGTGTCAGGAAATATCTCTCGGCGGAGAAGCAATTCAGGTGGCTGCTTAAATGAAAGGACAAATGGCCGCGCTGAGAAACAACTGAAGAAATCCACCAAGATTTGCTGACAGTGAAGCTGAGAAAATTATTCTAAGCAATAGAATAACATACTCCAGTTTGACCTTGTCTGGCTACTGGTACGGATTCCTTCATTAGTTTCTCGGTGGCCATTTGTGGTCACTCATTTAGCAGCCACTGAATTTGCTTCTCGTTGAGAGATATTTTCTCAATATCAAGTGCCTTTGCTTTATCAGCATCCAAATGTATTAACTGGTTCAGTTGATTCTTTTGAAACTTATCCAGAAAGGGGCAGGAGATAGTGAGTTATCACAACCAGCCAGTACCACCCAAAATTGCCCTGTATTACGGATGCTCCACATGACAATGGTTTTATAGTCAAGCTTCTTGTAATAATTATATGTTTTGGTGACTGATTGAACACCGGGGTCTTCCGGTTGGCTCAAAAGTTTTCTTATCAGTGTTCTTGACATACCAGATCATAGATACGACCAACAAATGGGAAATCAAGTGACTTGTGGCACATGAAACTGCCCCGCAGGCAAACAAAAGCCAAGAAGTCAGATTATGATGGATACCATATTCGCTCTCCAGAATCTGGCTGCTTCATGCCCTCCCATGTTGAACTAAGTTGATTAGTACTCTCTCCTTGCTGATACCTGCCTTTTCATATAGTTCAATAAACCTCTTTGCTTCAGCAATAGGTTCTTCTTTATCAAGCCAGTAACCTTGCATCCGCTAGAGCCCGACCGAGTGCTATCTTCCAAGTCTACAACCCCAAAGTTTACATGGCTTGTCAATAGCAGCGTTACTTGATCGTCAAACTGCCACCCAGCGTCTTTGGCAAACTTAATAGCATCGCAGCAAATGCTGATATTCCGGCATCTGAGCGGCCTGGCCTGGCAAGGATGGGTTAGTAGTTGCATCCGTTGGCTTGTATTGATCAATGGATTTGATATCTCCAGTGTCAGCGACTACCGTAGTAAATTTATTCAATTGGTCAAGCGCTTTCCATAATCTTGCAAGGTGTACGCTTGAAAAGATGTATTGACCTTAATAGGCACTGGAGCGGTCAACACAAAGTTAGATAACGAGAATCAAGGCAGGAATGCAATTTTACCAAAATCATACATCCCCGTAATATATTTCAACAAGAAACACAAAAGAAGACACAAACAACTTTATTGTT * MD:Z:4^TTG3T1 NM:i:347 AS:i:-1134 H0:i:0 ZE:f:0 ZF:f:0 ZQ:i:1735 ZR:i:93581

Can you please help me to figure out what went wrong? I specified to include extended CIGARs with

--extcigar

thanks
Michael

@jmaricb
Copy link
Member

jmaricb commented Oct 5, 2019

Hi @michieitel ,

is there a way to send us the data you used so we can try to recreate this issue on our side?

Thanks

@ghelman91
Copy link

Hi @jmaricb -
I am currently having the same issue when trying to convert the sam output to a bam. Was there any luck in finding what the issue was?

Cheers,
Guy

@jmaricb
Copy link
Member

jmaricb commented Oct 8, 2019

Hi @ghelman91 ,

I am currently looking into it. What dataset and reference were you using?

Thanks

@ghelman91
Copy link

ghelman91 commented Oct 8, 2019

Hi @jmaricb -
I was trying to covert the sam of aligned nanopore cDNA reads mapped to the human reference genome to a bam file, similar to @michieitel above. The alignment runs fine but converting the output to a bam is the problem:

This is the command and output:
samtools view -b 191007_testNGMLR/191008_graphmap.sam > 191008_graphmap.bam
[E::sam_parse1] CIGAR and query sequence are of different length
[W::sam_read1] Parse error at line 6305
[main_samview] truncated file.

I had also used the extend cigar option just to see if I could figure out what was going on but no luck. Here is my problem line:
5f4df548-7bb6-427c-80ab-2e05238df0fc 16 chr11 6615448 0 13S9=2D14=1I3=1X21=2X1=2X10=1X15=1D24=1I9=3I442N12=1I42=1I16=586N11=1X1I68=1D60=1I55=1I3=115N1=1I1=1I19=1X1I36=6D16=1I2=1X7=1I31=1X22=1X13=2D19=147N17=1D12=1D13=8D4=1I53=1D18=197N2X5=1X55=1D28=1I4=1D6=3D10=1X1D20=1D12=999N14=1X1=2X1I23=1D1=1X11=1D18=1D5=2X1I11=2I2=1X8=1I15=1D4=1I5=1X10=2I280N1I1D5=1X15=2D1=1X12=1I9=1D17=1X2=1D3=116N36= * 0 0 GGTTACGTATTGCTGAAGGCCGGGAACACATTGCTGAAGCCACCACCACTGATATAGTCAACAATTTCATTTGTGATGAGGAAAGGTTCCTGGAAGGATGCCTCCCACTGTGGTGACATAGGGCTGAGGCAGTGTGGGAAGGTAGGGATGGAACTGGTGTCTTCCCCCAGAGACAGACCAACACCTGGCCCCAGCCTCGTCACCAGGGCTACTAGACCCAGGTGAGATGTTGGCACCAGCACTCATCAGGTACTGCACACCCAGACTGGCCTCAATCCCGGCCCGGCCCCGGCCCTGTTGTCCAACCACACGGGCTACTGATGCCCGATGTGCAAAGTTGCCACTGAAGAAAGTGCATGAACTGAGCCAGGTCTGAGTCATGGAAATACTGCTCCAGGAACTGGGCACAGGCTTGGCTGTTATTGCTGGTGCCAGAGCCCACGTCTTGTGAGGTCGTTAGCCGCTTACGGATCACAGAGGGGGCTGGCTTCCCAGATGCAGGCCTACAGTCCCTGTCACCTGCGGCTCAGACGTTGCCTCAGGGATGATGTTGGGGAAATCGGTGCAGTCCCCCCACAAAGTCCACATGGGGGCCAAGGCCTGTGGAAGCTGGTAGAGGGGGACCTTACAACATGGGTTTCCTCTGTAGGTCCTCCCACACATAGTGATGAAATCAGCCCCAGGAGCAGCAGCTCTGCTTGTCGATGCTCAGCCAGCAAGTCAGAAAGTCCTGTGTGATCACAGAACACTTCTGGGCTCCTCGGGCTGCCAAGAGCCATTTTTGCACCGTGTGGAGGGTTTGCAGGGGATGGCCTCACCAGATCAGCCACATTCTCTAGGGTCAGGTATTTCCTCCTGCATTGAAAGAGAGCTGGGATCCGACACAGCCTGCACCAGCTCCGAGAGTCTTTCCACATTCTGCTGTCCCAGGGCAAAAGGTGCAGAGACTCAGTCCTTTCTCAGGGTCCTAATGGCCCAGGGATCTGTAGCCTGGGGCAGCGTCCTCCGCTGGTCGGCTCCGGGCTGTAACTGCATTTGCCAGAGAGATGAGGGCAAAAGAGCCCCTAGGAGGCAGGCTTGGAGTCCCATTCTGCCCTTCCGCGGGATCTGTGAAGGCCGGGAACACATTGCTGGAAGTCACCACCACTGATATAGTCAAATACCTCATTTGTGACGAGGAAAGGTTCCTGAAGGATGTGCCTCCCACTGTGGTGGACATAGGGGCTGCTGGAGGCAGGGGAAGGTAGGGCGGAACTGGTGTCTTCCAGAGACAGACCAACACTCCGGCCCCACTGTCACCAGGGCTACTGAGAGACCCAGGTGGAGATGTTGGCACCAGCACTCATCAGGTACTGCACATCTAGACTGGCCTCAATCCCGCCCGGCCCCGGCCCTGTTGTCCAACCACACGGGCTACTGATGCCTGATGTGCAAAGTTGCTCACCGAAGAGGCGCATGAACTGAGCCAGGTCTGAGTCATGGAAATACTGCTCCAGTGAACTTCGGGCACAGGCTTGGCTGTTGGTTGCTGGTGCCAGAGCCCACGTCTTGTGAGGTCAAGCGCTTACGGATCACAGCAGAGGGTTACTCCCCAGATGCAGGCCTACAGTCCCTGTCACCCGCGGCTCAGGACGTTGCCTCAGTGATGATGTTGGGGAAACGGTGCAGTCCCCCCACAAAGTCCACATGGGGGCCAAGGCCTGTGAAGCTGGTAGGGACCTTTACAACATGGGTTTCCGTAGGTCCTCCCACATAGTGATGAAACTCAGCCCCAGGAGCAGCAGCTCTGCTTGTTTGATGCCCAGCCAGCAAGTCAGAAAGTCCTGTGTGATCACAGAATGGCACTTCTGGGCTCCGCTGCCAAGAGCCATTTTTGCACCGTGTGTGAGGTCAGTGATGGCCTCACTGATCAGCCACATTCTCTAGGTCAGGTATTTTCCGTATTGAGGAGAGTTTTCGATCCGACACAGCCTGCACCAGCCTGAGAGTCTTTCACATTCTGCTGTCTCAGGCAAAGCCAGAGACTCAGCTTTCTCCCTCAGGGGTCCGCACGGCCCAGGACACTCCAGCTTGGGGGCAGCGTCTCCTCTGCTGGTCGGGCTCCGCCGTAACTGCATTTTGCCAGAGAGATGAGGGCAAAGAGCCCCAGAGGCAGGCTTGGAGTCCCATTCTGCCCTTCCGCGGATCT %$(()&#((./48,:>=:899,55<C<<57767%6/-,,459000599<2@<;=A@@581.'''').:;DEED>CABC@;3IDB?>:48:8564:33=.%1E211,)((%,<+(5120,+-(2,30(,$%$&<>;>B88,-++%$$0';;603(-.8<4100/25<>@><<=;:2/;;AAB2%<<EF322)-$&amp;':A:2769&lt;AC992(,D:?/0;&gt;,444?;&gt;8809B00.-4&gt;CGA&gt;&gt;?(C@?A1362272&gt;7;A4-,A;:9060:&lt;:,,.47117(94/*)''$$#%%*4<=<0@/9869<8;;;:=B6985=3CFF>>80=,,--7=:DECDG@A=B;514+0-530<3::6CD@5B:?BA;9,)+(-&&&33;:971//=98.-&&&%%%$$%0.-0>52795;=>88:98;%1*..)(9-,,D06<.,++4:82/5,*(,<217@82200500-$*'2&*--+-+($$$8;77?A8:6,+878576($#('%&)0++42-.*,1)=CE6;0/.**/-,55663>?>=A./.1:4>;-&7250(-(,<>>;:=.)&(++-453=A@.(#''(,08@;@;>93632/',&&&(6=C>@@BC8&7><86/-5:@4,>HF>A353@:9.@@7402%/?62;AA?<E<9622685660.-3,2**'(<))'%$&%#'/-)'&535.+#&&#'/+-.03545788-006B311+:)&'.+.;(9A@)&$',8:))0/.//$,:;>A?>>;8>1'')+&$($+,0:<6779966;-+%%+','((),,%$''94:/009&%9427%328?=>877:@667;B(:<>1=?>79/22752))66:651.,-(,6;:<=;)98754%%/%/076;:674&'76+'')41%346-+&$##,(''''%1+'%%%%)/54258782=..485?==742,-)'.2,+&//673+/.).+6&&''12/33289+('''(,0201'689:6722'&''4%589@>@?81.<$6;:E=<;<-%&%.04193217>G;::.--11%//40+AB;;42:;06%%&%$24)194@=6/21939302,)(&&()++587))0&%,17/B)%,.-7;/.,:@<:>>$:;43=4268227896:008874(125&9$$-9$;==;8:A>;895558@@@8<8??648588>??:EC;@<<>><>;98=>;89%,-)47BDBC=;=>:.,<7892+))&$)0)+;=?>?:06C>???97?>9?:=A744(+./13/34&&,-.(,%&%&%'/&&#%''+(,0/.24&1-.5;<:033C@>34))&$68=@4/@=6>>@,@?6434;7>BDFC9@197'%(&%%$**1''&&5373ABB6:'035?8E:==BB3/.32$-):9A=<>B@?>=;;3852&.)+&(65:2$$$22387<A@A91/9:794,+'%$#&%&+./0//,79<=,++87<CC?9;?,-+2:332%%&-//-5++21-2-'%--,871//0&233568888846951/$$'#,249++(,2('(+,-)%(')$$53%$%'-%%$&&;'-==00*))+,,2:;61/,,,00+6787989681/%,-.$-10420/0+589;;4001?=CBA<=9+**5)),3---'.6&&)032)'$'(&'$%&4470280-1//097==47<<;27:=388'&-6%:&+A/<>;C;<=;=07;>@:9:?9-%%6:<>6+/',-&&/>:86895865::4,',01/.0''(%9(3728))):(087;4/59.&+,68129979:@?9B99/7>16162,,6<64,''-/22>782*%($'***)-(.768'/19&lt;;?=431746A?5&amp;&amp;834/,(((102$/054370,62>>80/-+')&$#%#$('+-++*+&%%'%#.&&%32/001/2*$&')'(&1--5;4443,&--31+$$%%01339(&*-4'',+-((+..,37;@B/(,%*/'%)+*3))3>C>99.0/)7)8/=+2>>?>=/GBA?>@?@'%1B6AA=:;=223@0A@C@/..84:41<9=2,,++8?110/1 MD:Z:9^TT17T21T0T1T0T10T15^T33 NM:i:92 AS:i:4535 H0:i:0 ZE:f:0 ZF:f:0 ZQ:i:2188 ZR:i:135086622

I might be able to directly send files if that would be helpful.

Cheers,
Guy

@HegedusB
Copy link

The same problem here! "CIGAR and query sequence are of different length" (graphmap2 (v.0.6.01))

[bhegedus@node2 map_all_03]$ samtools view -Sb -@60 full_length_barcode01_05_scaffold_1.sam > full_length_barcode01_05_scaffold_1.bam
[E::sam_parse1] CIGAR and query sequence are of different length
[W::sam_read1] Parse error at line 13867
[main_samview] truncated file.

@jmaricb
Copy link
Member

jmaricb commented Nov 21, 2019

@ghelman91 @michieitel @HegedusB Could you try with the newest commit? I have fixed the cigar length bug.

@menickname
Copy link

dear @jmaricb , I currently encounter the same issue with ONT reads using the Graphmap2 0.6.4. release. I do not succeed in converting my .sam graphmap2 output into sorted .bam files using samtools sort nor the comments described above.

Thank you in advance.

@jmaricb
Copy link
Member

jmaricb commented Apr 4, 2020

@menickname

Can you share the dataset and the reference you are using and still getting different CIGAR and query lengths? I would like to run it and try to find the error.

Thank you.

@menickname
Copy link

menickname commented Apr 5, 2020 via email

@jmaricb
Copy link
Member

jmaricb commented Apr 5, 2020

@menickname Well if you can, can you just send me one or several reads that have wrong CIGAR length? You don't need to send me the whole dataset. Just tell me which reference are you using then.

@menickname
Copy link

DearJosip

Thank you for your quick response. Below an example of the filtered/trimmed ONT reads. The used reference genome can be found at https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2 .

@fd5bbd49-dcc4-492b-b19c-e11e853e33a5 runid=674eb540e8d54b342a1e172ff5484ba559454959 sampleid=200327_Covid-19_Artic_Protocol_restart read=1319 ch=195 start_time=2020-03-27T16:38:58Z AATTACCTGAAACTTACTTTACTCAGGTAAAAATTTACAAGAATTTCGCCCAGGAGTCAAATGGAAATTGATTTCTTAGAATTAGCTATGGATGAATTCATTGAACGGTATAGAAGGCTATGCCCAACTGAACATATCTTTATGGAGGTGCGTCATAGTCAGTTAGGTGGTTTTACATCTACTGATTGGACTAGCTAAACGTTTAAGGAATCACCTTTTGAATTAGAAGATTTTATTCCTATGGACAGTACAGTTAAAAACTATTTCATAACAGATGCACCAAACAGGTTCATCTAAGTGTGTGTTCTGTTATTGATTTATTACTTGATGATTTTGTTGTTGGAAATAATAAAATCCCAAGATTATCTGTAGTTTCTAAGGTTGAG + 4648*;;309<;8:7)@GGE?@7693.712/21@DEB;./:1334/#$.0143*-51ABCA<<1.4&&-9:9KJ>?7A000;BF?@EB?A>@)****,5=?;,,/4+01/**)5,@=?7.-)%%%%%)3(0362,+&$)38?@A;(&&&#$&4:A>@92?<;<1$1&)).37;53>C=A=;@DJDG>6A??573<>BFH:E6@F:7)FHA9?>5558?:4783;=:9766FHJFFKC>?DF@ABB43>790/5::?@<8455>?DC)>CAF>E?6.)$$')357/.0<GCD746867=?.64FHDEAC>6;335><;=6443'035;69A=>HJM4++&<<()A@C?@:979D><?DB23<0.0650<@@AGHCG>:9:@FIEA?= @2efc38c4-3d9c-491f-8772-db3fdeb8749a runid=674eb540e8d54b342a1e172ff5484ba559454959 sampleid=200327_Covid-19_Artic_Protocol_restart read=1324 ch=150 start_time=2020-03-27T16:39:01Z GGTCAATTTCTGTACAAACAACACCATCCAATTTATAGAAGTAACTGGTTTATGGTTGTTGTGTAACTGTTTTCTTTGTAGAAAACATCAATAGGACCTTTGTATTCTGAGGGACTTTGTAAGTAAAGCACCGTCTATGCAATACAAAGTTTCTTTAGAAGTTATATGTTTATAGTGACCACACTGGTAATTACCAGTGTACTCCTGCTTACAAATGTACCATGTAGAAAGTTCATACTGAGCAGGTGGTGCTGACATCATAACAAAGGTGACTCCTGTTGTACTAGATATTTTTGTAGCTTGTTTACCACACGTACAAGGTATCTGAACACCTTTCTTAAATTGTTCATAAGAGAAGTGTGCCCATGTATAACAGCAG + *(&(13.5DB@>=7:03*/224768>89?CCCG;6,9%''1:4<>@?@ID/+++.7.2<+,62))%$&%/0013;;?6444EEHD>:3?./245;?;=8>F;>BA:9?:?A<;:389?8822+:@B3:75)*)11*566@@@CA=898=>DE9A@2,+(%3)2=9997CAD?>>CEFB<>=(9559@<@C=?035:.753666&-*(%%'/3=@B@B<;45>;/)-%&&''8CE>==AH<,655=5D8966)38>==77)9;;'670746<;87-23244=><94.1710-9=A?9/5,7766680,12/.32-+,//.4,.+(3+2.3050023655/,---6:AECB?++6:*++3>?G<;;96986>>>=AFD@C@ @1d9bbe11-c3e5-4d6e-ae33-2883200d8816 runid=674eb540e8d54b342a1e172ff5484ba559454959 sampleid=200327_Covid-19_Artic_Protocol_restart read=1103 ch=454 start_time=2020-03-27T16:38:59Z TAGCGGCGTGCCTTTATGGCACGAAACACAGTGGTACGAACTTATGTACTCATTCGTTTCCGGAAAGACAGGCACGTTATTAGTTAATAGCGTACTTCTTTTTCTTGCTTTCGTGGTATTCTTGCTAGTTACACTAGCCATCCTTACTGCGCTTCGATTGTGTGCGTACTGCTGCAATATTGTTAACGTGAGTCTTGTAAAACCTTCTTTTTACGTTTACTCTCGTGTTAAAATCTAATTCTTCTAGAGTTCCTGATCTTCTGGTCTAAACGAACTAAATATTATAGTTTTCTGTTTGGAACTTTAATTTTATATATGGTTTAACAGTACTATTACCGTTGAAGAGCTAAGCTCCTTGAACAATGGAACCTAGTAG + *+*%&'&.1)/--.)%#%(%$$$$$#%%%%%42.<<=...++,?<<75/67&&?@=IED$?03/.$&*%'('%,34540'*-=8AA800G9C;<AEE@JGFGH;><,''):52''-77679>=8;=<B;.5&4589<99>22637/+1.3323832/993:57=69?<=GGFLEC;93323985@B90:19;5A@>:6;=H=162;;=>BG.+58DEGCCG.?7<;>7:>@7:58.'*')/79:<@>5HFI26C?;79?>@D<82A<ADD;''*996774439;9<:.1?@3:6264&8DAAKK9CDC>8)&#$#$%%((0%'(*+2--4(0***+,0;973)%0$#&..&976:DBCD@?EC>;;9=:;&,*080 @c46650e5-daa0-4235-8b9b-ce49c9966837 runid=674eb540e8d54b342a1e172ff5484ba559454959 sampleid=200327_Covid-19_Artic_Protocol_restart read=973 ch=467 start_time=2020-03-27T16:39:02Z AAGGTAAGAACAAGTCCTGAGTTGAATGTAAAACTGAGGATCTGAAAACTGTCAGAATTAATAAACACCACGTGTGAAAGAATTAGTGTATGCAGGGGTAATTGAGTTCTGGTTGTAAGATTAACACACTGACTAGAGACTAGTGGCAATAAAACAAGAAAAACAAACATTGTTCGTTTAGAGAACAGATCTACAAGAGATCGAAAGTTGGTTGGTA + ('33):)AABA?@@++--,.2JJCHIABB@=;<4114?;>8??))BM-(=3*.:())*-327;?;8ADAD=?B?**))'+-,-;35;<;A:978ADA;E=BCDC??G<@DCC:<>@BC=:8D@@8B/@?GID><@7DMA>;::>;:<EECFGC?<A@0LHO2+.4)&(=<GDGD>>HC-()(3=D?B;;5=5'.:;'1/.,686(((3E7:;;7=?D @e4af2786-708a-46ac-9687-b4484e6c68f7 runid=674eb540e8d54b342a1e172ff5484ba559454959 sampleid=200327_Covid-19_Artic_Protocol_restart read=1003 ch=476 start_time=2020-03-27T16:39:01Z ATGTCTTGTGCTGCCGGTACTACACACAAACTGCTTGCACTGATGACAATGCGTTAGCTTACTACAACACAACAAAGGGAGGTAGGTTTGTACTTGCACTGTTATCGATTTACAGGATTTGAAATGAAACAGATTCCCTAAGAGTGATGGAACTGGTACTATCTATGCAGAACTGGAACCACCTTGTAGGTTTGTTACGAACAATGCTTTTTCACTATGTAGAAAGTTGGATAATGATGCACTCAACAACATTATCAACAACAATGCAGAGATGGTTTATTCGCCCAGACTTTCAACCATCTCGCTGCATTATTGATAACTCATTGAGATGCATCGTGCTTCCAGCAGCATAGTGAAAAGCATTGTCTGTAACAAAAACCTACGAGGTTCCCGGTGCTGTATGGATGAGTACCATTCCATCGCTCTTGGGAATCTGGCCATTTCAAATCTGTAAATCGGATAACGTGCGTGACCTACTCCTTTTTGTTGTGTTGGCAATGGCTGGCGCATTGTCTCATCGGTACTGGCGCGATTTGTGACGCACCGGCACAGACAG + ::<??>3-80CG;@?;?>:=B<&3,.15;CFGB==<((47?;67D?>660/46<+*(*+-*,+(/77444((-::,B>>C==2>4@GD@355///%"%+,&92-)-906961008.,5?4:264+*%$#$'0''(/4$%')*$309:=<CF=8:3/))--*32/.&&5847157<?@8132558:.((:9EEB=?:..,./42300:<=;=98+)08,'$%%(((719483-)//;;>0('&27---..)()*11:()+,,*..62+++9<,%$)22*$$&,$$$$#$'$$$$)()+,*)('&$$1%&&%&$$)&%''((*(%$)3+,.%%126.%(#"#$)+%''+%&%$$33200.())())-)0-&&&&+-'%(%()$$$%#.+%0-34''($$('*,,,'*&&#**(+*))+54*+/&'+353/35=833++-+/++-23770))%%%*+)*((*&$&)/,.-(+'54//)(&+-0*0111%*+.072/*''**))'$)#%*****+&&&'*1/0$&(&$%&%##&'%'))&*$'&''--,+/..).'-$#&

I did use following command:
for READS in $(ls nanofilt*.fastq); do graphmap2 align -r ../part3_Reference_mapping/Covid-19_Ref_genome.fasta -d $READS -o ../part3_Reference_mapping/graphmap2_$READS.sam -t 36; done
The .sam file is created without any issue and seems a normal .sam file to mee. However when I want to proceed further and do further sorting and indexing, I do not get correct .bam files (only one line present in the file) and empty .bai files.
for samfiles in $(ls graphmap2_*.sam); do samtools sort $samfiles > $samfiles.bam samtools index $samfiles.bam; done

I have used the same dataset with the Graphmap pipeline and this did not give any problems in further sorting and indexing. The Graphmap2 issue appeared both on our local computer as well as our HPC infrastructure.

Thank you in advance.
Best regards,
Nick Vereecke

@jmaricb
Copy link
Member

jmaricb commented Apr 5, 2020

Are you aligning RNA reads or DNA? If you are aligning RNA reads you should use '-x rnaseq' option. If you are aligning DNA reads then actually there is no difference between Graphmap and Graphmap2, so you could continue using Graphmap.

I will take a look anyway.

@menickname
Copy link

@jmaricb we are aligning DNA sequences. We like to work with the most-up-to-date versions of the available software. Please keep me posted if the issue got solved, then I could try to run it with Graphmap2 as well.

@jmaricb
Copy link
Member

jmaricb commented Apr 6, 2020

I will let you know. It should work with Graphmap2 too. It's just that Graphmap2 has only updates for RNA reads so it should be the same regarding DNA reads.

@1053286838
Copy link

Hi @jmaricb -
I was trying to covert the sam of aligned nanopore cDNA reads mapped to the human reference genome to a bam file, similar to @michieitel above. The alignment runs fine but converting the output to a bam is the problem:

This is the command and output:
samtools view -b 191007_testNGMLR/191008_graphmap.sam > 191008_graphmap.bam
[E::sam_parse1] CIGAR and query sequence are of different length
[W::sam_read1] Parse error at line 6305
[main_samview] truncated file.

I had also used the extend cigar option just to see if I could figure out what was going on but no luck. Here is my problem line:
5f4df548-7bb6-427c-80ab-2e05238df0fc 16 chr11 6615448 0 13S9=2D14=1I3=1X21=2X1=2X10=1X15=1D24=1I9=3I442N12=1I42=1I16=586N11=1X1I68=1D60=1I55=1I3=115N1=1I1=1I19=1X1I36=6D16=1I2=1X7=1I31=1X22=1X13=2D19=147N17=1D12=1D13=8D4=1I53=1D18=197N2X5=1X55=1D28=1I4=1D6=3D10=1X1D20=1D12=999N14=1X1=2X1I23=1D1=1X11=1D18=1D5=2X1I11=2I2=1X8=1I15=1D4=1I5=1X10=2I280N1I1D5=1X15=2D1=1X12=1I9=1D17=1X2=1D3=116N36= * 0 0 GGTTACGTATTGCTGAAGGCCGGGAACACATTGCTGAAGCCACCACCACTGATATAGTCAACAATTTCATTTGTGATGAGGAAAGGTTCCTGGAAGGATGCCTCCCACTGTGGTGACATAGGGCTGAGGCAGTGTGGGAAGGTAGGGATGGAACTGGTGTCTTCCCCCAGAGACAGACCAACACCTGGCCCCAGCCTCGTCACCAGGGCTACTAGACCCAGGTGAGATGTTGGCACCAGCACTCATCAGGTACTGCACACCCAGACTGGCCTCAATCCCGGCCCGGCCCCGGCCCTGTTGTCCAACCACACGGGCTACTGATGCCCGATGTGCAAAGTTGCCACTGAAGAAAGTGCATGAACTGAGCCAGGTCTGAGTCATGGAAATACTGCTCCAGGAACTGGGCACAGGCTTGGCTGTTATTGCTGGTGCCAGAGCCCACGTCTTGTGAGGTCGTTAGCCGCTTACGGATCACAGAGGGGGCTGGCTTCCCAGATGCAGGCCTACAGTCCCTGTCACCTGCGGCTCAGACGTTGCCTCAGGGATGATGTTGGGGAAATCGGTGCAGTCCCCCCACAAAGTCCACATGGGGGCCAAGGCCTGTGGAAGCTGGTAGAGGGGGACCTTACAACATGGGTTTCCTCTGTAGGTCCTCCCACACATAGTGATGAAATCAGCCCCAGGAGCAGCAGCTCTGCTTGTCGATGCTCAGCCAGCAAGTCAGAAAGTCCTGTGTGATCACAGAACACTTCTGGGCTCCTCGGGCTGCCAAGAGCCATTTTTGCACCGTGTGGAGGGTTTGCAGGGGATGGCCTCACCAGATCAGCCACATTCTCTAGGGTCAGGTATTTCCTCCTGCATTGAAAGAGAGCTGGGATCCGACACAGCCTGCACCAGCTCCGAGAGTCTTTCCACATTCTGCTGTCCCAGGGCAAAAGGTGCAGAGACTCAGTCCTTTCTCAGGGTCCTAATGGCCCAGGGATCTGTAGCCTGGGGCAGCGTCCTCCGCTGGTCGGCTCCGGGCTGTAACTGCATTTGCCAGAGAGATGAGGGCAAAAGAGCCCCTAGGAGGCAGGCTTGGAGTCCCATTCTGCCCTTCCGCGGGATCTGTGAAGGCCGGGAACACATTGCTGGAAGTCACCACCACTGATATAGTCAAATACCTCATTTGTGACGAGGAAAGGTTCCTGAAGGATGTGCCTCCCACTGTGGTGGACATAGGGGCTGCTGGAGGCAGGGGAAGGTAGGGCGGAACTGGTGTCTTCCAGAGACAGACCAACACTCCGGCCCCACTGTCACCAGGGCTACTGAGAGACCCAGGTGGAGATGTTGGCACCAGCACTCATCAGGTACTGCACATCTAGACTGGCCTCAATCCCGCCCGGCCCCGGCCCTGTTGTCCAACCACACGGGCTACTGATGCCTGATGTGCAAAGTTGCTCACCGAAGAGGCGCATGAACTGAGCCAGGTCTGAGTCATGGAAATACTGCTCCAGTGAACTTCGGGCACAGGCTTGGCTGTTGGTTGCTGGTGCCAGAGCCCACGTCTTGTGAGGTCAAGCGCTTACGGATCACAGCAGAGGGTTACTCCCCAGATGCAGGCCTACAGTCCCTGTCACCCGCGGCTCAGGACGTTGCCTCAGTGATGATGTTGGGGAAACGGTGCAGTCCCCCCACAAAGTCCACATGGGGGCCAAGGCCTGTGAAGCTGGTAGGGACCTTTACAACATGGGTTTCCGTAGGTCCTCCCACATAGTGATGAAACTCAGCCCCAGGAGCAGCAGCTCTGCTTGTTTGATGCCCAGCCAGCAAGTCAGAAAGTCCTGTGTGATCACAGAATGGCACTTCTGGGCTCCGCTGCCAAGAGCCATTTTTGCACCGTGTGTGAGGTCAGTGATGGCCTCACTGATCAGCCACATTCTCTAGGTCAGGTATTTTCCGTATTGAGGAGAGTTTTCGATCCGACACAGCCTGCACCAGCCTGAGAGTCTTTCACATTCTGCTGTCTCAGGCAAAGCCAGAGACTCAGCTTTCTCCCTCAGGGGTCCGCACGGCCCAGGACACTCCAGCTTGGGGGCAGCGTCTCCTCTGCTGGTCGGGCTCCGCCGTAACTGCATTTTGCCAGAGAGATGAGGGCAAAGAGCCCCAGAGGCAGGCTTGGAGTCCCATTCTGCCCTTCCGCGGATCT %$(()&#((./48,:>=:899,55<C<<57767%6/-,,459000599<2@<;=A@@581.'''').:;DEED>CABC@;3IDB?>:48:85_64:33=.%1E211,)((%,<+(5120,+-(2,30(,$%$&<>;>B88,-++%$$0';;603(-.8<4100/25<>@><<=;:2/;;AAB2%<<EF322)-$&':A:2769<AC992(,D:?/0;>,444?;>8809B00.-4>CGA>>?(C@?A1362272>7;A4-,A;:9060:<:,,.47117(94/)''$$#%%4<=<0@/9869<8;;;:=B6985=3CFF>>80=,,--7=:DECDG@A=B;514+0-530<3::6CD@5B:?BA;9,)+(-&&&33;:971//=98.-&&&%%%$$%0.-0>52795;=>88:98;%1..)(9-,,D06<.,++4:82/5,(,<217@82200500-$'2&--+-+($$$8;77?A8:6,+878576($#('%&)0++42-.*,1)=CE6;0/.**/-,55663>?>=A./.1:4>;-&7250(-(,<>>;:=.)&(++-453=A@.(#''(,08@;@;>93632/',&&&(6=C>@@BC8&7><86/-5:@4,>HF>A353@:9.@@7402%/?62;AA?<E<9622685660.-3,2_'(<))'%$&%#'/-)'&535.+#&&#'/+-.03545788-006B311+:)&'.+.;(9A@)&$',8:))0/.//$,:;>A?>>;8>1'')+&$($+,0:<6779966;-+%%+','((),,%$''94:/009&%9427%328?=>877:@667;B(:<>1=?>79/22752))66:651.,-(,6;:<=;)98754%%/%/076;:674&'76+'')41%346-+&$##,(''''%1+'%%%%)/54258782=..485?==742,-)'.2,+&//673+/.).+6&&''12/33289+('''(,0201'689:6722'&''4%589@>@?81.<$6;:E=<;<-%&%.04193217>G;::.--11%//40+AB;;42:;06%%&%$24)194@=6/21939_302,)(&&()++587))0&%,17/B)%,.-7;/.,:@<:>>$:;43=4268227896:008874(125&9$$-9$;==;8:A>;895558@@@8<8??648588>??:EC;@<<>><>;98=>;89%,-)47BDBC=;=>:.,<7892+)**)&$)0)+;=?>?:06C>???97?>9?:=A744(+./13/34&&,-.(,%&%&%'/&&#%''+(,0/.24&1-.5;<:033C@>34))&$68=@4/@=6>>@,@?6434;7>BDFC9@197'%(&%%$**1''&&5373ABB6:'035?8E:==BB3/.32$-):9A=<>B@?>=;;3852&.)+&(65:2$$$22387<A@A91/9:794,+'%$#&%&+./0//,79<=,++87<CC?9;?,-+2:332%%&-//-5++21-2-'%--,871//0&233568888846951/$$'#,249++(_,2('(+,-)%(')$$53%$%'-%%$&&;'-==00*))+,,2:;61/,,,00+6787989681/%,-.$-10420/0+589;;4001?=CBA<=9+**5)),3---'.6&&)032)'$'(&'$%&4470280-1//097==47<<;27:=388'&-6%:&+A/<>;C;<=;=07;>@:9:?9-%%6:<>6+/',-&&/>:86895865::4,',01/.0''(%9(3728))):(087;4/59.&+,68129979:@?9B99/7>16162,,6<64,''-/22>782*%($')-(.768'/19<;?=431746A?5&&834/,(((102$/054370,62>>80/-+')&$#%#$('+-+++&%%'%#.&&%32/001/2$&')'(&1--5;4443,&--31+$$%%01339(&-4'',+-((+..,37;@B/(,%*/'%)+*3))3>C>99.0/)7)8/=+2>>?>=/GBA?>@?@'%1B6AA=:;=223@0A@C@/..84:41<9=2,,++8?110/1 MD:Z:9^TT17T21T0T1T0T10T15^T33 NM:i:92 AS:i:4535 H0:i:0 ZE:f:0 ZF:f:0 ZQ:i:2188 ZR:i:135086622

I might be able to directly send files if that would be helpful.

Cheers,
Guy

@HegedusB Do you solve the issue? I have the same problem:
[E::sam_parse1] CIGAR and query sequence are of different length
[W::sam_read1] Parse error at line 16323139
[main_samview] truncated file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants