Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manage 'NAME' column in BED, BEDPE and PAF, 'M' CIGAR operation, GitHub workflow, strandness #4

Merged
merged 9 commits into from
Apr 15, 2024

Conversation

AndreaGuarracino
Copy link
Member

column -t x.bed
  S288C#1#chrXII  1000     100000   annotation1
  S288C#1#chrXII  500000   900000   annotation2
  S288C#1#chrXII  1000000  2000000  

impg -p scerevisiae7.aln.paf -b x.bed | column -t
  S288C#1#chrXII        1000     100000   S288C#1#chrXII  1000     100000   annotation1
  SK1#1#chrXII          15000    49816    S288C#1#chrXII  1000     100000   annotation1
  UWOPS034614#1#chrXII  70000    92908    S288C#1#chrXII  1000     100000   annotation1
  Y12#1#chrXII          90000    104447   S288C#1#chrXII  1000     100000   annotation1
  S288C#1#chrXII        500000   900000   S288C#1#chrXII  500000   900000   annotation2
  YPS128#1#chrXII       479121   500000   S288C#1#chrXII  500000   900000   annotation2
  Y12#1#chrXII          483399   564848   S288C#1#chrXII  500000   900000   annotation2
  UWOPS034614#1#chrXII  471803   854819   S288C#1#chrXII  500000   900000   annotation2
  Y12#1#chrXII          585000   780000   S288C#1#chrXII  500000   900000   annotation2
  YPS128#1#chrXII       590000   790000   S288C#1#chrXII  500000   900000   annotation2
  SK1#1#chrXII          730000   865468   S288C#1#chrXII  500000   900000   annotation2
  Y12#1#chrXII          860000   872245   S288C#1#chrXII  500000   900000   annotation2
  S288C#1#chrXII        1000000  2000000  S288C#1#chrXII  1000000  2000000  
  UWOPS034614#1#chrXII  944197   1000000  S288C#1#chrXII  1000000  2000000  
  Y12#1#chrXII          980000   1029536  S288C#1#chrXII  1000000  2000000  

@AndreaGuarracino AndreaGuarracino changed the title manage 'NAME' column in BED, BEDPE and PAF manage 'NAME' column in BED, BEDPE and PAF, 'M' CIGAR operation, and GitHub workflow Apr 13, 2024
@AndreaGuarracino
Copy link
Member Author

Now we can properly manage also minimap2's CIGAR strings:

minimap2 scerevisiae7.fa.gz scerevisiae7.fa.gz -X -c -t 48 > mm2.paf

impg -p mm2.paf -r UWOPS034614#1#chrI:1000-2000 | head -n 5 | column -t  
  UWOPS034614#1#chrI  1000    2000
  S288C#1#chrVIII     570385  569601
  DBVPG6765#1#chrVI   5195    5981
  DBVPG6765#1#chrI    210581  209798
  UWOPS034614#1#chrI  213064  212060

impg -p mm2.paf -r UWOPS034614#1#chrI:1000-2000 -P | head -n 5 | column -t 
  UWOPS034614#1#chrI  214332  1000    2000    +  UWOPS034614#1#chrI  214332  1000  2000  1000  1000  255  cg:Z:1000=
  S288C#1#chrVIII     581049  570385  569601  -  UWOPS034614#1#chrI  214332  1000  2000  775   1009  255  cg:Z:10M1I29M2I8M1D11M1D78M18D6M1D5M2D1M3D23M1D6M1D19M1D26M1D27M1D39M5D1M2D19M1D94M1I72M1I21M1I10M1D21M3I22M151D205M1D22M33D
  DBVPG6765#1#chrVI   257436  5195    5981    +  UWOPS034614#1#chrI  214332  1000  2000  777   1009  255  cg:Z:10M1I29M2I8M1D11M1D78M18D6M1D5M2D1M1D25M1D6M1D19M1D26M1D27M1D39M5D1M2D19M1D94M1I72M1I21M1I10M1D21M3I22M151D205M1D22M33D
  DBVPG6765#1#chrI    215496  210581  209798  -  UWOPS034614#1#chrI  214332  1000  2000  774   1009  255  cg:Z:10M1I29M2I8M1D11M1D78M18D6M1D5M6D23M1D6M1D19M1D26M1D27M1D39M5D1M2D19M1D94M1I72M1I21M1I10M1D21M3I22M151D205M1D22M33D
  UWOPS034614#1#chrI  214332  213064  212060  -  UWOPS034614#1#chrI  214332  1000  2000  1000  1004  255  cg:Z:171M4I829M


# --eqx  to write =/X CIGAR operators
minimap2 scerevisiae7.fa.gz scerevisiae7.fa.gz -X -c -t 48 --eqx > mm2.eqx.paf

impg -p mm2.eqx.paf -r UWOPS034614#1#chrI:1000-2000 | head -n 5 | column -t 
  UWOPS034614#1#chrI  1000    2000
  S288C#1#chrVIII     570385  569601
  DBVPG6765#1#chrVI   5195    5981
  DBVPG6765#1#chrI    210581  209798
  UWOPS034614#1#chrI  213064  212060

impg -p mm2.eqx.paf -r UWOPS034614#1#chrI:1000-2000 -P | head -n 5 | column -t 
  UWOPS034614#1#chrI  214332  1000    2000    +  UWOPS034614#1#chrI  214332  1000  2000  1000  1000  255  cg:Z:1000=
  S288C#1#chrVIII     581049  570385  569601  -  UWOPS034614#1#chrI  214332  1000  2000  665   1009  255  cg:Z:10=1I24=1X4=2I2=1X5=1D3=1X1=1X5=1D3=2X10=1X29=1X4=1X1=2X9=2X13=18D6=1D5=2D1=3D19=2X2=1D6=1D1X16=1X1=1D10=1X12=2X1=1D1=1X2=2X3=1X17=1D15=1X18=1X1=1X2=5D1=2D4=1X1=1X3=1X1=1X1=1X4=1D1X5=1X1=1X7=1X5=1X2=1X6=1X4=1X15=1X4=3X7=1X1=1X1=2X10=1X1=1X1=1X1=1X3=1I13=1X5=1X2=3X3=1X1=1X1=1X2=1X4=1X1=1X2=1X12=1X7=1X5=1I1X1=1X3=1X4=2X2=1X3=1X1=1I10=1D4=1X4=1X2=1X8=3I16=1X5=151D17=2X7=1X23=1X5=1X14=1X20=1X2=1X2=2X4=1X5=1X12=1X4=1X5=2X4=1X3=1X1=1X1=2X2=1X1=1X3=1X2=1X11=1X2=1X8=1X3=1X4=1X4=2X4=1D17=1X4=33D
  DBVPG6765#1#chrVI   257436  5195    5981    +  UWOPS034614#1#chrI  214332  1000  2000  667   1009  255  cg:Z:10=1I24=1X4=2I2=1X5=1D3=1X1=1X5=1D3=2X10=1X29=1X4=1X1=2X9=2X13=18D6=1D5=2D1=1D21=2X2=1D6=1D1X16=1X1=1D10=1X12=2X1=1D1=1X2=2X3=1X17=1D15=1X18=1X1=1X2=5D1=2D4=1X1=1X3=1X1=1X1=1X4=1D1X5=1X1=1X7=1X5=1X2=1X6=1X4=1X15=1X4=3X7=1X1=1X1=2X10=1X1=1X1=1X1=1X3=1I13=1X5=1X2=3X3=1X1=1X1=1X2=1X4=1X1=1X2=1X12=1X7=1X5=1I1X1=1X3=1X4=2X2=1X3=1X1=1I10=1D4=1X4=1X2=1X8=3I16=1X5=151D17=2X7=1X23=1X5=1X14=1X20=1X2=1X2=2X4=1X5=1X12=1X4=1X5=2X4=1X3=1X1=1X1=2X2=1X1=1X3=1X2=1X11=1X2=1X8=1X3=1X4=1X4=2X4=1D17=1X4=33D
  DBVPG6765#1#chrI    215496  210581  209798  -  UWOPS034614#1#chrI  214332  1000  2000  664   1009  255  cg:Z:10=1I24=1X4=2I2=1X5=1D3=1X1=1X5=1D3=2X10=1X29=1X4=1X1=2X9=2X13=18D6=1D5=6D19=2X2=1D6=1D1X16=1X1=1D10=1X12=2X1=1D1=1X2=2X3=1X17=1D15=1X18=1X1=1X2=5D1=2D4=1X1=1X3=1X1=1X1=1X4=1D1X5=1X1=1X7=1X5=1X2=1X6=1X4=1X15=1X4=3X7=1X1=1X1=2X10=1X1=1X1=1X1=1X3=1I13=1X5=1X2=3X3=1X1=1X1=1X2=1X4=1X1=1X2=1X12=1X7=1X5=1I1X1=1X3=1X4=2X2=1X3=1X1=1I10=1D4=1X4=1X2=1X8=3I16=1X5=151D17=2X7=1X23=1X5=1X14=1X20=1X2=1X2=2X4=1X5=1X12=1X4=1X5=2X4=1X3=1X1=1X1=2X2=1X1=1X3=1X2=1X11=1X2=1X8=1X3=1X4=1X4=2X4=1D17=1X4=33D
  UWOPS034614#1#chrI  214332  213064  212060  -  UWOPS034614#1#chrI  214332  1000  2000  1000  1004  255  cg:Z:171=4I829=

@AndreaGuarracino
Copy link
Member Author

AndreaGuarracino commented Apr 15, 2024

Now we properly manage the strandness too for all output formats:

impg -p input.paf -r UWOPS034614#1#chrI:1000-2000 | head -n 5 | column -t
UWOPS034614#1#chrI   1000    2000    .  +
S288C#1#chrVIII      569601  570385  .  -
DBVPG6765#1#chrVI    5195    5981    .  +
DBVPG6765#1#chrI     209798  210581  .  -
UWOPS034614#1#chrI   212060  213064  .  -

impg -p input.paf -b x.bed | head -n 5 | column -t
S288C#1#chrXII      1000     100000   S288C#1#chrXII  1000  100000  annotation1  0  +  +
S288C#1#chrIII      3        878      S288C#1#chrXII  1000  100000  annotation1  0  +  +
S288C#1#chrV        19       6566     S288C#1#chrXII  1000  100000  annotation1  0  +  +
DBVPG6044#1#chrX    725146   726056   S288C#1#chrXII  1000  100000  annotation1  0  -  +
DBVPG6765#1#chrXVI  948      5546     S288C#1#chrXII  1000  100000  annotation1  0  +  +

impg -p input.paf -b x.bed -P | head -n 5 | column -t 
S288C#1#chrXII      1075542  1000    100000  +  S288C#1#chrXII  1075542  1000  100000  99000  99000  255  cg:Z:99000=                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               an:Z:annotation1
S288C#1#chrIII      341580   3       878     +  S288C#1#chrXII  1075542  1000  100000  809    882    255  cg:Z:7=2D11=1X2=1I27=4I7=2X3=1X4=2X5=1X1=1D5=1X2=1X8=1I1=2X5=1X4=1X3=1X3=1X78=3I6=1X2=1I30=1X5=1X35=1X130=1X5=2X1=1X4=1X26=1D18=1X3=1X78=1X19=1X5=1X1=1X43=1X46=1X36=1X8=1X15=1X21=1D4=1I2=1I2=1D6=2X2=2X1=1X7=1X1=1X1=1X1=1X3=1D1X4=4X2=1X5=1X30=1X1=1X1=1X23=                                                                                                                                                                                                                                                                                                                                                                                                                                                                           an:Z:annotation1
S288C#1#chrV        583092   19      6566    +  S288C#1#chrXII  1075542  1000  100000  5158   6656   255  cg:Z:10=2D15=1I20=1D9=1I3=1X3=1X2=1X2=1I7=1X18=1X26=3I19=1X3=1X15=1X5=1X12=3I2=7I390=1X6=1X42=1X45=1X50=1X49=1X24=1X168=1X10=1X217=1X16=1X10=1X106=1X37=1X46=1X272=1X35=1I1=35I3=1X5=1X34=1X7=1X28=36I7=1X9=1X12=1X26=1X4=1X3=1X4=1X3=2X6=1X2=3X28=1X16=1X35=1X102=96I66=1X29=1X199=1X201=1X7=1X39=1X110=1X60=1X126=1X59=1X249=1X231=1X63=1X56=1X8=1X1=1X109=484I46=31I12=1X16=24I4=1X19=1X4=32I17=14I8=1X47=522I1=1X8=1X138=15D1=1D18=16D2=1X19=1X43=1X47=1X145=1X2=1X1=2D3=11D1=4D61=1X26=1X5=1X349=5D12=1X22=11D6=2D4=1X3=2D2=9D31=10I33=1I140=1X14=1X11=27D28=1X53=1X39=1X11=1X3=1D14=1X13=1X14=1X20=                                                                                                                                 an:Z:annotation1
DBVPG6044#1#chrX    728645   725146  726056  -  S288C#1#chrXII  1075542  1000  100000  624    977    255  cg:Z:36=1X3=1X11=1X15=1X48=1X9=1X1=1X5=9D1=1D1=1X2=3D1=2X1=2X3=2X2=2I1=3X4=6I1=1X3=1X1=4I3=1X5=2D1=1D1=2X1=2D1=1I3=1I1=1X2=3I2=2X2=1X2=1X1=1X3=1I1=1X2=1X1=2X1=3I3=1X1=2I1=2I1=1X1=2I2=1D2=2X4=1X2=1D3=2I2=1X1=3X3=1D1=2I1X3=2D3=2X1=1X1=1X2=1I1=1I4=1I3=3X3=1X1=1X3=2D2=1X1=5D2=1X1=1X1=1X3=1I1X2=1X3=3D2=1D1X3=4I4=1X2=6I2=7I4=1I2=3I1=1I3=4I3=7I3=1X1=1I2X2=1X1=3I1X7=2X2=3D2X2=1I2=1X1=1X1=2I1X2=1X4=7I4=1X3=2X4=1D1=2I6=6I1X3=1X2=1I1=1X1=2I1=2X1=1D4=3I2=1I2=2X3=1I3=2X2=2D1=2X3=1I2=2X2=6X2=1X1=1I3=1X2=3D1=1X3=1I1=3I1=2I2=5I3=7I1=1I2=1X5=1X1=5I4=1X1=1I1=1X7=1X1=1D1=5D5=2X1=1X4=1X2=1X2=5D2=1D3=1X2=5D2=2X3=2I1=1X3=2X1=4I2=1X1=1X2=1I3=4D3=1I2=1X1=1I2=1X2=4I3=2X1=1X1=1X1=4I2=1X104=15I8=1D1=1D1X2=1X1=1I1X2=1X5=1X26=1X21=  an:Z:annotation1
DBVPG6765#1#chrXVI  920208   948     5546    +  S288C#1#chrXII  1075542  1000  100000  4550   4711   255  cg:Z:529=1X109=1X102=1X35=1X3=1X5=1X9=108D4=1X15=1X5=1X6=1X9=1X35=1X12=1X5=1X8=1X6=1X254=1X261=1X629=1X283=1X232=1X75=1X324=1X36=1X146=1X128=1I182=1X40=1X355=5D12=1X31=1X13=1X11=1X35=10I11=1X21=1I202=1X127=1X16=1X28=1X201=                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            an:Z:annotation1

@AndreaGuarracino AndreaGuarracino changed the title manage 'NAME' column in BED, BEDPE and PAF, 'M' CIGAR operation, and GitHub workflow manage 'NAME' column in BED, BEDPE and PAF, 'M' CIGAR operation, GitHub workflow, stradness Apr 15, 2024
@AndreaGuarracino AndreaGuarracino changed the title manage 'NAME' column in BED, BEDPE and PAF, 'M' CIGAR operation, GitHub workflow, stradness manage 'NAME' column in BED, BEDPE and PAF, 'M' CIGAR operation, GitHub workflow, strandness Apr 15, 2024
@ekg ekg merged commit ac5088a into pangenome:main Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants