Prevent pilon from changing contig names after polishing #7

hkaspersen · 2022-08-11T12:40:33Z

When working on issue #5 I noticed that Pilon changes the contig headers in the resulting fasta file. This is unfortunate due to the "circular=true" and depth information that Unicycler provides in the headers.

Asked for help on the Pilon github:
broadinstitute/pilon#151

hkaspersen · 2024-04-24T08:29:27Z

Pilon will not be implemented in this pipeline.

Isoris · 2024-10-15T20:00:37Z

I am sorry to ask but would it be possible to 1. add the full contig name to the polished fasta headers. and also get the consensus polished sequence instead of many contigs ?

Thank you in advance. it is quite basic and would be valuable to PILON.
Quentin.


Writing HiC_scaffold_27_hap2:20462918-20463189 changes to polished_assembly_hap2_scf27.changes
Writing updated HiC_scaffold_27_hap2_pilon to polished_assembly_hap2_scf27.fasta
Writing HiC_scaffold_27_hap2:20463007-20463247 VCF to polished_assembly_hap2_scf27.vcf
Writing HiC_scaffold_27_hap2:20463007-20463247 changes to polished_assembly_hap2_scf27.changes
Writing updated HiC_scaffold_27_hap2_pilon to polished_assembly_hap2_scf27.fasta
Writing HiC_scaffold_27_hap2:20463060-20463301 VCF to polished_assembly_hap2_scf27.vcf
Writing HiC_scaffold_27_hap2:20463060-20463301 changes to polished_assembly_hap2_scf27.changes
Writing updated HiC_scaffold_27_hap2_pilon to polished_assembly_hap2_scf27.fasta
Writing HiC_scaffold_27_hap2:20463108-20463348 VCF to polished_assembly_hap2_scf27.vcf
Writing HiC_scaffold_27_hap2:20463108-20463348 changes to polished_assembly_hap2_scf27.changes
Writing updated HiC_scaffold_27_hap2_pilon to polished_assembly_hap2_scf27.fasta
Writing HiC_scaffold_27_hap2:20464881-20465112 VCF to polished_assembly_hap2_scf27.vcf
Writing HiC_scaffold_27_hap2:20464881-20465112 changes to polished_assembly_hap2_scf27.changes
Writing updated HiC_scaffold_27_hap2_pilon to polished_assembly_hap2_scf27.fasta

>HiC_scaffold_27_hap2_pilon
ACATGCATGCAATGGGCCTAATGTGCATGTTGTGAGAAAACCAGTCATATGTACGTAATGTAGCTCTGTGTATGTGGCTG
TGAAATGTACGCGATGCCAGCTAGGCAGCAAAAATTGTTAAATTCTGTGGCAGGAGGCTGTTTATCTGTTTCAGGTGGGA
AATAATAATAATCATAACCAGATGTGAGAATATTTTAAACATTTGATACACCAATGAGCTGGACGGCTGTAATTATAACA
>HiC_scaffold_27_hap2_pilon
TGTGTGACTGTGACCAGACCTGTTATCTCTCCTCTTTTGCTCTCCCCTCTCCTGTTCTCTCTCCCACTCTCCCCCTTTCT
CTGTCTCTGTCGAGCTACACATGTCGTTCCTGAGCTGCCATTGATTCAGACCCCCTCTGCCCTCTGGACCTGCCTGACTC
ATCCTGGTGCCCCGCTTCTGGTTGGAGATCTCGTCACATGGATGTCCCGTGTGTCTCTTTGGGATATGTGGGTCC
>HiC_scaffold_27_hap2_pilon
CTATAAAGTTTGGGGATTGGGAAATTATTCATGTTTAATTTATCTGTTAATTTTAACCTTTGAACAGACGTTTTATAATT

...


##fileformat=VCFv4.1
##fileDate=20241016
##source="Pilon version 1.24 Thu Jan 28 13:00:45 2021 -0500"
##PILON="--genome HiC_scaffold_27_hap2.fa --frags short_reads_scaffold27.bam --bam /project/lt200308-agbsci/01-catfish_assembly/04-polish/01-PilonCMA/03-QV_mercury/asm.Hap2.polish3.renamed.fa.hifireads.sorted.bam --bam HiC_scaffold_27_hap2.ont_aligned.sorted.bam --targets /project/lt200308-agbsci/01-catfish_assembly/04-polish/01-PilonCMA/asm.Hap2.polish3.renamed_only.bed.chr27.targets.bed --output polished_assembly_hap2_scf27 --fix all --vcf --diploid --minmq 30 --minqual 30 --changes --tracks --verbose"
##reference=file:/lustrefs/disk/project/lt200308-agbsci/01-catfish_assembly/04-polish/01-PilonCMA/HiC_scaffold_27_hap2.fa
##contig=<ID=HiC_scaffold_27_hap2,length=239>
##contig=<ID=HiC_scaffold_27_hap2,length=235>
##contig=<ID=HiC_scaffold_27_hap2,length=240>
##contig=<ID=HiC_scaffold_27_hap2,length=236>
##contig=<ID=HiC_scaffold_27_hap2,length=235>

...

hkaspersen · 2024-10-16T07:58:35Z

I am sorry to ask but would it be possible to 1. add the full contig name to the polished fasta headers. and also get the consensus polished sequence instead of many contigs ?

Thank you in advance. it is quite basic and would be valuable to PILON. Quentin.


Writing HiC_scaffold_27_hap2:20462918-20463189 changes to polished_assembly_hap2_scf27.changes
Writing updated HiC_scaffold_27_hap2_pilon to polished_assembly_hap2_scf27.fasta
Writing HiC_scaffold_27_hap2:20463007-20463247 VCF to polished_assembly_hap2_scf27.vcf
Writing HiC_scaffold_27_hap2:20463007-20463247 changes to polished_assembly_hap2_scf27.changes
Writing updated HiC_scaffold_27_hap2_pilon to polished_assembly_hap2_scf27.fasta
Writing HiC_scaffold_27_hap2:20463060-20463301 VCF to polished_assembly_hap2_scf27.vcf
Writing HiC_scaffold_27_hap2:20463060-20463301 changes to polished_assembly_hap2_scf27.changes
Writing updated HiC_scaffold_27_hap2_pilon to polished_assembly_hap2_scf27.fasta
Writing HiC_scaffold_27_hap2:20463108-20463348 VCF to polished_assembly_hap2_scf27.vcf
Writing HiC_scaffold_27_hap2:20463108-20463348 changes to polished_assembly_hap2_scf27.changes
Writing updated HiC_scaffold_27_hap2_pilon to polished_assembly_hap2_scf27.fasta
Writing HiC_scaffold_27_hap2:20464881-20465112 VCF to polished_assembly_hap2_scf27.vcf
Writing HiC_scaffold_27_hap2:20464881-20465112 changes to polished_assembly_hap2_scf27.changes
Writing updated HiC_scaffold_27_hap2_pilon to polished_assembly_hap2_scf27.fasta

>HiC_scaffold_27_hap2_pilon
ACATGCATGCAATGGGCCTAATGTGCATGTTGTGAGAAAACCAGTCATATGTACGTAATGTAGCTCTGTGTATGTGGCTG
TGAAATGTACGCGATGCCAGCTAGGCAGCAAAAATTGTTAAATTCTGTGGCAGGAGGCTGTTTATCTGTTTCAGGTGGGA
AATAATAATAATCATAACCAGATGTGAGAATATTTTAAACATTTGATACACCAATGAGCTGGACGGCTGTAATTATAACA
>HiC_scaffold_27_hap2_pilon
TGTGTGACTGTGACCAGACCTGTTATCTCTCCTCTTTTGCTCTCCCCTCTCCTGTTCTCTCTCCCACTCTCCCCCTTTCT
CTGTCTCTGTCGAGCTACACATGTCGTTCCTGAGCTGCCATTGATTCAGACCCCCTCTGCCCTCTGGACCTGCCTGACTC
ATCCTGGTGCCCCGCTTCTGGTTGGAGATCTCGTCACATGGATGTCCCGTGTGTCTCTTTGGGATATGTGGGTCC
>HiC_scaffold_27_hap2_pilon
CTATAAAGTTTGGGGATTGGGAAATTATTCATGTTTAATTTATCTGTTAATTTTAACCTTTGAACAGACGTTTTATAATT

...


##fileformat=VCFv4.1
##fileDate=20241016
##source="Pilon version 1.24 Thu Jan 28 13:00:45 2021 -0500"
##PILON="--genome HiC_scaffold_27_hap2.fa --frags short_reads_scaffold27.bam --bam /project/lt200308-agbsci/01-catfish_assembly/04-polish/01-PilonCMA/03-QV_mercury/asm.Hap2.polish3.renamed.fa.hifireads.sorted.bam --bam HiC_scaffold_27_hap2.ont_aligned.sorted.bam --targets /project/lt200308-agbsci/01-catfish_assembly/04-polish/01-PilonCMA/asm.Hap2.polish3.renamed_only.bed.chr27.targets.bed --output polished_assembly_hap2_scf27 --fix all --vcf --diploid --minmq 30 --minqual 30 --changes --tracks --verbose"
##reference=file:/lustrefs/disk/project/lt200308-agbsci/01-catfish_assembly/04-polish/01-PilonCMA/HiC_scaffold_27_hap2.fa
##contig=<ID=HiC_scaffold_27_hap2,length=239>
##contig=<ID=HiC_scaffold_27_hap2,length=235>
##contig=<ID=HiC_scaffold_27_hap2,length=240>
##contig=<ID=HiC_scaffold_27_hap2,length=236>
##contig=<ID=HiC_scaffold_27_hap2,length=235>

...

Dear @Isoris,
I think you posted this comment on the wrong page?
I assume this was supposed to be on the pilon github page?

Isoris · 2024-10-16T08:18:23Z

Hello Yes! I just wonder if it would be possible to directly add a new option in pilon to have better headers but also to call a consensus like with bcftools like.if we choose --diploid we could call.with heterozygosity -H 1 in bcftools.

But it seems that the Pilon software is.not maintained anymore, right?

In my case I could find a way to polish my genome so it is still a very useful too.

Thank you and sorry for posting in the wrong repo.

hkaspersen · 2024-10-17T06:59:57Z

No worries!
I actually stopped using Pilon for my polishing because recently it has been found that it can introduce errors into the assembly.
Have a look here: https://rrwick.github.io/2023/05/15/short-read-polishing-short-read-assemblies.html
Depending on your assembly method, polishing a short-read assembly may not be optimal.

Isoris · 2024-10-17T13:43:16Z

In my case. I first created a meryl database of Kmers from HIFI reads and from Illumina reads. Then I filtered it for the kmers that appears at least 2 times. After that I ran Merqury and got the bed files of the assembly errors based on the k-mers missing in the assembly but present in the meryl database. Then I used this bed of positions of assemblies errors to provide the --targets to PILON. My QV increased from 41 to 52 in my eukaryote genome ( 1Gb ) in the two haplotypes. I am now comparing the results contig by contig. It seems that targeted PILON is much better than racon and other tools because it can be more haplotype aware? But I am not sure it seems that it only correct small scale SVs. Maybe It would need another tool to increase QV further to 60? My data was 12 X HiFi 33 X nanopore 36 X HiC 36 X illumina. At first the genome was at QV 31. Then I used NextPolish2 (3 times) and the QV increased to 40. Then Pilon on each separate scaffolds with --targets increased it from 40 to 52. Do you have any recommendations on what's next ? Or I should simply stop here ? Thank you.

…

On Thu, Oct 17, 2024, 2:00 PM Håkon Kaspersen ***@***.***> wrote: No worries! I actually stopped using Pilon for my polishing because recently it has been found that it can introduce errors into the assembly. Have a look here: https://rrwick.github.io/2023/05/15/short-read-polishing-short-read-assemblies.html Depending on your assembly method, polishing a short-read assembly may not be optimal. — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASYS5TC3U2MINGGUBFCMTI3Z35OAFAVCNFSM6AAAAABP75B3MKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJYG4YTCMRWHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

hkaspersen · 2024-10-17T13:53:36Z

That seems like a reasonable solution, I was not aware of the possibility of targeted correction like that!
I don't think I am the right person to ask, it all depends on your plans and your goals!

Isoris · 2024-10-17T14:01:45Z

But now I am left with 1000 contigs in addition to the 27 chromosomes. Even after Scaffolding with HiC there is still 500+ gaps in the 27 chromosomes in addition to the gaps in the 1000 unplaced contigs. Yes the target of pilon is amazing but I run it on separate scaffolds and maybe It is much better to directly use it on the full genome in one time. But also I created many jobs with 25 gb of memory for --frags R1 short reads + 25 gb of RAM memory for R2 short reads + 20 Gb for HIFI alignment bam --bam + 20 gb for ONT reads --bam + 1 GB / Mb of genome so if we calculate It would require at least 1 TB of Ram in one single run.. so it's impossible to run it in one time... Maybe the authors of PILON could make it more easy and split the fasta based on the target and then run the genome polishing separately on each scaffold to minimize rhe memory footprint?? It would be a great improvement if it was possible.

…

On Thu, Oct 17, 2024, 8:53 PM Håkon Kaspersen ***@***.***> wrote: That seems like a reasonable solution, I was not aware of the possibility of targeted correction like that! I don't think I am the right person to ask, it all depends on your plans and your goals! — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASYS5TDWA5R3LZUDF7H3QODZ366PNAVCNFSM6AAAAABP75B3MKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJZGYYTQMBZGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Isoris · 2024-10-17T14:13:53Z

I agree with you that Pilon is not good to polish the whole thing. But when used in a targeted way as a second polished after a first polishing tool it seems to work somehow.

…

On Thu, Oct 17, 2024, 8:53 PM Håkon Kaspersen ***@***.***> wrote: That seems like a reasonable solution, I was not aware of the possibility of targeted correction like that! I don't think I am the right person to ask, it all depends on your plans and your goals! — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ASYS5TDWA5R3LZUDF7H3QODZ366PNAVCNFSM6AAAAABP75B3MKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJZGYYTQMBZGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

hkaspersen closed this as completed Dec 19, 2023

hkaspersen reopened this Dec 19, 2023

hkaspersen linked a pull request Jan 3, 2024 that will close this issue

Finish draft assembly track #12

Merged

hkaspersen closed this as completed Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent pilon from changing contig names after polishing #7

Prevent pilon from changing contig names after polishing #7

hkaspersen commented Aug 11, 2022 •

edited

Loading

hkaspersen commented Apr 24, 2024

Isoris commented Oct 15, 2024 •

edited

Loading

hkaspersen commented Oct 16, 2024 •

edited

Loading

Isoris commented Oct 16, 2024

hkaspersen commented Oct 17, 2024

Isoris commented Oct 17, 2024 via email

hkaspersen commented Oct 17, 2024

Isoris commented Oct 17, 2024 via email

Isoris commented Oct 17, 2024 via email

Prevent pilon from changing contig names after polishing #7

Prevent pilon from changing contig names after polishing #7

Comments

hkaspersen commented Aug 11, 2022 • edited Loading

hkaspersen commented Apr 24, 2024

Isoris commented Oct 15, 2024 • edited Loading

hkaspersen commented Oct 16, 2024 • edited Loading

Isoris commented Oct 16, 2024

hkaspersen commented Oct 17, 2024

Isoris commented Oct 17, 2024 via email

hkaspersen commented Oct 17, 2024

Isoris commented Oct 17, 2024 via email

Isoris commented Oct 17, 2024 via email

hkaspersen commented Aug 11, 2022 •

edited

Loading

Isoris commented Oct 15, 2024 •

edited

Loading

hkaspersen commented Oct 16, 2024 •

edited

Loading