SunflowerMutagenesisAssembly

High level summary: Genome Assembly - HiFi + OmniC: *Software used in this pipeline was often installed via conda to an environment, then loaded with "source /mmfs1/home/USER.NAME/miniconda3/bin/activate envNameHere"

Received raw data - 2 PacBio Revio HiFi SMRT cells + R1/R2 OmniC
Merge/concatenate 2 SMRT cell data into one fq.gz file.
Use HifiAdaptFilt to ensure that no adapters/contaminants remain (they didn't). bash hifiadapterfilt.sh -p mergedHiFiReads.fq.gz -t #threads
Use meryl then genomescope2.0 to visualize k-mer counts and get homo/het, haplotype, repeat, and error stats meryl-1.4.1/bin/meryl count k=21_OR_31 threads=#threads memory=#gbRAM output hifi_kmers_k21_OR_31.meryl mergedHiFiReads.fq.gz Rscript genomescope2.0/genomescope.R -i input_merylHistogram_file -o output_dir -p ploidy/2 -k kmer_length/31_OR_21 Then look at the resulting plots and summary.txt file! *My understanding is k21 is standard/common especially for Illumnina, but VGP pipeline suggested 31 for HiFi. We tried both.
Run hifiasm using merged hifi reads (fq.gz) and both R1/R2 omniC (fq.gz) hifiasm -o outputLabel.asm -t #threads --h1 hic/omnicR1.fq.gz --h2 hic/omnicR2.fq.gz mergedHiFiReads.fq.gz *p_ctg means polished contigs, and is output for primary, hap1 and hap2 when in HiC-phased mode
Run gfastats to get contig statistics and convert .gfa to .fasta format gfastats/build/bin/gfastats primary_OR_hap1_OR_hap2.p_ctg.gfa --threads #threads -o primary_OR_hap1_OR_hap2.p_ctg.fasta Then look at the statistics in the accompanying output - #contigs/scaffolds, length, N50, etc.
Run BUSCO on contigs busco -i primary_OR_hap1_OR_hap2.p_ctg.fasta -m genome -c #threads -o outputDirectory -l eudicots_odb10 *eudicots is the closest lineage to sunflower for BUSCO gene completeness analysis
Run Merqury on contigs merqury/merqury.sh hifi_kmers_k21_OR_31.meryl primary_OR_hap1_OR_hap2.p_ctg.fasta outputDirectory
Generate contact map for contigs

Index the fasta contigs file

samtools faidx primary_OR_hap1_OR_hap2.p_ctg.fasta

Print first two columns of index into genome file

cut -f1,2 primary_OR_hap1_OR_hap2.p_ctg.fasta.fai > primary_OR_hap1_OR_hap2.p_ctg.genome

Index the fasta

bwa-mem2/bwa-mem2 index primary_OR_hap1_OR_hap2.p_ctg.fasta

Align fasta to OmniC data

bwa-mem2/bwa-mem2 mem -5SP -T0 -t#threads primary_OR_hap1_OR_hap2.p_ctg.fasta OmniCR1.fq.gz OmniCR2.fq.gz o primary_OR_hap1_OR_hap2_alignedToOmniC.sam

Find ligation junctions/parse

pairtools parse --min-mapq 40 --walks-policy 5unique --max-inter-align-gap 30 -nproc-in #threads -nproc-out #threads -chroms-path primary_OR_hap1_OR_hap2.p_ctg.genome primary_OR_hap1_OR_hap2_alignedToOmniC.sam > primary_OR_hap1_OR_hap2_alignedToOmniC_parsed.pairsam

Sort the parsed pairs

pairtools sort --tmpdir=/path/to/tempdir --nproc #threads primary_OR_hap1_OR_hap2_alignedToOmniC_parsed.pairsam > primary_OR_hap1_OR_hap2_alignedToOmniC_sorted.pairsam

Mark duplicates

pairtools dedup --nproc-in #threads --nproc-out #threads --mark-dups --output-stats primary_OR_hap1_OR_hap2_alignedToOmniC_pairtools_stats.txt --output primary_OR_hap1_OR_hap2_alignedToOmniC_dedup.pairsam primary_OR_hap1_OR_hap2_alignedToOmniC_sorted.pairsam

Split pairsam into two files

pairtools split --nproc-in #threads --nproc-out #threads --output-pairs primary_OR_hap1_OR_hap2_alignedToOmniC_mapped.pairs --output-sam primary_OR_hap1_OR_hap2_alignedToOmniC_unsorted.bam primary_OR_hap1_OR_hap2_alignedToOmniC_dedup.pairsam

Generate the final bam file

samtools sort -@#threads -T /path/to/tempdir/temp primary_OR_hap1_OR_hap2_mappedToOmniC.PT.bam primary_OR_hap1_OR_hap2_alignedToOmniC_unsorted.bam

Index the final bam file

samtools index primary_OR_hap1_OR_hap2_mappedToOmniC.PT.bam

Generate contact map data

samtools view -h primary_OR_hap1_OR_hap2_mappedToOmniC.PT.bam | PretextMap -o primary_OR_hap1_OR_hap2_omnic_pretext --sortby nosort --mapq 10

Visualize contact map

PrextextSnapshot -m primary_OR_hap1_OR_hap2_omnic_pretext -f "png"
Scaffold with YaHS yahs/yahs primary_OR_hap1_OR_hap2.p_ctg.fasta primary_OR_hap1_OR_hap2_mappedToOmniC.PT.bam *Purging duplicates isn't necessary before this because of the phasing with OmniC
Run BUSCO on scaffolds Just like above, but with scaffold fasta
Run Merqury on scaffolds Just like above, but with scaffold fasta
Generate contact map for scaffolds Just like above, but with scaffold fasta
Use JupiterPlot to compare scaffolds to a published reference genome The first 17 scaffolds should be MUCH larger than the rest, corresponding to chromosomes. Use them as input to JupiterPlot: jupiter name=OmniCprimary_OR_hap1_OR_hap2 ref=reference_17scaffs.fasta fa=primary_OR_hap1_OR_hap2.fasta t=#threads m=0 ng=0 maxScaff=17 labels=both *m is minimum reference chromosome size. Could use this instead of trimming ref fasta, e.g. 10-100Mb. ng limits to percent of ref genome size to map, disable this. maxScaff limits number of scaffs, can use 17 to exclude non-Chr ones. labels=both applies labels to both ref and query fasta *Do NOT use the resulting jpg. It was much lower quality than svg.
Relabel chromsomes according to links with reference. Chr01-17, Chr00c000 for scaffolds, moving Scaff18 to Chr00c001.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
BUSCO_HA412HOv2cpmt.pbs		BUSCO_HA412HOv2cpmt.pbs
BUSCO_HA412HOv2cpmtChromosomes.pbs		BUSCO_HA412HOv2cpmtChromosomes.pbs
BUSCO_PSC8.pbs		BUSCO_PSC8.pbs
BUSCO_PSC8chromosomes.pbs		BUSCO_PSC8chromosomes.pbs
BUSCO_hap12primarychromosomes.pbs		BUSCO_hap12primarychromosomes.pbs
README.md		README.md
alignment_scaff_hap1.pbs		alignment_scaff_hap1.pbs
alignment_scaff_hap2.pbs		alignment_scaff_hap2.pbs
alignment_triobinning_hap1.pbs		alignment_triobinning_hap1.pbs
alignment_triobinning_hap2.pbs		alignment_triobinning_hap2.pbs
busco_embryophtya_hap2.pbs		busco_embryophtya_hap2.pbs
busco_embryophyta_hap1.pbs		busco_embryophyta_hap1.pbs
busco_eudicots_hap1.pbs		busco_eudicots_hap1.pbs
busco_eudicots_hap2.pbs		busco_eudicots_hap2.pbs
busco_omnic_embryophyta.pbs		busco_omnic_embryophyta.pbs
busco_primaryScaffolds.pbs		busco_primaryScaffolds.pbs
busco_primcontigs.pbs		busco_primcontigs.pbs
busco_trio_hap1_scaffolds.pbs		busco_trio_hap1_scaffolds.pbs
busco_trio_hap2_scaffolds.pbs		busco_trio_hap2_scaffolds.pbs
busco_visual.pbs		busco_visual.pbs
bwaIndex.pbs		bwaIndex.pbs
bwa_alignment.pbs		bwa_alignment.pbs
bwa_alignment_scaffolding.pbs		bwa_alignment_scaffolding.pbs
bwa_indexing.pbs		bwa_indexing.pbs
bwa_indexing_hap1.pbs		bwa_indexing_hap1.pbs
bwa_indexing_hap2.pbs		bwa_indexing_hap2.pbs
concatGzip.pbs		concatGzip.pbs
create_buscoplot.pbs		create_buscoplot.pbs
create_buscoplot_embryophyta.pbs		create_buscoplot_embryophyta.pbs
create_plot_embryophyta_hap1.pbs		create_plot_embryophyta_hap1.pbs
create_plot_embryophyta_hap2.pbs		create_plot_embryophyta_hap2.pbs
create_plot_eudicots_hap1.pbs		create_plot_eudicots_hap1.pbs
create_plot_eudicots_hap2.pbs		create_plot_eudicots_hap2.pbs
create_plot_withExplore.pbs		create_plot_withExplore.pbs
create_plot_withSearch.pbs		create_plot_withSearch.pbs
fastpHomoRedGreen.pbs		fastpHomoRedGreen.pbs
geneMappingFromHA412HOv2.pbs		geneMappingFromHA412HOv2.pbs
gfa2fa_OmniC_p_ctg.pbs		gfa2fa_OmniC_p_ctg.pbs
gfa2fa_omnic_hifiasm.pbs		gfa2fa_omnic_hifiasm.pbs
gfastats_hifi_parental_gfa.pbs		gfastats_hifi_parental_gfa.pbs
gfastats_hifi_parental_gfaTEMP.pbs		gfastats_hifi_parental_gfaTEMP.pbs
gfastats_hifi_parental_scaffolds.pbs		gfastats_hifi_parental_scaffolds.pbs
hifiadapterfilt.pbs		hifiadapterfilt.pbs
hifiasmMutagenesisTrioBinning.pbs		hifiasmMutagenesisTrioBinning.pbs
hifiasm_omnic.pbs		hifiasm_omnic.pbs
jupiterHap1.pbs		jupiterHap1.pbs
jupiterHap1Unfilt.pbs		jupiterHap1Unfilt.pbs
jupiterHap1v2.pbs		jupiterHap1v2.pbs
jupiterHap2.pbs		jupiterHap2.pbs
jupiterHap2Unfilt.pbs		jupiterHap2Unfilt.pbs
jupiterPrimary.pbs		jupiterPrimary.pbs
jupiterPrimaryHap2Chr.pbs		jupiterPrimaryHap2Chr.pbs
jupiterPrimaryUnfilt.pbs		jupiterPrimaryUnfilt.pbs
makevcf.pbs		makevcf.pbs
mergePacBioRevio2SMRTcellFastqs.pbs		mergePacBioRevio2SMRTcellFastqs.pbs
merqury_primaryContigs.pbs		merqury_primaryContigs.pbs
merqury_primaryScaffolds.pbs		merqury_primaryScaffolds.pbs
merqury_trio_contigs.pbs		merqury_trio_contigs.pbs
merqury_trio_scaffolds.pbs		merqury_trio_scaffolds.pbs
merylRawHifi.pbs		merylRawHifi.pbs
meryl_parents.pbs		meryl_parents.pbs
minimap2assemblies.pbs		minimap2assemblies.pbs
omnic_gfastats_h1.pbs		omnic_gfastats_h1.pbs
omnic_gfastats_h2.pbs		omnic_gfastats_h2.pbs
omnic_gfastats_primcontigs.pbs		omnic_gfastats_primcontigs.pbs
omnic_gfastats_primscaffolds.pbs		omnic_gfastats_primscaffolds.pbs
pretextMap.pbs		pretextMap.pbs
pretextMap_hap1.pbs		pretextMap_hap1.pbs
pretextMap_hap2.pbs		pretextMap_hap2.pbs
pretextMap_scaffolding.pbs		pretextMap_scaffolding.pbs
pretextSnapshot.pbs		pretextSnapshot.pbs
pretextSnapshot_hap1.pbs		pretextSnapshot_hap1.pbs
pretextSnapshot_hap2.pbs		pretextSnapshot_hap2.pbs
pretextSnapshot_scaffolding.pbs		pretextSnapshot_scaffolding.pbs
quast_haplotype1.pbs		quast_haplotype1.pbs
quast_haplotype2.pbs		quast_haplotype2.pbs
quast_primarycontigs.pbs		quast_primarycontigs.pbs
samtools_index.pbs		samtools_index.pbs
samtools_indexing.pbs		samtools_indexing.pbs
samtools_indexing_hap1.pbs		samtools_indexing_hap1.pbs
samtools_indexing_hap2.pbs		samtools_indexing_hap2.pbs
scaffolding_omnic.pbs		scaffolding_omnic.pbs
tblastn-hap1-script.pbs		tblastn-hap1-script.pbs
tblastn-hap2-script.pbs		tblastn-hap2-script.pbs
tblastn-primary-script.pbs		tblastn-primary-script.pbs
tidk_scaffolding_explore.pbs		tidk_scaffolding_explore.pbs
tidk_scaffolding_search.pbs		tidk_scaffolding_search.pbs
tidk_search_hap1.pbs		tidk_search_hap1.pbs
tidk_search_hap2.pbs		tidk_search_hap2.pbs
unBzipThenGzipFastq.pbs		unBzipThenGzipFastq.pbs
unzip_then_gzip.pbs		unzip_then_gzip.pbs
yahs_scaffolding_hap1.pbs		yahs_scaffolding_hap1.pbs
yahs_scaffolding_hap2.pbs		yahs_scaffolding_hap2.pbs
yakHomo.pbs		yakHomo.pbs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SunflowerMutagenesisAssembly

Index the fasta contigs file

Print first two columns of index into genome file

Index the fasta

Align fasta to OmniC data

Find ligation junctions/parse

Sort the parsed pairs

Mark duplicates

Split pairsam into two files

Generate the final bam file

Index the final bam file

Generate contact map data

Visualize contact map

About

Uh oh!

Releases

Packages

Languages

BrianSmart/SunflowerMutagenesisAssembly

Folders and files

Latest commit

History

Repository files navigation

SunflowerMutagenesisAssembly

Index the fasta contigs file

Print first two columns of index into genome file

Index the fasta

Align fasta to OmniC data

Find ligation junctions/parse

Sort the parsed pairs

Mark duplicates

Split pairsam into two files

Generate the final bam file

Index the final bam file

Generate contact map data

Visualize contact map

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages