Sequence-scripts

Extract metagenomic reads from a particular species or strain using Bowtie2, samtools, and Picard.

Usage

perl run_bowtie2_subtract_mapped_reads_with_picard directory/containing/metagenomic/samples/only/*

Remove human DNA from metagenomic samples to comply with HIPAA regulations (useful when uploading raw data to SRA/ENA)

Usage

perl run_kneaddata_only_human_removal.pl directory/containing/metagenomic/samples/only/*

Cut DNA sequence at a user defined position (e.g. position 63789), and paste the sequence that ranges from start to 63789 to the end of that same sequence. Useful when generating plasmid sequence comparison plots with Easyfig or Geneious. People often cut right before the start of the plasmid replicon gene.

Usage

perl cut_and_paste_seq.pl -cut 63789 -strand <forward/reverse> -seq Plasmid_DNA.fasta > Plasmid_DNA_new_sequence_order.fasta

Determine the core genome size of a given dataset. The script is assessing sorted BAM files, and the mapping reference in order to estimate the core genome size for a certain depth of coverage. The script needs samtools, bedtools, and awk.

Usage

perl estimate_core_genome_from_bam.pl -bam /path/to/bam/files -genome mapping/reference/fasta/file -depth 10

Calculate simple genome assembly stats including N50, number of contigs, total bases, and G+C content

Usage

perl calc_N50_GC_genomesize.pl -i genomeAssembly.fasta -o output.stats

Screen raw reads for contamination and get an impression of the bacterial composition of your sample(s). Script is using Kraken for determining species composition, KronaTools for generating multi-layered pie charts, and conversion script metaphlan2krona.py

Usage

bash Kraken_krona_fastq.bash

Trimming raw reads and remove sequencing adapters using fastq-mcf

Usage

perl run_fastqMcf.pl directory/containing/raw/reads/only/*

Map trimmed reads to contaminant (e.g. PhiX) database and subtract unmapped reads for downstream analysis using Bowtie2, SAMtools and bam2fastq

Usage

perl run_bowtie2_subtract_unmapped_reads.pl directory/containing/trimmed/reads/only/*

Assembling the trimmed and contaminant free reads using SPAdes

Usage

perl run_SPAdes.pl directory/containing/trimmed/and/virus/free/reads/only/*

Perform the previous three steps using one Shell script. It runs fastq-MCF, Bowtie2, SAMtools, bam2fastq and SPAdes assembler in batch

Usage

bash fastqMcf-bowtie2-SPAdes.bash

Calculating average K-mer coverage of SPAdes assembly, from your highest K value (usually k=127)

Usage

perl Calc_coverage_from_spades_assembly.pl <scaffolds.fasta>

Correcting PacBio data with Illumina reads by means of Bowtie2 and Pilon

Usage

perl run_bowtie2_and_pilon.pl <PacBio-unitigs.fasta> path/to/trimmed/Illumina/reads/*

Running kSNP version 2 using assembled microbial genomes (in fasta format)

Usage

perl run_kSNP.pl full/path/containing/the/input/files projectname

IonTorrent scripts

Assemble Single-End (SE) IonTorrent reads with SPAdes

Usage

perl run_IonT_SPAdes.pl directory/containing/trimmed/SE-reads/only/*

Bash workflow script for trimming SE IonTorrent reads, assembling trimmed reads, and quality check contigs using BUSCO 2.0

Usage

bash IonTorrent_SE_run.bash

454 scripts

Quickly assess binary 454 Standard Flowgram Format (SFF) files from a 454 sequencing run. This simple script counts amount of reads and bases. Script needs SFFinfo

Usage

perl BaseCount_sequenceCount_from_sff_file.pl /directory/to/sff/files

Eukaryotic part

Generate EVM suitable GFF3 files from MAKER de novo gene prediction GFF

Usage

perl gff3_2_gff3EVM.pl <maker_protein_genes.gff3>

Make EVM data compatible with Gbrowse

Usage

perl fix_evm_for_gbrowse.pl < inputfile.gff3

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
BaseCount_sequenceCount_from_sff_file.pl		BaseCount_sequenceCount_from_sff_file.pl
Calc_coverage_from_spades_assembly.pl		Calc_coverage_from_spades_assembly.pl
IonTorrent_SE_run.bash		IonTorrent_SE_run.bash
Kraken_krona_fastq.bash		Kraken_krona_fastq.bash
README.md		README.md
adapters.fasta		adapters.fasta
calc_N50_GC_genomesize.pl		calc_N50_GC_genomesize.pl
contig_size_select.pl		contig_size_select.pl
cut_and_paste_seq.pl		cut_and_paste_seq.pl
estimate_core_genome_from_bam.pl		estimate_core_genome_from_bam.pl
fastqMcf-bowtie2-SPAdes.bash		fastqMcf-bowtie2-SPAdes.bash
fix_evm_for_gbrowse.pl		fix_evm_for_gbrowse.pl
gff3_2_gff3EVM.pl		gff3_2_gff3EVM.pl
run_IonT_SPAdes.pl		run_IonT_SPAdes.pl
run_SPAdes.pl		run_SPAdes.pl
run_bowtie2_and_pilon.pl		run_bowtie2_and_pilon.pl
run_bowtie2_subtract_mapped_reads_with_picard.pl		run_bowtie2_subtract_mapped_reads_with_picard.pl
run_bowtie2_subtract_unmapped_reads.pl		run_bowtie2_subtract_unmapped_reads.pl
run_fastqMcf.pl		run_fastqMcf.pl
run_kSNP.pl		run_kSNP.pl
run_kneaddata_only_human_removal.pl		run_kneaddata_only_human_removal.pl

alarawms/Sequence-scripts

Folders and files

Latest commit

History

Repository files navigation

Sequence-scripts

Extract metagenomic reads from a particular species or strain using Bowtie2, samtools, and Picard.

Usage

Remove human DNA from metagenomic samples to comply with HIPAA regulations (useful when uploading raw data to SRA/ENA)

Usage

Usage

Determine the core genome size of a given dataset. The script is assessing sorted BAM files, and the mapping reference in order to estimate the core genome size for a certain depth of coverage. The script needs samtools, bedtools, and awk.

Usage

Calculate simple genome assembly stats including N50, number of contigs, total bases, and G+C content

Usage

Screen raw reads for contamination and get an impression of the bacterial composition of your sample(s). Script is using Kraken for determining species composition, KronaTools for generating multi-layered pie charts, and conversion script metaphlan2krona.py

Usage

Trimming raw reads and remove sequencing adapters using fastq-mcf

Usage

Map trimmed reads to contaminant (e.g. PhiX) database and subtract unmapped reads for downstream analysis using Bowtie2, SAMtools and bam2fastq

Usage

Assembling the trimmed and contaminant free reads using SPAdes

Usage

Perform the previous three steps using one Shell script. It runs fastq-MCF, Bowtie2, SAMtools, bam2fastq and SPAdes assembler in batch

Usage

Calculating average K-mer coverage of SPAdes assembly, from your highest K value (usually k=127)

Usage

Correcting PacBio data with Illumina reads by means of Bowtie2 and Pilon

Usage

Running kSNP version 2 using assembled microbial genomes (in fasta format)

Usage

IonTorrent scripts

Assemble Single-End (SE) IonTorrent reads with SPAdes

Usage

Bash workflow script for trimming SE IonTorrent reads, assembling trimmed reads, and quality check contigs using BUSCO 2.0

Usage

454 scripts

Quickly assess binary 454 Standard Flowgram Format (SFF) files from a 454 sequencing run. This simple script counts amount of reads and bases. Script needs SFFinfo

Usage

Eukaryotic part

Generate EVM suitable GFF3 files from MAKER de novo gene prediction GFF

Usage

Make EVM data compatible with Gbrowse

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages