perl run_bowtie2_subtract_mapped_reads_with_picard directory/containing/metagenomic/samples/only/*
Remove human DNA from metagenomic samples to comply with HIPAA regulations (useful when uploading raw data to SRA/ENA)
perl run_kneaddata_only_human_removal.pl directory/containing/metagenomic/samples/only/*
Cut DNA sequence at a user defined position (e.g. position 63789), and paste the sequence that ranges from start to 63789 to the end of that same sequence. Useful when generating plasmid sequence comparison plots with Easyfig or Geneious. People often cut right before the start of the plasmid replicon gene.
perl cut_and_paste_seq.pl -cut 63789 -strand <forward/reverse> -seq Plasmid_DNA.fasta > Plasmid_DNA_new_sequence_order.fasta
Determine the core genome size of a given dataset. The script is assessing sorted BAM files, and the mapping reference in order to estimate the core genome size for a certain depth of coverage. The script needs samtools, bedtools, and awk.
perl estimate_core_genome_from_bam.pl -bam /path/to/bam/files -genome mapping/reference/fasta/file -depth 10
Calculate simple genome assembly stats including N50, number of contigs, total bases, and G+C content
perl calc_N50_GC_genomesize.pl -i genomeAssembly.fasta -o output.stats
Screen raw reads for contamination and get an impression of the bacterial composition of your sample(s). Script is using Kraken for determining species composition, KronaTools for generating multi-layered pie charts, and conversion script metaphlan2krona.py
bash Kraken_krona_fastq.bash
Trimming raw reads and remove sequencing adapters using fastq-mcf
perl run_fastqMcf.pl directory/containing/raw/reads/only/*
Map trimmed reads to contaminant (e.g. PhiX) database and subtract unmapped reads for downstream analysis using Bowtie2, SAMtools and bam2fastq
perl run_bowtie2_subtract_unmapped_reads.pl directory/containing/trimmed/reads/only/*
Assembling the trimmed and contaminant free reads using SPAdes
perl run_SPAdes.pl directory/containing/trimmed/and/virus/free/reads/only/*
Perform the previous three steps using one Shell script. It runs fastq-MCF, Bowtie2, SAMtools, bam2fastq and SPAdes assembler in batch
bash fastqMcf-bowtie2-SPAdes.bash
perl Calc_coverage_from_spades_assembly.pl <scaffolds.fasta>
perl run_bowtie2_and_pilon.pl <PacBio-unitigs.fasta> path/to/trimmed/Illumina/reads/*
Running kSNP version 2 using assembled microbial genomes (in fasta format)
perl run_kSNP.pl full/path/containing/the/input/files projectname
Assemble Single-End (SE) IonTorrent reads with SPAdes
perl run_IonT_SPAdes.pl directory/containing/trimmed/SE-reads/only/*
Bash workflow script for trimming SE IonTorrent reads, assembling trimmed reads, and quality check contigs using BUSCO 2.0
bash IonTorrent_SE_run.bash
Quickly assess binary 454 Standard Flowgram Format (SFF) files from a 454 sequencing run. This simple script counts amount of reads and bases. Script needs SFFinfo
perl BaseCount_sequenceCount_from_sff_file.pl /directory/to/sff/files
perl gff3_2_gff3EVM.pl <maker_protein_genes.gff3>
Make EVM data compatible with Gbrowse
perl fix_evm_for_gbrowse.pl < inputfile.gff3