Skip to content

alarawms/Sequence-scripts

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sequence-scripts

Extract metagenomic reads from a particular species or strain using Bowtie2, samtools, and Picard.

Usage

perl run_bowtie2_subtract_mapped_reads_with_picard directory/containing/metagenomic/samples/only/*

Remove human DNA from metagenomic samples to comply with HIPAA regulations (useful when uploading raw data to SRA/ENA)

Usage

perl run_kneaddata_only_human_removal.pl directory/containing/metagenomic/samples/only/*

Cut DNA sequence at a user defined position (e.g. position 63789), and paste the sequence that ranges from start to 63789 to the end of that same sequence. Useful when generating plasmid sequence comparison plots with Easyfig or Geneious. People often cut right before the start of the plasmid replicon gene.

Usage

perl cut_and_paste_seq.pl -cut 63789 -strand <forward/reverse> -seq Plasmid_DNA.fasta > Plasmid_DNA_new_sequence_order.fasta

Determine the core genome size of a given dataset. The script is assessing sorted BAM files, and the mapping reference in order to estimate the core genome size for a certain depth of coverage. The script needs samtools, bedtools, and awk.

Usage

perl estimate_core_genome_from_bam.pl -bam /path/to/bam/files -genome mapping/reference/fasta/file -depth 10

Calculate simple genome assembly stats including N50, number of contigs, total bases, and G+C content

Usage

perl calc_N50_GC_genomesize.pl -i genomeAssembly.fasta -o output.stats

Screen raw reads for contamination and get an impression of the bacterial composition of your sample(s). Script is using Kraken for determining species composition, KronaTools for generating multi-layered pie charts, and conversion script metaphlan2krona.py

Usage

bash Kraken_krona_fastq.bash

Trimming raw reads and remove sequencing adapters using fastq-mcf

Usage

perl run_fastqMcf.pl directory/containing/raw/reads/only/*

Map trimmed reads to contaminant (e.g. PhiX) database and subtract unmapped reads for downstream analysis using Bowtie2, SAMtools and bam2fastq

Usage

perl run_bowtie2_subtract_unmapped_reads.pl directory/containing/trimmed/reads/only/*

Assembling the trimmed and contaminant free reads using SPAdes

Usage

perl run_SPAdes.pl directory/containing/trimmed/and/virus/free/reads/only/*

Perform the previous three steps using one Shell script. It runs fastq-MCF, Bowtie2, SAMtools, bam2fastq and SPAdes assembler in batch

Usage

bash fastqMcf-bowtie2-SPAdes.bash

Calculating average K-mer coverage of SPAdes assembly, from your highest K value (usually k=127)

Usage

perl Calc_coverage_from_spades_assembly.pl <scaffolds.fasta>

Correcting PacBio data with Illumina reads by means of Bowtie2 and Pilon

Usage

perl run_bowtie2_and_pilon.pl <PacBio-unitigs.fasta> path/to/trimmed/Illumina/reads/*

Running kSNP version 2 using assembled microbial genomes (in fasta format)

Usage

perl run_kSNP.pl full/path/containing/the/input/files projectname

IonTorrent scripts

Assemble Single-End (SE) IonTorrent reads with SPAdes

Usage

perl run_IonT_SPAdes.pl directory/containing/trimmed/SE-reads/only/*

Bash workflow script for trimming SE IonTorrent reads, assembling trimmed reads, and quality check contigs using BUSCO 2.0

Usage

bash IonTorrent_SE_run.bash

454 scripts

Quickly assess binary 454 Standard Flowgram Format (SFF) files from a 454 sequencing run. This simple script counts amount of reads and bases. Script needs SFFinfo

Usage

perl BaseCount_sequenceCount_from_sff_file.pl /directory/to/sff/files

Eukaryotic part

Generate EVM suitable GFF3 files from MAKER de novo gene prediction GFF

Usage

perl gff3_2_gff3EVM.pl <maker_protein_genes.gff3>

Make EVM data compatible with Gbrowse

Usage

perl fix_evm_for_gbrowse.pl < inputfile.gff3

About

Random utility scripts for genomics data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Perl 92.9%
  • Shell 7.1%