fastqp

Simple FASTQ, SAM and BAM read quality assessment and plotting using Python.

Features

Requires only Python with Numpy, Scipy, and Matplotlib libraries
Works with (gzipped) FASTQ, SAM, and BAM formatted reads
Tabular, tidy, output statistics so you can create your own graphs
A useful set of default graphics rivaling comparable QC packages
Counts all IUPAC ambiguous nucleotide codes (NMWSKRYVHDB) if present in sequences
Downsamples input files to around 2,000,000 reads (user adjustable)
Allows a 5′ and 3′ (left and right) cycle limit for graphics generation
Tracks kmers and sequence duplication for the entire input file
Plots base call reference mismatches for aligned reads
Optional sequence duplication calculation using Bloom filters (beta)

Requirements

Tested on Python 2.7, and 3.4

Tested on Mac OS 10.10 and Linux 2.6.18

Installation

pip install [--user] fastqp

Note: BAM file support requires samtools

Usage

usage: fastqp [-h] [-q] [-s BINSIZE] [-a NAME] [-n NREADS] [-p BASE_PROBS] [-k {2,3,4,5,6,7}] [-o OUTPUT]
              [-ll LEFTLIMIT] [-rl RIGHTLIMIT] [-mq MEDIAN_QUAL] [--aligned-only | --unaligned-only] [-d]
              input

simple NGS read quality assessment using Python

positional arguments:
  input                 input file (one of .sam, .bam, .fq, or .fastq(.gz) or stdin (-))

optional arguments:
  -h, --help            show this help message and exit
  -q, --quiet           do not print any messages (default: False)
  -s BINSIZE, --binsize BINSIZE
                        number of reads to bin for sampling (default: auto)
  -a NAME, --name NAME  sample name identifier for text and graphics output (default: input file name)
  -n NREADS, --nreads NREADS
                        number of reads sample from input (default: 2000000)
  -p BASE_PROBS, --base-probs BASE_PROBS
                        probabilites for observing A,T,C,G,N in reads (default: 0.25,0.25,0.25,0.25,0.1)
  -k {2,3,4,5,6,7}, --kmer {2,3,4,5,6,7}
                        length of kmer for over-repesented kmer counts (default: 5)
  -o OUTPUT, --output OUTPUT
                        base name for output files (default: fastqp_figures)
  -ll LEFTLIMIT, --leftlimit LEFTLIMIT
                        leftmost cycle limit (default: 1)
  -rl RIGHTLIMIT, --rightlimit RIGHTLIMIT
                        rightmost cycle limit (-1 for none) (default: -1)
  -mq MEDIAN_QUAL, --median-qual MEDIAN_QUAL
                        median quality threshold for failing QC (default: 30)
  --aligned-only        only aligned reads (default: False)
  --unaligned-only      only unaligned reads (default: False)
  -d, --count-duplicates
                        calculate sequence duplication rate (default: False)

Changes

See releases page for details.

Examples

Acknowledgements

This project is freely licensed by the author, Matthew Shirley, and was completed under the mentorship financial support of Drs. Sarah Wheelan and Vasan Yegnasubramanian at the Sidney Kimmel Comprehensive Cancer Center in the Department of Oncology.

Name		Name	Last commit message	Last commit date
Latest commit History 180 Commits
examples		examples
fastqp		fastqp
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fastqp

Features

Requirements

Installation

Usage

Changes

Examples

Acknowledgements

About

Releases 10

Packages

Contributors 4

Languages

License

mdshw5/fastqp

Folders and files

Latest commit

History

Repository files navigation

fastqp

Features

Requirements

Installation

Usage

Changes

Examples

Acknowledgements

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 10

Packages 0

Contributors 4

Languages

Packages