Skip to content

blobtk filter

Richard Challis edited this page Dec 22, 2023 · 4 revisions

Filter files based on list of sequence names

Command line

blobtk filter --help

Filter files based on list of sequence names

Usage: blobtk filter [OPTIONS] <--bam <BAM>|--cram <CRAM>>

Options:
  -i, --list <TXT>       Path to input file containing a list of sequence IDs
  -b, --bam <BAM>        Path to BAM file
  -c, --cram <CRAM>      Path to CRAM file
  -a, --fasta <FASTA>    Path to assembly FASTA input file (required for CRAM)
  -f, --fastq <FASTQ>    Path to FASTQ file to filter (forward or single reads)
  -r, --fastq2 <FASTQ>   Path to paired FASTQ file to filter (reverse reads)
  -S, --suffix <SUFFIX>  Suffix to use for output filtered files [default: filtered]
  -A, --fasta-out        Flag to output a filtered FASTA file
  -F, --fastq-out        Flag to output filtered FASTQ files
  -O, --read-list <TXT>  Path to output list of read IDs
  -h, --help             Print help information

Examples

blobtk filter -i test/test.list -b test/test.bam -f test/reads_1.fq.gz -r test/reads_2.fq.gz -F

Python module

from blobtk import filter

# filter fastq files based on a list of sequence names
read_count = filter.fastx(list_file="test/test.list", bam="test/test.bam", fastq1="test/reads_1.fq.gz", fastq2="test/reads_2.fq.gz", fastq_out=True)

print(read_count)
Clone this wiki locally