-
Notifications
You must be signed in to change notification settings - Fork 2
blobtk filter
Richard Challis edited this page Dec 22, 2023
·
4 revisions
Filter files based on list of sequence names
blobtk filter --help
Filter files based on list of sequence names
Usage: blobtk filter [OPTIONS] <--bam <BAM>|--cram <CRAM>>
Options:
-i, --list <TXT> Path to input file containing a list of sequence IDs
-b, --bam <BAM> Path to BAM file
-c, --cram <CRAM> Path to CRAM file
-a, --fasta <FASTA> Path to assembly FASTA input file (required for CRAM)
-f, --fastq <FASTQ> Path to FASTQ file to filter (forward or single reads)
-r, --fastq2 <FASTQ> Path to paired FASTQ file to filter (reverse reads)
-S, --suffix <SUFFIX> Suffix to use for output filtered files [default: filtered]
-A, --fasta-out Flag to output a filtered FASTA file
-F, --fastq-out Flag to output filtered FASTQ files
-O, --read-list <TXT> Path to output list of read IDs
-h, --help Print help information
blobtk filter -i test/test.list -b test/test.bam -f test/reads_1.fq.gz -r test/reads_2.fq.gz -F
from blobtk import filter
# filter fastq files based on a list of sequence names
read_count = filter.fastx(list_file="test/test.list", bam="test/test.bam", fastq1="test/reads_1.fq.gz", fastq2="test/reads_2.fq.gz", fastq_out=True)
print(read_count)