Skip to content

blobtools filter

Richard Challis edited this page Dec 22, 2023 · 1 revision

Datasets can be filtered based on the values in any variable or category field, or using a list of identifiers. Filters may be applied to a complete dataset to allow for use of a reduced dataset without repeating analyses or applied to assembly FASTA and read FASTQ files to allow for reassembly and reanalysis. Filter parameters are shared between blobtools and the interactive Viewer, allowing interactive sessions to be reproduced on the command line.

Command line

Filter a BlobDir.

Usage:
    blobtools filter [--param STRING...] [--query-string STRING] [--json JSON]
                     [--list TXT] [--invert] [--output DIRECTORY]
                     [--fasta FASTA] [--fastq FASTQ...] [--suffix STRING]
                     [--cov BAM] [--summary FILENAME] [--summary-rank RANK]
                     [--table FILENAME] [--table-fields STRING]
                     [--taxdump DIRECTORY] [--taxrule STRING] [--text TXT] [--text-header]
                     [--text-delimiter STRING] [--text-id-column INT] DIRECTORY

Arguments:
    DIRECTORY                   Existing BlobDir dataset directory.

Options:
    --param STRING            String of type param=value.
    --query-string STRING     List of param=value pairs from url query string.
    --json JSON               JSON format list file as generated by BlobtoolKit Viewer.
    --list TXT                Space or newline separated list of identifiers.
    --invert                  Invert filter (exclude matching records).
    --output DIRECTORY        Path to directory to generate a new, filtered BlobDir.
    --fasta FASTA             FASTA format assembly file to be filtered.
    --fastq FASTQ             FASTQ format read file to be filtered (requires --cov).
    --cov BAM                 BAM/SAM/CRAM read alignment file.
    --text TXT                generic text file to be filtered.
    --text-delimiter STRING   text file delimiter. [Default: whitespace]
    --text-id-column INT      index of column containing identifiers (1-based). [Default: 1]
    --text-header             Flag to indicate first row of text file contains field names. [Default: False]
    --suffix STRING           String to be added to filtered filename. [Default: filtered]
    --summary FILENAME        Generate a JSON-format summary of the filtered dataset.
    --summary-rank RANK       Taxonomic level for summary. [Default: phylum]
    --table FILENAME          Tabular output of filtered dataset.
    --table-fields STRING     Comma separated list of field IDs to include in the
                              table output. Use 'plot' to include all plot axes.
                              [Default: plot]
    --taxdump DIRECTORY       Location of NCBI new_taxdump directory.
    --taxrule STRING          Taxrule used when processing hits.
Clone this wiki locally