TBtypeR
enables accurate and sensitive quantification of
Mycobacterium tuberculosis (MTB) strain mixtures from whole genome
sequencing (WGS) data. It is designed to detect low-frequency mixed
infections that other tools struggle to identify, with frequencies as
low as 1%.
TBtypeR
is available as a standalone R package and as part of an
end-to-end Nextflow pipeline, TBtypeNF
, which automates data
processing from raw sequencing reads to final results.
Extensive benchmarking, detailed in our publication in Communications
Biology, shows that
TBtypeR has the highest accuracy in predicting minor strain
fractions. Competing tools fail to accurately detect or quantify
mixtures below 5%.
TBtypeR models reference and alternative allele counts at phylogenetic SNP sites using a binomial distribution. These sites, known as the SNP barcode, are compiled from multiple studies:
Publication | Lineages | Unique SNPs | # Sublineages |
---|---|---|---|
Napier et al. | L1-L9,La1-La3 | 7572 | 78 |
Zwyer et al. | La1-La3 | 1323 | 19 |
Thawornwattana et al. | L2 | 728 | 42 |
Coscolla et al. | L5,L6 | 643 | 16 |
Shuaib et al. | L3 | 637 | 12 |
TBtypeNF
is a Nextflow pipeline
that automates MTB mixture detection from raw sequencing reads. It
integrates:
- FASTQ preprocessing with fastp
- Read alignment with BWA-MEM
- Variant calling with BCFtools
- Quality control with SAMtools, mosdepth and multiQC
- MTB lineage & mixture detection with
TBtypeR
The pipeline generates an HTML report with detected MTB strains and mixture proportions.
We strongly recommend using pipeline where possible as it ensures data is extracted from all phylogenetic SNP sites, maximising accuracy and sensitivity.
TBtypeNF
requires a sample manifest in TSV format with column names
“sample”, “fastq1” and “fastq2” - see example
manifest.
- Nextflow (≥ 22.03.0)
- Singularity/Apptainer or Docker
# download example manifest
wget https://raw.githubusercontent.com/bahlolab/TBtypeR/main/TBtypeNF/resources/lung_example_manifest.tsv -O my_manifest.tsv
# run the nextflow pipeline
nextflow run bahlolab/TBtypeR/TBtypeNF/main.nf -r main -profile singularity --manifest my_manifest.tsv
To use Docker instead of Singularity, replace -profile singularity
with -profile docker
.
Parameter | Description | Default Value |
---|---|---|
manifest | Input sample manifest | null |
id | Run identifier, for naming output files | ‘TBtypeNF-run’ |
outdir | Output files directory | ‘output’ |
publish_bams | Save BAM files to output directory | false |
fast | Run FastTBtypeNF workflow | false |
max_mix | Maximum number of strains in a mixture to be detected | 3 |
min_mix_prop | Minimum mixture proportion to be detected | 0.005 |
The easiest way to use TBtypeR
is through the TBtypeNF
pipeline.
However, additional parameters and customisation is available by using
the R package directly. TBtypeR
can be installed with devtools
as
follows:
devtools::install_github("bahlolab/TBtypeR")
Mixture Frequency | Recommended Coverage |
---|---|
≥ 5% | ≥20× |
≥ 2.5% | ≥40× |
≥ 1% | ≥60× |
It is recommended to either use TBtypeNF or BCFtools Call to generate VCF files for TBtypeR. Input VCF files must:
- Use H37Rv Genome: Download from NCBI here. The chromosome must be named either “AL123456.3” or “NC_000962.3”.
- Contain AD Field: TBtypeR requires the VCF AD (allelic
depth) format field, as generated by
BCFtools call
. - Include SNP Barcode Sites: Coverage of the majority of TBtypeR SNP barcode sites.
To generate a compatible VCF using BCFtools:
# Download TBtypeR SNP barcode
wget https://github.com/bahlolab/TBtypeR/raw/refs/heads/main/TBtypeNF/resources/tbt_panel.tsv.bz2
# Reformat for BCFtools call
bzcat tbt_panel.tsv.bz2 | tail -n+2 | awk 'BEGIN { FS = OFS = "\t" } { print $1, $2, $3 "," $4 }' > tbtyper_targets.tsv
# Call variants with BCFtools
bcftools mpileup <SAMPLE_1>.bam -q 1 -Q 10 -d 200 -f <REFERENCE_FASTA> -a FMT/AD -Ou \
| bcftools call -A -m --prior 1e-2 -C alleles -T tbtyper_targets.tsv -Ou \
| bcftools annotate -x INFO,^FORMAT/GT,^FORMAT/AD -Oz -o <SAMPLE_1>.vcf.gz
Example usage of TBtypeR
:
library(tidyverse)
library(TBtypeR)
# replace with path to your VCF file
vcf_filename <- system.file('vcf/example.vcf.gz', package = 'TBtypeR')
tbtype_result <-
# generate TBtypeR results
tbtype(vcf = vcf_filename) %>%
# filter TBtypeR results
filter_tbtype(max_phylotypes = 3) %>%
# unnest data so there is 1 row per identified Mtb strain in each sample
unnest_mixtures()
tbtype_result %>%
select(sample_id, n_phy, mix_phylotype, mix_prop) %>%
knitr::kable()
sample_id | n_phy | mix_phylotype | mix_prop |
---|---|---|---|
SRR13312530 | 2 | 4.2.1 | 0.8579 |
SRR13312530 | 2 | 4.3.3 | 0.1421 |
SRR13312531 | 1 | 4.3.3 | 1.0000 |
SRR13312533 | 2 | 4.3.3 | 0.9192 |
SRR13312533 | 2 | 4.2.1 | 0.0808 |
Visualise mixtures:
tbtype_result %>%
ggplot(aes(x = sample_id,
y = mix_prop,
fill = mix_phylotype)) +
geom_col() +
coord_flip() +
labs(x = 'Sample ID',
y = 'Minor Strain Fraction (%)',
fill = 'Sublineage') +
theme(text = element_text(size = 6))
Detailed usage guides for the tbtype
and filter_tbtype
functions are
available in the package documentation by running help(tbtype)
or
help(filter_tbtype)
.
If you use TBtypeR, please cite:
Munro, J. E., Coussens, A. K., & Bahlo, M. (2025). TBtypeR: Sensitive detection and sublineage classification of Mycobacterium tuberculosis complex mixed-strain infections. Communications Biology, 8(1), 260.
DOI: 10.1038/s42003-025-07705-9