Skip to content

MiXCR v4.4.0

Compare
Choose a tag to compare
@github-actions github-actions released this 05 Jul 18:51

🚀 New features

Built-in alleles database

MiXCR features robust support for inferring donor-specific allelic variants of V and J genes from NGS data, using the findAlleles command. With this new release, we introduce a comprehensive built-in database of human alleles. Now, the findAlleles command will utilize known allele names from this integrated library. Feel free to explore our database at https://vdj.online/library.

image

New rigorous quality checks

MiXCR now offers detailed insights into the quality of input data with its new quality control (QC) checks. A comprehensive list of checks provides complete information about the data and facilitates immediate feedback to the wet lab if any issues are detected

image

Convenient way to build custom libraries

Now one can build gene segment reference library for de-novo libraries or for chimeric model animals with just a single buildLibrary command. Check out our updated guide.

mixcr buildLibrary \
    --v-genes-from-fasta v-genes.IGH.fasta \
    --v-gene-feature VRegion \
    --j-genes-from-fasta j-genes.IGH.fasta \
    --d-genes-from-fasta d-genes.IGH.fasta \ # optional
    --c-genes-from-fasta c-genes.IGH.fasta \ # optional
    --chain IGH \
    --species phocoena \
    phocoena-IGH.json.gz

Comprehensive support of sample sheets

Now one can pass sample sheet directly to MiXCR analyze command as input. This way one can easily run MiXCR for arbitrary structure of input files, demultiplexed or not, with any type of multiplexing used:

mixcr analyze generic-sc-ht-vdj-amplicon --species hsa \
    sample-sheet.csv \
    output_prefix

🤩 New presets

  • Support of MiLaboratories Human 7 Genes DNA Multiplex: milab-human-dna-xcr-7genes-multiplex

  • Support of Parse Bio Evercode Whole Transcriptome presets: parsebio-sc-3gex-evercode-wt-mini, parsebio-sc-3gex-evercode-wt and parsebio-sc-3gex-evercode-wt-mega

  • Support of FLAIRR-Seq protocol via flairr-seq preset

  • New generic single cell presets:

    • Low throughput (e.g. micro-wells) amplicon-based single cell:

      • No UMIs: generic-sc-lt-vdj-amplicon
      • With UMIs: generic-sc-lt-vdj-amplicon-umi
    • Low throughput (e.g. micro-wells) single cell with fragmentation (RNA-Seq):

      • No UMIs: generic-sc-lt-vdj-fragmented
      • With UMIs: generic-sc-lt-vdj-fragmented-umi
    • High throughput (e.g. droplets) amplicon-based single cell:

      • No UMIs: generic-sc-ht-vdj-amplicon
      • With UMIs: generic-sc-ht-vdj-amplicon-umi
    • High throughput (e.g. droplets) single cell with fragmentation (RNA-Seq):

      • No UMIs: generic-sc-ht-vdj-fragmented
      • With UMIs: generic-sc-ht-vdj-fragmented-umi
    • Reconstructing VDJ from generic gene expression data:

      • No UMIs: generic-sc-gex
      • With UMIs: generic-sc-gex-umi
  • New Biomed2 primer sets: biomed2-human-rna-igkl, biomed2-human-rna-trbdg.

💪 Major changes

  • Improved aligner parameters for all protocols. We spent in total more than 100,000 CPU/hours running optimization. As a result alignment rate is better for most of the protocols, especially in the case of average data quality.

  • Adds new minSequenceCount parameter for k-mer filter, allowing construction of more flexible filtering pipelines with better fallback behaviour for under-sequenced libraries.

  • Now full sample sheet with input file names can be provided as an input to the pipeline.

  • Sample sheets provided both with --sample-sheet mixin and as a pipeline input, will be fuzzy matched against the data, allowing for one substitutions in unambiguous cases. This behaviour can be turned off by using --sample-sheet-strict mixin instead, or by adding a --strict-sample-sheet-matching option if full sample sheet input is used as pipeline input.

  • New commands: mixcr qc, mixcr buildLibrary) , mixcr mergeLibrary, mixcr debugLibrary)

  • Various major improvements to sequencing and PCR error correction algorithms for tags and clonotypes:

    • tag refinement now uses average quality in statistical inference; this is the correct approach from the mathematical point of view, and it slightly increases performance judging by better consensus assembly downstream
    • statistical inference in PCR error correction redone from scratch, now it takes into account aggregated quality scores of clonotypes, which makes the procedure automatically adapt to low quality samples and better perform in many marginal cases in both UMI and non-UMI protocols
    • better algorithm for quality score aggregation in clonotype assembly
    • better algorithm for quality score aggregation in consensus assembly
  • Mechanism to apply different tag transformations on the align step. Transformations include mappings, string and sequence manipulations and various arithmetic operations. This feature allows to fit single-cell scenarios where multiple well-known barcodes marks the same cell, allows to convert sequence barcodes to textual representation to adopt different barcode naming schemas used in some protocols, convert multiple barcodes to single cell id. Feature is currently used in presets for analysis of data from Parse Bioscience and BD Rhapsody single-cell platforms.

  • Special mechanism to allow for NaN values in metrics in group filters (used in minSequenceCount parameter in k-mer filter, see below).

  • Added fallback behaviour for under-sequenced libraries

🐞 Bug fixes

  • Fix for naming of intermediate files and reports produced by analyze if target folder is specified
  • Tag pattern now is also searched in reverse strand for single-ended input with --tag-parse-unstranded
  • fix for value in report line Reads dropped due to low quality, percent of total report string
  • Fixed bug not allowing to parse more than two reads with tag pattern
  • Fixed bug when --chains is used with exportClonesOverlap
  • Fixed for export... - tag quality field added back to export columns

👷 Minor fixes and improvements

  • Added gene feature coverage in alignment report
  • On Linux platforms default calculation of -Xmx now based on "available" memory (previously "free" was used)
  • New gene aligner parameter edgeRealignmentMinScoreOverride for more sensitive alignments for short paired-end reads
  • Report values downstream align now calculate percents relative to the number of reads in the sample rather than the
    total number of reads in multi-sample analysis
  • Options helping with advanced analysis of data quality and consensus assembly process added
    to assemble (--consensus-alignments, --consensus-state-stat, --downsample-consensus-state-stat)
    and analyze (--output-consensus-alignments, --output-consensus-state-stat, --downsample-consensus-state-stat)
  • Better tag pattern search projection representation in reports
  • findAlleles now recalculate functionality of de novo found alleles
  • Better algorithm to calculating checksum of VDJC library
  • Additional report string "Aligned reads processed" in assemble report
  • Added options --by-feature and --by-gene to sortClones
  • Added options -rankByReads and -rankByTag <(Molecule|Cell|Sample)> to exportClones and exportShmTreesWithNodes.txt
  • Export readIds in exportAlignments by default
  • Added recalculation functionality for de-novo found alleles in findAlleles
  • Add info about CDR3 in generated hash for de-novo alleles
  • Remove de-novo alleles that are actually the same
  • findAlleles will remove not used genes from the library (genes that not represented in given donor)
  • Make --chains optional in downsampling command and allow multiple input
  • Write empty file on exportClones if file doesn't contain any clones
  • Better exception messages on incorrect inputs for export commands
  • In exportClones write no_d_gene if requested VDJunction, DCDR3Part or DJJunction in absence of D hit
  • Columns in exportReportsTable now covers most of significant statistics from reports

🐬 Docker image changes

  • Custom entry-point of the image removed, and now is set to /bin/bash. Now one needs to specify mixcr command at the beginning of argument list:

    Old: docker run ghcr.io/milaboratory/mixcr/mixcr analyze ...

    New: docker run ghcr.io/milaboratory/mixcr/mixcr mixcr analyze ...

  • New image is based on Amazon Corretto which in turn is based on Amazon Linux 2. If customization is required for the image, one now need to use yum package manager instead of apt/apt-get.

    With old image:

    FROM ghcr.io/milaboratory/mixcr/mixcr:4.3.2
    # ...
    RUN apt-get install -y wget
    # ...

    With new image:

    FROM ghcr.io/milaboratory/mixcr/mixcr:4.4.0
    # ...
    RUN yum install -y wget
    # ...

    see official docs for more detais.

  • Better compatibility of official docker image with Nextflow

‼️ Breaking changes

  • Parameter clusteringFilter.specificMutationProbability removed from assemble action. Three new parameters are introduced instead:

    • clusteringFilter.correctionPower - this parameter determines how thorough the procedure should eliminate erroneous variants. Smaller value leaves less erroneous variants at the cost of accidentally correcting true variants. This value approximates the fraction of erroneous variants the algorithm will miss (type II errors).

    • clusteringFilter.backgroundSubstitutionRate - expected rate of substitutions happening before the sequencing.

    • clusteringFilter.backgroundIndelRate - expected rate of indels happening before the sequencing.

  • Majority of presets underwent name revisions (legacy names remain functional, though accompanied by deprecation warnings). See the full list of renames here.