Skip to content

MMseqs2 Release 12-113e3

Compare
Choose a tag to compare
@martin-steinegger martin-steinegger released this 01 Sep 11:22
· 609 commits to master since this release

Breaking changes

  • Remove --add-internal-id parameter from result2msa
  • filterdb --shuffle is now randomly instead of deterministically shuffled
  • Taxonomy expressions in filtertax(seq)db interpret , as || now #320
  • convertalis pident output field now correctly reports percentage (0-100) sequence identity instead of fraction (0.00-1.00), use fident to print the fraction instead

Features

  • Support nucleotide clustering in cluster and easy-cluster
  • Support other architectures (SSE2/ARM64/POWER8/POWER9/etc) through SIMDe
  • Linclust is much faster on systems with a lot of CPU cores
  • Clustering update is faster, more stable and correctly deals with deleted sequences #272
  • Add easy workflow for reciprocal best hit searches easy-rbh
  • Add SILVA, Pfam-B, dbCAN2 to databases
  • databases produces taxonomy information for NR
  • Replace old greedy incremental clustering with new memory efficient version
  • Add result2dnamsa module to create MSAs of nucleotide sequences
  • Continued progress on profile-profile searching (result2pp,expandaln,expand2profile) , stay tuned!
  • Add multi-parameter to support to overwrite sequence type specific parameters: e.g. --gap-open "nucl:5,aa:11"
  • Add ORF information as output options to convertalis (qOrfStart/qOrfEnd, dbOrfStart, dbOrfEnd)
  • Speed up sorting using ips4o
  • Speed up masking through new version of tantan
  • Speed up multi-threaded writing of clustering results
  • Speed up reading of database indices and merging target split databases
  • Add memory tracking to account for index size when computing available memory (--split-memory-limit should be more reliable when searching/clustering billions of sequences).
  • Add --search-type 4 (translated/translated search) to createindex
  • Add convertalis --format-mode 3 HTML output based on MMseqs2 app (app.mmseqs.com)
  • Improve memory management in result2msa and result2profile modules
  • Add msa2result module to create an alignment result db from MSAs
  • Add filterresult to slim down result dbs with pairwise HHblits filtering #316
  • Add --kmers-per-sequence-scale to linsearch to extract a k-mer fraction instead of a fixed count
  • Add a random integer to --local-tmp path to avoid race conditions if multiple MMseqs2 happen on the same machine
  • Add --max-seqs to ungappedprefilter
  • Add --tax-lineage-mode 2 parameter to print numeric taxids

Bugs fixed

  • rbh workflow was broken due to issues with filterdb
  • Fix -a in RBH search to show alignments
  • Fix PDB70 database creation in databases
  • Fix aria2c download support
  • Fix memory issues and MPI in kmermatcher
  • Fix memory issues in extractorfs when using AVX2
  • Fix --cluster-reassign to respect --cov-mode
  • Set-cover supports up to 2^32 sequences (previously crashed with more than 2^31)
  • Exit correctly if there is not have enough disk space instead of crashing in the next module
  • Fix prefilter order instability when searching very redundant databases
  • Correctly parse keys from data files in filterdb --filter-file, this was causing instability in linsearch
  • Allow overwriting string parameters with empty strings
  • Fix ASAN issue in extractorf when using AVX2
  • Microtar would try to seek backwards constantly resulting in horrible gzip read performance
  • Avoid lookup writing to corrupt memory if an accession is too long
  • Fix various inconsistencies and usability issues in alignall:
    • --alignment-mode inconsistent with align module
    • --add-backtrace did not do anything
  • Fix restart of clusterings using reassignment cluster --cluster-reassign
  • Fix createdb did not correctly read gz/bzip files with --createdb-mode 1 #323