Release MMseqs2 Release 12-113e3 · soedinglab/MMseqs2

Breaking changes

Remove --add-internal-id parameter from result2msa
filterdb --shuffle is now randomly instead of deterministically shuffled
Taxonomy expressions in filtertax(seq)db interpret , as || now #320
convertalis pident output field now correctly reports percentage (0-100) sequence identity instead of fraction (0.00-1.00), use fident to print the fraction instead

Features

Support nucleotide clustering in cluster and easy-cluster
Support other architectures (SSE2/ARM64/POWER8/POWER9/etc) through SIMDe
Linclust is much faster on systems with a lot of CPU cores
Clustering update is faster, more stable and correctly deals with deleted sequences #272
Add easy workflow for reciprocal best hit searches easy-rbh
Add SILVA, Pfam-B, dbCAN2 to databases
databases produces taxonomy information for NR
Replace old greedy incremental clustering with new memory efficient version
Add result2dnamsa module to create MSAs of nucleotide sequences
Continued progress on profile-profile searching (result2pp,expandaln,expand2profile) , stay tuned!
Add multi-parameter to support to overwrite sequence type specific parameters: e.g. --gap-open "nucl:5,aa:11"
Add ORF information as output options to convertalis (qOrfStart/qOrfEnd, dbOrfStart, dbOrfEnd)
Speed up sorting using ips4o
Speed up masking through new version of tantan
Speed up multi-threaded writing of clustering results
Speed up reading of database indices and merging target split databases
Add memory tracking to account for index size when computing available memory (--split-memory-limit should be more reliable when searching/clustering billions of sequences).
Add --search-type 4 (translated/translated search) to createindex
Add convertalis --format-mode 3 HTML output based on MMseqs2 app (app.mmseqs.com)
Improve memory management in result2msa and result2profile modules
Add msa2result module to create an alignment result db from MSAs
Add filterresult to slim down result dbs with pairwise HHblits filtering #316
Add --kmers-per-sequence-scale to linsearch to extract a k-mer fraction instead of a fixed count
Add a random integer to --local-tmp path to avoid race conditions if multiple MMseqs2 happen on the same machine
Add --max-seqs to ungappedprefilter
Add --tax-lineage-mode 2 parameter to print numeric taxids

Bugs fixed

rbh workflow was broken due to issues with filterdb
Fix -a in RBH search to show alignments
Fix PDB70 database creation in databases
Fix aria2c download support
Fix memory issues and MPI in kmermatcher
Fix memory issues in extractorfs when using AVX2
Fix --cluster-reassign to respect --cov-mode
Set-cover supports up to 2^32 sequences (previously crashed with more than 2^31)
Exit correctly if there is not have enough disk space instead of crashing in the next module
Fix prefilter order instability when searching very redundant databases
Correctly parse keys from data files in filterdb --filter-file, this was causing instability in linsearch
Allow overwriting string parameters with empty strings
Fix ASAN issue in extractorf when using AVX2
Microtar would try to seek backwards constantly resulting in horrible gzip read performance
Avoid lookup writing to corrupt memory if an accession is too long
Fix various inconsistencies and usability issues in alignall:
- --alignment-mode inconsistent with align module
- --add-backtrace did not do anything
Fix restart of clusterings using reassignment cluster --cluster-reassign
Fix createdb did not correctly read gz/bzip files with --createdb-mode 1 #323

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMseqs2 Release 12-113e3

Breaking changes

Features

Bugs fixed