Release MMseqs2 Release 15-6f452 · soedinglab/MMseqs2

MMseqs2 Release 15 brings efficient single query searches with low memory overhead through the new ungapped-prefiltering mode (--prefilter-mode 1). We also improved our greedy clustering algorithm and added a large swath of smaller fixes and features. Thanks to all contributors for their vital contributions and fixes.

Breaking

Updated greedy cluster algorithm. The clustering picks better representatives to respect the sequence identity and coverage criteria. (2568829) Thanks @bbuchfink

New Features and Enhancements

Implement additional prefilter modes (standard double k-mer prefilter, ungapped prefilter, exhaustive searching) (5e119e9)
Added createclusearchdb and mkrepseqdb modules to build cluster-search databases, this was implemented for Foldseek, cluster-search in MMseqs2 will be implemented at a later point (9ae4458, 80f8b0b, 542f362, ad6dfc6, 91f2a6a, 8310cd6, 0019026, 76b7df1)
Implement target-side similar k-mer search mode for sequence-sequence prefiltering (71dd32e)
Rework ungappedprefilter to improve performance and expose additional parameters such as taxon filtering and db-load-mode to ungappedprefilter (8a89305, 800eb09, eb01b5b, 20d3afc)
Added gappedprefilter module for Smith-Waterman prefiltering, similar to ungappedprefilter (df77d9e)
Reworked pairaln for the ColabFold greedy taxonomy pairing mode (1514015)
Implemented experimental module for A3M filtering (167bbd1, 499bb73)
Implemented weighted clustering (bd080e6, b36070a, fd1837b) Thanks @AnnSeidel
Precomputed indices without k-mers can be created with --index-subset (314c1f0, 8fe3bf9)
Add result2neff module to extract Neff scores (4148e09) Thanks @neftlon
Add ppos format-output to convertalis for count of positive substitution scores (5edc79b) Thanks @Dohyun-s
Speed-up FASTA parsing in kseq.h with memchr (98406dd) Thanks @valentynbez @kloetzl

Bugfixes

Add min and max modes for result2stats (19dce03, 61e7734) Thanks @ClovisG
Fixed a segmentation fault in ca3m with the same database (f5f780a) Thanks @ClovisG
Fix crash when some input file sizes are an exact multiple of 4096 in convertalis and gff2db (712f288) Thanks @RuoshiZhang
Fixed issues for GTDB r214 database creation (4b52296) Thanks @apcamargo
Fix source number being limited to 16-bit (65k) (1d62fa0)
kseq now correctly handles input sequences larger than 2^31 bytes (07ca4a7)
Fixed unpackdb to work without a .lookup file and added support for writing compressed files (92d8cc3, 570e3ed)
createindex --check-compatible check the k-mer threshold correctly now (bb0a1b3)
Fixed prefilter exclusively long result lists reading to result truncation. This was primarily a Foldseek issue and shouldn't affect MMseqs2 (ed4c55f)
Corrected handling of multiline checks in createdb (6b93884)
Fix crash by disabling wrapped scoring when the target sequence is shorter than the query (8459b6b) Thanks @AnnSeidel
Fixed logic in reciprocal-best-hit by removing resAB_sort (3bcbdba) Thanks @StephanieSKim
Corrected handling of differently ordered parts of sequence databases in concatdbs (ea17d30)
Fix --single-step-clustering misspelled in cluster warning (fa6c093) Thanks @valentynbez

Build and Compatibility Updates

Addressed build and compatibility issues, including updates for newer compilers and architectures (e.g., Mac ARM64) (e26b9ad, 3e43617, b341b66, 932d32b) Thanks @A-N-Other
Added Mac ARM64 support in GitHub actions and updated from Ubuntu 18.04 to a newer image (1fea43d, 05132de)
Updated regression testing to fix errors in MPI test (2113766)

Developer

Introduced base: prefix to enable inheriting subprojects to find shadowed modules (i.e. Foldseek shadows createdb, but can use base:createdb to use the MMseq2's one) (90aa913)
Exported build architecture in CMake so subprojects can use it (fce06b1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MMseqs2 Release 15-6f452

Breaking

New Features and Enhancements

Bugfixes

Build and Compatibility Updates

Developer

Contributors