Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convertalis pident: fraction, not a percentage #337

Closed
nick-youngblut opened this issue Jul 27, 2020 · 2 comments
Closed

convertalis pident: fraction, not a percentage #337

nick-youngblut opened this issue Jul 27, 2020 · 2 comments

Comments

@nick-youngblut
Copy link

nick-youngblut commented Jul 27, 2020

Expected Behavior

mmseqs convertalis --format-mode 0 --format-output query,target,evalue,pident writes out table of mmseq search hits in blast M8 format, but pident seems to be written as a fraction instead of a percent. This affects downstream processing of the table, especially when trying to conduct the same processing to this table and one generated by blast or diamond (pident is then written as a percent).

Steps to Reproduce (for bugs)

mmseqs search
mmseqs convertalis --format-mode 0 --format-output query,target,evalue,pident

Your Environment

Ubuntu 18.04.4

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       0_gnu    conda-forge
biolib                    0.1.6                      py_0    bioconda
boost-cpp                 1.70.0               h7b93d67_3    conda-forge
bzip2                     1.0.8                h516909a_2    conda-forge
ca-certificates           2020.6.20            hecda079_0    conda-forge
certifi                   2020.6.20        py37hc8dfbb8_0    conda-forge
comparem                  0.1.1                      py_0    bioconda
curl                      7.69.1               h33f0ec9_0    conda-forge
cycler                    0.10.0                     py_2    conda-forge
diamond                   0.9.36               h56fc30b_0    bioconda
fqtools                   2.0                  hc0aa232_5    bioconda
freetype                  2.10.2               he06d7ca_0    conda-forge
future                    0.18.2           py37hc8dfbb8_1    conda-forge
gawk                      5.1.0                h516909a_0    conda-forge
gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
hhsuite                   3.2.0           py37pl526h3340039_1    bioconda
htslib                    1.9                  h4da6232_3    bioconda
icu                       67.1                 he1b5a44_0    conda-forge
kiwisolver                1.2.0            py37h99015e2_0    conda-forge
krb5                      1.17.1               h2fd8d38_0    conda-forge
ld_impl_linux-64          2.34                 h53a641e_5    conda-forge
libblas                   3.8.0               14_openblas    conda-forge
libcblas                  3.8.0               14_openblas    conda-forge
libcurl                   7.69.1               hf7181ac_0    conda-forge
libdeflate                1.6                  h516909a_0    conda-forge
libedit                   3.1.20191231         h46ee950_0    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc-ng                 9.2.0                h24d8f2e_2    conda-forge
libgfortran-ng            7.5.0                hdf63c60_6    conda-forge
libgomp                   9.2.0                h24d8f2e_2    conda-forge
libidn2                   2.3.0                h516909a_0    conda-forge
liblapack                 3.8.0               14_openblas    conda-forge
libopenblas               0.3.7                h5ec1e0e_6    conda-forge
libpng                    1.6.37               hed695b0_1    conda-forge
libssh2                   1.9.0                hab1572f_2    conda-forge
libstdcxx-ng              9.2.0                hdf63c60_2    conda-forge
libunistring              0.9.10               h14c3975_0    conda-forge
llvm-openmp               8.0.1                hc9558a2_0    conda-forge
lz4-c                     1.9.2                he1b5a44_1    conda-forge
matplotlib-base           3.2.2            py37h1d35a4c_1    conda-forge
mmseqs2                   11.e1a1c             h2d02072_0    bioconda
ncurses                   6.1               hf484d3e_1002    conda-forge
numpy                     1.18.5           py37h8960a57_0    conda-forge
openmp                    8.0.1                         0    conda-forge
openssl                   1.1.1g               h516909a_0    conda-forge
perl                      5.26.2            h516909a_1006    conda-forge
pigz                      2.3.4                hed695b0_1    conda-forge
pip                       20.1.1                     py_1    conda-forge
prodigal                  2.6.3                h516909a_2    bioconda
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
python                    3.7.6           cpython_h8356626_6    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.7                     1_cp37m    conda-forge
readline                  8.0                  hf8c457e_0    conda-forge
scipy                     1.5.0            py37ha3d9a3c_0    conda-forge
seqkit                    0.12.1                        0    bioconda
setuptools                47.3.1           py37hc8dfbb8_0    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
spades                    3.14.0               h2d02072_0    bioconda
sqlite                    3.30.1               hcee41ef_0    conda-forge
taxonkit                  0.5.0                         0    bioconda
tk                        8.6.10               hed695b0_0    conda-forge
tornado                   6.0.4            py37h8f50634_1    conda-forge
wget                      1.20.1               h22169c7_0    conda-forge
wheel                     0.34.2                     py_1    conda-forge
xz                        5.2.5                h516909a_0    conda-forge
zlib                      1.2.11            h516909a_1006    conda-forge
zstd                      1.4.4                h6597ccf_3    conda-forge
martin-steinegger added a commit that referenced this issue Jul 28, 2020
@martin-steinegger
Copy link
Member

Yes, this issue exists since MMseqs1. Thank you for pointing it out. I have added fident additionally to pident. fident will report the fraction while pident reports the percentage. In default fident is reported.

@nick-youngblut
Copy link
Author

Awesome! Thanks for the quick fix!

elileka added a commit to soedinglab/metaeuk that referenced this issue Jul 28, 2020
a9c56e57f Merge branch 'master' of github.com:soedinglab/MMseqs2
ec3b8254a Try out new aggregate tax algoritm
cfba9f021 Fox .index.0 files not being removed after sorting
dde4b2e38 Next try to downgrade ICC
618331da9 Downgrade ICC since latest version seems to be broken
ed45a9f20 Remove unused variables in rewritten microtar
328732a1d Update regression
ae7398d65 Added fident to convertalis. fident prints the fraction of the sequence identity. pident reports the percentage. soedinglab/MMseqs2#337

git-subtree-dir: lib/mmseqs
git-subtree-split: a9c56e57f3007ba88c3645821a7bbbd2b1db2f3c
RuoshiZhang added a commit to soedinglab/spacepharer that referenced this issue Nov 12, 2020
9b74117ee proteinaln2nucl can now compute scores and evalues
8ea08f0c7 Add curl flag to follow redirects to database downloader
1cf3002a6 Fix compiler warning
5dc4bcd4d Update eggnog urls (fix curl bug)
20a03128f Fix id issue in tar2db
be4d2e074 Add multi-threading support to tar2db
f68316088 Merge pull request #359 from mr-c/spelling
b244246b8 Spelling typos fixes
96d452cb4 Inline single use of DBWriter::mergeFiles to mergedbs
24ecc26c1 Fix some compilation flags would not be correctly set during cross-compilation
beabb353c Make sure to flush stdout/err before calling any workflows
a16220688 Add missing dbtypes to allDbAndFlat
49240a30c Setting APT::Immediate-Configure=false fixes cross-compiler installation
d4fd0729b Next try to fix cross-compilation
bd3e49fe4 Remove ubuntu-toolchain ppa breaking cross-compiler installation on azure
4b9b3b56c Remove all other apt sources from azure before installing cross-compilers
57f429a0b remove unused remnants of the past in alignment class
de06950ff Reduce calls to posix_memalign, fixes lock contention of some platforms
d3b0cf9a0 Fix result2profile could allocate not enough memory if target database contained much longer sequences than query database
1a490efe7 Support ungapped alignments in sliced search
333cc350a Fix addtaxonomy always crashing due to invalid check
29e327f9b change orf filter params to match test runs
cc7d7da30 result2repseq should preload the sequence database into memory
637942259 Improve createsubdb help text
951d5a72b Add nrtotaxmapping to create taxonomy mapping from NR
df69c26e1 Merge commit '90e71f9968d3925e545c45d7c68325dd3cd0c588' into master
90e71f996 Squashed 'lib/simde/simde/' changes from 938d82c8..f2257f11
48950b95b Correctly pass threads/verbosity in taxonomy workflow
9d3ab794f Merge commit 'b6a4528e818ca644f8200fc84b2d1856ecd8f5c7' into master
b6a4528e8 Squashed 'lib/simde/simde/' changes from 2119ac73..938d82c8
113e3212c Fix ASAN issue in extractorf when using AVX2
b15e95a16 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
b7ec0e93a Fix setcover issues with dbs > 2^31 sequences
f8b3f8b1d Add biocontainers badge
b7ac683cb Update cluster update regression
4d665ce9e Automatically set cluster parameters also in cluster update
b5a088338 Fix #272 remove deleted sequences from old clustering in cluster update
66f77ce8b Cleanup subtractdbs
b2ac9e0bb remove confusing comments
3d2e394a6 Limit number of jobs used for compiling on travis
d58cc78cf Fix invalid symlinks in result2repseq
21f714661 Cluster update refactoring
fbe754e06 Fix missing newline in first sequence in entries of result2msa
a29379e26 Do not map scores if not needed in result2pp
e80ec9a30 Updated ROC for result2pp
769aa78a7 Add seqdb preloading in result2pp
cf8b14292 Remove more unused parameter from result2pp
f2a293393 Update regression to include result2pp test
3fac8dde8 Copy profile information of unaligned regions from query profile
967a4555d Cleanup and fix result2pp
b2f49a253 Add NR taxonomy information
efdbe9415 Change serial sort to std::sort
97a8f1dce Update regression
0c123fe73 Fix comp. bias correction in expandaln
401d8e6fb Add --max-seqs to ungappedprefilter
f57d1a712 Update expandaln, expand2profile and regression
a62ea9a99 Update reassign cov. mode in prefilter and fix regression
64f9294ba Update regression to include expansion test
61d8b64d2 Fix coverage read in for nucl-nucl alignment results (#339)
45ae92765 Compute evalues and sort correctly in expandaln
38fab36e5 Fix wrong sequences being loading in expandaln due to wrong sorting
3aa032be6 Cleanup in MultipleAlignment
ea3212f05 Fix realloc size in profile set size increase
7cca0508d Fix restart cluster-reassign
0945e5a58 Add prefilter parameter to reassign
4e436c79b Fix compile error in tests
47e622999 Avoid constant allocations in PSSMCalculator
657a97c01 Don't clone the whole result_t vector uselessly in profile related modules
b87cae011 MultipleAlignment does not require constantly allocating and deallocating Sequence objects anymore
486e13ac0 Remove add internal ID parameter in result2msa
0a8a7a3a4 expand2profile module should be able to directly build a new profile
a84e6f48c Make max set size in profile classes dynamically growable
5baf62ab5 Cleanup Sequence class
e4b2ffb09 Move PSSM masking and writing to its own file
d10a6104c Fix clang warning
76d7d83b8 Fix progressbar in first clust readin step
01937be26 Taxonomy expressions in filtertax(seq)db interpret , as || now #320
fddf635d0 Add SILVA to databases module
9ec7c5e6f Fix MPI warning
ce65cb86e disable ICC in travis, beta08 breaks their setvars.sh script and SIMDe has many issues
871831351 Fix warning in clang
97653a929 Check the return code of fclose to handle full disk errors better
06bd0cfd6 Add filterresult for pairwise HHblits filtering to reduce redundancy in a result db #316
3bdaf488c Fix various result2msa modes (compress works cleanly now, --filter-msa mode could return invalid MSAs)
c1f783387 Fix invalid projected backtraces in expandaln
d741a251b Remove circular include
595625a17 Cleanup result2msa/profile
8ad363748 Unify to computation of alignments in msa2result and transitivealign
55534d71a Fix wrong lengths used in msa2profile
5d10ce00d Rewrite expandaln module
4be0d6e10 Add msa2result module for generating result dbs from MSAs
a179ab277 Cleanup DBConcat
a9c56e57f Merge branch 'master' of github.com:soedinglab/MMseqs2
ec3b8254a Try out new aggregate tax algoritm
cfba9f021 Fox .index.0 files not being removed after sorting
dde4b2e38 Next try to downgrade ICC
618331da9 Downgrade ICC since latest version seems to be broken
ed45a9f20 Remove unused variables in rewritten microtar
328732a1d Update regression
ae7398d65 Added fident to convertalis. fident prints the fraction of the sequence identity. pident reports the percentage. soedinglab/MMseqs2#337
a61b9eb9b handle the unranked root and cell orgs
d2141f324 ORF filter with high-eval thr ungapped alignment
ea01a174c Remove useless cast in QueryMatcher
1e95b6bdb Update tantan
207d0d210 Allow overwriting string parameters with empty strings
755a7b030 Add new binaries to README and fix whitespace
be05b8d06 Add orf-filter to taxpercontig and cleanup
22e17aa44 orf-filter should also work in easy-search and easy-taxonomy
7fefa8af2 added mode to ByteParser
4393c5aae typo
b05d7d753 Speed up read index and kmermatcher
3f9a60317 Fix --search-type 4 in createindex
18e901198 Rework read index in DBReader
1eb72611c Do not sort indexes when already ordered while DB close
65f246b10 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
053fd61c5 Improve multi-threaded speed of writing clustering results
e1a710661 Fix typo in arch name
3b1e528c3 Add SSE2 binary to Docker
d66ee4163 Use Ubuntu 20.04 for cross-compilation
8ba605e87 Add SSE2 and cross-compiled ARM64/POWER8/POWER9 builds to azure
a5e485ba0 Fix broken checks for libraries when cross-compiling
7fe0cb909 Fix progress bar in DBConcat
cef0731b3 Create translated index if --search-type 4 is used in createindex
47afc572d Fix --search-mode 4 issues in offsetalignment
80fdcbed0 Change cluster reassing to bool soedinglab/MMseqs2#329
57e8a9df2 Allow ORF filter only in combination with query nucleotides
d55f06ce1 Fix Pfam.full database creation
659cc1f84 Add additional experimental ORF prefiltering step before translated search
e934f1c48 made ByteParser more informative
4d14c9fed tax-lineage modes: 0 nothing, 1 names, 2 taxids
b777cd099 Disable ips4o on ppc for now
95a885243 find_package is case sensitive
8e797b1d1 Allow disabling use of IPS4O, cleanup
850a196b4 added seqs assignment agreement to the output
e21dc40f7 Fix wrong existence checks for databases in workflows
5901a0a98 Set minimum clang to 5.0 for now
d7b46e609 Disable ips4o on cygwin
033fda237 Change travis gcc check to 4.9
908675d26 Add includes
ee7b5c11d Change random_shuffle to shuffle
d1a1af5e2 Rewrite atomic check in cmake
d6590f394 Add missing FastSort.h
704d0fb4d Merge branch 'master' of https://github.com/soedinglab/mmseqs2
109be7bba Change sort to ips4o if possible
020593660 Fix warning
d092a4698 Fix kmermatcher MPI support
2f1db01c5 Rename martin.steinegger@mpibpc.mpg.de to martin.steinegger@snu.ac.kr
0f7b6856c Fix #326 wrong citation link
62a387ed5 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
c125a2170 Fix issues in expandaln
648bc1f65 Add Pfam-B download script
16e79a2a8 Add dbCAN2 download script
7c0ed7f84 Microtar would try to seek backwards resulting in horrible gzip read performance
cab0e8384 Fix #323 createdb not correctly reading gz/bzip with --createdb-mode 1
1d6500345 mmseqs --help should not give a useless correction suggestion
35c58af90 Improve download of taxdmp file in createtaxdb
68feeb202 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
6546822c7 Add missing .dbtype to newSeqDb header in cluster update
2a7874828 Merge pull request #321 from milot-mirdita/simde
72d19b966 Seems like travis reduced the RAM available on ARM
565ad3f93 Add script to update SIMDe
b9783a7fc Squashed 'lib/simde/simde/' content from commit 2119ac73
9828f0d69 Merge commit 'b9783a7fca1677486f2f830a9c59fda11330980c' as 'lib/simde/simde'
641ef68ba Remove submodule in preparation for subtree
b6dd64470 Work around clang issue
a877dc00c Rebuild SIMD autodetection
5ba9e7ae9 Cleanup warnings
3980d2a7d Add one Newton-Raphson it to make division with _mm_rcp_ps always consistent
27b82963c Try limiting threads in ppc to not crash on 4gig ram
c95bdcc1b Silence strict aliasing warning in Itoa for NEON
590cfb962 Rebuild 128/256 bit SIMD split in simd.h
f5750feee Enable building on non-x86 and less than SSE4.1
21d798f09 Remove not finished createtaxdb changes
b59c33816 Make orf information available through convertalis
284bb7578 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
f4bbce845 Add MemoryTracker, Account for index size when computing available memory
e2510e8f6 fixed comment because it wasmisleading
def7ace2e Add convertalis HTML output based on MMseqs2 app (app.mmseqs.com)
dd3ff63a5 Fix convertkb to work without a mapping file
dc054792d Previous lookup writer would always report failing
52ac0f368 Refactor lookup writing to not corrupt memory if an accession is too long
9f2be0e03 Disable ICC travis for now
d319bb920 Merge branch 'master' of github.com:soedinglab/MMseqs2
d15223652 remove appendtaxaln
648cf8368 refined code as per Milot's feedback
94db03160 One more INT -> UINT warning that ICC complains about
271b7c135 Next try for travis
2c1dfdd4d Fix terminate value in SSW again
6016e1b1a Try to fix travis
ddaaaf7fd Fix various warnings reported by ICC, add ICC to travis
f3adc10fd added aggregatetaxweights to get rid of appendtaxaln
211dd7a79 change to SSTR
517b01ba8 true in createParameterString saves defining taxonomy defaults
aa068c861 added taxonomy default parameter values because life
f9d1face8 in the process of adding taxPerContig workflow
94f895a34 fixed english
19f1dfbd1 moved weight const back
8db3e714a moved definition of constants
05cfc8bde added mode of tax output: both lca and aln
2cd590463 added voteMode parameter
128f57b5d extended aggregatetax to handle eval-based weights
e4a10bd7b added appendtaxaln for extending aggregatetax
0c29da4ab Actually fix the filterdb --join-db issue
7ff6ae7c3 Restore fix lost char in joindb mode change
f5c8b28cb Update README.md
e4f7e7454 Add qOrfStart/qOrfEnd, dbOrfStart,dbOrfEnd to offsetalignment
cf40916cd Merge branch 'master' of https://github.com/soedinglab/mmseqs2
c0dac797b Do not write null byte in splitdb
cbb542af9 added rand id to tmp files created at localTmp
214e87e91 Remove goto in lca.cpp
c8309fced Merge branch 'master' of https://github.com/soedinglab/mmseqs2
b761ddf4c Fix issue with qset format output
80bff8323 Do not write .lookup in easy workflows if not needed

git-subtree-dir: lib/mmseqs
git-subtree-split: 9b74117eecf3b346a2fd0bfe62d8cee244299039
martin-steinegger added a commit to steineggerlab/conterminator that referenced this issue Oct 25, 2022
c48da9d7 Update Prefiltering.cpp
45891515 Reset errno before various strto* calls
7e284099 Update docker install instruction to GHCR
28b00883 Fix FASTA input not ending with a newline resulting in invalid sequence db with --createdb-mode 1 (#617)
a81d9e72 Fix issue with gcc 4.9
8799829d Fix compile error
1761bd60 Add module db2tar: Create a tar file from a database
dcd180be (Re)add support for tar-writing to microtar
fea8d203 Add support for external k-mer thresholds for the prefilter
ede0be15 Rework rescore diagonal
8f78b0ab Rework ungapped alignment
aabc78c2 Fix indexdb
ce8cd536 Fix masking issue
304a99bb Delete unmasked index to fix asan issue
67949d70 Fix #586 summarizeresult should not reject hits that match the coverage threshold
3d4840b3 Use macos-11 in azure
8ff26f23 Support finding taxonomy db paths from other prefilter databases
8ff72796 Add speedup shortcut to TaxonomyExpression for a single tax identifier
1d631726 Add taxonomic filtering during prefilter with --taxon-list
3b9cf881 Add URIs as allowed parameter inputs
1c739ae7 Add easy parsable tsv output to databases
ba4e11f1 workflow_dispatch can tag container as latest
7ebd2e04 Revert alignment profile in sequence.cpp
5185d3cb Allow tagging of docker containers through workflow dispatch
eb203d35 Build docker image in GH action and publish to ghcr
678c82ac GTDB ar122_taxonomy does not exist anymore, replace with different file #561
7be78c81 Fix tar2db breaking with --tar-include/exclude #561
d1555862 Encode more
16b57741 Encode " \n\t[]{}^$?|.~!*" as b64
b0b8e85f Fix truncated profile sequences in convertalis #567
96b20099 Fix broken badges in README (and remove travis)
407b315e Fix multi-threading issues in pairaln
92deb92f Fix unpackdb parameter
be8c278c Progress update fix
58593ec0 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
3f8695ea Add multi-thread support to pairaln
e9e829c7 Fix seg. fault in realign
ce7bf53b Point Kalamari3.7v to a fixed commit soedinglab/MMseqs2#531
fcf52600 Remove a level of indirection to access compatible index version
922e2691 Fix failing utility tests
74c3aa65 Fix typo (violoations -> violations) (#526)
7281baf9 Add --comp-bias-corr-scale
d89fcecf Write serialized index in appenddbtoindex
79ea1ee3 Fix new IndexReader USER_SELECT trying to read header databases as fallback
a506d677 Allow subprojects to build their own precomputed indices
75af0c82 Add appenddbtoindex to argument a precomputed index in sub-projects
4f046dd1 Add mask prob to mask sequence
38cf3f10 Fix TestIndexTable
b768f48f Add --mask-prob parameter
bfc6f85b removed error message for wrapped scoring, should work with all rescore modes
edb8223d Fix pairaln
6e7ed700 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
e19df7ce Rework pairing to support more than two sequences
9fded60a Add environment variable MMSEQS_IGNORE_INDEX to ignore an existing precomputed target index
efacc690 Cushioning the overestimated number of diagonals in case of many successive hits on one diagonal
5fc318b6 Add convertalis --format-mode 4 to print blast-tab headers
80fcadde Disable profile gap scores in msa2profile temporarily
9cc89aa5 Fix huge memory allocations introduced in 49c2b70
a8c30da5 result2msa correctly prints X residues
482dedc6 Explicitly set threads in Cirrus
75e9bfaa Update tectonic in azure to fix error in userguide building
16830a52 Fix number of CPUs used in cirrus
aab640d2 Fix gap pseudocount mode again
716fb621 Turn --k-score into MuliParam so it works correctly in iterative-profile search
56816b39 Resfinder download should not use tar wildcards, broken in busybox #494
e85ceb9d Change the url for UniRef* from ftp to https in databases downloader (#496)
49c2b70b Fix mem. issue
09e261bf Avoid substracting from getMaxSeqLen
4b77690e Move maxSeqLen logig to getMaxSeqLen() to avoid index issues
d8736973 Fix max length in DBReader Allocate CSProfile only when needed
42bf6438 Rework download database
5afd33c3 Make "databases" usable in sub-projects
f6518799 Update regression
f3f5b133 Update k-score sensitivity fitting for no-cntxt profile searches
3e92abf7 Add db-load-mode support to pairaln
5e245d17 copy dbtype and clear map
4a3bb340 Merge branch 'master' of https://github.com/milot-mirdita/mmseqs2
9a0df0d2 Add pairaln
fa44760e Fix recent forgotten else in getKmerThreshold
45b2b521 Revert "Try increasing the k-mer thresholds again for 5/6-mers"
be119433 Fix prefilter not correctly masking extended dbtype for comparision
e3ce4605 Fix memory leak in MappingReader uncovered by ASan
06bdc5e7 Fix missing cassert header in tsv2exprofiledb
8521fb45 Remove useless calls to opendir/closedir in FileUtil
885b4699 Add workflow to create expandable profile (profile-profile) db from a bunch of TSV files
ad05844f Add missing pseudocount check in indexdb
e33c32aa Fit new values for prefilter
7950368f Fix another broken test
b456cf51 Fix unused variables in lca
003cd244 Merge remote-tracking branch 'main/master'
6a8f586b Add extended dbtype to check for context specific pseudocounts, so that the correctly fitted kmer thresholds can be used
92a19497 Fix uninitialized warning in addtaxonomy
2e75435e Fix createbintaxonomy mapping dump size written
178eacff impl. contextPseudoCnts getKmerThreshold, values not fitted yet
35c67c87 Change pos. spec. gap costs to templates
9defdf89 fixed bug for uneven number of repeated kmers
0c26a107 replaced global with end_to_end in rescore mode variable
9064061d fixed size_t parameter handling
3fa46fe3 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
763fa9ff Change compress loop to omp static to keep order
49710b7f Fix sub. mat asan issue
d0a00d6a Update Sub. Mat. logic for aa2num mapping
ccf55559 Fix test
e4aae927 Make taxonomy mapping mmap'able for instant read-in
c66fd1b1 Fix syntax error in filterresult
87623596 Fix issues with include identities in filterresult
91617c4b Add includeIdentity to filterresult
fe16da39 Stay compatible with previous short A3M header output format
ce5b2418 Fix wrong assumption about header databases IDs with new index database scheme in result2msa
a54df874 Remove E-value threshold in filterresults
5647a56a Allow --diff 0
d5656191 Add MSA output mode for A3M+aln info
85ce8472 Expand can filter in each target cluster before expanding
ae4c7ab1 Merge branch 'master' of https://github.com/soedinglab/MMseqs2
38ab523a Merge branch 'master' of https://github.com/soedinglab/mmseqs2
5e0d11f2 Extend MSA filtering for bucketed filtering within qid buckets
c6d8ae0c Add filter min enable
25cb16ff Enable result2profile/filterresult to read new expand alignment index
37225004 Don't mask consensus sequences in profiles
b2a34020 Ignore cacode warnings
c3e90f41 Allow indexing of profile-profile db
f3491183 Make sure very large database don't overflow localThreads
66fa3c76 Update regression to remove result2pp from expand check
87fed2e6 Merge remote-tracking branch 'main/master'
5b75b842 Try increasing the k-mer thresholds again for 5/6-mers
ad5837b3 Revert "result2msa now supports reading from index"
7ee3e794 Fix wrong database name printed for variadic input when creating a tmp directory
15fdf48e result2msa now supports reading from index
7aade9df Change deep copies to const references in result2msa
ce7cf754 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
31eb67ae Add A3M support to result2msa
56f7685b Add symlinks/copies for _taxonomy file #474
904d0c6d Transition old compiler tests from travis to CirrusCI
442d8983 Fix memory issues in QueryMatcher
17c8028e Move fixRlimitNoFile to Application
c6634976 Fix the forbidden symbols when using unpackdb (#467)
488df863 Refactoring of gff2db
d822533f Build update function for DbType validators
a09a704e Remove bash dependency in regression to fix FreeBSD in CirrusCI
4f1996a4 Fix FreeBSD on CirrusCI samtools issue
a2e2129c Add CirrusCI to test FreeBSD
01492c95 Revert "Make sure QueryMatcher::radixSortByScoreSize cant corrupt memory"
15ace29a Fix posix_madvise on FreeBSD returning error if size=0 (See #460)
86152a2f Remove useless calls to std::map::operator[]
d4dd06d2 Fix iterative profile search restartable again
91b61706 Make sure QueryMatcher::radixSortByScoreSize cant corrupt memory
af317095 Save a buch of work when sequences are not needed in expand*
be5a1da4 Replace many aligned allocation in MultipleAlignment with single allocation
7469d599 Fix unused warning
942a012a Move MultiParam::format out of header to avoid compilation warning
d2148058 Fix unused parameter warning
40ba03f4 Disable warnings from nedmalloc (external dependency)
c811a511 Fix tests after profile-profile refactoring
7a8ee485 Try to fix profile-profile alignment for SSE
68862ed2 Add missing simd.h functions for SSE
a09de7eb Fix compile errors
807d97a9 Merge remote-tracking branch 'main/master' into ppmerge
4578f8ba Temporary change to slicesearch to speed things up
3a51b445 Add support to support position-specific gap penalties in profile-profile alignment in iterative search.
139e4502 Get rid of MathUtil::popCount in favor of __builtin_popcount
bbfd6e26 Add preload mode to expand(aln/2profile)
b14d0136 Fix a few more tests
635911ec Increase sortresult buffer for matcher result
d6c19db9 Fix exhaustive search parameter in examples
e86afeab Move substitution matrix init code out of Parameters::parseParameters to fix tests
62f7aba1 Replace biorxiv citation for taxonomy paper
24f6b52a Cleanup magic value with constant in kseq
c7f6a37e Allocate at least a 20 * 20 matrix in StripedSmithWaterman
57de8c8d Fix profile2repseq input database type
96a069e5 Shellcheck fix
52c6ae87 "Can not" to "Cannot" in DBReader and cleanup
e39d02af MemoryMapped cannot accidentally segfault on 0-byte sized files anymore
2d7411a1 Revert "Bug fix with empty temporary files"
7be4fca9 Add VOGDB to database downloader
dd5db429 Update dbCAN2 to V9 and make remove .aln suffix from profile names
d4a33542 Always set a value for FILTER_RESULT in exhaustive search
ec1f599e Update regression for recent change to nucl-nucl search
c967985e changed rescoring for nucleotide sequences only in prefilter
19064f27 Revert "fixed rescoring for nucleotide sequences with multiple diagonals for one target exceeding UCHAR_MAX count"
c54c5382 fixed signed error
f751bcc9 fixed rescoring for nucleotide sequences with multiple diagonals for one target exceeding UCHAR_MAX count
1d770285 Fix endless loop in rescorediagonal
4462533c Don't allow iterative profile search in taxonomy #432
64a2265f Make sure no backtraces are computed in lcaalign
b8501a1b Fix previous broken commit
971b442e Fix additional two more memory leaks before exit
7fbc0b65 Fix memory leak in DBWriter::createRenumberedDB
a6cab565 Fix prefilter/alignment with 0-size query input #433
14a3dce2 createsubdb and view can now return results from identifiers in .lookup with --id-mode 1
6622c9f0 Fix DBReader::USE_LOOKUP_REV
d77de8da Fix extractorfs sometimes loading invalid start/stop codons on non-avx2 platforms
5daca424 Fix typos in extractorfs warnings for short input sequences
fe61aeee Replace strcpy in microtar
0523594f Add support for GNU tar specific filenames and some lesser used entry types to tar2db
5ed18ff0 Merge commit '15242315f80fbda1bffc05cd41fa47c192373902'
15242315 Squashed 'lib/simde/simde/' changes from 79bf0b7c..1f4a28c4
bb02734e Get rid of more scanf calls
fa4cd2a7 Fix arch selection on ARM (use -mcpu instead of -march) and s390x (enable -mzvector)
a202b3c2 Squashed 'lib/simde/simde/' changes from b6c9c964..79bf0b7c
fb39ca1e Merge commit 'a202b3c2d58cc2f80ecfb2123158377f08bc6510'
3d40f105 Fixes for gap panalties merge
2718ca75 First attempt to merge prof-prof and gap-penalties
93f90b04 Fixes to last merge
b7811188 Merge branch 'master' into main-master
22a7bfa2 Add iterativepp workflow
1a87a226 Cleanup Matcher::compressAlignment
6885bad8 Get rid of sscanf in Matcher::uncompressAlignment
50ce7a5c Fix previous commit writing dbtypes for big endian
852f04de Fix compile error
afa6d02d Read/write dbtypes always as little-endian
6269994f Explicitly support size_t in Parameters
d9744e3c Fix some 32-bit issues #418
c25aec57 Cleanup kmergenerator header
be343e98 Additional s390x fixes (linclust might work now)
45111b64 Add initial fixes to get MMseqs2 working on s390x
b1704ccc Merge branch 'master' of https://github.com/soedinglab/mmseqs2
f388ead8 Add parameter --alignment-output-mode, remove alignment mode 5
2a4a2dc5 Add correlation score parameter to align
f9d2ae30 Add support for new Multiparameter type
cbc1b489 Refactor pseudocounts
1e58454a Restore K4000.crf from history
f6eadeaa --majority parameter was missing from taxonomy workflow
24217dc9 Reduce number of threads on travis ARM
ff4c9029 Remove SORTRESULT_PAR from search.cpp
178d3b5f Fix exhaustive search
247de411 Move warning from inner loop to outer in extractorfs
6a0dcee4 Update Regression test
f92447d0 Rename slice to exhaustive search, add filterresult
6c2fefce Set pca to 0.0 in expand2profile
0cc7e674 Add unpackdb to split a database into separate files #406
877344c3 Add USE_SYSTEM_ZSTD cmake flag to use system provided zstd #411
bbd56417 Replace throw with abort in ALP again
46c26ce9 Add missing licenses and readmes for code in lib #403
20543e0a Update ALP to 1.98 and add readme/license
d5717e82 Add CDD to databases downloader #410
04b27f98 msa2profile always copies lookup/source files instead of linking them to be independent from the MSA db
2d83f517 msa2profile/result can skip the first sequence
242a8faf Pass threads to tar2db in databases workflow
a19f5a52 Allow clustering of clustering input with set-cover or connected-component by ignoring scores/weight
39a41403 Don't set INT_MAX as --max-seqs in slice search to avoid huge allocations in prefilter
9290a2b5 Allow sequence database input in taxonomyreport #408
aaba0c7f Short circuit cluster-reassign if nothing can be reassigned
3822a8f5 Fix tmp files not getting removed in linclust/cluster with --remove-tmp--files
2a35e025 Fix kmermatcher setting user k-mer pattern in auto k-mer selection and breaking
a1050359 Rename accelerated 2bLCA to approximate 2bLCA to be consistent with manuscript
11698a5b Rename LICENCE to LICENSE soedinglab/MMseqs2#402
0828d865 Allow result database input in taxonomyreport #401
b31ebb64 Krona taxonomy report was not working if no sequence was unclassified
9f0fb3ed Cleanup taxonomyreport
a2d9568d Fix wrong azure dependency
b1367fc2 Make resultToBuffer buffer sizes consistent (needs further refactoring)
98f9939d Get rid of results temporary array in msa2result
d495e0e9 Replace texlive with tectonic for userguide building
e03b5257 Fix MMseqs2 Taxonomy citation
602689c1 Update examples in mmseqs (easy-)taxonomy invocation
ecf152cf Improve (easy-)taxonomy description text by reordering parameters by importance
e0b04434 Improve description of --orf-filter
a7f91d46 Add warning if cluster or prefilter input is used in majoritylca with invalid --vote-mode
a3399397 Update regression to include recent speedup
d5da12d7 Add GTDB to databases downloader
83780f4c Respect verbosity for rmdb calls in databases
9011c15d Improve output of databases list
86c03fd4 Increase buffer sizes in tar2db
2bd03c68 Fix tar directory (symlink, etc) entries causing tar2db to stop early
7bdb222d Use DBWriter to write .lookup multi-threaded in tar2db
23c9e1e7 Don't use multiple threads in tar2db when reading .tar.gz/.tgz as nearly all the time is spent inside zlib
2e128d4f Increase zlib buffer in tar2db to speedup reading
c1911893 Fix multiple locations where Util::checkAllocation would never be called as the preceding allocation would already terminate on failure
1f302134 Fix two compilation failures revealed by Debian
5b03cdff Another instance of the same warning
3fda449b Fix compile warning
3b0197af Encode species names in taxonomy blocklist to make sure we don't block random nodes in non-NCBI taxonomies (e.g. GTDB)
ab2426f8 Fix String MultiParameter (e.g. sub matrices) breaking if filenames contain whitespaces
e8de3507 Encode whitespace containing parameters as base64 to better deal with shell word splitting in workflows
c7a7c366 Add instructions to simd.h
6672bbc9 Fix missing newline in log message
84034a52 Remove useless taxonomy ancestor warning
6609c6cd Fix invalid taxonomy output mode being set
441c52cf Fix taxpercontig not working with easy-taxonomy
4ce38109 lca is not computed by easy-taxonomy anymore
9d631c16 Fix cleanup of taxonomy intermediate files
d0f596f5 taxonomyreport and addtaxonomy output is now adjustable in easy-taxonomy
6bfd08d5 Cleanup default set parameters in easy-taxonomy
afcade16 Improve default taxonomy parameter lists shown (without -h)
fc126b3e Improve error messages when something is wrong with the input/output paths
3b49310f Improve unrecognized parameter message
83b9e9a1 Remove useless missing tmp dir warning
d0a9b79f Fix typo
48f9737a Add ORF filter parameters only to taxonomy for now
a6068975 Disable unfinished ORF filter in search
336d9d04 Add taxonomy citation
f7fde6fe Reduce binary taxonomy dump memory requirements slightly
eff61cfe Add \0 byte after serialization
7e63e1ea Fix typo in Parameters.h
019de271 Add vector of predefined substitution matrices
34b3a539 Merge pull request #389 from mr-c/simde_v0.7.0
74724b3a Cleanup headers in kmermatcher
73fd5cfa Update xxhash to v0.8.0
8dd192c0 Don't create false _has_{builtin,attribute}
c2d60348 Squashed 'lib/simde/simde/' changes from f2257f11..b6c9c964
062ef995 Merge commit 'c2d60348af5c036eb2cbc7974d84065e16ab4096' into simde_v0.7.0
bad16c76 Check correctly for existing of binary tax dump in createtaxdb
457cacab Replace string concatenation in aggregatetax with append
a5169557 Fix strcmp comperator in nrtotaxmapping too
0da81a03 Fix ASAN free-delete mismatch
4fa7cb27 Replace std::sort in StringBlock with fast sort
dc4f9ed4 Wrong comparision used in sort comperator was crashing clang
e09b3db3 Move taxonomy version to cpp file
1645696b Use less threads on PPC64LE regression
9c0a99ca Fix compile error in taxonomy test
f1ab0b3c Fix missing newline in lca
7ff6dc5e Add version check for binary taxonomy
df301e3b Create serialized/mmapable taxonomy in createtaxdb, taxonomy loads instantly compared to before
3addec8e Remove debug output mode from createtaxdb again
5407ca4c Don't create taxonomy files in createtaxdb again if they already exist
95968440 read correct number of CPUs in macos build script of nproc is not available
0defb362 Split aggregatetax and aggregatetaxweight parameter lists
86e6b0b7 Cleanup weightedMajorityLCA
d03a8d03 Add score vote mode to taxonomy weighted voting
553a670d Split non-index parts over more files if a split index is requested
f5a762ff Do not read e-values for tax-id 0 again in aggregatetax
4c1137c2 Add majoritylca module for majority voting based taxonomy from alignment results
4224c6a6 Move majority lca voting to NcbiTaxonomy class
6ab700bf Fix parameter order in lca and aggregatetax
26a8e478 Skip secondary structure in msa2* with (c)a3m input
6f56a262 Fix: Extract the correct source name when tar2db and createdb are used together
ea83a916 Fix cmake deprecation warning
ca6aea96 Fix #379: E-value parameters are now correctly parsed as doubles instead of floats
1cec7419 Fix atomic check when cross-compiling
aed7d976 Fix now correctly switching to xcode 12.2 in azure
184d834a Try building macOS ARM binaries on Azure's Catalina VMs
9b819686 Fix not returning error in mergeresultsbyset after error case
9f718741 Add MMSEQS_FORCE_MERGE env var for forcing generating fully merged dbs
3df79c30 Build arm64 macos binary only on big sur (not in CI yet)
acfa3ef1 Build universal mac binary for sse/avx and arm neon
f4f38685 Add symlinks to splitdb #376
41adb5d4 Add cpdb and lndb, place them and rmdb, mvdb into same file
99410a2e Revert "Remove handling of pre-split sequences in splitsequence"
3c0000ba Remove handling of pre-split sequences in splitsequence
6bb22ecc Add splitsequence parameters to all relevant workflows
d204e91f createtaxdb can create a taxdb by mapping through .source
1c52b75a Fix tar2db would create entries for non regular tar files
2719ba2f Allow createdb to read generic dbtype (to use in combination with tar2db)
9e990b30 Add missing stdin dbtype to getDbTypeName
c8e082e3 Increase number of opened files limit when DBReader is used
2a972e91 Fix gapped score calculation in proteinaln2nucl
750e8844 Update regression for taxonomy
35ad87ed Remove debug message
6a882624 Unify TaxPerContig and Taxonomy
7da33b05 Acc 2bLCA is now default for protein and translated taxonomy, tophit is always used for nucl-nucl
9d0169cc Taxonomy search mode fully integrated into alignment module
f8d2878e Refactor alignment to allow computing a limited number of realignments
1cc54190 New 2blca could compute LCA from res not finding anything in first aln
5067c1d4 Taxonomy refactoring
18da8d6e Set approx 2blca as default taxpercontig mode
6da35599 Make taxpercontig orf-prefilter parameters adjustable
45c4de7f Include file size and modified date of inputs in tmp file hash calculation #372
cc472544 Fix #371: --cov-mode 5 was not working
8e8e9a0b Fix MPI compile issue
f537370a BC breaking: Unify in result2msa --compress --summarize --omit-consensus to --msa-format-mode, support stockholm output
951d51b4 Don't link header db etc in filterresult to output db
349c2765 Move currentKey out of ifdef in tar2db
d95e41e7 Always compute result files in easy-taxonomy
31a90e13 Actually fix the uninitialized warning
20eeaabc Fix uninitialized warning
3c94c0a2 splitsequence can create a sequence database with original headers
aca7380b Return bit-score in proteinaln2nucl instead of raw-score
18588bb3 Fix filterresult off by one issue
9b74117e proteinaln2nucl can now compute scores and evalues
8ea08f0c Add curl flag to follow redirects to database downloader
1cf3002a Fix compiler warning
5dc4bcd4 Update eggnog urls (fix curl bug)
20a03128 Fix id issue in tar2db
be4d2e07 Add multi-threading support to tar2db
f6831608 Merge pull request #359 from mr-c/spelling
b244246b Spelling typos fixes
d9f2041e Merge branch 'master' of https://github.com/soedinglab/mmseqs2
971f9d90 Turn profiles from lin-space to scores, add average profile-profile code
96d452cb Inline single use of DBWriter::mergeFiles to mergedbs
24ecc26c Fix some compilation flags would not be correctly set during cross-compilation
beabb353 Make sure to flush stdout/err before calling any workflows
a1622068 Add missing dbtypes to allDbAndFlat
49240a30 Setting APT::Immediate-Configure=false fixes cross-compiler installation
d4fd0729 Next try to fix cross-compilation
bd3e49fe Remove ubuntu-toolchain ppa breaking cross-compiler installation on azure
4b9b3b56 Remove all other apt sources from azure before installing cross-compilers
57f429a0 remove unused remnants of the past in alignment class
de06950f Reduce calls to posix_memalign, fixes lock contention of some platforms
d3b0cf9a Fix result2profile could allocate not enough memory if target database contained much longer sequences than query database
1a490efe Support ungapped alignments in sliced search
3af62f06 Fix banded_sw
333cc350 Fix addtaxonomy always crashing due to invalid check
29e327f9 change orf filter params to match test runs
cc7d7da3 result2repseq should preload the sequence database into memory
63794225 Improve createsubdb help text
951d5a72 Add nrtotaxmapping to create taxonomy mapping from NR
90e71f99 Squashed 'lib/simde/simde/' changes from 938d82c8..f2257f11
df69c26e Merge commit '90e71f9968d3925e545c45d7c68325dd3cd0c588' into master
48950b95 Correctly pass threads/verbosity in taxonomy workflow
9d3ab794 Merge commit 'b6a4528e818ca644f8200fc84b2d1856ecd8f5c7' into master
b6a4528e Squashed 'lib/simde/simde/' changes from 2119ac73..938d82c8
725d9f63 Modified Profile-Profile alignment implementation with templates.
113e3212 Fix ASAN issue in extractorf when using AVX2
b15e95a1 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
b7ec0e93 Fix setcover issues with dbs > 2^31 sequences
f8b3f8b1 Add biocontainers badge
b7ac683c Update cluster update regression
4d665ce9 Automatically set cluster parameters also in cluster update
b5a08833 Fix #272 remove deleted sequences from old clustering in cluster update
66f77ce8 Cleanup subtractdbs
b2ac9e0b remove confusing comments
3d2e394a Limit number of jobs used for compiling on travis
d58cc78c Fix invalid symlinks in result2repseq
21f71466 Cluster update refactoring
60d5be17 Add missing var to profile
12b78e3f Merge branch 'master' of https://github.com/haydenji0731/MMseqs
2aaac47a First running version of double max profile/profile
fbe754e0 Fix missing newline in first sequence in entries of result2msa
db1c38b1 Made changes to SSW class for Profile2Profile Alignment
a29379e2 Do not map scores if not needed in result2pp
e80ec9a3 Updated ROC for result2pp
769aa78a Add seqdb preloading in result2pp
cf8b1429 Remove more unused parameter from result2pp
f2a29339 Update regression to include result2pp test
3fac8dde Copy profile information of unaligned regions from query profile
967a4555 Cleanup and fix result2pp
b2f49a25 Add NR taxonomy information
efdbe941 Change serial sort to std::sort
97a8f1dc Update regression
0c123fe7 Fix comp. bias correction in expandaln
401d8e6f Add --max-seqs to ungappedprefilter
f57d1a71 Update expandaln, expand2profile and regression
a62ea9a9 Update reassign cov. mode in prefilter and fix regression
64f9294b Update regression to include expansion test
61d8b64d Fix coverage read in for nucl-nucl alignment results (#339)
45ae9276 Compute evalues and sort correctly in expandaln
38fab36e Fix wrong sequences being loading in expandaln due to wrong sorting
3aa032be Cleanup in MultipleAlignment
ea3212f0 Fix realloc size in profile set size increase
7cca0508 Fix restart cluster-reassign
0945e5a5 Add prefilter parameter to reassign
4e436c79 Fix compile error in tests
47e62299 Avoid constant allocations in PSSMCalculator
657a97c0 Don't clone the whole result_t vector uselessly in profile related modules
b87cae01 MultipleAlignment does not require constantly allocating and deallocating Sequence objects anymore
486e13ac Remove add internal ID parameter in result2msa
0a8a7a3a expand2profile module should be able to directly build a new profile
a84e6f48 Make max set size in profile classes dynamically growable
5baf62ab Cleanup Sequence class
e4b2ffb0 Move PSSM masking and writing to its own file
d10a6104 Fix clang warning
76d7d83b Fix progressbar in first clust readin step
01937be2 Taxonomy expressions in filtertax(seq)db interpret , as || now #320
fddf635d Add SILVA to databases module
9ec7c5e6 Fix MPI warning
ce65cb86 disable ICC in travis, beta08 breaks their setvars.sh script and SIMDe has many issues
87183135 Fix warning in clang
97653a92 Check the return code of fclose to handle full disk errors better
06bd0cfd Add filterresult for pairwise HHblits filtering to reduce redundancy in a result db #316
3bdaf488 Fix various result2msa modes (compress works cleanly now, --filter-msa mode could return invalid MSAs)
c1f78338 Fix invalid projected backtraces in expandaln
d741a251 Remove circular include
595625a1 Cleanup result2msa/profile
8ad36374 Unify to computation of alignments in msa2result and transitivealign
55534d71 Fix wrong lengths used in msa2profile
5d10ce00 Rewrite expandaln module
4be0d6e1 Add msa2result module for generating result dbs from MSAs
a179ab27 Cleanup DBConcat
a9c56e57 Merge branch 'master' of github.com:soedinglab/MMseqs2
ec3b8254 Try out new aggregate tax algoritm
cfba9f02 Fox .index.0 files not being removed after sorting
dde4b2e3 Next try to downgrade ICC
618331da Downgrade ICC since latest version seems to be broken
ed45a9f2 Remove unused variables in rewritten microtar
328732a1 Update regression
ae7398d6 Added fident to convertalis. fident prints the fraction of the sequence identity. pident reports the percentage. soedinglab/MMseqs2#337
a61b9eb9 handle the unranked root and cell orgs
d2141f32 ORF filter with high-eval thr ungapped alignment
ea01a174 Remove useless cast in QueryMatcher
1e95b6bd Update tantan
207d0d21 Allow overwriting string parameters with empty strings
755a7b03 Add new binaries to README and fix whitespace
be05b8d0 Add orf-filter to taxpercontig and cleanup
22e17aa4 orf-filter should also work in easy-search and easy-taxonomy
7fefa8af added mode to ByteParser
4393c5aa typo
b05d7d75 Speed up read index and kmermatcher
3f9a6031 Fix --search-type 4 in createindex
18e90119 Rework read index in DBReader
1eb72611 Do not sort indexes when already ordered while DB close
65f246b1 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
053fd61c Improve multi-threaded speed of writing clustering results
e1a71066 Fix typo in arch name
3b1e528c Add SSE2 binary to Docker
d66ee416 Use Ubuntu 20.04 for cross-compilation
8ba605e8 Add SSE2 and cross-compiled ARM64/POWER8/POWER9 builds to azure
a5e485ba Fix broken checks for libraries when cross-compiling
7fe0cb90 Fix progress bar in DBConcat
cef0731b Create translated index if --search-type 4 is used in createindex
47afc572 Fix --search-mode 4 issues in offsetalignment
80fdcbed Change cluster reassing to bool soedinglab/MMseqs2#329
57e8a9df Allow ORF filter only in combination with query nucleotides
d55f06ce Fix Pfam.full database creation
659cc1f8 Add additional experimental ORF prefiltering step before translated search
e934f1c4 made ByteParser more informative
4d14c9fe tax-lineage modes: 0 nothing, 1 names, 2 taxids
b777cd09 Disable ips4o on ppc for now
95a88524 find_package is case sensitive
8e797b1d Allow disabling use of IPS4O, cleanup
850a196b added seqs assignment agreement to the output
e21dc40f Fix wrong existence checks for databases in workflows
5901a0a9 Set minimum clang to 5.0 for now
d7b46e60 Disable ips4o on cygwin
033fda23 Change travis gcc check to 4.9
908675d2 Add includes
ee7b5c11 Change random_shuffle to shuffle
d1a1af5e Rewrite atomic check in cmake
d6590f39 Add missing FastSort.h
704d0fb4 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
109be7bb Change sort to ips4o if possible
02059366 Fix warning
d092a469 Fix kmermatcher MPI support
b001dfb2 Made modifications for Profile-Profile alignment. Changes belong to SSW, Alignment, Matcher. Right before integrating lin space vector cost calculation for H value.
521c0d25 Made modifications to ssw algorithm implementation.
2f1db01c Rename martin.steinegger@mpibpc.mpg.de to martin.steinegger@snu.ac.kr
0f7b6856 Fix #326 wrong citation link
62a387ed Merge branch 'master' of https://github.com/soedinglab/mmseqs2
c125a217 Fix issues in expandaln
648bc1f6 Add Pfam-B download script
16e79a2a Add dbCAN2 download script
7c0ed7f8 Microtar would try to seek backwards resulting in horrible gzip read performance
cab0e838 Fix #323 createdb not correctly reading gz/bzip with --createdb-mode 1
1d650034 mmseqs --help should not give a useless correction suggestion
35c58af9 Improve download of taxdmp file in createtaxdb
68feeb20 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
6546822c Add missing .dbtype to newSeqDb header in cluster update
2a787482 Merge pull request #321 from milot-mirdita/simde
72d19b96 Seems like travis reduced the RAM available on ARM
565ad3f9 Add script to update SIMDe
b9783a7f Squashed 'lib/simde/simde/' content from commit 2119ac73
9828f0d6 Merge commit 'b9783a7fca1677486f2f830a9c59fda11330980c' as 'lib/simde/simde'
641ef68b Remove submodule in preparation for subtree
b6dd6447 Work around clang issue
a877dc00 Rebuild SIMD autodetection
5ba9e7ae Cleanup warnings
3980d2a7 Add one Newton-Raphson it to make division with _mm_rcp_ps always consistent
27b82963 Try limiting threads in ppc to not crash on 4gig ram
c95bdcc1 Silence strict aliasing warning in Itoa for NEON
590cfb96 Rebuild 128/256 bit SIMD split in simd.h
f5750fee Enable building on non-x86 and less than SSE4.1
21d798f0 Remove not finished createtaxdb changes
b59c3381 Make orf information available through convertalis
284bb757 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
f4bbce84 Add MemoryTracker, Account for index size when computing available memory
e2510e8f fixed comment because it wasmisleading
def7ace2 Add convertalis HTML output based on MMseqs2 app (app.mmseqs.com)
dd3ff63a Fix convertkb to work without a mapping file
dc054792 Previous lookup writer would always report failing
52ac0f36 Refactor lookup writing to not corrupt memory if an accession is too long
9f2be0e0 Disable ICC travis for now
d319bb92 Merge branch 'master' of github.com:soedinglab/MMseqs2
d1522365 remove appendtaxaln
648cf836 refined code as per Milot's feedback
94db0316 One more INT -> UINT warning that ICC complains about
271b7c13 Next try for travis
2c1dfdd4 Fix terminate value in SSW again
6016e1b1 Try to fix travis
ddaaaf7f Fix various warnings reported by ICC, add ICC to travis
f3adc10f added aggregatetaxweights to get rid of appendtaxaln
211dd7a7 change to SSTR
517b01ba true in createParameterString saves defining taxonomy defaults
aa068c86 added taxonomy default parameter values because life
f9d1face in the process of adding taxPerContig workflow
94f895a3 fixed english
19f1dfbd moved weight const back
8db3e714 moved definition of constants
05cfc8bd added mode of tax output: both lca and aln
2cd59046 added voteMode parameter
128f57b5 extended aggregatetax to handle eval-based weights
e4a10bd7 added appendtaxaln for extending aggregatetax
0c29da4a Actually fix the filterdb --join-db issue
7ff6ae7c Restore fix lost char in joindb mode change
f5c8b28c Update README.md
e4f7e745 Add qOrfStart/qOrfEnd, dbOrfStart,dbOrfEnd to offsetalignment
cf40916c Merge branch 'master' of https://github.com/soedinglab/mmseqs2
c0dac797 Do not write null byte in splitdb
cbb542af added rand id to tmp files created at localTmp
214e87e9 Remove goto in lca.cpp
c8309fce Merge branch 'master' of https://github.com/soedinglab/mmseqs2
b761ddf4 Fix issue with qset format output
80bff832 Do not write .lookup in easy workflows if not needed
21f7a05f createdb can now read a database containing FASTA/Q entries
d5a05376 Fix whitespace and cleanup output strings in createdb
d14b622e Fix cygwin compile issue
9e5fb33b Introduce KSeqWrapper to read from memory location
dc7b9626 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
de6f7524 Fix soft link createdb bug if multiple input file are provided
b06bee91 Fix alpha regex
46c84389 Update combine pval agg-mode 3
67d61013 Disable fancy progress bars on travis to reduce output
203a2173 Updated two more tests to use tighter ROC thresholds
a9052f44 Update regression with tighter bounds for ROC tests
c62736a6 Correctly parse keys from data files in filterdb --filter-file This was causing a linsearch instability
fe007cb4 Use MultiParam for gapOpen, gapExtend costs
3513001d Add easy-rbh workflow
d0d3032e Fix RBH search if using -a to show alignments
ce1a43bf Merge branch 'master' of https://github.com/soedinglab/mmseqs2
ea24e493 Fix issues with abs. path if using aria2c
5228745f Improve --alignment-mode parameter description and make it a non expert parameter
fffa9b10 Fix various inconsistencies and usability issues with alignall: * alignall alignment-mode did not correspond to align alignment-mode * add-backtrace did not do anything, has to be specified now if backtrace is needed * Did return a alignment db type even though it is incompatible with that type, uses generic for now * various parameters were passed but unused   - zdrop and scorebias are used now (however see below)   - realign, alt ali, max accept/reject, wrapped are now gone
29066847 Fix wrong warning
813d81f2 Update regression
264d7811 Switch greedy clustering algorithm back to old idea
c09f6574 Improve nucleotide clustering workflow
38a73770 Set k-mers in linclust to 0 for the nucleotide clustering
7df6e3f7 Replace characters that can not be reversed by N in extract frames
e9678f62 Update regression
f886e868 Add nucleotide support to cluster (workflow nucleotide_clustering), clust module will infer identity automatically if missing, Improve low. mem. greedy incremental algorithm, Update regression
5f873587 Add kmers-per-sequence-scale to linsearch
0310eb60 Change --kmer-per-seq-scale to a multi parameter, add error if cluster is called with a nucleotide sequence
e258bc8d Fix #299 PDB70 database creation was not working
7095f37e Add support reverse complemente in rescorediagonal --rescore-mode 0 and 1
61ca4888 Fix result2dnamsa
70d014e4 Add search-type 4 to Search
462f24cb Add module result2dnamsa
5670d990 Fix regression error
e4451d59 Add result direction parameter to kmersearch
12c499dc Fix reverse sequences issues in linclust and linsearch
44499c3c Update filterdb regression test
807b4a56 Fix issue soedinglab/MMseqs2#290. Filterdb checked for mode == true but mode was 2.
24479bc2 Fix Docker
a578f52a Fix char signedness on PPC
a0d64a98 Update regression
a07a266f Working on PPC64LE support
09734177 Remove remaining _mm_shuffle_epi32
cdef78a6 Merge pull request #285 from hgsommer/misc_small
283c8d03 Replace goto end in ssw
6bfc5028 Fix c/p mistake in convertalignments
e61da344 Fix spelling of 'length'
9a63760f Replace nested ternary operator
4349b5c6 Avoid repeatedly checking for profile db types
c170a11f Call MsaFilter::shuffleSequences() from MsaFilter::filter()
ef49ba22 Return value from MsaFilter::filter()
d155dc36 Replace int by bool literals for bool variable
ec6722ad Align headings with column in PSSMCalculator::printProfile()
548a9bd6 Avoid forward declaration of ScoreMatrix
d0fbe471 Do some cleanup in StripedSmithWaterman.cpp
91d1aedd Replace check for zero-sized containers by empty()
e47b8eed Remove superfluous parameter from ssw_init()
250b1221 Simplify return statements
4fe1116a Remove counting zero scores in Sequence::mapProfile()
4303728b Replace multiplication by zero
1bd60242 Remove increment by zero
e4d4389f Move check for exit condition in front of allocations
556d26d1 Clean up function signatures in MultipleAlignment
3863af9a Move include back to header to restore build
e1208493 Remove unused TmpResult score field
1fd4db8f Die if DBReader cannot reopen files (e.g. no more file handles left)
1e21b87b Purge sequenceLookup early since its recreate in split databases
40854ddc Prefiltering and CacheFriendlyOperations refactoring
2433e086 WASM work in progress
14014cd0 Fix prefilter overflow instability
e0f97184 Add conda forge to conda install instructions
aa175d63 Fix off by one in kmermatcher soedinglab/MMseqs2#274 (comment)
d1607bc8 Remove LINE_MAX
eca2155d Clear string buffer instead of reassigning in swapresults
0f4645ed Fix wrong reverse marking in linsearch reported by UBSAN
5b612a32 Missing mpi binaries for travis regression
83d22417 Next try for ARM compiler flags
7ad122f0 Missed a few variables
ac7914be Do not require a cmake variable to build ARM
0dcfaadb Update regression to fix broken samtools call on ARM
29927b4c More NEON fixes, we assume signed chars, ARM uses unsigned by default
7760220f Next try to get the ARM regression to work
cc6d0d52 Add hack to not break travis log size limit
5408c3d1 Try to get NEON to compile
83192cab Fix search workflow parameters printed twice
f6f001c8 Fix new clang-10 warnings and further travis fixes
259e6434 llvm-10 alias is not whitelisted in travis yet
b1249fd5 Fix errors in Travis YAML from previous commit
18486d4c Update travis - use native aarch64 for neon - use xenial - shorten script
98c37f3c shortend MultiParam usage, improved line breaks in usage
c9be07f1 Add gcc-9 to travis
2e5fb309 Fix travis clang build
d5865c89 Remove MultiParam g++-9 warning
73679835 Rework target split merging
ca586939 Fix RESSIZE issue in slice search if sequences are used
491900b9 Improve usage text of cluster/linclust
0166850a Remove old greedy incremental clustering code and just run the memory efficient version instead.
15163e64 Fix Verbosity in workflows
aa78af46 Fix issue soedinglab/MMseqs2#274
7846dfce fixed clang template error
e1206371 extended MultiParam class, replaced ScoreMatrixFile type by MultiParam<char*>
b88b5475 rewrite alphabetSize as multi parameter
ecb4e35d started template class MultiParam to store sequence type specific values
e1a1c122 changed dbtype comparision in AlignmentSymmetry
2a829aef Replace symlinkat call with getcwd/chdir/symlink/chdir to fix Conda build using macOS 10.9 SDK
28e83e8d Add OpenMP include to DBReader
fb00aa0c Fix realloc issue while IndexTable creation of profiles
504e5021 Take max. seq. len of query and target db in prefilter and alignment
16e23521 Fix bug if seq. len > max seq. length in Alignment
80d0187d Fix asan issue
751f5c19 Make ZDROP an expert parameter, change description text
1b6edd0d Rework x detection (SIMD)
9677254a Merge branch 'master' of https://github.com/soedinglab/mmseqs2
1ac1e686 Fix max seq issues in prefilter
cb737033 Reset download strategy to not use aria2c for the NCBI download
c95f3ee0 fixed ksw2 test
72b95c0c Error if we cannot download from NCBI
1d0aad50 Fix databases not piecing togehter all kalamari accessions
516723d5 Merge branch 'master' of https://github.com/soedinglab/MMseqs2
d81b6cca added zdrop parameter to control banded nucleotide alignment
e2e39a97 Add Kalamari Contaminants database
c0c538ea Various fixes in databases script
08cc95b3 Fix createtaxdb redownloading when taxdump already exists
018eb349 Remove a bit whitespace in front of each parameter in usage message
8aa7513d add aggregatetax example, fix typos
8bcd7c74 Fix typo
8e581b76 Rework usage texts
7dc25764 Hide most parameters from createindex
2baa609e Add examples to many modules
00a7d769 fixed bugs for long or wrapped nucleotide sequences
a4bdcb47 eggNOG profiles should not depend on the deleted MSAs
4c783095 Fix eggNOG database construction
f7a5599c Cleanup not needed files immediately in databases workflow
3ed3690d Fix downloads always restarting in databases workflow
4cfac9a8 Fix aria warning with more than 16 connections
e0a00e10 Revert "Use SW instead of BandedNucAln if we don't have diagonals"
7ac966b2 Fix result2msa could fail if it was writing compressed output
95729ac7 Fix wrong output DB type written in alignall
f899e7c7 Use SW instead of BandedNucAln if we don't have diagonals
c08d9fa8 Allow parameter descriptions to span multiple lines
57868498 MMseqs2 is not limited to proteins, update README to reflect that
11818b0a Cleanup hiding parameters in workflows
c481cea6 Remove some useless includes
2f64aeeb Fix databases timestamp appending instead of overwriting
ae9e9e32 Add eggNOG setup procedure to databases
31c8e5d5 Shorten two short parameter descriptions
2f49d3e3 Read header from lookup in msa2profile if available
1356869b add option to reverese profile dbs
ac3482e8 More issues with zlib and tar2db
aaafafe4 Fix tar2db keys
c751d9e2 More tar2db fixes
a9c93014 Fix variadic input to tar2db
51a76130 Add tar2db module to convert content of any tar to a DB
96f9a91e Use nedmalloc on Windows/Cygwin
73f5c2a2 Add databases workflow to README
5a7ac9e5 make align output consistent
c5ebe529 fixed setcover cluster mode (by fixing bug in similarity reading for short aln results e.g. hamming distance aln)
481696b5 Fix databases output
c6b4a57a Beginning cleaning up parameter descriptions
a9552a17 Show default value of bool parameters
af89c467 Add a proposed example text structure

git-subtree-dir: lib/mmseqs
git-subtree-split: c48da9d781b81804727b5cccfed7f97cfcc20c9d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants