Squashed 'lib/mmseqs/' changes from 3097aa5a3..0a1348be7

0a1348be7 offsetalignments now correctly returns a nucleotide backtrace if needed 34da34640 include VTML40 in binary for easier access d82fdd233 Add missed target .source file for reading in convertalis b87ac140d Overload patterncompiler isMatch for pos of match 013ee7cf8 avoid appending extra tabs besthitperset 77567ee86 Update regression to fix samtools validation never getting called on linux e10238de0 Make citations extendable in inheriting tools 3a53b0a7e Reset progress if createdb has to redo cee42cf19 Remove useless buffer 27995609f Warning for compressed databases in soft split mode 4224ae469 fix warning in convertalis 9af458912 Add tax. support to convertalis 102e4299b fixed bug in offsetalignment f36513b54 corrected header len 913e40f2f Next try for strict ordering problem aa9251fd4 Update README c570e476c Fix not existing file deleted in createtaxdb.sh 173a54e16 Fix wrong weak ordering for std::sort in DBReader::sortIndex 3ac5fc142 remove travis macos build 11e273602 Add a mode to createdb to support soft linking data instead of copy 5ae5503a9 changed skipping logic for sequences with repeated kmers: skip repeated kmer when looking for matches instead of skipping whole sequence, replaced skipNRepeatKmer parameter by ignore-multi-kmer parameter with new logic 84e53091f Fix issue soedinglab/MMseqs2#239 c0a8ed056 Merge branch 'master' of https://github.com/soedinglab/MMseqs2 99d763e13 fixed really rare edgecase of wrong kmermatch positions 88cb82700 Try fixing macOS build when coreutils is not installed 49a7568e6 compile macOS build with AppleClang 34eefdec0 Silence omptl warnings c94d00e50 Third time is hopefully the charm in deleting regression tmp results b8f968b4b Now the intermediate regression tests should actually be deleted 489708b68 New regression cleans up after itself to not run out of space d6bf14f9b Allow libomp compilation on macOS 9c224bbe4 Add threshold for k-mer only prefilter 064334269 Fix memory leak in db-load-mode 1 4ddf8bbb1 Merge branch 'master' of https://github.com/soedinglab/MMseqs2 3e3ec7771 fixed edge case in alignment calculation for length=1 d8d607b0c Include db-load-mode regression test ebb16f363 Fix #234: Prefilter with precomputed index would load invalid memory in PRELOAD_MODE_FREAD c88dd3818 fixed comparison between signed and unsigned integer error b7ffc1066 added wrapped scoring for ungapped and gapped nucleotide alignments in align module, shortened result if wrapped scoring is activated 0c4b74980 added wrapped scoring for ungapped alignments 1dd955a2c Remove mmap merge branch in mergeTargetSplits 8c79865b6 Breaking fix #179: —-lca-ranks now expects ranks semicolon separated and will return species names also semicolon separated 790d70566 Mark krona as vendored 274752cb0 fixed comparison bug in kmermatcher 2f4b77c18 Merge branch 'master' of https://github.com/soedinglab/MMseqs2 99462142c introduced template for seqLen and seqPos in KmerPosition to handle kmermatches of sequences larger than MAX_SHRT corrrectly eb090882b Invalid parameter passed to taxonomyreport in easy-taxonomy workflow 076daa363 escape species name in taxonomy report so it cant break krona 66385552c Fix broken taxonomyreport CLI again deb0e82a7 Taxonomy report can also build a Krona HTML report eba0ccf03 Fix 230: apply mode was using the wrong database entry length 1b9a22574 Fix #229: Invalid query length used in result2msa f05f8c51d reduce efficiency impact of edgecase fix in gapped alignment computation 5d78b6c41 updated regressiontest for easynuclnucltax c2e7c273a fixed edgecase for gapped alignment computation 4404fe0a7 Fix pairwise LCA function 5c69b0dee Fix issues with splitsequence if combined with compressed fff584e77 Try getc based merging instead of mmap 736e0bfe8 Fix error messages 926920014 Shorten merging debug output 461301834 Allow target split merging to happen multi-threaded again d1b5f63fa Cleanup more dead code 205740420 Some cleanup in Util 202efad1b Fix warning 62aae0335 Fix compiler issues 0f963300b Quick fix prefix id issue with lookup 0a88a9ee2 Merge branch 'master' of https://github.com/soedinglab/mmseqs2 488b14f4f Open reader git-subtree-dir: lib/mmseqs git-subtree-split: 0a1348be78bd84137bdb373ba32e0e8c054b3e1c
soedinglab · Nov 28, 2019 · e594d87 · e594d87
1 parent 3550b0c
commit e594d87
Show file tree

Hide file tree

Showing 93 changed files with 7,916 additions and 751 deletions.
diff --git a/.gitattributes b/.gitattributes
@@ -1,2 +1,3 @@
+data/krona_prelude.html linguist-vendored
 lib/* linguist-vendored
 lib/simd linguist-vendored=false
diff --git a/.travis.yml b/.travis.yml
@@ -91,18 +91,6 @@ matrix:
         - libopenmpi-dev
         - shellcheck
     env: MPI=1 CC=gcc-8 CXX=g++-8
-  - os: osx
-    osx_image: xcode10.1
-    addons:
-      homebrew:
-        packages:
-        - cmake
-        - ninja
-        - gcc@8
-        - zlib
-        - bzip2
-        - shellcheck
-    env: CC=gcc-8 CXX=g++-8
   allow_failures:
   - env: QEMU_ARM=1
   fast_finish: true
@@ -111,7 +99,6 @@ services:
   - docker
 
 before_install:
-  - if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew link gcc@8 --force; fi
   - export CC
   - export CXX
 
@@ -124,8 +111,6 @@ script:
       mkdir build; cd build; \
       cmake -G Ninja -DENABLE_WERROR=1 -DHAVE_MPI="$MPI" -DHAVE_SSE4_1=1 -DHAVE_TESTS=1 -DREQUIRE_OPENMP=0 .. \
         || exit 1; ninja || exit 1; \
-    elif [[ "$TRAVIS_OS_NAME" == "osx" ]]; then \
-      ./util/build_osx.sh . build || exit 1; \
     else \
       exit 1; \
     fi

diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@ MMseqs2 (Many-against-Many sequence searching) is a software suite to search and
 
 [Steinegger M and Soeding J. Clustering huge protein sequence sets in linear time. Nature Communications, doi: 10.1038/s41467-018-04964-5 (2018)](https://www.nature.com/articles/s41467-018-04964-5).
 
-[Mirdita M, Steinegger M and Soeding J. MMseqs2 desktop and local web server app for fast, interactive sequence searches. Bioinformatics, doi: 10.1093/bioinformatics/bty1057 (2019)](https://academic.oup.com/bioinformatics/article/35/16/2856/5280135)
+[Mirdita M, Steinegger M and Soeding J. MMseqs2 desktop and local web server app for fast, interactive sequence searches. Bioinformatics, doi: 10.1093/bioinformatics/bty1057 (2019)](https://academic.oup.com/bioinformatics/article/35/16/2856/5280135).
 
 [![BioConda Install](https://img.shields.io/conda/dn/bioconda/mmseqs2.svg?style=flag&label=BioConda%20install)](https://anaconda.org/bioconda/mmseqs2)
 [![Github All Releases](https://img.shields.io/github/downloads/soedinglab/mmseqs2/total.svg)](https://github.com/soedinglab/mmseqs2/releases/latest)
@@ -20,7 +20,7 @@ MMseqs2 (Many-against-Many sequence searching) is a software suite to search and
 
 
 ## Documentation
-The MMseqs2 user guide is available in our [GitHub Wiki](https://github.com/soedinglab/mmseqs2/wiki) or as a [PDF file](https://mmseqs.com/latest/userguide.pdf) (Thanks to [pandoc](https://github.com/jgm/pandoc)!). We provide a tutorial of MMseqs2 [here](https://github.com/soedinglab/metaG-ECCB18-partII).
+The MMseqs2 user guide is available in our [GitHub Wiki](https://github.com/soedinglab/mmseqs2/wiki) or as a [PDF file](https://mmseqs.com/latest/userguide.pdf) (Thanks to [pandoc](https://github.com/jgm/pandoc)!). The wiki also contains [tutorials](https://github.com/soedinglab/MMseqs2/wiki/Tutorials) to learn how to use MMseqs2 with real data.
 
 Keep posted about MMseqs2/Linclust updates by following Martin on [Twitter](https://twitter.com/thesteinegger).
 
@@ -61,10 +61,7 @@ Compiling MMseqs2 from source has the advantage that it will be optimized to the
         make install
         export PATH=$(pwd)/bin/:$PATH
 
-:exclamation: To compile MMseqs2 on MacOS, first install the `gcc` compiler from Homebrew. The default MacOS `clang` compiler does not support OpenMP and MMseqs2 will only be able to use a single thread. Then use the following `cmake` call:
-
-        CC="$(brew --prefix)/bin/gcc-9" CXX="$(brew --prefix)/bin/g++-9" cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. ..
-                
+:exclamation: Compiling MMseqs2 correctly on macOS requires [more effort](https://github.com/soedinglab/MMseqs2/wiki#compile-from-source-under-macos).
 
 ## Getting started
 We provide `easy` workflows to cluster, search and assign taxonomy. These `easy` workflows are a shorthand to deal directly with FASTA/FASTQ files as input and output. MMseqs2 provides many modules to transform, filter, execute external programs and search. However, these modules use the MMseqs2 database formats, instead of the FASTA/FASTQ format. For maximum flexibility, we recommend using MMseqs2 workflows and modules directly. Please read more about this in the [documentation](https://github.com/soedinglab/mmseqs2/wiki).
@@ -81,7 +78,7 @@ For clustering, MMseqs2 `easy-cluster` and `easy-linclust` are available.
 
         mmseqs easy-linclust examples/DB.fasta clusterRes tmp     
 
-Sequence identity is in default [estimated](https://github.com/soedinglab/MMseqs2/wiki#how-does-mmseqs2-compute-the-sequence-identity) to output real sequence identity use `--alignment-mode 3`.
+Sequence identity is by default [estimated](https://github.com/soedinglab/MMseqs2/wiki#how-does-mmseqs2-compute-the-sequence-identity) to output real sequence identity use `--alignment-mode 3`.
 Read more about the [clustering format](https://github.com/soedinglab/mmseqs2/wiki#clustering-format) in our user guide.
 
 Please adjust the [clustering criteria](https://github.com/soedinglab/MMseqs2/wiki#clustering-criteria) and check if temporary directory provides enough free space. For disk space requirements, see the user guide.

diff --git a/azure-pipelines.yml b/azure-pipelines.yml
@@ -35,11 +35,11 @@ jobs:
       - checkout: self
         submodules: true
       - script: |
-          brew install cmake gcc@9 zlib bzip2 coreutils
+          brew install cmake zlib bzip2 libomp
         displayName: Install Dependencies
       - script: |
           cd ${BUILD_SOURCESDIRECTORY}
-          CC=gcc-9 CXX=g++-9 ./util/build_osx.sh . build
+          ./util/build_osx.sh . build
         displayName: Build MMseqs2
       - script: |
           ${BUILD_SOURCESDIRECTORY}/util/regression/run_regression.sh ${BUILD_SOURCESDIRECTORY}/build/build_sse41/src/mmseqs ${BUILD_SOURCESDIRECTORY}/regression

diff --git a/data/CMakeLists.txt b/data/CMakeLists.txt
@@ -22,6 +22,7 @@ set(COMPILED_RESOURCES
         enrich.sh
         blastn.sh
         VTML80.out
+        VTML40.out
         nucleotide.out
         blosum62.out
         PAM30.out
@@ -35,6 +36,7 @@ set(COMPILED_RESOURCES
         searchslicedtargetprofile.sh
         cs219.lib
         linsearch.sh
+        krona_prelude.html
         )
 
 set(GENERATED_OUTPUT_HEADERS "")

diff --git a/data/createtaxdb.sh b/data/createtaxdb.sh
@@ -59,7 +59,7 @@ if [ -n "$REMOVE_TMP" ]; then
    rm -f "${TMP_PATH}/names.dmp" "${TMP_PATH}/nodes.dmp" "${TMP_PATH}/merged.dmp" "${TMP_PATH}/delnodes.dmp"
    rm -f "${TMP_PATH}/taxidmapping"
    if [ "$DOWNLOAD_DATA" -eq "1" ]; then
-      rm -f "${TMP_PATH}/ncbi_download.complete" "${TMP_PATH}/targetDB_mapping.complete"
+      rm -f "${TMP_PATH}/ncbi_download.complete" "${TMP_PATH}/mapping_download.complete"
    fi
    rm -f "${TMP_PATH}/targetDB_mapping.complete"
    rm -f "${TMP_PATH}/targetDB_mapping"

diff --git a/data/easysearch.sh b/data/easysearch.sh
@@ -14,7 +14,7 @@ INPUT="$INPUT"
 
 if notExists "${TMP_PATH}/query.dbtype"; then
     # shellcheck disable=SC2086
-    "$MMSEQS" createdb "$@" "${TMP_PATH}/query" ${CREATEDB_PAR} \
+    "$MMSEQS" createdb "$@" "${TMP_PATH}/query" ${CREATEDB_QUERY_PAR} \
         || fail "query createdb died"
 fi
 

diff --git a/data/easytaxonomy.sh b/data/easytaxonomy.sh
@@ -10,7 +10,7 @@ notExists() {
 
 if notExists "${TMP_PATH}/query.dbtype"; then
     # shellcheck disable=SC2086
-    "$MMSEQS" createdb "$@" "${TMP_PATH}/query" ${CREATEDB_PAR} \
+    "$MMSEQS" createdb "$@" "${TMP_PATH}/query" ${CREATEDB_QUERY_PAR} \
         || fail "query createdb died"
 fi
 
@@ -40,7 +40,7 @@ fi
 
 if notExists "${TMP_PATH}/result_tophit1.dbtype"; then
     # shellcheck disable=SC2086
-     "$MMSEQS" filterdb "${TMP_PATH}/result" "${TMP_PATH}/result_top1" --extract-lines 1 ${THREADS_PAR} \
+     "$MMSEQS" filterdb "${TMP_PATH}/result" "${TMP_PATH}/result_top1" --extract-lines 1 ${THREADS_COMP_PAR} \
         || fail "filterdb died"
 fi
 
@@ -52,13 +52,13 @@ fi
 
 if notExists "${TMP_PATH}/result_top1_swapped_sum.dbtype"; then
     # shellcheck disable=SC2086
-     "$MMSEQS" summarizealis "${TMP_PATH}/result_top1_swapped" "${TMP_PATH}/result_top1_swapped_sum" ${THREADS_PAR}  \
+     "$MMSEQS" summarizealis "${TMP_PATH}/result_top1_swapped" "${TMP_PATH}/result_top1_swapped_sum" ${THREADS_COMP_PAR}  \
         || fail "filterdb died"
 fi
 
 if notExists "${TMP_PATH}/result_top1_swapped_sum_tax.dbtype"; then
     # shellcheck disable=SC2086
-     "$MMSEQS" addtaxonomy "${TARGET}" "${TMP_PATH}/result_top1_swapped_sum" "${TMP_PATH}/result_top1_swapped_sum_tax"  ${THREADS_PAR} --pick-id-from 1 --tax-lineage  \
+     "$MMSEQS" addtaxonomy "${TARGET}" "${TMP_PATH}/result_top1_swapped_sum" "${TMP_PATH}/result_top1_swapped_sum_tax"  ${THREADS_COMP_PAR} --pick-id-from 1 --tax-lineage  \
         || fail "filterdb died"
 fi