Skip to content

Commit

Permalink
Squashed 'lib/mmseqs/' changes from 3097aa5a3..0a1348be7
Browse files Browse the repository at this point in the history
0a1348be7 offsetalignments now correctly returns a nucleotide backtrace if needed
34da34640 include VTML40 in binary for easier access
d82fdd233 Add missed target .source file for reading in convertalis
b87ac140d Overload patterncompiler isMatch for pos of match
013ee7cf8 avoid appending extra tabs besthitperset
77567ee86 Update regression to fix samtools validation never getting called on linux
e10238de0 Make citations extendable in inheriting tools
3a53b0a7e Reset progress if createdb has to redo
cee42cf19 Remove useless buffer
27995609f Warning for compressed databases in soft split mode
4224ae469 fix warning in convertalis
9af458912 Add tax. support to convertalis
102e4299b fixed bug in offsetalignment
f36513b54 corrected header len
913e40f2f Next try for strict ordering problem
aa9251fd4 Update README
c570e476c Fix not existing file deleted in createtaxdb.sh
173a54e16 Fix wrong weak ordering for std::sort in DBReader::sortIndex
3ac5fc142 remove travis macos build
11e273602 Add a mode to createdb to support soft linking data instead of copy
5ae5503a9 changed skipping logic for sequences with repeated kmers: skip repeated kmer when looking for matches instead of skipping whole sequence, replaced skipNRepeatKmer parameter by ignore-multi-kmer parameter with new logic
84e53091f Fix issue soedinglab/MMseqs2#239
c0a8ed056 Merge branch 'master' of https://github.com/soedinglab/MMseqs2
99d763e13 fixed really rare edgecase of wrong kmermatch positions
88cb82700 Try fixing macOS build when coreutils is not installed
49a7568e6 compile macOS build with AppleClang
34eefdec0 Silence omptl warnings
c94d00e50 Third time is hopefully the charm in deleting regression tmp results
b8f968b4b Now the intermediate regression tests should actually be deleted
489708b68 New regression cleans up after itself to not run out of space
d6bf14f9b Allow libomp compilation on macOS
9c224bbe4 Add threshold for k-mer only prefilter
064334269 Fix memory leak in db-load-mode 1
4ddf8bbb1 Merge branch 'master' of https://github.com/soedinglab/MMseqs2
3e3ec7771 fixed edge case in alignment calculation for length=1
d8d607b0c Include db-load-mode regression test
ebb16f363 Fix #234: Prefilter with precomputed index would load invalid memory in PRELOAD_MODE_FREAD
c88dd3818 fixed comparison between signed and unsigned integer error
b7ffc1066 added wrapped scoring for ungapped and gapped nucleotide alignments in align module, shortened result if wrapped scoring is activated
0c4b74980 added wrapped scoring for ungapped alignments
1dd955a2c Remove mmap merge branch in mergeTargetSplits
8c79865b6 Breaking fix #179: —-lca-ranks now expects ranks semicolon separated and will return species names also semicolon separated
790d70566 Mark krona as vendored
274752cb0 fixed comparison bug in kmermatcher
2f4b77c18 Merge branch 'master' of https://github.com/soedinglab/MMseqs2
99462142c introduced template for seqLen and seqPos in KmerPosition to handle kmermatches of sequences larger than MAX_SHRT corrrectly
eb090882b Invalid parameter passed to taxonomyreport in easy-taxonomy workflow
076daa363 escape species name in taxonomy report so it cant break krona
66385552c Fix broken taxonomyreport CLI again
deb0e82a7 Taxonomy report can also build a Krona HTML report
eba0ccf03 Fix 230: apply mode was using the wrong database entry length
1b9a22574 Fix #229: Invalid query length used in result2msa
f05f8c51d reduce efficiency impact of edgecase fix in gapped alignment computation
5d78b6c41 updated regressiontest for easynuclnucltax
c2e7c273a fixed edgecase for gapped alignment computation
4404fe0a7 Fix pairwise LCA function
5c69b0dee Fix issues with splitsequence if combined with compressed
fff584e77 Try getc based merging instead of mmap
736e0bfe8 Fix error messages
926920014 Shorten merging debug output
461301834 Allow target split merging to happen multi-threaded again
d1b5f63fa Cleanup more dead code
205740420 Some cleanup in Util
202efad1b Fix warning
62aae0335 Fix compiler issues
0f963300b Quick fix prefix id issue with lookup
0a88a9ee2 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
488b14f4f Open reader

git-subtree-dir: lib/mmseqs
git-subtree-split: 0a1348be78bd84137bdb373ba32e0e8c054b3e1c
  • Loading branch information
RuoshiZhang committed Nov 28, 2019
1 parent 3550b0c commit e594d87
Show file tree
Hide file tree
Showing 93 changed files with 7,916 additions and 751 deletions.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
data/krona_prelude.html linguist-vendored
lib/* linguist-vendored
lib/simd linguist-vendored=false
15 changes: 0 additions & 15 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,18 +91,6 @@ matrix:
- libopenmpi-dev
- shellcheck
env: MPI=1 CC=gcc-8 CXX=g++-8
- os: osx
osx_image: xcode10.1
addons:
homebrew:
packages:
- cmake
- ninja
- gcc@8
- zlib
- bzip2
- shellcheck
env: CC=gcc-8 CXX=g++-8
allow_failures:
- env: QEMU_ARM=1
fast_finish: true
Expand All @@ -111,7 +99,6 @@ services:
- docker

before_install:
- if [[ "$TRAVIS_OS_NAME" == "osx" ]]; then brew link gcc@8 --force; fi
- export CC
- export CXX

Expand All @@ -124,8 +111,6 @@ script:
mkdir build; cd build; \
cmake -G Ninja -DENABLE_WERROR=1 -DHAVE_MPI="$MPI" -DHAVE_SSE4_1=1 -DHAVE_TESTS=1 -DREQUIRE_OPENMP=0 .. \
|| exit 1; ninja || exit 1; \
elif [[ "$TRAVIS_OS_NAME" == "osx" ]]; then \
./util/build_osx.sh . build || exit 1; \
else \
exit 1; \
fi
Expand Down
11 changes: 4 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ MMseqs2 (Many-against-Many sequence searching) is a software suite to search and

[Steinegger M and Soeding J. Clustering huge protein sequence sets in linear time. Nature Communications, doi: 10.1038/s41467-018-04964-5 (2018)](https://www.nature.com/articles/s41467-018-04964-5).

[Mirdita M, Steinegger M and Soeding J. MMseqs2 desktop and local web server app for fast, interactive sequence searches. Bioinformatics, doi: 10.1093/bioinformatics/bty1057 (2019)](https://academic.oup.com/bioinformatics/article/35/16/2856/5280135)
[Mirdita M, Steinegger M and Soeding J. MMseqs2 desktop and local web server app for fast, interactive sequence searches. Bioinformatics, doi: 10.1093/bioinformatics/bty1057 (2019)](https://academic.oup.com/bioinformatics/article/35/16/2856/5280135).

[![BioConda Install](https://img.shields.io/conda/dn/bioconda/mmseqs2.svg?style=flag&label=BioConda%20install)](https://anaconda.org/bioconda/mmseqs2)
[![Github All Releases](https://img.shields.io/github/downloads/soedinglab/mmseqs2/total.svg)](https://github.com/soedinglab/mmseqs2/releases/latest)
Expand All @@ -20,7 +20,7 @@ MMseqs2 (Many-against-Many sequence searching) is a software suite to search and


## Documentation
The MMseqs2 user guide is available in our [GitHub Wiki](https://github.com/soedinglab/mmseqs2/wiki) or as a [PDF file](https://mmseqs.com/latest/userguide.pdf) (Thanks to [pandoc](https://github.com/jgm/pandoc)!). We provide a tutorial of MMseqs2 [here](https://github.com/soedinglab/metaG-ECCB18-partII).
The MMseqs2 user guide is available in our [GitHub Wiki](https://github.com/soedinglab/mmseqs2/wiki) or as a [PDF file](https://mmseqs.com/latest/userguide.pdf) (Thanks to [pandoc](https://github.com/jgm/pandoc)!). The wiki also contains [tutorials](https://github.com/soedinglab/MMseqs2/wiki/Tutorials) to learn how to use MMseqs2 with real data.

Keep posted about MMseqs2/Linclust updates by following Martin on [Twitter](https://twitter.com/thesteinegger).

Expand Down Expand Up @@ -61,10 +61,7 @@ Compiling MMseqs2 from source has the advantage that it will be optimized to the
make install
export PATH=$(pwd)/bin/:$PATH

:exclamation: To compile MMseqs2 on MacOS, first install the `gcc` compiler from Homebrew. The default MacOS `clang` compiler does not support OpenMP and MMseqs2 will only be able to use a single thread. Then use the following `cmake` call:

CC="$(brew --prefix)/bin/gcc-9" CXX="$(brew --prefix)/bin/g++-9" cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. ..
:exclamation: Compiling MMseqs2 correctly on macOS requires [more effort](https://github.com/soedinglab/MMseqs2/wiki#compile-from-source-under-macos).

## Getting started
We provide `easy` workflows to cluster, search and assign taxonomy. These `easy` workflows are a shorthand to deal directly with FASTA/FASTQ files as input and output. MMseqs2 provides many modules to transform, filter, execute external programs and search. However, these modules use the MMseqs2 database formats, instead of the FASTA/FASTQ format. For maximum flexibility, we recommend using MMseqs2 workflows and modules directly. Please read more about this in the [documentation](https://github.com/soedinglab/mmseqs2/wiki).
Expand All @@ -81,7 +78,7 @@ For clustering, MMseqs2 `easy-cluster` and `easy-linclust` are available.

mmseqs easy-linclust examples/DB.fasta clusterRes tmp

Sequence identity is in default [estimated](https://github.com/soedinglab/MMseqs2/wiki#how-does-mmseqs2-compute-the-sequence-identity) to output real sequence identity use `--alignment-mode 3`.
Sequence identity is by default [estimated](https://github.com/soedinglab/MMseqs2/wiki#how-does-mmseqs2-compute-the-sequence-identity) to output real sequence identity use `--alignment-mode 3`.
Read more about the [clustering format](https://github.com/soedinglab/mmseqs2/wiki#clustering-format) in our user guide.

Please adjust the [clustering criteria](https://github.com/soedinglab/MMseqs2/wiki#clustering-criteria) and check if temporary directory provides enough free space. For disk space requirements, see the user guide.
Expand Down
4 changes: 2 additions & 2 deletions azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@ jobs:
- checkout: self
submodules: true
- script: |
brew install cmake gcc@9 zlib bzip2 coreutils
brew install cmake zlib bzip2 libomp
displayName: Install Dependencies
- script: |
cd ${BUILD_SOURCESDIRECTORY}
CC=gcc-9 CXX=g++-9 ./util/build_osx.sh . build
./util/build_osx.sh . build
displayName: Build MMseqs2
- script: |
${BUILD_SOURCESDIRECTORY}/util/regression/run_regression.sh ${BUILD_SOURCESDIRECTORY}/build/build_sse41/src/mmseqs ${BUILD_SOURCESDIRECTORY}/regression
Expand Down
2 changes: 2 additions & 0 deletions data/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ set(COMPILED_RESOURCES
enrich.sh
blastn.sh
VTML80.out
VTML40.out
nucleotide.out
blosum62.out
PAM30.out
Expand All @@ -35,6 +36,7 @@ set(COMPILED_RESOURCES
searchslicedtargetprofile.sh
cs219.lib
linsearch.sh
krona_prelude.html
)

set(GENERATED_OUTPUT_HEADERS "")
Expand Down
2 changes: 1 addition & 1 deletion data/createtaxdb.sh
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ if [ -n "$REMOVE_TMP" ]; then
rm -f "${TMP_PATH}/names.dmp" "${TMP_PATH}/nodes.dmp" "${TMP_PATH}/merged.dmp" "${TMP_PATH}/delnodes.dmp"
rm -f "${TMP_PATH}/taxidmapping"
if [ "$DOWNLOAD_DATA" -eq "1" ]; then
rm -f "${TMP_PATH}/ncbi_download.complete" "${TMP_PATH}/targetDB_mapping.complete"
rm -f "${TMP_PATH}/ncbi_download.complete" "${TMP_PATH}/mapping_download.complete"
fi
rm -f "${TMP_PATH}/targetDB_mapping.complete"
rm -f "${TMP_PATH}/targetDB_mapping"
Expand Down
2 changes: 1 addition & 1 deletion data/easysearch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ INPUT="$INPUT"

if notExists "${TMP_PATH}/query.dbtype"; then
# shellcheck disable=SC2086
"$MMSEQS" createdb "$@" "${TMP_PATH}/query" ${CREATEDB_PAR} \
"$MMSEQS" createdb "$@" "${TMP_PATH}/query" ${CREATEDB_QUERY_PAR} \
|| fail "query createdb died"
fi

Expand Down
8 changes: 4 additions & 4 deletions data/easytaxonomy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ notExists() {

if notExists "${TMP_PATH}/query.dbtype"; then
# shellcheck disable=SC2086
"$MMSEQS" createdb "$@" "${TMP_PATH}/query" ${CREATEDB_PAR} \
"$MMSEQS" createdb "$@" "${TMP_PATH}/query" ${CREATEDB_QUERY_PAR} \
|| fail "query createdb died"
fi

Expand Down Expand Up @@ -40,7 +40,7 @@ fi

if notExists "${TMP_PATH}/result_tophit1.dbtype"; then
# shellcheck disable=SC2086
"$MMSEQS" filterdb "${TMP_PATH}/result" "${TMP_PATH}/result_top1" --extract-lines 1 ${THREADS_PAR} \
"$MMSEQS" filterdb "${TMP_PATH}/result" "${TMP_PATH}/result_top1" --extract-lines 1 ${THREADS_COMP_PAR} \
|| fail "filterdb died"
fi

Expand All @@ -52,13 +52,13 @@ fi

if notExists "${TMP_PATH}/result_top1_swapped_sum.dbtype"; then
# shellcheck disable=SC2086
"$MMSEQS" summarizealis "${TMP_PATH}/result_top1_swapped" "${TMP_PATH}/result_top1_swapped_sum" ${THREADS_PAR} \
"$MMSEQS" summarizealis "${TMP_PATH}/result_top1_swapped" "${TMP_PATH}/result_top1_swapped_sum" ${THREADS_COMP_PAR} \
|| fail "filterdb died"
fi

if notExists "${TMP_PATH}/result_top1_swapped_sum_tax.dbtype"; then
# shellcheck disable=SC2086
"$MMSEQS" addtaxonomy "${TARGET}" "${TMP_PATH}/result_top1_swapped_sum" "${TMP_PATH}/result_top1_swapped_sum_tax" ${THREADS_PAR} --pick-id-from 1 --tax-lineage \
"$MMSEQS" addtaxonomy "${TARGET}" "${TMP_PATH}/result_top1_swapped_sum" "${TMP_PATH}/result_top1_swapped_sum_tax" ${THREADS_COMP_PAR} --pick-id-from 1 --tax-lineage \
|| fail "filterdb died"
fi

Expand Down
Loading

0 comments on commit e594d87

Please sign in to comment.