Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusthash doesn't seem to use all given threads, and result2flat breaks with "segmentation fault" #261

Closed
UriNeri opened this issue Jan 2, 2020 · 13 comments

Comments

@UriNeri
Copy link

UriNeri commented Jan 2, 2020

Expected Behavior

clusthash uses all threads and result2flat produces a complete fasta file (not ending in %)

Current Behavior

clusthash seems to use only 1 of all given threads, and eventually result2flat breaks "segmentation fault"

Steps to Reproduce (for bugs)

Ran from the terminal in the same directory as a contigs fasta file (DNA) (named "cated_sk100.fna"):
THREADS=10
mkdir resultsDB scafDB
mmseqs createdb cated_sk100.fna scafDB/cated_sk100
mmseqs clusthash scafDB/cated_sk100 resultsDB/resultDB --min-seq-id 0.99 --threads $THREADS
mmseqs clust scafDB/cated_sk100 resultsDB/resultDB clusterDB --threads $THREADS
mmseqs result2repseq scafDB/cated_sk100 clusterDB DB_clu_rep
mmseqs result2flat scafDB/cated_sk100 scafDB/cated_sk100 DB_clu_rep scafs_reps.fasta --use-fasta-header

When "Compute 1 unique hashes." is printed, there are 10 resultsDB files and 10 resultDB.index files, however, only one (resultDB.index.7) is getting larger with time (and is > 0 in size). Meanwhile only one thread seems to be utilized (around 8% of the total 10 threads given).
When the clusthash finishes there is one resultsDB.index file, and 10 resultsDB files, 8 with zero size, and resultsDB.index7 and resultsDB.index both with the same size). After this, the process breaks in the last command:
mmseqs result2flat scafDB/cated_sk100 scafDB/cated_sk100 DB_clu_rep scafs_reps.fasta --use-fasta-header
With the message:
`result2flat scafDB/cated_sk100 scafDB/cated_sk100 DB_clu_rep scafs_reps.fasta --use-fasta-header

MMseqs Version: 48a037a
Use fasta header true
Verbosity 3

[1] 18252 segmentation fault (core dumped) mmseqs result2flat scafDB/cated_sk100 scafDB/cated_sk100 DB_clu_rep`

MMseqs Output (for bugs)

Which output should I upload?

Context

I'm trying to remove redundancy by collapsing sequences that are either highly similar (99%) or are also contained within longer sequences from other fasta entries in the file. This fasta file size <1gb but I first tried to run this process on a >80gb file on remote compute node and was concerned when I saw the job was using only a small part of the resources.
Not part of this issue but realted; also tried to do the same thing with a large protein file but I get invalid fasta entry errors (maybe because of the "*" marking STOPs left by the ORF predictor, but that wouldn't happen in the nucleic acid file example above).

Your Environment

  • Git commit used:
    I tried on my personal machine and a compute node (PBS), similar behaviour in both.
    Personal machine MMseqs2 Version: 48a037a.
    Server MMseqs2 Version: 2a8c5f0.
  • Which MMseqs version was used: Statically-compiled
  • Server specifications:
    Server: (2a8c5f0)
    CPU: Intel(R) Xeon(R) Platinum 8168
    Memory: 366 GB
    Personal machine: (48a037a)
    CPU: Intel Core i7-8700 6-Core model: bits: 64 type: L2 cache: 12.0 MiB
    Memory: 15.33 GB
  • Operating system and version:
    Personal machine: Linux Mint 19.2 Tina Kernel: 4.15.0-72-generic x86_64;
    Server: Linux 3.10.0-693.el7.x86_64

Thanks for developing and maintaining this totally amazing tool !

@milot-mirdita
Copy link
Member

I reviewed the code and found multiple possible issues with the module for smaller datasets. I'll try to finish up the refactoring either later today or tomorrow. Thanks for the bug report.

We are still doing the free sticker thing (see https://twitter.com/thesteinegger/status/1201076220957315074). If you want a set send me your address at milot at mirdita dot de.

@UriNeri
Copy link
Author

UriNeri commented Jan 2, 2020

Thanks for the super quick reply !
Not sure I understand what you mean by "smaller datasets"
I tried to run this on the server with the 90gb file and locally with the small file (a 0.6gb subset of the sequences in the larger one). Both had the issue (but only the local run finished the clusthash step)

@martin-steinegger
Copy link
Member

@UriNeri clusthash can not find contained sequences. The best would be to used linclust

 mmseqs easy-linclust cated_sk100.fna cated_sk100.clu tmp --min-seq-id 0.99 -c 1.0 --cov-mode 1 --kmer-per-seq-scale 0.4

@UriNeri
Copy link
Author

UriNeri commented Jan 2, 2020

@martin-steinegger @milot-mirdita
Makes much more sense indeed! Thanks ! will try soon

@milot-mirdita
Copy link
Member

Disregarding how much biological sense it makes, would you mind rerunning the clusthash workflow above with the latest commit? I refactored some code and want to know if it improved the performance.

Also the crash in result2flat was probably because of the wrong input database (clusterDB instead of DB_clu_rep). It should be:

mmseqs result2flat scafDB/cated_sk100 scafDB/cated_sk100 clusterDB scafs_reps.fasta --use-fasta-header

@UriNeri
Copy link
Author

UriNeri commented Jan 5, 2020

@milot-mirdita No problem!
MMseqs2 version: 55bcdd3
Adjusted workflow, same input as before; clusthash finishes in seconds, the files previously of zero size are around the same size now. Now result2repseq is the "slowest" step.
The last step:
mmseqs result2flat scafDB/cated_sk100 scafDB/cated_sk100 clusterDB scafs_reps.fasta --use-fasta-header
Create the scafs_reps.fasta file but it's in this format:
>header
[number]
Any way to transform this into regular fasta? ( >header \n [sequence]...)
(Although I'll use linclust anyway as Martin's suggested)
Thanks again !

@UriNeri
Copy link
Author

UriNeri commented Jan 5, 2020

@martin-steinegger
Tried the easy-linclust as suggested on the server described before, with the latest version ( 55bcdd3 )
I receive the following error:
Time for merging to clu_rep: 0h 0m 21s 931ms
Time for processing: 0h 0m 39s 80ms
result2flat tmp/6767229871110119818/input tmp/6767229871110119818/input tmp/6767229871110119818/clu_rep tmp/6767229871110119818/rep_seq.fasta --use-fasta-header -v 3

tmp/6767229871110119818/easycluster.sh: line 38: 4147 Segmentation fault "$MMSEQS" result2flat "${TMP_PATH}/input" "${TMP_PATH}/input" "${TMP_PATH}/clu_rep" "${TMP_PATH}/rep_seq.fasta" --use-fasta-header ${VERBOSITY_PAR}
Error: result2flat died

@martin-steinegger
Copy link
Member

martin-steinegger commented Jan 5, 2020

Could you please provide the full log? Is it possible to provide your input?

@UriNeri
Copy link
Author

UriNeri commented Jan 5, 2020

@martin-steinegger
I didn't delete the ./tmp/ folder, is the full log in there? or do you mean the stdout ?
About the input, please email me : uri dot neri at gmail dot com
Thanks !

@martin-steinegger
Copy link
Member

@UriNeri the log is only written to stdout. We do not store a copy of the log in the temp directory. So you probably need to rerun the whole job. If you use the same tmp folder and command then it will just perform the last step.

@UriNeri
Copy link
Author

UriNeri commented Jan 6, 2020

@martin-steinegger I re-run the easy-linclust command on a different input (in email).
This time kept the stdout and stderr

stderr.txt
stdout.txt

martin-steinegger added a commit that referenced this issue Jan 6, 2020
@martin-steinegger
Copy link
Member

@UriNeri thank you for sending me the input. I could reproduce the issue and fix it.

@UriNeri
Copy link
Author

UriNeri commented Jan 7, 2020

@martin-steinegger thank you !
Can confirm that using the latest commit the issue is fixed.
Thanks again for this truly amazing tool - absolutely fantastic !

RuoshiZhang added a commit to soedinglab/spacepharer that referenced this issue May 12, 2020
46c843895 Update combine pval agg-mode 3
67d610136 Disable fancy progress bars on travis to reduce output
203a21736 Updated two more tests to use tighter ROC thresholds
a9052f449 Update regression with tighter bounds for ROC tests
c62736a6d Correctly parse keys from data files in filterdb --filter-file This was causing a linsearch instability
fe007cb4e Use MultiParam for gapOpen, gapExtend costs
3513001d3 Add easy-rbh workflow
d0d3032e9 Fix RBH search if using -a to show alignments
ce1a43bf1 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
ea24e4934 Fix issues with abs. path if using aria2c
5228745f5 Improve --alignment-mode parameter description and make it a non expert parameter
fffa9b10e Fix various inconsistencies and usability issues with alignall: * alignall alignment-mode did not correspond to align alignment-mode * add-backtrace did not do anything, has to be specified now if backtrace is needed * Did return a alignment db type even though it is incompatible with that type, uses generic for now * various parameters were passed but unused   - zdrop and scorebias are used now (however see below)   - realign, alt ali, max accept/reject, wrapped are now gone
290668474 Fix wrong warning
813d81f29 Update regression
264d78117 Switch greedy clustering algorithm back to old idea
c09f6574e Improve nucleotide clustering workflow
38a737708 Set k-mers in linclust to 0 for the nucleotide clustering
7df6e3f75 Replace characters that can not be reversed by N in extract frames
e9678f625 Update regression
f886e868f Add nucleotide support to cluster (workflow nucleotide_clustering), clust module will infer identity automatically if missing, Improve low. mem. greedy incremental algorithm, Update regression
5f8735872 Add kmers-per-sequence-scale to linsearch
0310eb607 Change --kmer-per-seq-scale to a multi parameter, add error if cluster is called with a nucleotide sequence
e258bc8d8 Fix #299 PDB70 database creation was not working
7095f37e4 Add support reverse complemente in rescorediagonal --rescore-mode 0 and 1
61ca48883 Fix result2dnamsa
70d014e41 Add search-type 4 to Search
462f24cbb Add module result2dnamsa
5670d990e Fix regression error
e4451d591 Add result direction parameter to kmersearch
12c499dcd Fix reverse sequences issues in linclust and linsearch
44499c3ce Update filterdb regression test
807b4a56a Fix issue soedinglab/MMseqs2#290. Filterdb checked for mode == true but mode was 2.
24479bc27 Fix Docker
a578f52a7 Fix char signedness on PPC
a0d64a989 Update regression
a07a266f9 Working on PPC64LE support
09734177c Remove remaining _mm_shuffle_epi32
cdef78a69 Merge pull request #285 from hgsommer/misc_small
283c8d03f Replace goto end in ssw
6bfc50281 Fix c/p mistake in convertalignments
e61da3447 Fix spelling of 'length'
9a63760fa Replace nested ternary operator
4349b5c6e Avoid repeatedly checking for profile db types
c170a11f5 Call MsaFilter::shuffleSequences() from MsaFilter::filter()
ef49ba220 Return value from MsaFilter::filter()
d155dc36c Replace int by bool literals for bool variable
ec6722adc Align headings with column in PSSMCalculator::printProfile()
548a9bd68 Avoid forward declaration of ScoreMatrix
d0fbe471f Do some cleanup in StripedSmithWaterman.cpp
91d1aeddc Replace check for zero-sized containers by empty()
e47b8eed9 Remove superfluous parameter from ssw_init()
250b1221d Simplify return statements
4fe1116ae Remove counting zero scores in Sequence::mapProfile()
4303728b5 Replace multiplication by zero
1bd602420 Remove increment by zero
e4d4389f2 Move check for exit condition in front of allocations
556d26d1a Clean up function signatures in MultipleAlignment
3863af9ac Move include back to header to restore build
e1208493a Remove unused TmpResult score field
1fd4db8f2 Die if DBReader cannot reopen files (e.g. no more file handles left)
1e21b87ba Purge sequenceLookup early since its recreate in split databases
40854ddcd Prefiltering and CacheFriendlyOperations refactoring
2433e086b WASM work in progress
14014cd0e Fix prefilter overflow instability
e0f971848 Add conda forge to conda install instructions
aa175d636 Fix off by one in kmermatcher soedinglab/MMseqs2#274 (comment)
d1607bc8a Remove LINE_MAX
eca2155d7 Clear string buffer instead of reassigning in swapresults
0f4645edd Fix wrong reverse marking in linsearch reported by UBSAN
5b612a327 Missing mpi binaries for travis regression
83d22417a Next try for ARM compiler flags
7ad122f0a Missed a few variables
ac7914bea Do not require a cmake variable to build ARM
0dcfaadbb Update regression to fix broken samtools call on ARM
29927b4c4 More NEON fixes, we assume signed chars, ARM uses unsigned by default
7760220ff Next try to get the ARM regression to work
cc6d0d52b Add hack to not break travis log size limit
5408c3d10 Try to get NEON to compile
83192cabd Fix search workflow parameters printed twice
f6f001c8c Fix new clang-10 warnings and further travis fixes
259e64341 llvm-10 alias is not whitelisted in travis yet
b1249fd54 Fix errors in Travis YAML from previous commit
18486d4c5 Update travis - use native aarch64 for neon - use xenial - shorten script
98c37f3c3 shortend MultiParam usage, improved line breaks in usage
c9be07f1a Add gcc-9 to travis
2e5fb309a Fix travis clang build
d5865c894 Remove MultiParam g++-9 warning
73679835b Rework target split merging
ca5869397 Fix RESSIZE issue in slice search if sequences are used
491900b99 Improve usage text of cluster/linclust
0166850a2 Remove old greedy incremental clustering code and just run the memory efficient version instead.
15163e64c Fix Verbosity in workflows
aa78af463 Fix issue soedinglab/MMseqs2#274
7846dfce3 fixed clang template error
e1206371c extended MultiParam class, replaced ScoreMatrixFile type by MultiParam<char*>
b88b54756 rewrite alphabetSize as multi parameter
ecb4e35d4 started template class MultiParam to store sequence type specific values
e1a1c1226 changed dbtype comparision in AlignmentSymmetry
2a829aef7 Replace symlinkat call with getcwd/chdir/symlink/chdir to fix Conda build using macOS 10.9 SDK
28e83e8d5 Add OpenMP include to DBReader
fb00aa0c3 Fix realloc issue while IndexTable creation of profiles
504e5021f Take max. seq. len of query and target db in prefilter and alignment
16e235214 Fix bug if seq. len > max seq. length in Alignment
80d0187de Fix asan issue
751f5c19f Make ZDROP an expert parameter, change description text
1b6edd0d4 Rework x detection (SIMD)
9677254ab Merge branch 'master' of https://github.com/soedinglab/mmseqs2
1ac1e6866 Fix max seq issues in prefilter
cb737033c Reset download strategy to not use aria2c for the NCBI download
c95f3ee0e fixed ksw2 test
72b95c0ce Error if we cannot download from NCBI
1d0aad50b Fix databases not piecing togehter all kalamari accessions
516723d53 Merge branch 'master' of https://github.com/soedinglab/MMseqs2
d81b6cca5 added zdrop parameter to control banded nucleotide alignment
e2e39a971 Add Kalamari Contaminants database
c0c538ea3 Various fixes in databases script
08cc95b3a Fix createtaxdb redownloading when taxdump already exists
018eb3498 Remove a bit whitespace in front of each parameter in usage message
8aa7513de add aggregatetax example, fix typos
8bcd7c740 Fix typo
8e581b762 Rework usage texts
7dc25764a Hide most parameters from createindex
2baa609e8 Add examples to many modules
00a7d7696 fixed bugs for long or wrapped nucleotide sequences
a4bdcb478 eggNOG profiles should not depend on the deleted MSAs
4c7830954 Fix eggNOG database construction
f7a5599c8 Cleanup not needed files immediately in databases workflow
3ed3690d4 Fix downloads always restarting in databases workflow
4cfac9a8a Fix aria warning with more than 16 connections
e0a00e10d Revert "Use SW instead of BandedNucAln if we don't have diagonals"
7ac966b2e Fix result2msa could fail if it was writing compressed output
95729ac7c Fix wrong output DB type written in alignall
f899e7c7a Use SW instead of BandedNucAln if we don't have diagonals
c08d9fa8e Allow parameter descriptions to span multiple lines
57868498e MMseqs2 is not limited to proteins, update README to reflect that
11818b0a2 Cleanup hiding parameters in workflows
c481cea60 Remove some useless includes
2f64aeeb8 Fix databases timestamp appending instead of overwriting
ae9e9e329 Add eggNOG setup procedure to databases
31c8e5d50 Shorten two short parameter descriptions
2f49d3e3e Read header from lookup in msa2profile if available
1356869b0 add option to reverese profile dbs
ac3482e80 More issues with zlib and tar2db
aaafafe43 Fix tar2db keys
c751d9e2f More tar2db fixes
a9c93014c Fix variadic input to tar2db
51a761305 Add tar2db module to convert content of any tar to a DB
96f9a91e5 Use nedmalloc on Windows/Cygwin
73f5c2a2d Add databases workflow to README
5a7ac9e54 make align output consistent
c5ebe5297 fixed setcover cluster mode (by fixing bug in similarity reading for short aln results e.g. hamming distance aln)
481696b5f Fix databases output
c6b4a57a8 Beginning cleaning up parameter descriptions
a9552a177 Show default value of bool parameters
af89c4677 Add a proposed example text structure
9c17f4eba Rework module description texts, better categories, shorten all descriptions, prepare to replace long descriptions with examples
00ff199e8 Add Resfinder DB
f1011ecb4 Fix krona again marked as vendored
02001ab03 missing mode resulted in different top1
4375463bc Header db should not have to be a unsplit db
edccbf33f Actually fix extractorfs lookup creation
041e8e558 Improve README
a8f2c7bad Remove correct workflow script in createtaxdb.sh
26c8202a9 print createdb cmd line again
df02bae34 Refactor createseqfiledb, remove stringstream
2523ebe1a do not write null byte
af847a724 Fix clang warning from DBConcat
ef1ec596f extend dbconcat to handle auxillary files
528bd2134 not needed
dec1b9215 Silence warning in GCC 4.8 casting function to void*
2d44c886d Fix extractorfs not being able to create lookup
ffe66afac Replace isnumber with isdigit. Add more tests to TestTaxExpr
fbe09867e Rework Taxon Expr parsing
f58329ef5 Add constructor to define custom functions to ExpressionParser
b6ef07281 Initialize expressionparser per thread, was not thread safe
f966bfa62 Fix reallocation issue in BandedAlignment
bbd3c2bb7 Add +1 to realloc in BandedNucleotideAligner but not to length
6b6e82ae6 Add +1 to realloc in mapSequence
75e2c8ec4 Fix off by one issues in realloc in rescorediagonal and BandedNucleotideAligner
afd14c8c2 First step to get rid of maxSeqLen
13ca612db Fix allocation issue in kermatcher if sequences are longer than > 2^16
62de5ba93 Fix off by one in computation for splits in kmermatcher
35e95d180 Change int_sequence to char (big change)
ecf82f2f4 Revert "Temporarily disable soft split mode for createdb in easy workflows"
d19219dd4 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
1a0d898ec Fix softlink issue in createdb soedinglab/MMseqs2#265
13e0fe466 Temporarily disable soft split mode for createdb in easy workflows
4487b6e14 Fix view module to work with softlinked createdb dbs
c1e9eb0e3 Fix MPI issue if only one server is used
e781c3fe5 fix MPI compile error
9bcff2844 Fix Filter2 bug of HH-suite in MMseqs2 soedinglab/hh-suite#182
01db79d33 Fix some bugs in splitting handling
d9a887453 Fix memory splitting issues in kmermatcher, kmerindexdb
37880f083 Fix MPI in kmermatcher and indexdb
bee93123f Update regression
03a89ff1c Merge branch 'master' of https://github.com/soedinglab/mmseqs2
6ca967362 Update the way how k-mers are extracted in kmermatcher. Extraction should be now ~3 times faster.
f1388309d Introducing databases workflow to automatically setup and download common databases
d78fdbb06 Add progress to convertmsa
18acba224 Do not recreate _mapping file if it already exists in createtaxdb
63a373f5a Skip validations steps correctly if a input db is neither INPUT nor OUTPUT
d95caa1a7 Allow modules with zero parameters
9f8aff948 Allow modules to handle -h or --help themselves
cf5691f92 Typo
8ebc9d16b fixed access mode
31895414d Clarify parameter help in createdb
f644744a8 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
c287719d9 Remove check for profiles for splice serach. It should also work with sequence databases.
c75fe9acf regression submodule w filtertaxseqdb
7587a872f Add one more missing check in kmermatcher
8d4e9f4fc Remove +1 from size in initKmerPositionMemory
aca141e95 Fix shellcheck error in splicesearch
8bdff50e1 Move +1 from initKmerPositionMemory outside
f12821e35 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
d74b76ca5 Avoid overflow in kmermatcher if split is needed
fd90ff2c3 Move compiled data resources into subfolders
2fd9f25d2 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
b439ce831 Make the slice search applicable to other databases types, not just profiles
589a2e276 Fix apply crashing on empty entries
82542a6ac Merge branch 'master' of https://github.com/soedinglab/mmseqs2
c0acdd8f3 Fix memory leak in createsubdb.
5129a956d Validate taxonomic ranks and make input/output formats consistent
53bb55b38 Fix issues in hash function soedinglab/MMseqs2#252
764c4a3e7 Fix lca message
c013a6929 Fix LCA output message
a1206690d Change db validator from result2stats
714f5b4fb Replace mmaped input file with std c io in createsubdb
6e43e9413 Add remove .source file to rmdb
3e58bb85b Fix result2flat soedinglab/MMseqs2#261
3e27833db Revert easycluster.sh back to result2flat. Reason is that createsubdb can not handle soft linked sequence databases (input.0 -> input.fas)
33354680f Merge branch 'master' of https://github.com/soedinglab/mmseqs2
1e92fb504 Replace result2repseq and result2flat with createsubdb and convert2fasta
55bcdd303 single step clustering could potential cluster unrelated sequences due to hash collisions
fdd0646b1 Fix clusthash issues with parallelization and nucl input
e62a1c717 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
1336b7ad2 Add MSA to allDb and allDbAndFlat
48a037a2e Update Prefiltering.cpp
a1adbf52d Fix warning: Remove useless copy constructor from Matcher::result_t
d3ca42657 Remove truncatedCounter variable in QueryMatcher
4647525ec Show full help text if "Error in argument " occurs
4149ae457 Remove annoying message in prefilter (truncated result). Move it to the statistics section.
d5aab5b86 Update regression
1f1e049e6 Fix output of unclassified hits in convertalis
83ff5c601 Fix permission issues for tmp directory
cce6e6714 add support to output taxon in easy-search when using an indexed database
f200bdd62 Merge branch 'master' of https://github.com/soedinglab/mmseqs2
6f28a29ae Fix seg. fault if all sequences could be classified
473d60580 Update batches
b52668f6e Add chat icon
af54c8e8e Update README.md
7eb6a0b70 Makde addtaxonomy more resilient against invalid taxonomy mappings
3482b0e91 Merge pull request #260 from RuoshiZhang/master
36f49f5b5 Fix issue in memory computation for split
bcb97d63f Update README.md
abcd97de7 write same number of fields even if no hit
38e102181 Update regression to hopefully fix windows failure
f41511465 Fix spelling error
1fd24924e Add a search-type 4 for trans-trans search returning a nucl backtrace in offsetalignment
31f6d7ac3 add aggragatetax to assign set tax by majority vote
b6e8ee239 allow more dbtypes in swapdb
c9d02ef21 add option to view rank index
49db7258e typo fix
9c32930f3 Merge branch 'master' of github.com:soedinglab/MMseqs2
17b5494fe Fix auto detection of dbtype in createdb
8831df81d Merge branch 'master' of github.com:soedinglab/MMseqs2
be1a9822c Fix createseqfiledb soedinglab/MMseqs2#258
02be0c4ea Fix summarizeresult to support reverse position in alignment
7ef586276 added filtertaxseqdb
00f2fd2b8 added mode for all but index
127db8c6d minor tidying for filtertaxdb
8144e7653 Merge branch 'master' of github.com:soedinglab/MMseqs2
48f77fa7d Fix ASan issue in filterdb
d722d5724 Fix warning in filterdb
4a4e6ea15 Update regression test for filterdb
31a7dc124 filterdb --join-db ignores lines it cannot join instead of crash
6c6faa96d filterdb's --extract-lines works together with --trim-to-one-column
12bee8142 filterdb can filter by rows with value within percentage #249
5c919ab95 Allow double parameters separately from floats in parsing
f9be8a88d Remove broken filterdb paths
1dc04f5e1 Refactoring of filterdb
90e3a9aaf Fix bug for enforced dbtypes in createdb
a4cee78db New regression to check stdin support
17ec97c78 Add stdin support to easy workflows
76c9e7c36 Fix compiler warnings in KSeqWrapper
0cc45536b Overwrite dbtype correctly in createdb
c0045182b Add stdin to createdb
02a88e438 use https instead of ftp for downloading taxdb data
a33bd27f4 offsetalignments now correctly returns a nucleotide backtrace if needed
456e1b5ab include VTML40 in binary for easier access
775de3850 Add missed target .source file for reading in convertalis
c08c071b2 Overload patterncompiler isMatch for pos of match
ba6aa8d12 avoid appending extra tabs besthitperset

git-subtree-dir: lib/mmseqs
git-subtree-split: 46c8438958edccd8fd09640eb174e2449529e4df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants