convertalis: corrupted unsorted chunks #372

nick-youngblut · 2020-11-11T16:01:34Z

It appears that convertalis fails if the mmseqs search hit database is empty or really small. The hits database size is ~7K, so it might be completely empty. The output that I'm getting:

convertalis --threads 4 --format-mode 0 --format-output query,target,evalue,pident,alnlen,tlen /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search/seqs17_db /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/mmseqs_search_db/db /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search/hits_seqs17_db /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search/hits_seqs17.tsv

MMseqs Version:        	12.113e3
Substitution matrix    	nucl:nucleotide.out,aa:blosum62.out
Alignment format       	0
Format alignment output	query,target,evalue,pident,alnlen,tlen
Translation table      	1
Gap open cost          	nucl:5,aa:11
Gap extension cost     	nucl:2,aa:1
Database output        	false
Preload mode           	0
Search type            	0
Threads                	4
Compressed             	0
Verbosity              	3

[============================================================Invalid database read for database data file=/ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search/seqs17_db_h, database index=/ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search/seqs17_db_h.index
Invalid database read for database data file=/ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search/seqs17_db_h, database index=/ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search/seqs17_db_h.index
Invalid database read for database data file=/ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search/seqs17_db_h, database index=/ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search/seqs17_db_h.index
Invalid database read for database data file=/ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search/seqs17_db_h, database index=/ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search/seqs17_db_h.index
getData: local id (4294967295) >= db size (6526)
getData: local id (4294967295) >= db size (6526)
getData: local id (4294967295) >= db size (6526)
getData: local id (4294967295) >= db size (6526)
free(): corrupted unsorted chunks

It would be nice if convertalis exited gracefully if the database is empty. Is there a way to even check whether any mmseqs database is empty?

mmseqs version: 12.113e3 (h2d02072_0 bioconda)

The text was updated successfully, but these errors were encountered:

milot-mirdita · 2020-11-11T18:13:34Z

I tried to reproduce the issue with a completely empty database, but that didn't crash. Could you upload the hits_seqs17_db db?

nick-youngblut · 2020-11-11T18:36:01Z

Thanks for checking so quickly! Attached is the hits_seqs17_db file (and associated files)

files.zip

milot-mirdita · 2020-11-11T18:58:12Z

Something might be wrong with the header database. hits_seqs17_db has 7002 entries and I assume the query sequence database has also this number of sequences, but the header database seems to have 6526 (db size (6526)) entries.
How was the query database created?

nick-youngblut · 2020-11-11T19:04:35Z

It was created with createdb. The query db is attached

files.zip

milot-mirdita · 2020-11-11T19:34:38Z

Okay some part of the puzzle is still missing. What was the search command? 7002 entries doesn't make much sense.

nick-youngblut · 2020-11-11T19:45:07Z

The command was:

mmseqs search --threads 8 -e 1e-3 \
  --max-accept 1 --max-seqs 100 -s 6 \
  --num-iterations 2   --split 0 --split-memory-limit 44G  \
  seqs17_db target_db hits_seqs17_db      mmseqs_search_TMP17

milot-mirdita · 2020-11-12T14:19:12Z

I think I also need the full output of the search. The issue does not seem to be in convertalis but somewhere in the search.

nick-youngblut · 2020-11-12T14:25:34Z

Here's the output from that search job:

align /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search/seqs17_db /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/mmseqs_search_db/db /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search_TMP17/874358861699530798/pref_0 /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search_TMP17/874358861699530798/aln_0 --sub-mat nucl:nucleotide.out,aa:blosum62.out -a 1 --alignment-mode 2 --wrapped-scoring 0 -e 0.001 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --realign 1 --max-rejected 2147483647 --max-accept 1 --add-self-matches 0 --db-load-mode 0 --pca 1 --pcb 1.5 --score-bias 0 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --zdrop 40 --threads 8 --compressed 0 -v 3

Compute score only
Query database size: 6526 type: Aminoacid
Target database size: 41195879 type: Aminoacid
Calculation of alignments
[=================================================================] 7.00K 0s 8ms
Time for merging to aln_0: 0h 0m 0s 9ms

0 alignments calculated.
0 sequence pairs passed the thresholds (-nan of overall calculated).
0.000000 hits per query sequence.
Time for processing: 0h 0m 3s 593ms
align /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search_TMP17/874358861699530798/profile_0 /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/mmseqs_search_db/db /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search_TMP17/874358861699530798/pref_1 /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search_TMP17/874358861699530798/aln_tmp_1 --sub-mat nucl:nucleotide.out,aa:blosum62.out -a 1 --alignment-mode 2 --wrapped-scoring 0 -e 0.001 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --realign 0 --max-rejected 2147483647 --max-accept 1 --add-self-matches 0 --db-load-mode 0 --pca 1 --pcb 1.5 --score-bias 0 --gap-open nucl:5,aa:11 --gap-extend nucl:2,aa:1 --zdrop 40 --threads 8 --compressed 0 -v 3

Compute score, coverage and sequence identity
Query database size: 7002 type: Profile
Target database size: 41195879 type: Aminoacid
Calculation of alignments
[=================================================================] 7.00K 0s 20ms
Time for merging to aln_tmp_1: 0h 0m 0s 10ms

0 alignments calculated.
0 sequence pairs passed the thresholds (-nan of overall calculated).
0.000000 hits per query sequence.
Time for processing: 0h 0m 4s 529ms
mergedbs /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search_TMP17/874358861699530798/profile_0 /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search/hits_seqs17_db /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search_TMP17/874358861699530798/aln_0 /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search_TMP17/874358861699530798/aln_tmp_1

Merging the results to /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search/hits_seqs17_db
Time for merging to hits_seqs17_db: 0h 0m 0s 2ms
Time for processing: 0h 0m 0s 19ms
rmdb /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search_TMP17/874358861699530798/aln_0

Time for processing: 0h 0m 0s 1ms
rmdb /ebio/abt3_scratch/nyoungblut/Struo2_255873462447/UniRef50_clst0.9/mmseqs_search_TMP17/874358861699530798/aln_tmp_1

Time for processing: 0h 0m 0s 1ms

milot-mirdita · 2020-11-12T14:39:48Z

Do you also have the previous steps? Alternatively, clear the temp directory and rerun the command.

At this point the profiles already contain over 7k entries for some reason (7.00K 0s 8ms).

nick-youngblut · 2020-11-12T14:43:52Z

Yeah, maybe it's due to an old temp directory. I'm going to use --remove-tmp-files 1 from now on

…ation #372

milot-mirdita · 2020-11-12T16:42:26Z

I added something that should hopefully prevent this from occurring in the future. It should create a new subdirectory in the tmp folder if any input has changed in the meantime.

milot-mirdita added a commit that referenced this issue Nov 12, 2020

Include file size and modified date of inputs in tmp file hash calcul…

45c4de7

…ation #372

milot-mirdita closed this as completed Nov 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convertalis: corrupted unsorted chunks #372

convertalis: corrupted unsorted chunks #372

nick-youngblut commented Nov 11, 2020

milot-mirdita commented Nov 11, 2020

nick-youngblut commented Nov 11, 2020

milot-mirdita commented Nov 11, 2020

nick-youngblut commented Nov 11, 2020

milot-mirdita commented Nov 11, 2020

nick-youngblut commented Nov 11, 2020

milot-mirdita commented Nov 12, 2020

nick-youngblut commented Nov 12, 2020

milot-mirdita commented Nov 12, 2020

nick-youngblut commented Nov 12, 2020

milot-mirdita commented Nov 12, 2020

convertalis: corrupted unsorted chunks #372

convertalis: corrupted unsorted chunks #372

Comments

nick-youngblut commented Nov 11, 2020

milot-mirdita commented Nov 11, 2020

nick-youngblut commented Nov 11, 2020

milot-mirdita commented Nov 11, 2020

nick-youngblut commented Nov 11, 2020

milot-mirdita commented Nov 11, 2020

nick-youngblut commented Nov 11, 2020

milot-mirdita commented Nov 12, 2020

nick-youngblut commented Nov 12, 2020

milot-mirdita commented Nov 12, 2020

nick-youngblut commented Nov 12, 2020

milot-mirdita commented Nov 12, 2020