Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evalue filtering via -e flag. Evalue becomes zero when specified e-value threshold is < 1E-45 for easy-search #379

Closed
chrismacdermaid opened this issue Nov 30, 2020 · 3 comments

Comments

@chrismacdermaid
Copy link

Maybe I'm going crazy, but this really does seem like a bug. Perhaps there's something obvious here, but I'm not seeing it.

Expected Behavior

Returning hits with e-values less than specified e-value threshold on the command line when using easy-search.

Current Behavior

e-value threshold becomes zero when specified e-value < 1E-45

mmseqs easy-search da7915829ba14fe0a86c3cc539a89f43.constructs.fa /db/mmseqs/uniprotkb_swiss-prot result.m8 /dev/shm/tmp3343 -e 1E-46

MMseqs Version:                         12.113e3
Substitution matrix                     nucl:nucleotide.out,aa:blosum62.out
Add backtrace                           false
Alignment mode                          3
Allow wrapped scoring                   false
E-value threshold                       0
Seq. id. threshold                      0
Min alignment length                    0

Steps to Reproduce (for bugs)

Run an easy-search. Specift an e-value threshold < 1E-45

MMseqs Output (for bugs)

MMseqs Version:                         12.113e3
Substitution matrix                     nucl:nucleotide.out,aa:blosum62.out
Add backtrace                           false
Alignment mode                          3
Allow wrapped scoring                   false
E-value threshold                       0
Seq. id. threshold                      0
Min alignment length                    0
Seq. id. mode                           0
Alternative alignments                  0
Coverage threshold                      0
Coverage mode                           0
Max sequence length                     65535
Compositional bias                      1
Realign hits                            false
Max reject                              2147483647
Max accept                              2147483647
Include identical seq. id.              false
Preload mode                            0
Pseudo count a                          1
Pseudo count b                          1.5
Score bias                              0
Gap open cost                           nucl:5,aa:11
Gap extension cost                      nucl:2,aa:1
Zdrop                                   40
Threads                                 96
Compressed                              0
Verbosity                               3
Seed substitution matrix                nucl:nucleotide.out,aa:VTML80.out
Sensitivity                             5.7
k-mer length                            0
k-score                                 2147483647
Alphabet size                           nucl:5,aa:21
Max results per query                   300
Split database                          0
Split mode                              2
Split memory limit                      0
Diagonal scoring                        true
Exact k-mer matching                    0
Mask residues                           1
Mask lower case residues                0
Minimum diagonal score                  15
Spaced k-mers                           1
Spaced k-mer pattern
Local temporary path
Rescore mode                            0
Remove hits by seq. id. and coverage    false
Sort results                            0
Mask profile                            1
Profile E-value threshold               0.001
Global sequence weighting               false
Allow deletions                         false
Filter MSA                              1
Maximum seq. id. threshold              0.9
Minimum seq. id.                        0
Minimum score per column                -20
Minimum coverage                        0
Select N most diverse seqs              1000
Omit consensus                          false
Min codons in orf                       30
Max codons in length                    32734
Max orf gaps                            2147483647
Contig start mode                       2
Contig end mode                         2
Orf start mode                          1
Forward frames                          1,2,3
Reverse frames                          1,2,3
Translation table                       1
Translate orf                           0
Use all table starts                    false
Offset of numeric ids                   0
Create lookup                           0
Add orf stop                            false
Chain overlapping alignments            0
Merge query                             1
Search type                             0
Search iterations                       1
Start sensitivity                       4
Search steps                            1
Slice search mode                       false
Strand selection                        1
Disk space limit                        0
MPI runner
Force restart with latest tmp           false
Remove temporary files                  true
Alignment format                        0
Format alignment output                 query,target,fident,alnlen,mismatch,gapopen,qstart,qend,tstart,tend,evalue,bits
Database output                         false
Overlap threshold                       0
Database type                           0
Shuffle input database                  true
Createdb mode                           0
Write lookup file                       0
Greedy best hits                        false

Context

Want to identify close homologues in uniprotkb from RCSB. Filtering by e-value seems a reasonable choice.

Your Environment

Installed via a fresh miniconda env last week.

@chrismacdermaid chrismacdermaid changed the title evalue filtering via -e flag. Evalue becomes zero when specified e-value is < 1E-45 for easy-search evalue filtering via -e flag. Evalue becomes zero when specified e-value threshold is < 1E-45 for easy-search Nov 30, 2020
@milot-mirdita
Copy link
Member

I think I know what's going on. We read the value as float (instead of double) during command line parameter parsing. I can look into fixing it later. The lowest you should be currently able to specify is about 1e-38.

@milot-mirdita
Copy link
Member

Should be fixed now. Statically built binaries with the fix will be available at http://mmseqs.com/latest/ in about an hour. We will also make a new release in the next few weeks.

@chrismacdermaid
Copy link
Author

Terrific. Thank you. I'll let you know if things behave accordingly. I suspected it might be single precision float, but the numbers weren't quite lining up. Glad you all were able to pinpoint it and fix it so quickly. Much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants