VSEARCH 1.1.0: Support for options --quiet and --log

torognes · Feb 20, 2015 · 5bdba2f · 5bdba2f
1 parent 0b6ccd3
commit 5bdba2f
Showing 18 changed files with 558 additions and 296 deletions.
diff --git a/README.md b/README.md
@@ -32,22 +32,22 @@ If you can't find an answer in the VSEARCH documentation, please visit the [VSEA
 
 In the example below, VSEARCH will identify sequences in the file database.fsa that are at least 90% identical on the plus strand to the query sequences in the file queries.fsa and write the results to the file alnout.txt.
 
-`./vsearch-1.0.16-linux-x86_64 --usearch_global queries.fsa --db database.fsa --id 0.9 --alnout alnout.txt`
+`./vsearch-1.1.0-linux-x86_64 --usearch_global queries.fsa --db database.fsa --id 0.9 --alnout alnout.txt`
 
 ## Download and install
 
 The latest releases of VSEARCH are available [here](https://github.com/torognes/vsearch/releases).
 
-Binary executables of VSEARCH are available in the `bin` folder for [GNU/Linux on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.0.16-linux-x86_64) and [Apple Mac OS X on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.0.16-osx-x86_64). These executables include support for  input files compressed by zlib and bzip2 (with files usually ending in .gz or .bz2).
+Binary executables of VSEARCH are available in the `bin` folder for [GNU/Linux on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.1.0-linux-x86_64) and [Apple Mac OS X on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.1.0-osx-x86_64). These executables include support for  input files compressed by zlib and bzip2 (with files usually ending in .gz or .bz2).
 
 Download the appropriate executable and make a symbolic link in a folder included in your `$PATH` from `vsearch` to the appropriate binary. You may use the following commands (assuming `~/bin` is in your `$PATH`):
 
 ```sh
 cd ~
 mkdir -p bin
 cd bin
-wget https://github.com/torognes/vsearch/releases/download/v1.0.16/vsearch-1.0.16-linux-x86_64
-ln -s vsearch-1.0.16-linux-x86_64 vsearch
+wget https://github.com/torognes/vsearch/releases/download/v1.1.0/vsearch-1.1.0-linux-x86_64
+ln -s vsearch-1.1.0-linux-x86_64 vsearch
 ```
 
 Substitute `linux` with `osx` in those lines if you're on a Mac.
@@ -128,6 +128,8 @@ General options:
 * `--minseqlength <int>` (Default 1 for sort/shuffle or 32 for search/dereplicate)
 * `--notrunclabels`
 * `--threads <int>` (Default 0 means all available cores)
+* `--quiet`
+* `--log <filename>`
 
 Chimera detection options:
 
@@ -344,7 +346,7 @@ The main contributors to VSEARCH:
 * Tom&aacute;&scaron; Flouri <tomas.flouri@h-its.org> (Coding, testing)
 * Umer Zeeshan Ijaz <umer.ijaz@glasgow.ac.uk> (Feature suggestions)
 * Fr&eacute;d&eacute;ric Mah&eacute; <mahe@rhrk.uni-kl.de> (Documentation, testing, feature suggestions)
-* Ben Nichols <b.nichols.1@research.gla.ac.uk> (evaluation)
+* Ben Nichols <b.nichols.1@research.gla.ac.uk> (Evaluation)
 * Christopher Quince <c.quince@warwick.ac.uk> (Initiator, feature suggestions, evaluation)
 * Torbj&oslash;rn Rognes <torognes@ifi.uio.no> (Coding, testing, documentation, evaluation)
 
@@ -361,7 +363,7 @@ Thanks to the following for patches and other suggestions for improvements:
 
 No papers about VSEARCH have been published yet, but a manuscript is in preparation.
 For now, please cite the [VSEARCH GitHub repository](https://github.com/torognes/vsearch).
-Release 1.0.14 has doi [10.5281/zenodo.14860](http://dx.doi.org/10.5281/zenodo.14860).
+Release 1.0.16 has doi [10.5281/zenodo.15524](http://dx.doi.org/10.5281/zenodo.15524).
 
 
 ## Test datasets

diff --git a/doc/vsearch.1 b/doc/vsearch.1
@@ -1,5 +1,5 @@
 .\" ============================================================================
-.TH vsearch 1 "February 19, 2015" "version 1.0.16" "USER COMMANDS"
+.TH vsearch 1 "February 20, 2015" "version 1.1.0" "USER COMMANDS"
 .\" ============================================================================
 .SH NAME
 vsearch \(em chimera detection, clustering, dereplication, masking, pairwise alignment, searching, shuffling and sorting of amplicons from metagenomic projects.
@@ -142,17 +142,23 @@ searching). We start with general options that apply to all themes.
 General options:
 .RS
 .TP 9
-.B \-\-help
-Display a short help and exit.
-.TP
-.B \-\-version
-Output version information and exit.
-.TP
 .BI \-\-fasta_width\~ "positive integer"
 Fasta files produced by \fBvsearch\fR are wrapped (sequences are
 written on lines of \fIinteger\fR nucleotides, 80 by default). Set
 that value to 0 to eliminate the wrapping.
 .TP
+.B \-\-help
+Display a short help and exit.
+.TP
+.BI \-\-log \0filename
+Write messages to the specified log file. Information written includes
+program version, amount of memory available, number of cores and command
+line options. The start and finish times are also recorded as well as
+the elapsed time. The maximum amount of memory consumed is included.
+The different commands will usually also write some information about
+their results. Both fatal, warning and informational messages are
+written.
+.TP
 .BI \-\-maxseqlength\~ "positive integer"
 All \fBvsearch\fR operations will discard sequences of length equal or
 greater than \fIinteger\fR (50,000 nucleotides by default).
@@ -165,6 +171,13 @@ than \fIinteger\fR (1 nucleotide by default for sorting or shuffling,
 .B \-\-notrunclabels
 Do not truncate sequence labels at first space, use the full header in
 output files.
+.TP
+.B \-\-quiet
+Suppress all output to stdout and stdout except for warnings and fatal
+error messages.
+.TP
+.B \-\-version
+Output version information and exit.
 .RE
 .PP
 .\" ----------------------------------------------------------------------------
@@ -240,7 +253,7 @@ nucleotide sequence is strictly identical with the query sequence.
 .BI \-\-threads\~ "positive integer"
 Number of computation threads to use (1 to 256) with \-\-uchime_ref.
 The number of threads should be lesser or equal to the number of
-available CPU cores. The default is to use all available ressources
+available CPU cores. The default is to use all available resources
 and to launch one thread per logical core.
 .TP
 .BI \-\-uchime_denovo \0filename
@@ -430,7 +443,7 @@ strand only (default) or check \fIboth\fR strands.
 .BI \-\-threads\~ "positive integer"
 Number of computation threads to use (1 to 256). The number of threads
 should be lesser or equal to the number of available CPU cores. The
-default is to use all available ressources and to launch one thread
+default is to use all available resources and to launch one thread
 per logical core.
 .TP
 .BI \-\-uc \0filename
@@ -446,7 +459,7 @@ Most searching options also apply to clustering:
 .br
 \-\-alnout, \-\-blast6out, \-\-fastapairs, \-\-matched,
 \-\-notmatched, \-\-maxaccept, \-\-maxreject, \-\-samout, \-\-userout,
-\-\-userfields, score filtering, \-\-gap penalties, masking. (see the
+\-\-userfields, score filtering, gap penalties, masking. (see the
 Searching section).
 .RE
 .PP
@@ -549,7 +562,7 @@ Mask simple repeats and low-complexity regions in sequences using the
 .BI \-\-threads\~ "positive integer"
 Number of computation threads to use (1 to 256). The number of threads
 should be lesser or equal to the number of available CPU cores. The
-default is to use all available ressources and to launch one thread
+default is to use all available resources and to launch one thread
 per logical core.
 .RE
 .PP
@@ -566,22 +579,22 @@ identity level with \-\-id to discard weak alignments. Most other
 accept/reject options (see Searching options below) may also be
 used. Sequences are aligned on their \fIplus\fR strand only.
 .TP 9
-.BI \-\-allpairs_global \0filename
-Perform optimal global pairwise alignments of all vs. all fasta
-sequences contained in \fIfilename\fR. This command is multi-threaded.
-.TP
 .B \-\-acceptall
 Write the results of all alignments to output files. This option
 overrides all other accept/reject options (including \-\-id).
 .TP
+.BI \-\-allpairs_global \0filename
+Perform optimal global pairwise alignments of all vs. all fasta
+sequences contained in \fIfilename\fR. This command is multi-threaded.
+.TP
 .BI \-\-id \0real
 Reject the sequence match if the pairwise identity is lower than
 \fIreal\fR (value ranging from 0.0 to 1.0 included).
 .TP
 .BI \-\-threads\~ "positive integer"
 Number of computation threads to use (1 to 256). The number of threads
 should be lesser or equal to the number of available CPU cores. The
-default is to use all available ressources and to launch one thread
+default is to use all available resources and to launch one thread
 per logical core.
 .RE
 .PP
@@ -706,7 +719,7 @@ terminal gaps (left or right), in both query and target sequences
 (i.e. 20I/2E). If only a numerical value is given, without any
 sequence or location symbol, then the penalty applies to all gap
 openings. To forbid gap-opening, an infinite penalty value can be
-declared with the symbol "*". Tu use \fBvsearch\fR as a semi-global
+declared with the symbol "*". To use \fBvsearch\fR as a semi-global
 aligner, a null-penalty can be applied to the left (L) or right (R)
 gaps.
 .br
@@ -937,7 +950,7 @@ length.  Internal or terminal gaps are not taken into account.
 .BI \-\-threads\~ "positive integer"
 Number of computation threads to use (1 to 256). The number of threads
 should be lesser or equal to the number of available CPU cores. The
-default is to use all available ressources and to launch one thread
+default is to use all available resources and to launch one thread
 per logical core.
 .TP
 .B \-\-top_hits_only
@@ -1560,6 +1573,15 @@ excessive stack memory usage.
 .BR v1.0.15\~ "released February 18th, 2015"
 Fix bug in calculation of identity metric between sequences when using
 the MBL definition (\-\-iddef 3).
+.TP
+.BR v1.0.16\~ "released February 19th, 2015"
+Integrated patches from Debian for increased compatibility with
+various architectures.
+.TP
+.BR v1.1.0\~ "released February 20th, 2015"
+Added the \-\-quiet option to suppress all output to stdout and stdout
+except for warnings and fatal errors.
+Added the \-\-log option to write messages to a log file.
 .LP
 .\" ============================================================================
 .\" TODO:

diff --git a/doc/vsearch_manual.pdf b/doc/vsearch_manual.pdf
diff --git a/src/align.cc b/src/align.cc
@@ -412,6 +412,14 @@ void nw_align(char * dseq,
     fprintf(stderr, "WARNING: Error with query no %lu and db sequence no %lu:\n", queryno, dbseqno);
     fprintf(stderr, "Initial and recomputed alignment score disagreement: %lu %lu\n", dist, score);
     fprintf(stderr, "Alignment: %s\n", cigar);
+
+    if (opt_log)
+      {
+        fprintf(fp_log, "WARNING: Error with query no %lu and db sequence no %lu:\n", queryno, dbseqno);
+        fprintf(fp_log, "Initial and recomputed alignment score disagreement: %lu %lu\n", dist, score);
+        fprintf(fp_log, "Alignment: %s\n", cigar);
+        fprintf(fp_log, "\n");
+      }
   }
 #endif
 }
diff --git a/src/align_simd.cc b/src/align_simd.cc
@@ -1020,10 +1020,7 @@ void search16(s16info_s * s,
               signed short h_min_c = h_min_array[c];
               signed short h_max_c = h_max_array[c];
               if ((h_min_c <= score_min) || (h_max_c >= score_max))
-                {
-                  overflow[c] = true;
-                  //                  fprintf(stderr, "h_min: %d, h_max: %d\n", h_min_c, h_max_c);
-                }
+                overflow[c] = true;
             }
         }
     }
@@ -1071,11 +1068,6 @@ void search16(s16info_s * s,
 
             if (overflow[c])
               {
-#if 0
-                fprintf(stderr, "WARNING! Alignment overflow!\n");
-                fprintf(stderr, "Seqid: %ld length: %lu\n",
-                        seq_id[c], d_length[c]);
-#endif
                 pscores[cand_id] = SHRT_MAX;
                 paligned[cand_id] = 0;
                 pmatches[cand_id] = 0;
@@ -1260,10 +1252,7 @@ void search16(s16info_s * s,
               signed short h_max_c = h_max_array[c];
               if ((h_min_c <= score_min) || 
                   (h_max_c >= score_max))
-                {
-                  overflow[c] = true;
-                  //                  fprintf(stderr, "h_min: %d, h_max: %d\n", h_min_c, h_max_c);
-                }
+                overflow[c] = true;
             }
         }
     }

diff --git a/src/allpairs.cc b/src/allpairs.cc
@@ -579,8 +579,15 @@ void allpairs_global(char * cmdline, char * progheader)
   allpairs_thread_worker_run();
   progress_done();
 
-  fprintf(stderr, "Matching query sequences: %d of %d (%.2f%%)\n", 
-          qmatches, queries, 100.0 * qmatches / queries);
+  if (!opt_quiet)
+    fprintf(stderr, "Matching query sequences: %d of %d (%.2f%%)\n", 
+            qmatches, queries, 100.0 * qmatches / queries);
+
+  if (opt_log)
+    {
+      fprintf(fp_log, "Matching query sequences: %d of %d (%.2f%%)\n\n", 
+              qmatches, queries, 100.0 * qmatches / queries);
+    }
 
   pthread_mutex_destroy(&mutex_output);
   pthread_mutex_destroy(&mutex_input);

diff --git a/src/chimera.cc b/src/chimera.cc
@@ -1361,7 +1361,7 @@ void chimera()
   /* override any options the user might have set */
   opt_maxaccepts = few;
   opt_maxrejects = rejects;
-  opt_id = 0.0;
+  opt_id = 0.75;
   opt_strand = 1;
 
   if (opt_uchime_denovo)
@@ -1404,22 +1404,49 @@ void chimera()
       progress_total = db_getnucleotidecount();
     }
 
+  if (opt_log)
+    {
+      fprintf(fp_log, "%8.2f  minh\n", opt_minh);
+      fprintf(fp_log, "%8.2f  xn\n", opt_xn);
+      fprintf(fp_log, "%8.2f  dn\n", opt_dn);
+      fprintf(fp_log, "%8.2f  xa\n", 1.0);
+      fprintf(fp_log, "%8.2f  mindiv\n", opt_mindiv);
+      fprintf(fp_log, "%8.2f  id\n", opt_id);
+      fprintf(fp_log, "%8d  maxp\n", 2);
+      fprintf(fp_log, "\n");
+    }
+
+
   progress_init("Detecting chimeras", progress_total);
 
   chimera_threads_run();
 
   progress_done();
 
-  fprintf(stderr,
-          "Found %d (%.1f%%) chimeras, %d (%.1f%%) non-chimeras,\n"
-          "and %d (%.1f%%) suspicious candidates in %d sequences.\n",
-          chimera_count,
-          100.0 * chimera_count / seqno,
-          nonchimera_count,
-          100.0 * nonchimera_count / seqno,
-          (seqno - chimera_count - nonchimera_count),
-          100.0 * (seqno - chimera_count - nonchimera_count) / seqno,
-          seqno);
+  if (!opt_quiet)
+    fprintf(stderr,
+            "Found %d (%.1f%%) chimeras, %d (%.1f%%) non-chimeras,\n"
+            "and %d (%.1f%%) suspicious candidates in %d sequences.\n",
+            chimera_count,
+            100.0 * chimera_count / seqno,
+            nonchimera_count,
+            100.0 * nonchimera_count / seqno,
+            (seqno - chimera_count - nonchimera_count),
+            100.0 * (seqno - chimera_count - nonchimera_count) / seqno,
+            seqno);
+
+  if (opt_log)
+    {
+      if (opt_uchime_ref)
+        fprintf(fp_log, "%s", opt_uchime_ref);
+      else
+        fprintf(fp_log, "%s", opt_uchime_denovo);
+      fprintf(fp_log, ": %d/%d chimeras (%.1f%%)\n",
+              chimera_count,
+              seqno, 
+              100.0 * chimera_count / seqno);
+    }
+
 
   if (opt_uchime_ref)
     query_close();