Skip to content

Commit

Permalink
VSEARCH 1.1.0: Support for options --quiet and --log
Browse files Browse the repository at this point in the history
  • Loading branch information
torognes committed Feb 20, 2015
1 parent 0b6ccd3 commit 5bdba2f
Showing 18 changed files with 558 additions and 296 deletions.
14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -32,22 +32,22 @@ If you can't find an answer in the VSEARCH documentation, please visit the [VSEA

In the example below, VSEARCH will identify sequences in the file database.fsa that are at least 90% identical on the plus strand to the query sequences in the file queries.fsa and write the results to the file alnout.txt.

`./vsearch-1.0.16-linux-x86_64 --usearch_global queries.fsa --db database.fsa --id 0.9 --alnout alnout.txt`
`./vsearch-1.1.0-linux-x86_64 --usearch_global queries.fsa --db database.fsa --id 0.9 --alnout alnout.txt`

## Download and install

The latest releases of VSEARCH are available [here](https://github.com/torognes/vsearch/releases).

Binary executables of VSEARCH are available in the `bin` folder for [GNU/Linux on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.0.16-linux-x86_64) and [Apple Mac OS X on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.0.16-osx-x86_64). These executables include support for input files compressed by zlib and bzip2 (with files usually ending in .gz or .bz2).
Binary executables of VSEARCH are available in the `bin` folder for [GNU/Linux on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.1.0-linux-x86_64) and [Apple Mac OS X on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.1.0-osx-x86_64). These executables include support for input files compressed by zlib and bzip2 (with files usually ending in .gz or .bz2).

Download the appropriate executable and make a symbolic link in a folder included in your `$PATH` from `vsearch` to the appropriate binary. You may use the following commands (assuming `~/bin` is in your `$PATH`):

```sh
cd ~
mkdir -p bin
cd bin
wget https://github.com/torognes/vsearch/releases/download/v1.0.16/vsearch-1.0.16-linux-x86_64
ln -s vsearch-1.0.16-linux-x86_64 vsearch
wget https://github.com/torognes/vsearch/releases/download/v1.1.0/vsearch-1.1.0-linux-x86_64
ln -s vsearch-1.1.0-linux-x86_64 vsearch
```

Substitute `linux` with `osx` in those lines if you're on a Mac.
@@ -128,6 +128,8 @@ General options:
* `--minseqlength <int>` (Default 1 for sort/shuffle or 32 for search/dereplicate)
* `--notrunclabels`
* `--threads <int>` (Default 0 means all available cores)
* `--quiet`
* `--log <filename>`

Chimera detection options:

@@ -344,7 +346,7 @@ The main contributors to VSEARCH:
* Tom&aacute;&scaron; Flouri <tomas.flouri@h-its.org> (Coding, testing)
* Umer Zeeshan Ijaz <umer.ijaz@glasgow.ac.uk> (Feature suggestions)
* Fr&eacute;d&eacute;ric Mah&eacute; <mahe@rhrk.uni-kl.de> (Documentation, testing, feature suggestions)
* Ben Nichols <b.nichols.1@research.gla.ac.uk> (evaluation)
* Ben Nichols <b.nichols.1@research.gla.ac.uk> (Evaluation)
* Christopher Quince <c.quince@warwick.ac.uk> (Initiator, feature suggestions, evaluation)
* Torbj&oslash;rn Rognes <torognes@ifi.uio.no> (Coding, testing, documentation, evaluation)

@@ -361,7 +363,7 @@ Thanks to the following for patches and other suggestions for improvements:

No papers about VSEARCH have been published yet, but a manuscript is in preparation.
For now, please cite the [VSEARCH GitHub repository](https://github.com/torognes/vsearch).
Release 1.0.14 has doi [10.5281/zenodo.14860](http://dx.doi.org/10.5281/zenodo.14860).
Release 1.0.16 has doi [10.5281/zenodo.15524](http://dx.doi.org/10.5281/zenodo.15524).


## Test datasets
58 changes: 40 additions & 18 deletions doc/vsearch.1
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
.\" ============================================================================
.TH vsearch 1 "February 19, 2015" "version 1.0.16" "USER COMMANDS"
.TH vsearch 1 "February 20, 2015" "version 1.1.0" "USER COMMANDS"
.\" ============================================================================
.SH NAME
vsearch \(em chimera detection, clustering, dereplication, masking, pairwise alignment, searching, shuffling and sorting of amplicons from metagenomic projects.
@@ -142,17 +142,23 @@ searching). We start with general options that apply to all themes.
General options:
.RS
.TP 9
.B \-\-help
Display a short help and exit.
.TP
.B \-\-version
Output version information and exit.
.TP
.BI \-\-fasta_width\~ "positive integer"
Fasta files produced by \fBvsearch\fR are wrapped (sequences are
written on lines of \fIinteger\fR nucleotides, 80 by default). Set
that value to 0 to eliminate the wrapping.
.TP
.B \-\-help
Display a short help and exit.
.TP
.BI \-\-log \0filename
Write messages to the specified log file. Information written includes
program version, amount of memory available, number of cores and command
line options. The start and finish times are also recorded as well as
the elapsed time. The maximum amount of memory consumed is included.
The different commands will usually also write some information about
their results. Both fatal, warning and informational messages are
written.
.TP
.BI \-\-maxseqlength\~ "positive integer"
All \fBvsearch\fR operations will discard sequences of length equal or
greater than \fIinteger\fR (50,000 nucleotides by default).
@@ -165,6 +171,13 @@ than \fIinteger\fR (1 nucleotide by default for sorting or shuffling,
.B \-\-notrunclabels
Do not truncate sequence labels at first space, use the full header in
output files.
.TP
.B \-\-quiet
Suppress all output to stdout and stdout except for warnings and fatal
error messages.
.TP
.B \-\-version
Output version information and exit.
.RE
.PP
.\" ----------------------------------------------------------------------------
@@ -240,7 +253,7 @@ nucleotide sequence is strictly identical with the query sequence.
.BI \-\-threads\~ "positive integer"
Number of computation threads to use (1 to 256) with \-\-uchime_ref.
The number of threads should be lesser or equal to the number of
available CPU cores. The default is to use all available ressources
available CPU cores. The default is to use all available resources
and to launch one thread per logical core.
.TP
.BI \-\-uchime_denovo \0filename
@@ -430,7 +443,7 @@ strand only (default) or check \fIboth\fR strands.
.BI \-\-threads\~ "positive integer"
Number of computation threads to use (1 to 256). The number of threads
should be lesser or equal to the number of available CPU cores. The
default is to use all available ressources and to launch one thread
default is to use all available resources and to launch one thread
per logical core.
.TP
.BI \-\-uc \0filename
@@ -446,7 +459,7 @@ Most searching options also apply to clustering:
.br
\-\-alnout, \-\-blast6out, \-\-fastapairs, \-\-matched,
\-\-notmatched, \-\-maxaccept, \-\-maxreject, \-\-samout, \-\-userout,
\-\-userfields, score filtering, \-\-gap penalties, masking. (see the
\-\-userfields, score filtering, gap penalties, masking. (see the
Searching section).
.RE
.PP
@@ -549,7 +562,7 @@ Mask simple repeats and low-complexity regions in sequences using the
.BI \-\-threads\~ "positive integer"
Number of computation threads to use (1 to 256). The number of threads
should be lesser or equal to the number of available CPU cores. The
default is to use all available ressources and to launch one thread
default is to use all available resources and to launch one thread
per logical core.
.RE
.PP
@@ -566,22 +579,22 @@ identity level with \-\-id to discard weak alignments. Most other
accept/reject options (see Searching options below) may also be
used. Sequences are aligned on their \fIplus\fR strand only.
.TP 9
.BI \-\-allpairs_global \0filename
Perform optimal global pairwise alignments of all vs. all fasta
sequences contained in \fIfilename\fR. This command is multi-threaded.
.TP
.B \-\-acceptall
Write the results of all alignments to output files. This option
overrides all other accept/reject options (including \-\-id).
.TP
.BI \-\-allpairs_global \0filename
Perform optimal global pairwise alignments of all vs. all fasta
sequences contained in \fIfilename\fR. This command is multi-threaded.
.TP
.BI \-\-id \0real
Reject the sequence match if the pairwise identity is lower than
\fIreal\fR (value ranging from 0.0 to 1.0 included).
.TP
.BI \-\-threads\~ "positive integer"
Number of computation threads to use (1 to 256). The number of threads
should be lesser or equal to the number of available CPU cores. The
default is to use all available ressources and to launch one thread
default is to use all available resources and to launch one thread
per logical core.
.RE
.PP
@@ -706,7 +719,7 @@ terminal gaps (left or right), in both query and target sequences
(i.e. 20I/2E). If only a numerical value is given, without any
sequence or location symbol, then the penalty applies to all gap
openings. To forbid gap-opening, an infinite penalty value can be
declared with the symbol "*". Tu use \fBvsearch\fR as a semi-global
declared with the symbol "*". To use \fBvsearch\fR as a semi-global
aligner, a null-penalty can be applied to the left (L) or right (R)
gaps.
.br
@@ -937,7 +950,7 @@ length. Internal or terminal gaps are not taken into account.
.BI \-\-threads\~ "positive integer"
Number of computation threads to use (1 to 256). The number of threads
should be lesser or equal to the number of available CPU cores. The
default is to use all available ressources and to launch one thread
default is to use all available resources and to launch one thread
per logical core.
.TP
.B \-\-top_hits_only
@@ -1560,6 +1573,15 @@ excessive stack memory usage.
.BR v1.0.15\~ "released February 18th, 2015"
Fix bug in calculation of identity metric between sequences when using
the MBL definition (\-\-iddef 3).
.TP
.BR v1.0.16\~ "released February 19th, 2015"
Integrated patches from Debian for increased compatibility with
various architectures.
.TP
.BR v1.1.0\~ "released February 20th, 2015"
Added the \-\-quiet option to suppress all output to stdout and stdout
except for warnings and fatal errors.
Added the \-\-log option to write messages to a log file.
.LP
.\" ============================================================================
.\" TODO:
Binary file modified doc/vsearch_manual.pdf
Binary file not shown.
8 changes: 8 additions & 0 deletions src/align.cc
Original file line number Diff line number Diff line change
@@ -412,6 +412,14 @@ void nw_align(char * dseq,
fprintf(stderr, "WARNING: Error with query no %lu and db sequence no %lu:\n", queryno, dbseqno);
fprintf(stderr, "Initial and recomputed alignment score disagreement: %lu %lu\n", dist, score);
fprintf(stderr, "Alignment: %s\n", cigar);

if (opt_log)
{
fprintf(fp_log, "WARNING: Error with query no %lu and db sequence no %lu:\n", queryno, dbseqno);
fprintf(fp_log, "Initial and recomputed alignment score disagreement: %lu %lu\n", dist, score);
fprintf(fp_log, "Alignment: %s\n", cigar);
fprintf(fp_log, "\n");
}
}
#endif
}
15 changes: 2 additions & 13 deletions src/align_simd.cc
Original file line number Diff line number Diff line change
@@ -1020,10 +1020,7 @@ void search16(s16info_s * s,
signed short h_min_c = h_min_array[c];
signed short h_max_c = h_max_array[c];
if ((h_min_c <= score_min) || (h_max_c >= score_max))
{
overflow[c] = true;
// fprintf(stderr, "h_min: %d, h_max: %d\n", h_min_c, h_max_c);
}
overflow[c] = true;
}
}
}
@@ -1071,11 +1068,6 @@ void search16(s16info_s * s,

if (overflow[c])
{
#if 0
fprintf(stderr, "WARNING! Alignment overflow!\n");
fprintf(stderr, "Seqid: %ld length: %lu\n",
seq_id[c], d_length[c]);
#endif
pscores[cand_id] = SHRT_MAX;
paligned[cand_id] = 0;
pmatches[cand_id] = 0;
@@ -1260,10 +1252,7 @@ void search16(s16info_s * s,
signed short h_max_c = h_max_array[c];
if ((h_min_c <= score_min) ||
(h_max_c >= score_max))
{
overflow[c] = true;
// fprintf(stderr, "h_min: %d, h_max: %d\n", h_min_c, h_max_c);
}
overflow[c] = true;
}
}
}
11 changes: 9 additions & 2 deletions src/allpairs.cc
Original file line number Diff line number Diff line change
@@ -579,8 +579,15 @@ void allpairs_global(char * cmdline, char * progheader)
allpairs_thread_worker_run();
progress_done();

fprintf(stderr, "Matching query sequences: %d of %d (%.2f%%)\n",
qmatches, queries, 100.0 * qmatches / queries);
if (!opt_quiet)
fprintf(stderr, "Matching query sequences: %d of %d (%.2f%%)\n",
qmatches, queries, 100.0 * qmatches / queries);

if (opt_log)
{
fprintf(fp_log, "Matching query sequences: %d of %d (%.2f%%)\n\n",
qmatches, queries, 100.0 * qmatches / queries);
}

pthread_mutex_destroy(&mutex_output);
pthread_mutex_destroy(&mutex_input);
49 changes: 38 additions & 11 deletions src/chimera.cc
Original file line number Diff line number Diff line change
@@ -1361,7 +1361,7 @@ void chimera()
/* override any options the user might have set */
opt_maxaccepts = few;
opt_maxrejects = rejects;
opt_id = 0.0;
opt_id = 0.75;
opt_strand = 1;

if (opt_uchime_denovo)
@@ -1404,22 +1404,49 @@ void chimera()
progress_total = db_getnucleotidecount();
}

if (opt_log)
{
fprintf(fp_log, "%8.2f minh\n", opt_minh);
fprintf(fp_log, "%8.2f xn\n", opt_xn);
fprintf(fp_log, "%8.2f dn\n", opt_dn);
fprintf(fp_log, "%8.2f xa\n", 1.0);
fprintf(fp_log, "%8.2f mindiv\n", opt_mindiv);
fprintf(fp_log, "%8.2f id\n", opt_id);
fprintf(fp_log, "%8d maxp\n", 2);
fprintf(fp_log, "\n");
}


progress_init("Detecting chimeras", progress_total);

chimera_threads_run();

progress_done();

fprintf(stderr,
"Found %d (%.1f%%) chimeras, %d (%.1f%%) non-chimeras,\n"
"and %d (%.1f%%) suspicious candidates in %d sequences.\n",
chimera_count,
100.0 * chimera_count / seqno,
nonchimera_count,
100.0 * nonchimera_count / seqno,
(seqno - chimera_count - nonchimera_count),
100.0 * (seqno - chimera_count - nonchimera_count) / seqno,
seqno);
if (!opt_quiet)
fprintf(stderr,
"Found %d (%.1f%%) chimeras, %d (%.1f%%) non-chimeras,\n"
"and %d (%.1f%%) suspicious candidates in %d sequences.\n",
chimera_count,
100.0 * chimera_count / seqno,
nonchimera_count,
100.0 * nonchimera_count / seqno,
(seqno - chimera_count - nonchimera_count),
100.0 * (seqno - chimera_count - nonchimera_count) / seqno,
seqno);

if (opt_log)
{
if (opt_uchime_ref)
fprintf(fp_log, "%s", opt_uchime_ref);
else
fprintf(fp_log, "%s", opt_uchime_denovo);
fprintf(fp_log, ": %d/%d chimeras (%.1f%%)\n",
chimera_count,
seqno,
100.0 * chimera_count / seqno);
}


if (opt_uchime_ref)
query_close();
Loading

0 comments on commit 5bdba2f

Please sign in to comment.