Skip to content
Martin Asser Hansen edited this page Oct 2, 2015 · 5 revisions

Biopiece: soap_seq

Description

soap_seq uses soap to match short sequences from the stream to a specified genome or sequence file in FASTA format. soap_seq allows for up to three mismatches in the mapping, but allows a maximum number of 1000 hits. Mathing is done progressively, so that if a tag is matches perfectly to a uniquely sequence, then matching is terminated with one hit. Alternatively, if no perfect matches are found, then matching with one mismatch is tried - only the first 1000 hits are reported, but only if there are zero matches soap tries matching with two mismatches.

This behaviour of soap is not verified !

Soap must be installed on your system in order for soap_seq to work. Read more here:

http://soap.genomics.org.cn/

Usage

... | soap_seq [options] -i <FASTA file>

or

... | soap_seq [options] -g <genome>

Options

[-?          | --help]               #  Print full usage description.
[-i <file>   | --in_file=<file>]     #  Path to FASTA file.
[-g <genome> | --genome=<genome>]    #  Choose genome instead of database.
[-s <uint>   | --seed_size=<uint>]   #  Seed size                      -  Default=10
[-m <uint>]  | --mismatches=<uint>]  #  Number of mismatches allowed   -  Default=2
[-G <uint>]  | --gap_size=<uint>]    #  Maximum gap sized allowed      -  Default=0
[-c <uint>   | --cpus=<uint>]        #  Number of CPUs to use          -  Default=1
[-I <file!>  | --stream_in=<file!>]  #  Read input from stream file    -  Default=STDIN
[-O <file>   | --stream_out=<file>]  #  Write output to stream file    -  Default=STDOUT
[-v          | --verbose]            #  Verbose output.

Examples

To match short sequence in a FASTA file against a reference sequence in another FASTA file, do:

read_fasta -i <query FASTA file(s)> | soap_seq -i <reference FASTA file>

To match short sequences against a genome previously formatted with format_genome, do:

read_fasta -i <query FASTA file(s)> | soap_seq -g <genome>

To list avalible genomes use list_genomes.

See also

read_fasta

blast_seq

blat_seq

vmatch_seq

patscan_seq

Author

Martin Asser Hansen - Copyright (C) - All rights reserved.

mail@maasha.dk

July 2008

License

GNU General Public License version 2

http://www.gnu.org/copyleft/gpl.html

Help

soap_seq is part of the Biopieces framework.

http://www.biopieces.org

Clone this wiki locally