-
Notifications
You must be signed in to change notification settings - Fork 23
get_seq
get_seq. The subsequences can be obtained explicitly or from BED/PSL/BLAST entries in the stream.
get_seq [options] <-i index> <-s seq_name>
or
... | get_seq [options] <-i index>
[-? | --help] # Print full usage description.
[-s <string> | --seq_name=<string>] # Explicit sequence name.
[-b <uint> | --beg=<uint>] # Begin position of subsequence (first residue=1).
[-e <uint> | --end=<uint>] # End position of subsequence.
[-l <uint> | --len=<uint>] # Length of subsequence.
[-I <file!> | --stream_in=<file!>] # Read input from stream file - Default=STDIN
[-O <file> | --stream_out=<file>] # Write output to stream file - Default=STDOUT
[-v | --verbose] # Verbose output.
Consider the following FASTA entries in the file test.fna
:
>test1
CTTAGTACGTTAGCCATGAA
>test2
GAGCTTAGTACGTTAGCCAT
>test3
TGAGGGTTTAGTTCGTTAAC
>test4
ACATGAGAGCTTAGTACGTG
>test5
TAAACATGAGAGCTTAGTAA
First, we need a sequence index which we make with create_seq_index:
read_fasta -i test.fna | create_seq_index -d my_dir -i my_index -x
This creates the index files:
my_dir/my_index.index
my_dir/my_index.seq
Now, we can make a lookup of test2
like this:
get_seq -i my_dir/my_index -s test2
SEQ: GAGCTTAGTACGTTAGCCAT
SEQ_LEN: 20
SEQ_NAME: test2
---
Now, we can specify a begin position (which is 1-based) like this:
get_seq -i my_dir/my_index -s test2 -b 5
SEQ: TTAGTACGTTAGCCAT
SEQ_LEN: 16
SEQ_NAME: test2
---
If the begin position is longer than the sequnce, no record is emitted.
We can also a end position:
get_seq -i my_dir/my_index -s test2 -b 5 -e 10
SEQ: TTAGTA
SEQ_LEN: 6
SEQ_NAME: test2
---
Or a length:
get_seq -i my_dir/my_index -s test2 -b 5 -l 10
SEQ: TTAGTACGTT
SEQ_LEN: 10
SEQ_NAME: test2
---
Finally, it is possible to get multiple sequences from the stream where get_seq uses the
value of SEQ_NAME
, BEG
, END
, and LEN
to extract subsequences.
Consider the following table in test.tab
:
#SEQ_NAME BEG LEN
test1 5 5
test2 7 5
test3 9 5
test4 11 5
test5 13 5
read_tab -i test.tab | get_seq -i my_dir/my_index
BEG: 5
SEQ: TACGT
SEQ_LEN: 5
SEQ_NAME: test1
LEN: 5
---
BEG: 7
SEQ: GTACG
SEQ_LEN: 5
SEQ_NAME: test2
LEN: 5
---
BEG: 9
SEQ: AGTTC
SEQ_LEN: 5
SEQ_NAME: test3
LEN: 5
---
BEG: 11
SEQ: TAGTA
SEQ_LEN: 5
SEQ_NAME: test4
LEN: 5
---
BEG: 13
SEQ: TTAGT
SEQ_LEN: 5
SEQ_NAME: test5
LEN: 5
---
Martin Asser Hansen - Copyright (C) - All rights reserved.
September 2009
GNU General Public License version 2
http://www.gnu.org/copyleft/gpl.html
get_seq is part of the Biopieces framework.