-
Notifications
You must be signed in to change notification settings - Fork 23
SB Extract regions
Pull out sub-sequences from each record. If using a richly annotated format, like GenBank, features are deleted or adjusted appropriately.
SeqBuddy uses a custom syntax to specify what regions should be extracted from each sequence, and multiple regions can either be passed in as separate arguments or combined into a single comma-separated string.
Single positions: This is the simplest syntax, consisting of a comma-separated list of each position you want extracted.
e.g., "1,2,4,45,79,305"
Ranges: Use two numbers separated by a colon to designate a range of residues, similar to python list notation. If the left side of the range is left blank, the range starts at the first residue, and if the right side is left blank, the range extends to the final residue. Negative numbers represent the number of residues from the end of the sequence.
e.g., "5:200"
"400:-1"
":245"
Every Nth residue: Use a forward slash to indicate ordered, but non-contiguous, sequences. For example, every 10th residue. The left side of the slash can also accept the colon notation to specify a sub-range.
e.g., "1/10"
"1:10/100"
LOCUS Mle-Panxα12 403 aa UNA 02-JAN-2015
DEFINITION cDNA - ML25997a.
ACCESSION Mle-Panxα12
VERSION Mle-Panxα12
KEYWORDS .
SOURCE
ORGANISM . . .
.
FEATURES Location/Qualifiers
CDS 1..403
/label="ML25997a"
/created_by="User"
TMD1 28..48
TMD2 131..151
TMD3 215..235
TMD4 299..329
ORIGIN
1 mvidilsgfk gitpfkgitl ddgwdqinrs fmfvlcvlmg tvvtvrqyag giiscdgftk
61 ysgsfsedyc wtqglytike aydlltmnvp ypgvipedmp tciereling grvscpdpet
121 vkpptrvyhl wyqwvpfyfw laaaafffpy liykhfgvgd lkpliqmlhn pivdegdqnc
181 maekasmwlf yklnvfmnen tifailtekh rlffivmlvk vlyliisila lyltdemfhi
241 gsfvsygsew atslpegdne ttlvkdklfp kmvaceikrw gptgleeeqg mcvlapnvin
301 qylflilwfa iifciacncl svlfaltklv fvlgsykrll asaflkdelh ykhmffnigt
361 sgrvllqiva tnvsprvfes imanlatkli aerlkgngkg sv*
//
Extract a range of residues, using the colon (:) operator.
$: sb Mle-Panxα12.gb -er "11:100"
LOCUS Mle-Panxα12 90 aa UNK 01-JAN-1980
DEFINITION cDNA - ML25997a.
ACCESSION Mle-Panxα12
VERSION Mle-Panxα12
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
CDS 1..90
/created_by="User"
/label="ML25997a"
TMD1 18..38
ORIGIN
1 gitpfkgitl ddgwdqinrs fmfvlcvlmg tvvtvrqyag giiscdgftk ysgsfsedyc
61 wtqglytike aydlltmnvp ypgvipedmp
//
Leave the left side of the range empty to begin extracting from the start of the sequence.
$: sb Mle-Panxα12.gb -er ":250"
LOCUS Mle-Panxα12 250 aa UNK 01-JAN-1980
DEFINITION cDNA - ML25997a.
ACCESSION Mle-Panxα12
VERSION Mle-Panxα12
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
CDS 1..250
/created_by="User"
/label="ML25997a"
TMD1 28..48
TMD2 131..151
TMD3 215..235
ORIGIN
1 mvidilsgfk gitpfkgitl ddgwdqinrs fmfvlcvlmg tvvtvrqyag giiscdgftk
61 ysgsfsedyc wtqglytike aydlltmnvp ypgvipedmp tciereling grvscpdpet
121 vkpptrvyhl wyqwvpfyfw laaaafffpy liykhfgvgd lkpliqmlhn pivdegdqnc
181 maekasmwlf yklnvfmnen tifailtekh rlffivmlvk vlyliisila lyltdemfhi
241 gsfvsygsew
//
Leave the right side of the range empty to extract until the end of the sequence.
$: sb Mle-Panxα12.gb -er "250:"
LOCUS Mle-Panxα12 154 aa UNK 01-JAN-1980
DEFINITION cDNA - ML25997a.
ACCESSION Mle-Panxα12
VERSION Mle-Panxα12
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
CDS 1..154
/created_by="User"
/label="ML25997a"
TMD4 50..80
ORIGIN
1 watslpegdn ettlvkdklf pkmvaceikr wgptgleeeq gmcvlapnvi nqylflilwf
61 aiifciacnc lsvlfaltkl vfvlgsykrl lasaflkdel hykhmffnig tsgrvllqiv
121 atnvsprvfe simanlatkl iaerlkgngk gsv*
//
Use negative numbers to specify distance from the rear of the sequence.
$: sb Mle-Panxα12.gb -er "100:-100"
LOCUS Mle-Panxα12 205 aa UNK 01-JAN-1980
DEFINITION cDNA - ML25997a.
ACCESSION Mle-Panxα12
VERSION Mle-Panxα12
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
CDS 1..205
/created_by="User"
/label="ML25997a"
TMD2 32..52
TMD3 116..136
TMD4 200..205
ORIGIN
1 ptcierelin ggrvscpdpe tvkpptrvyh lwyqwvpfyf wlaaaafffp yliykhfgvg
61 dlkpliqmlh npivdegdqn cmaekasmwl fyklnvfmne ntifailtek hrlffivmlv
121 kvlyliisil alyltdemfh igsfvsygse watslpegdn ettlvkdklf pkmvaceikr
181 wgptgleeeq gmcvlapnvi nqylf
//
Pull out all hydrophobic residues from the transmembrane domains by specifying individual residues and ranges
$: sb Mle-Panxα12.gb -er "32,34,35,37,38,42,43" "135,141,151" "215:219,221,222,224:226,228,229,231,233" "305:307,311,312,315,320,322,323,326"
LOCUS Mle-Panxα12 34 aa UNK 01-JAN-1980
DEFINITION cDNA - ML25997a.
ACCESSION Mle-Panxα12
VERSION Mle-Panxα12
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
CDS 1..34
/created_by="User"
/label="ML25997a"
TMD1 1..7
TMD2 8..10
TMD3 11..24
TMD4 25..34
ORIGIN
1 mvlvlvvvll ivmlvvllii illlliliii lvll
//
Extract every tenth residue using the forward-slash (/) operator (starting at residue #1).
$: sb Mle-Panxα12.gb -er "1/10"
LOCUS Mle-Panxα12 40 aa UNK 01-JAN-1980
DEFINITION cDNA - ML25997a.
ACCESSION Mle-Panxα12
VERSION Mle-Panxα12
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
CDS 1..40
/created_by="User"
/label="ML25997a"
TMD1 3..4
TMD2 14..15
TMD3 22..23
TMD4 30..32
ORIGIN
1 klsggkcepp gtlwydncfn hkaiwepwgn alvlhtasig
//
Extract the first three residues of every ten by mixing the colon (:) and forward-slash (/) operators.
$: sb Mle-Panxα12.gb -er "1:3/10"
LOCUS Mle-Panxα12 123 aa UNK 01-JAN-1980
DEFINITION cDNA - ML25997a.
ACCESSION Mle-Panxα12
VERSION Mle-Panxα12
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
CDS 1..123
/created_by="User"
/label="ML25997a"
TMD1 10..15
TMD2 40..46
TMD3 67..72
TMD4 91..99
ORIGIN
1 mvigitddgf mftvvgiiys gwtqaydypg tcigrvvkpw yqlaaliylk ppivmaeykl
61 tifrlfvlyl ylgsfatstt lkmvgptmcv qyliifsvlf vlasaykhsg rtnvimaaer
121 sv*
//
Wacky example to illustrate how flexible the syntax is. NOTE! If you use a minus sign (-), make sure there is a space between your quotation mark and the minus. Otherwise python thinks you're including a new flag.
$: sb Mle-Panxα12.gb -er " -5:8/10,45,124" "60:-100,5:42,78,-5" "1/50"
LOCUS Mle-Panxα12 325 aa UNK 01-JAN-1980
DEFINITION cDNA - ML25997a.
ACCESSION Mle-Panxα12
VERSION Mle-Panxα12
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
CDS 1..325
/created_by="User"
/label="ML25997a"
TMD1 25..43
TMD2 119..139
TMD3 203..223
TMD4 287..301
ORIGIN
1 milsgfkgit pfkgitlddg wdqinrsfmf vlcvlmgtvv rqygdgfkys gsfsedycwt
61 qglytikeay dlltmnvpyp gvipedmptc ierelinggr vscpdpetvk pptrvyhlwy
121 qwvpfyfwla aaafffpyli ykhfgvgdlk pliqmlhnpi vdegdqncma ekasmwlfyk
181 lnvfmnenti failtekhrl ffivmlvkvl yliisilaly ltdemfhigs fvsygsewat
241 slpegdnett lvkdklfpkm vaceikrwgp tgleeeqgmc vlapnvinqy lfilwacnlt
301 kykrkdeyfn ilqirvfatk gngks
//