-
Notifications
You must be signed in to change notification settings - Fork 23
SB Find pattern
Search for all occurrences of sub-sequence(s) within the input sequences. The start positions of all matches are returned to stderr, and depending on the output format selected, the matches will either be represented directly in the sequences using UPPERCASE (non-matches sequence will be in lowercase), or as annotated 'match' features (GenBank and EMBL format).
Simple strings (case insensitive) are acceptable input, but regular expressions are also understood for more advanced searches.
Optional: Both nucleotide and protein sequences have ambiguity codes (see below), which can be used in place of (or in combination with) regular expressions if desired. SeqBuddy will treat all characters as literal by default, which means the regular expression ATGN{194,1994}(TGA|TAG|TAA)
will look for a sequence with a start codon, followed by 194 to 1994 literal 'N' characters, followed by a stop codon. In this case it would probably makes more sense for the 'N' character to be written [ATCG], which would match any open reading frames between 200 and 2000 residues long. Simply pass in the argument 'ambig' to allow ambiguous characters to represent any of their subset of residues (see example 3).
Nucleotide Code: Bases:
---------------- -----
R.................A or G
Y.................C or T/U
S.................G or C
W.................A or T/U
K.................G or T/U
M.................A or C
B.................C or G or T/U
D.................A or G or T/U
H.................A or C or T/U
V.................A or C or G
N/X...............any base
Amino Acid Code: Three letter Code: Amino Acids:
---------------- ------------------ -----------
B.................Asx.................Aspartic acid or Asparagine
Z.................Glx.................Glutamine or Glutamic acid
X.....................................Any amino acid
>Dme-Panxδ3
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFCWITYTYTVAG
PGLEKHSYYQWVPFVLFFQGLMFYVPHWVWKMDGKIRMITGVDDRDRILKYFVNNTHNGY
SFYFFCELLNFINVIVNIFMVDKFLGGAFMSYGTDVLKFSNMDQDRFDPMIEIFPRLTKC
TFHKFGPSGSVQKHDTLCVLALNILNEKIYIFLWFWFIILATISGVAVLYSVVITRTIRK
EGDFLILHFLSQNLSTRSYSDMLQ
>Dme-Panxδ2
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPIDCIVEIPLGVM
DTYCWIYSTFTVPEGRDVQPGSEKYHKYYQWVCFVLFFQAILFYVPRYLWKSWEGGRLKM
LVDLSVNDKDRKIVDYFGNLNRHNFYAFFFVCEALNFVNVIGQIYFVDFFLDGEFSTYGS
DVLKFTELEPDERIDPMARVFPKVTKCTFHKYGPSGSVQTHDGLCVLPLNIVNEKIYVFL
WFWFIILSIMSISLIYRIAVAPKLRHLLLRARSRAESEVEVAIGDWFLLYQLGKNIDPLI
YKEVISDLEMG
>Dme-Panxδ4
MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPIQCFGDKDMDA
FCWIYGAYLQCAVSKVVENYITYYQWVVLVLLLESFVFYMPAFLWKIWEGGRLKHLCDFK
RTHRVLVNYFETHFRYFVYVFCEILNLSISILNFLLLDVFFGGFWGRYRNALYNQWIAVF
PKCAKCEYKGGPSGSSNIYDYLCLLPLNILNEKIFAFLWIWFILAMLISLKFLYRLAVLY
PMRLQLLRPKKHLQVALNCSFGDWFVLMRVGNNISPELFRKLLEEL
$: sb Drosophila.fa -fp "LLL" "LLY"
#### 4 matches found across 3 sequences for pattern 'LLL' ####
Dme-Panxδ3: None
Dme-Panxδ2: 266-269
Dme-Panxδ4: 31-34, 90-93, 154-157
#### 1 matches found across 3 sequences for pattern 'LLY' ####
Dme-Panxδ3: None
Dme-Panxδ2: 287-290
Dme-Panxδ4: None
>Dme-Panxδ3
gfikidnmvfrchyritailftcciivtannligdpisciipmhvintfcwitytytvag
pglekhsyyqwvpfvlffqglmfyvphwvwkmdgkirmitgvddrdrilkyfvnnthngy
sfyffcellnfinvivnifmvdkflggafmsygtdvlkfsnmdqdrfdpmieifprltkc
tfhkfgpsgsvqkhdtlcvlalnilnekiyiflwfwfiilatisgvavlysvvitrtirk
egdflilhflsqnlstrsysdmlq
>Dme-Panxδ2
mdvfgsvkgllkidqvdnnvfrmhykatviiliafsllvtsrqyigdpidciveiplgvm
dtycwiystftvpegrdvqpgsekyhkyyqwvcfvlffqailfyvprylwksweggrlkm
lvdlsvndkdrkivdyfgnlnrhnfyafffvcealnfvnvigqiyfvdffldgefstygs
dvlkftelepderidpmarvfpkvtkctfhkygpsgsvqthdglcvlplnivnekiyvfl
wfwfiilsimsisliyriavapklrhLLLrarsraesevevaigdwfLLYqlgknidpli
ykevisdlemg
>Dme-Panxδ4
maavkplskylqfkvhiydaiftlhskvtvaLLLactfllsskqyfgdpiqcfgdkdmda
fcwiygaylqcavskvvenyityyqwvvlvLLLesfvfympaflwkiweggrlkhlcdfk
rthrvlvnyfethfryfvyvfceilnlsisilnfLLLdvffggfwgryrnalynqwiavf
pkcakceykggpsgssniydylcllplnilnekifaflwiwfilamlislkflyrlavly
pmrlqllrpkkhlqvalncsfgdwfvlmrvgnnispelfrklleel
$: sb Drosophila.fa -fp "[LIY]{3}" -o genbank
#### 12 matches found across 3 sequences for pattern '[LIY]{3}' ####
Dme-Panxδ3: 208-211, 217-220, 244-247
Dme-Panxδ2: 29-32, 244-247, 253-256, 266-269, 287-290, 298-301
Dme-Panxδ4: 31-34, 90-93, 154-157
LOCUS Dme-Panxδ3 264 aa UNK 01-JAN-1980
DEFINITION Dme-Panxδ3.
ACCESSION Dme-Panxδ3
VERSION Dme-Panxδ3
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
match 209..211
/regex="[LIY]{3}"
/added_by="SeqBuddy"
match 218..220
/regex="[LIY]{3}"
/added_by="SeqBuddy"
match 245..247
/regex="[LIY]{3}"
/added_by="SeqBuddy"
ORIGIN
1 gfikidnmvf rchyritail ftcciivtan nligdpisci ipmhvintfc witytytvag
61 pglekhsyyq wvpfvlffqg lmfyvphwvw kmdgkirmit gvddrdrilk yfvnnthngy
121 sfyffcelln finvivnifm vdkflggafm sygtdvlkfs nmdqdrfdpm ieifprltkc
181 tfhkfgpsgs vqkhdtlcvl alnilnekiy iflwfwfiil atisgvavly svvitrtirk
241 egdflilhfl sqnlstrsys dmlq
//
LOCUS Dme-Panxδ2 311 aa UNK 01-JAN-1980
DEFINITION Dme-Panxδ2.
ACCESSION Dme-Panxδ2
VERSION Dme-Panxδ2
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
match 30..32
/regex="[LIY]{3}"
/added_by="SeqBuddy"
match 245..247
/regex="[LIY]{3}"
/added_by="SeqBuddy"
match 254..256
/regex="[LIY]{3}"
/added_by="SeqBuddy"
match 267..269
/regex="[LIY]{3}"
/added_by="SeqBuddy"
match 288..290
/regex="[LIY]{3}"
/added_by="SeqBuddy"
match 299..301
/regex="[LIY]{3}"
/added_by="SeqBuddy"
ORIGIN
1 mdvfgsvkgl lkidqvdnnv frmhykatvi iliafsllvt srqyigdpid civeiplgvm
61 dtycwiystf tvpegrdvqp gsekyhkyyq wvcfvlffqa ilfyvprylw ksweggrlkm
121 lvdlsvndkd rkivdyfgnl nrhnfyafff vcealnfvnv igqiyfvdff ldgefstygs
181 dvlkftelep deridpmarv fpkvtkctfh kygpsgsvqt hdglcvlpln ivnekiyvfl
241 wfwfiilsim sisliyriav apklrhlllr arsraeseve vaigdwflly qlgknidpli
301 ykevisdlem g
//
LOCUS Dme-Panxδ4 286 aa UNK 01-JAN-1980
DEFINITION Dme-Panxδ4.
ACCESSION Dme-Panxδ4
VERSION Dme-Panxδ4
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
match 32..34
/regex="[LIY]{3}"
/added_by="SeqBuddy"
match 91..93
/regex="[LIY]{3}"
/added_by="SeqBuddy"
match 155..157
/regex="[LIY]{3}"
/added_by="SeqBuddy"
ORIGIN
1 maavkplsky lqfkvhiyda iftlhskvtv alllactfll sskqyfgdpi qcfgdkdmda
61 fcwiygaylq cavskvveny ityyqwvvlv lllesfvfym paflwkiweg grlkhlcdfk
121 rthrvlvnyf ethfryfvyv fceilnlsis ilnfllldvf fggfwgryrn alynqwiavf
181 pkcakceykg gpsgssniyd ylcllplnil nekifaflwi wfilamlisl kflyrlavly
241 pmrlqllrpk khlqvalncs fgdwfvlmrv gnnispelfr klleel
//
Include the argument ambig
to search with IUPAC ambiguity codes instead of literal letters.
$: sb Drosophila.fa -fp "[bz]x{50,100}[bz]" "ambig"
#### 7 matches found across 3 sequences for pattern '[bz]x{50,100}[bz]' ####
Dme-Panxδ3: 5-106, 113-207
Dme-Panxδ2: 1-99, 113-195, 218-309
Dme-Panxδ4: 11-109, 117-212
>Dme-Panxδ3
gfikiDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFCWITYTYTVAG
PGLEKHSYYQWVPFVLFFQGLMFYVPHWVWKMDGKIRMITGVDDRDrilkyfvNNTHNGY
SFYFFCELLNFINVIVNIFMVDKFLGGAFMSYGTDVLKFSNMDQDRFDPMIEIFPRLTKC
TFHKFGPSGSVQKHDTLCVLALNILNEkiyiflwfwfiilatisgvavlysvvitrtirk
egdflilhflsqnlstrsysdmlq
>Dme-Panxδ2
mDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPIDCIVEIPLGVM
DTYCWIYSTFTVPEGRDVQPGSEKYHKYYQWVCFVLFFQailfyvprylwkswEGGRLKM
LVDLSVNDKDRKIVDYFGNLNRHNFYAFFFVCEALNFVNVIGQIYFVDFFLDGEFSTYGS
DVLKFTELEPDERIDpmarvfpkvtkctfhkygpsgsvQTHDGLCVLPLNIVNEKIYVFL
WFWFIILSIMSISLIYRIAVAPKLRHLLLRARSRAESEVEVAIGDWFLLYQLGKNIDPLI
YKEVISDLEmg
>Dme-Panxδ4
maavkplskylQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPIQCFGDKDMDA
FCWIYGAYLQCAVSKVVENYITYYQWVVLVLLLESFVFYMPAFLWKIWEggrlkhlcDFK
RTHRVLVNYFETHFRYFVYVFCEILNLSISILNFLLLDVFFGGFWGRYRNALYNQWIAVF
PKCAKCEYKGGPSGSSNIYDYLCLLPLNILNEkifaflwiwfilamlislkflyrlavly
pmrlqllrpkkhlqvalncsfgdwfvlmrvgnnispelfrklleel