Skip to content
Steve Bond edited this page Oct 30, 2017 · 9 revisions

--find_pattern, -fp

Description

Search for all occurrences of sub-sequence(s) within the input sequences. The start positions of all matches are returned to stderr, and depending on the output format selected, the matches will either be represented directly in the sequences using UPPERCASE (non-matches sequence will be in lowercase), or as annotated 'match' features (GenBank and EMBL format).

Arguments

One or more sequence patterns ( str )

Simple strings (case insensitive) are acceptable input, but regular expressions are also understood for more advanced searches.

ambig ( exact string )

Optional: Both nucleotide and protein sequences have ambiguity codes (see below), which can be used in place of (or in combination with) regular expressions if desired. SeqBuddy will treat all characters as literal by default, which means the regular expression ATGN{194,1994}(TGA|TAG|TAA) will look for a sequence with a start codon, followed by 194 to 1994 literal 'N' characters, followed by a stop codon. In this case it would probably makes more sense for the 'N' character to be written [ATCG], which would match any open reading frames between 200 and 2000 residues long. Simply pass in the argument 'ambig' to allow ambiguous characters to represent any of their subset of residues (see example 3).

Nucleotide Code:  Bases:
----------------  -----
R.................A or G
Y.................C or T/U
S.................G or C
W.................A or T/U
K.................G or T/U
M.................A or C
B.................C or G or T/U
D.................A or G or T/U
H.................A or C or T/U
V.................A or C or G
N/X...............any base

Amino Acid Code:  Three letter Code:  Amino Acids:
----------------  ------------------  -----------
B.................Asx.................Aspartic acid or Asparagine
Z.................Glx.................Glutamine or Glutamic acid
X.....................................Any amino acid

Examples

Input file: Drosophila.fa

>Dme-Panxδ3
GFIKIDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFCWITYTYTVAG
PGLEKHSYYQWVPFVLFFQGLMFYVPHWVWKMDGKIRMITGVDDRDRILKYFVNNTHNGY
SFYFFCELLNFINVIVNIFMVDKFLGGAFMSYGTDVLKFSNMDQDRFDPMIEIFPRLTKC
TFHKFGPSGSVQKHDTLCVLALNILNEKIYIFLWFWFIILATISGVAVLYSVVITRTIRK
EGDFLILHFLSQNLSTRSYSDMLQ
>Dme-Panxδ2
MDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPIDCIVEIPLGVM
DTYCWIYSTFTVPEGRDVQPGSEKYHKYYQWVCFVLFFQAILFYVPRYLWKSWEGGRLKM
LVDLSVNDKDRKIVDYFGNLNRHNFYAFFFVCEALNFVNVIGQIYFVDFFLDGEFSTYGS
DVLKFTELEPDERIDPMARVFPKVTKCTFHKYGPSGSVQTHDGLCVLPLNIVNEKIYVFL
WFWFIILSIMSISLIYRIAVAPKLRHLLLRARSRAESEVEVAIGDWFLLYQLGKNIDPLI
YKEVISDLEMG
>Dme-Panxδ4
MAAVKPLSKYLQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPIQCFGDKDMDA
FCWIYGAYLQCAVSKVVENYITYYQWVVLVLLLESFVFYMPAFLWKIWEGGRLKHLCDFK
RTHRVLVNYFETHFRYFVYVFCEILNLSISILNFLLLDVFFGGFWGRYRNALYNQWIAVF
PKCAKCEYKGGPSGSSNIYDYLCLLPLNILNEKIFAFLWIWFILAMLISLKFLYRLAVLY
PMRLQLLRPKKHLQVALNCSFGDWFVLMRVGNNISPELFRKLLEEL

Usage example 1

$: sb Drosophila.fa -fp "LLL" "LLY"

Output

#### 4 matches found across 3 sequences for pattern 'LLL' ####
Dme-Panxδ3: None
Dme-Panxδ2: 266-269
Dme-Panxδ4: 31-34, 90-93, 154-157

#### 1 matches found across 3 sequences for pattern 'LLY' ####
Dme-Panxδ3: None
Dme-Panxδ2: 287-290
Dme-Panxδ4: None

>Dme-Panxδ3
gfikidnmvfrchyritailftcciivtannligdpisciipmhvintfcwitytytvag
pglekhsyyqwvpfvlffqglmfyvphwvwkmdgkirmitgvddrdrilkyfvnnthngy
sfyffcellnfinvivnifmvdkflggafmsygtdvlkfsnmdqdrfdpmieifprltkc
tfhkfgpsgsvqkhdtlcvlalnilnekiyiflwfwfiilatisgvavlysvvitrtirk
egdflilhflsqnlstrsysdmlq
>Dme-Panxδ2
mdvfgsvkgllkidqvdnnvfrmhykatviiliafsllvtsrqyigdpidciveiplgvm
dtycwiystftvpegrdvqpgsekyhkyyqwvcfvlffqailfyvprylwksweggrlkm
lvdlsvndkdrkivdyfgnlnrhnfyafffvcealnfvnvigqiyfvdffldgefstygs
dvlkftelepderidpmarvfpkvtkctfhkygpsgsvqthdglcvlplnivnekiyvfl
wfwfiilsimsisliyriavapklrhLLLrarsraesevevaigdwfLLYqlgknidpli
ykevisdlemg
>Dme-Panxδ4
maavkplskylqfkvhiydaiftlhskvtvaLLLactfllsskqyfgdpiqcfgdkdmda
fcwiygaylqcavskvvenyityyqwvvlvLLLesfvfympaflwkiweggrlkhlcdfk
rthrvlvnyfethfryfvyvfceilnlsisilnfLLLdvffggfwgryrnalynqwiavf
pkcakceykggpsgssniydylcllplnilnekifaflwiwfilamlislkflyrlavly
pmrlqllrpkkhlqvalncsfgdwfvlmrvgnnispelfrklleel

Usage example 2

$: sb Drosophila.fa -fp "[LIY]{3}" -o genbank

Output

#### 12 matches found across 3 sequences for pattern '[LIY]{3}' ####
Dme-Panxδ3: 208-211, 217-220, 244-247
Dme-Panxδ2: 29-32, 244-247, 253-256, 266-269, 287-290, 298-301
Dme-Panxδ4: 31-34, 90-93, 154-157

LOCUS       Dme-Panxδ3               264 aa                     UNK 01-JAN-1980
DEFINITION  Dme-Panxδ3.
ACCESSION   Dme-Panxδ3
VERSION     Dme-Panxδ3
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     match           209..211
                     /regex="[LIY]{3}"
                     /added_by="SeqBuddy"
     match           218..220
                     /regex="[LIY]{3}"
                     /added_by="SeqBuddy"
     match           245..247
                     /regex="[LIY]{3}"
                     /added_by="SeqBuddy"
ORIGIN
        1 gfikidnmvf rchyritail ftcciivtan nligdpisci ipmhvintfc witytytvag
       61 pglekhsyyq wvpfvlffqg lmfyvphwvw kmdgkirmit gvddrdrilk yfvnnthngy
      121 sfyffcelln finvivnifm vdkflggafm sygtdvlkfs nmdqdrfdpm ieifprltkc
      181 tfhkfgpsgs vqkhdtlcvl alnilnekiy iflwfwfiil atisgvavly svvitrtirk
      241 egdflilhfl sqnlstrsys dmlq
//
LOCUS       Dme-Panxδ2               311 aa                     UNK 01-JAN-1980
DEFINITION  Dme-Panxδ2.
ACCESSION   Dme-Panxδ2
VERSION     Dme-Panxδ2
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     match           30..32
                     /regex="[LIY]{3}"
                     /added_by="SeqBuddy"
     match           245..247
                     /regex="[LIY]{3}"
                     /added_by="SeqBuddy"
     match           254..256
                     /regex="[LIY]{3}"
                     /added_by="SeqBuddy"
     match           267..269
                     /regex="[LIY]{3}"
                     /added_by="SeqBuddy"
     match           288..290
                     /regex="[LIY]{3}"
                     /added_by="SeqBuddy"
     match           299..301
                     /regex="[LIY]{3}"
                     /added_by="SeqBuddy"
ORIGIN
        1 mdvfgsvkgl lkidqvdnnv frmhykatvi iliafsllvt srqyigdpid civeiplgvm
       61 dtycwiystf tvpegrdvqp gsekyhkyyq wvcfvlffqa ilfyvprylw ksweggrlkm
      121 lvdlsvndkd rkivdyfgnl nrhnfyafff vcealnfvnv igqiyfvdff ldgefstygs
      181 dvlkftelep deridpmarv fpkvtkctfh kygpsgsvqt hdglcvlpln ivnekiyvfl
      241 wfwfiilsim sisliyriav apklrhlllr arsraeseve vaigdwflly qlgknidpli
      301 ykevisdlem g
//
LOCUS       Dme-Panxδ4               286 aa                     UNK 01-JAN-1980
DEFINITION  Dme-Panxδ4.
ACCESSION   Dme-Panxδ4
VERSION     Dme-Panxδ4
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     match           32..34
                     /regex="[LIY]{3}"
                     /added_by="SeqBuddy"
     match           91..93
                     /regex="[LIY]{3}"
                     /added_by="SeqBuddy"
     match           155..157
                     /regex="[LIY]{3}"
                     /added_by="SeqBuddy"
ORIGIN
        1 maavkplsky lqfkvhiyda iftlhskvtv alllactfll sskqyfgdpi qcfgdkdmda
       61 fcwiygaylq cavskvveny ityyqwvvlv lllesfvfym paflwkiweg grlkhlcdfk
      121 rthrvlvnyf ethfryfvyv fceilnlsis ilnfllldvf fggfwgryrn alynqwiavf
      181 pkcakceykg gpsgssniyd ylcllplnil nekifaflwi wfilamlisl kflyrlavly
      241 pmrlqllrpk khlqvalncs fgdwfvlmrv gnnispelfr klleel
//

Usage example 3

Include the argument ambig to search with IUPAC ambiguity codes instead of literal letters.

$: sb Drosophila.fa -fp "[bz]x{50,100}[bz]" "ambig"

Output

#### 7 matches found across 3 sequences for pattern '[bz]x{50,100}[bz]' ####
Dme-Panxδ3: 5-106, 113-207
Dme-Panxδ2: 1-99, 113-195, 218-309
Dme-Panxδ4: 11-109, 117-212

>Dme-Panxδ3
gfikiDNMVFRCHYRITAILFTCCIIVTANNLIGDPISCIIPMHVINTFCWITYTYTVAG
PGLEKHSYYQWVPFVLFFQGLMFYVPHWVWKMDGKIRMITGVDDRDrilkyfvNNTHNGY
SFYFFCELLNFINVIVNIFMVDKFLGGAFMSYGTDVLKFSNMDQDRFDPMIEIFPRLTKC
TFHKFGPSGSVQKHDTLCVLALNILNEkiyiflwfwfiilatisgvavlysvvitrtirk
egdflilhflsqnlstrsysdmlq
>Dme-Panxδ2
mDVFGSVKGLLKIDQVDNNVFRMHYKATVIILIAFSLLVTSRQYIGDPIDCIVEIPLGVM
DTYCWIYSTFTVPEGRDVQPGSEKYHKYYQWVCFVLFFQailfyvprylwkswEGGRLKM
LVDLSVNDKDRKIVDYFGNLNRHNFYAFFFVCEALNFVNVIGQIYFVDFFLDGEFSTYGS
DVLKFTELEPDERIDpmarvfpkvtkctfhkygpsgsvQTHDGLCVLPLNIVNEKIYVFL
WFWFIILSIMSISLIYRIAVAPKLRHLLLRARSRAESEVEVAIGDWFLLYQLGKNIDPLI
YKEVISDLEmg
>Dme-Panxδ4
maavkplskylQFKVHIYDAIFTLHSKVTVALLLACTFLLSSKQYFGDPIQCFGDKDMDA
FCWIYGAYLQCAVSKVVENYITYYQWVVLVLLLESFVFYMPAFLWKIWEggrlkhlcDFK
RTHRVLVNYFETHFRYFVYVFCEILNLSISILNFLLLDVFFGGFWGRYRNALYNQWIAVF
PKCAKCEYKGGPSGSSNIYDYLCLLPLNILNEkifaflwiwfilamlislkflyrlavly
pmrlqllrpkkhlqvalncsfgdwfvlmrvgnnispelfrklleel

Main Toolkit Pages





Further Reading

Clone this wiki locally