Skip to content
Steve Bond edited this page Jan 31, 2017 · 4 revisions

--find_orfs, -orf

Description

Identify open reading frames (ORFs) in each sequence by looking for a start codon followed by an in-frame stop codon. This search is somewhat greedy, in that smaller ORFs contained within a larger one will not be reported. The exact regular expression for DNA sequences is:

atg(?:...)*?(?:taa|tag|tga)

By default, both the complement and reverse complement of each sequence will be searched for ORFs, although this behavior can be suppressed (see example 4).

The ORFs are itemized as a list in the standard error, and are added as sequence features if the output format supports rich annotation (e.g., GenBank or EMBL).

Arguments

The following arguments are implemented in version 1.3

Min size ( int )

Optional. Limit the number of results by specifying a minimum ORF length. Note that ORF lengths must be divisible by three, so your input here will be rounded up to the nearest multiple of three.

Reverse complement --> 'false' ( exact string )

Optional. Passing in the word 'false' will disable searching the reverse complement of sequences for ORFs

Examples

Input file: Mle-Panxα10.fa

>Mle-Panxα10A cDNA - ML32831a-1.
atgcgtttatcagaaaagtctacatcacacgattgcaaagcttgcatcacacgatcgcac
aacgaagattgtgctaggagatggggtataacgatagatgacgggtgggatcaactcaat
cggagttttatgttcggcctgctcgttgtcatgggaacgactgtcactgtccggcaatac
accggcagtgtcatcagttgtgacggcttcaaaaagtttggatctacttttgcggaggat
tactgttggacccagggacagtacacagttttagaaggatatgaccaacccaaccaaaat
atcccctgcccattgcccgctgcgtttgctccgtaccccgggattttcccggaggagcta
tcgcactgtctggttggcgctcgtaaagccggccagtctgaggacctgattaacggtacg
aggttaaagtgcccagaccctgatcagttgttgtcaccgacgcggatatcccacctatgg
taccagtgggtccctttttacttctggctggcggctgctgccttcttcatgccctaccta
ttgtacaagaattttggcataggagatatcaagcctctcgtgagatttctacacaatcca
gtagaatcagaccaggaattgaagaagatgacagacaaggccgcaacctggctgttctac
aagtttgacctgtacatgagcgaacagtcgctcctagcaagtctgaccaataaacacggt
cttggtctatctgtggtctttgtaaagatcctatatgccgcagtttcgttcgggtgtttt
ctcctgaccgctgatatgttctcaattggagatttcaaaacctatggatcagaatggatt
aataagttgaagttggaagataatctagctacggaggaaaaggataagctttttcctaaa
atggtggcatgtgaagtgaaacgctggggtgcatcaggtattgaagaggaacaagggatg
tgtgtcctggcccccaacgtaatcaaccaatacctattccttattctctggttctgtctg
gtattcgtgatgttctgcaacattgtctccatattcgcctccctcatcaagctcctcttc
acctacggctcctaccgccgtctcctttccaccgccttcctgagggacgactccgccatc
aaacacatgtacttcaacgtggggtcgtcagggagattgatattgcacgtgctggcgaac
aacaccgccccgcgcgtcttcgaggacatcctgctgaccctggcccccaagctgatccaa
cggaaactcagagctaaggactatgactaa

Usage example 1

All default settings.

$: sb Mle-Panxα10.fa -orf

Output

# Mle-Panxα10A
(+) ORFs: 0:1290
(-) ORFs: 1229:1091, 1067:821, 677:656, 560:533, 530:497, 313:223, 194:110, 25:19

>Mle-Panxα10A cDNA - ML32831a-1.
atgcgtttatcagaaaagtctacatcacacgattgcaaagcttgcatcacacgatcgcac
aacgaagattgtgctaggagatggggtataacgatagatgacgggtgggatcaactcaat
cggagttttatgttcggcctgctcgttgtcatgggaacgactgtcactgtccggcaatac
accggcagtgtcatcagttgtgacggcttcaaaaagtttggatctacttttgcggaggat
tactgttggacccagggacagtacacagttttagaaggatatgaccaacccaaccaaaat
atcccctgcccattgcccgctgcgtttgctccgtaccccgggattttcccggaggagcta
tcgcactgtctggttggcgctcgtaaagccggccagtctgaggacctgattaacggtacg
aggttaaagtgcccagaccctgatcagttgttgtcaccgacgcggatatcccacctatgg
taccagtgggtccctttttacttctggctggcggctgctgccttcttcatgccctaccta
ttgtacaagaattttggcataggagatatcaagcctctcgtgagatttctacacaatcca
gtagaatcagaccaggaattgaagaagatgacagacaaggccgcaacctggctgttctac
aagtttgacctgtacatgagcgaacagtcgctcctagcaagtctgaccaataaacacggt
cttggtctatctgtggtctttgtaaagatcctatatgccgcagtttcgttcgggtgtttt
ctcctgaccgctgatatgttctcaattggagatttcaaaacctatggatcagaatggatt
aataagttgaagttggaagataatctagctacggaggaaaaggataagctttttcctaaa
atggtggcatgtgaagtgaaacgctggggtgcatcaggtattgaagaggaacaagggatg
tgtgtcctggcccccaacgtaatcaaccaatacctattccttattctctggttctgtctg
gtattcgtgatgttctgcaacattgtctccatattcgcctccctcatcaagctcctcttc
acctacggctcctaccgccgtctcctttccaccgccttcctgagggacgactccgccatc
aaacacatgtacttcaacgtggggtcgtcagggagattgatattgcacgtgctggcgaac
aacaccgccccgcgcgtcttcgaggacatcctgctgaccctggcccccaagctgatccaa
cggaaactcagagctaaggactatgactaa

Usage example 2

When the output format is GenBank, notice that each ORF is annotated as a feature

$: sb Mle-Panxα10.fa -orf -o genbank

Output

# Mle-Panxα10A
(+) ORFs: 0:1290
(-) ORFs: 1229:1091, 1067:821, 677:656, 560:533, 530:497, 313:223, 194:110, 25:19

LOCUS       Mle-Panxα10A            1290 bp    DNA              UNK 01-JAN-1980
DEFINITION  Mle-Panxα10A cDNA - ML32831a-1.
ACCESSION   Mle-Panxα10A
VERSION     Mle-Panxα10A
KEYWORDS    .
SOURCE      .
  ORGANISM  .
            .
FEATURES             Location/Qualifiers
     orf             1..1290
                     /added_by="SeqBuddy"
     orf             complement(1092..1229)
                     /added_by="SeqBuddy"
     orf             complement(822..1067)
                     /added_by="SeqBuddy"
     orf             complement(657..677)
                     /added_by="SeqBuddy"
     orf             complement(534..560)
                     /added_by="SeqBuddy"
     orf             complement(498..530)
                     /added_by="SeqBuddy"
     orf             complement(224..313)
                     /added_by="SeqBuddy"
     orf             complement(111..194)
                     /added_by="SeqBuddy"
     orf             complement(20..25)
                     /added_by="SeqBuddy"
ORIGIN
        1 atgcgtttat cagaaaagtc tacatcacac gattgcaaag cttgcatcac acgatcgcac
       61 aacgaagatt gtgctaggag atggggtata acgatagatg acgggtggga tcaactcaat
      121 cggagtttta tgttcggcct gctcgttgtc atgggaacga ctgtcactgt ccggcaatac
      181 accggcagtg tcatcagttg tgacggcttc aaaaagtttg gatctacttt tgcggaggat
      241 tactgttgga cccagggaca gtacacagtt ttagaaggat atgaccaacc caaccaaaat
      301 atcccctgcc cattgcccgc tgcgtttgct ccgtaccccg ggattttccc ggaggagcta
      361 tcgcactgtc tggttggcgc tcgtaaagcc ggccagtctg aggacctgat taacggtacg
      421 aggttaaagt gcccagaccc tgatcagttg ttgtcaccga cgcggatatc ccacctatgg
      481 taccagtggg tcccttttta cttctggctg gcggctgctg ccttcttcat gccctaccta
      541 ttgtacaaga attttggcat aggagatatc aagcctctcg tgagatttct acacaatcca
      601 gtagaatcag accaggaatt gaagaagatg acagacaagg ccgcaacctg gctgttctac
      661 aagtttgacc tgtacatgag cgaacagtcg ctcctagcaa gtctgaccaa taaacacggt
      721 cttggtctat ctgtggtctt tgtaaagatc ctatatgccg cagtttcgtt cgggtgtttt
      781 ctcctgaccg ctgatatgtt ctcaattgga gatttcaaaa cctatggatc agaatggatt
      841 aataagttga agttggaaga taatctagct acggaggaaa aggataagct ttttcctaaa
      901 atggtggcat gtgaagtgaa acgctggggt gcatcaggta ttgaagagga acaagggatg
      961 tgtgtcctgg cccccaacgt aatcaaccaa tacctattcc ttattctctg gttctgtctg
     1021 gtattcgtga tgttctgcaa cattgtctcc atattcgcct ccctcatcaa gctcctcttc
     1081 acctacggct cctaccgccg tctcctttcc accgccttcc tgagggacga ctccgccatc
     1141 aaacacatgt acttcaacgt ggggtcgtca gggagattga tattgcacgt gctggcgaac
     1201 aacaccgccc cgcgcgtctt cgaggacatc ctgctgaccc tggcccccaa gctgatccaa
     1261 cggaaactca gagctaagga ctatgactaa
//

Usage example 3

If you want to set a lower limit on ORF size, simply pass in an integer.

$: sb Mle-Panxα10.fa -orf 500

Output

# Mle-Panxα10A
(+) ORFs: 0:1290
(-) ORFs: 1229:656

>Mle-Panxα10A cDNA - ML32831a-1.
atgcgtttatcagaaaagtctacatcacacgattgcaaagcttgcatcacacgatcgcac
aacgaagattgtgctaggagatggggtataacgatagatgacgggtgggatcaactcaat
cggagttttatgttcggcctgctcgttgtcatgggaacgactgtcactgtccggcaatac
accggcagtgtcatcagttgtgacggcttcaaaaagtttggatctacttttgcggaggat
tactgttggacccagggacagtacacagttttagaaggatatgaccaacccaaccaaaat
atcccctgcccattgcccgctgcgtttgctccgtaccccgggattttcccggaggagcta
tcgcactgtctggttggcgctcgtaaagccggccagtctgaggacctgattaacggtacg
aggttaaagtgcccagaccctgatcagttgttgtcaccgacgcggatatcccacctatgg
taccagtgggtccctttttacttctggctggcggctgctgccttcttcatgccctaccta
ttgtacaagaattttggcataggagatatcaagcctctcgtgagatttctacacaatcca
gtagaatcagaccaggaattgaagaagatgacagacaaggccgcaacctggctgttctac
aagtttgacctgtacatgagcgaacagtcgctcctagcaagtctgaccaataaacacggt
cttggtctatctgtggtctttgtaaagatcctatatgccgcagtttcgttcgggtgtttt
ctcctgaccgctgatatgttctcaattggagatttcaaaacctatggatcagaatggatt
aataagttgaagttggaagataatctagctacggaggaaaaggataagctttttcctaaa
atggtggcatgtgaagtgaaacgctggggtgcatcaggtattgaagaggaacaagggatg
tgtgtcctggcccccaacgtaatcaaccaatacctattccttattctctggttctgtctg
gtattcgtgatgttctgcaacattgtctccatattcgcctccctcatcaagctcctcttc
acctacggctcctaccgccgtctcctttccaccgccttcctgagggacgactccgccatc
aaacacatgtacttcaacgtggggtcgtcagggagattgatattgcacgtgctggcgaac
aacaccgccccgcgcgtcttcgaggacatcctgctgaccctggcccccaagctgatccaa
cggaaactcagagctaaggactatgactaa

Usage example 4

If you are not interested in the reverse complement ORFs, pass in the word 'false'.

$: sb Mle-Panxα10.fa -orf false

Output

# Mle-Panxα10A
(+) ORFs: 0:1290

>Mle-Panxα10A cDNA - ML32831a-1.
atgcgtttatcagaaaagtctacatcacacgattgcaaagcttgcatcacacgatcgcac
aacgaagattgtgctaggagatggggtataacgatagatgacgggtgggatcaactcaat
cggagttttatgttcggcctgctcgttgtcatgggaacgactgtcactgtccggcaatac
accggcagtgtcatcagttgtgacggcttcaaaaagtttggatctacttttgcggaggat
tactgttggacccagggacagtacacagttttagaaggatatgaccaacccaaccaaaat
atcccctgcccattgcccgctgcgtttgctccgtaccccgggattttcccggaggagcta
tcgcactgtctggttggcgctcgtaaagccggccagtctgaggacctgattaacggtacg
aggttaaagtgcccagaccctgatcagttgttgtcaccgacgcggatatcccacctatgg
taccagtgggtccctttttacttctggctggcggctgctgccttcttcatgccctaccta
ttgtacaagaattttggcataggagatatcaagcctctcgtgagatttctacacaatcca
gtagaatcagaccaggaattgaagaagatgacagacaaggccgcaacctggctgttctac
aagtttgacctgtacatgagcgaacagtcgctcctagcaagtctgaccaataaacacggt
cttggtctatctgtggtctttgtaaagatcctatatgccgcagtttcgttcgggtgtttt
ctcctgaccgctgatatgttctcaattggagatttcaaaacctatggatcagaatggatt
aataagttgaagttggaagataatctagctacggaggaaaaggataagctttttcctaaa
atggtggcatgtgaagtgaaacgctggggtgcatcaggtattgaagaggaacaagggatg
tgtgtcctggcccccaacgtaatcaaccaatacctattccttattctctggttctgtctg
gtattcgtgatgttctgcaacattgtctccatattcgcctccctcatcaagctcctcttc
acctacggctcctaccgccgtctcctttccaccgccttcctgagggacgactccgccatc
aaacacatgtacttcaacgtggggtcgtcagggagattgatattgcacgtgctggcgaac
aacaccgccccgcgcgtcttcgaggacatcctgctgaccctggcccccaagctgatccaa
cggaaactcagagctaaggactatgactaa

Main Toolkit Pages





Further Reading

Clone this wiki locally