-
Notifications
You must be signed in to change notification settings - Fork 23
SB Find orfs
Identify open reading frames (ORFs) in each sequence by looking for a start codon followed by an in-frame stop codon. This search is somewhat greedy, in that smaller ORFs contained within a larger one will not be reported. The exact regular expression for DNA sequences is:
atg(?:...)*?(?:taa|tag|tga)
By default, both the complement and reverse complement of each sequence will be searched for ORFs, although this behavior can be suppressed (see example 4).
The ORFs are itemized as a list in the standard error, and are added as sequence features if the output format supports rich annotation (e.g., GenBank or EMBL).
The following arguments are implemented in version 1.3
Optional. Limit the number of results by specifying a minimum ORF length. Note that ORF lengths must be divisible by three, so your input here will be rounded up to the nearest multiple of three.
Optional. Passing in the word 'false' will disable searching the reverse complement of sequences for ORFs
>Mle-Panxα10A cDNA - ML32831a-1.
atgcgtttatcagaaaagtctacatcacacgattgcaaagcttgcatcacacgatcgcac
aacgaagattgtgctaggagatggggtataacgatagatgacgggtgggatcaactcaat
cggagttttatgttcggcctgctcgttgtcatgggaacgactgtcactgtccggcaatac
accggcagtgtcatcagttgtgacggcttcaaaaagtttggatctacttttgcggaggat
tactgttggacccagggacagtacacagttttagaaggatatgaccaacccaaccaaaat
atcccctgcccattgcccgctgcgtttgctccgtaccccgggattttcccggaggagcta
tcgcactgtctggttggcgctcgtaaagccggccagtctgaggacctgattaacggtacg
aggttaaagtgcccagaccctgatcagttgttgtcaccgacgcggatatcccacctatgg
taccagtgggtccctttttacttctggctggcggctgctgccttcttcatgccctaccta
ttgtacaagaattttggcataggagatatcaagcctctcgtgagatttctacacaatcca
gtagaatcagaccaggaattgaagaagatgacagacaaggccgcaacctggctgttctac
aagtttgacctgtacatgagcgaacagtcgctcctagcaagtctgaccaataaacacggt
cttggtctatctgtggtctttgtaaagatcctatatgccgcagtttcgttcgggtgtttt
ctcctgaccgctgatatgttctcaattggagatttcaaaacctatggatcagaatggatt
aataagttgaagttggaagataatctagctacggaggaaaaggataagctttttcctaaa
atggtggcatgtgaagtgaaacgctggggtgcatcaggtattgaagaggaacaagggatg
tgtgtcctggcccccaacgtaatcaaccaatacctattccttattctctggttctgtctg
gtattcgtgatgttctgcaacattgtctccatattcgcctccctcatcaagctcctcttc
acctacggctcctaccgccgtctcctttccaccgccttcctgagggacgactccgccatc
aaacacatgtacttcaacgtggggtcgtcagggagattgatattgcacgtgctggcgaac
aacaccgccccgcgcgtcttcgaggacatcctgctgaccctggcccccaagctgatccaa
cggaaactcagagctaaggactatgactaa
All default settings.
$: sb Mle-Panxα10.fa -orf
# Mle-Panxα10A
(+) ORFs: 0:1290
(-) ORFs: 1229:1091, 1067:821, 677:656, 560:533, 530:497, 313:223, 194:110, 25:19
>Mle-Panxα10A cDNA - ML32831a-1.
atgcgtttatcagaaaagtctacatcacacgattgcaaagcttgcatcacacgatcgcac
aacgaagattgtgctaggagatggggtataacgatagatgacgggtgggatcaactcaat
cggagttttatgttcggcctgctcgttgtcatgggaacgactgtcactgtccggcaatac
accggcagtgtcatcagttgtgacggcttcaaaaagtttggatctacttttgcggaggat
tactgttggacccagggacagtacacagttttagaaggatatgaccaacccaaccaaaat
atcccctgcccattgcccgctgcgtttgctccgtaccccgggattttcccggaggagcta
tcgcactgtctggttggcgctcgtaaagccggccagtctgaggacctgattaacggtacg
aggttaaagtgcccagaccctgatcagttgttgtcaccgacgcggatatcccacctatgg
taccagtgggtccctttttacttctggctggcggctgctgccttcttcatgccctaccta
ttgtacaagaattttggcataggagatatcaagcctctcgtgagatttctacacaatcca
gtagaatcagaccaggaattgaagaagatgacagacaaggccgcaacctggctgttctac
aagtttgacctgtacatgagcgaacagtcgctcctagcaagtctgaccaataaacacggt
cttggtctatctgtggtctttgtaaagatcctatatgccgcagtttcgttcgggtgtttt
ctcctgaccgctgatatgttctcaattggagatttcaaaacctatggatcagaatggatt
aataagttgaagttggaagataatctagctacggaggaaaaggataagctttttcctaaa
atggtggcatgtgaagtgaaacgctggggtgcatcaggtattgaagaggaacaagggatg
tgtgtcctggcccccaacgtaatcaaccaatacctattccttattctctggttctgtctg
gtattcgtgatgttctgcaacattgtctccatattcgcctccctcatcaagctcctcttc
acctacggctcctaccgccgtctcctttccaccgccttcctgagggacgactccgccatc
aaacacatgtacttcaacgtggggtcgtcagggagattgatattgcacgtgctggcgaac
aacaccgccccgcgcgtcttcgaggacatcctgctgaccctggcccccaagctgatccaa
cggaaactcagagctaaggactatgactaa
When the output format is GenBank, notice that each ORF is annotated as a feature
$: sb Mle-Panxα10.fa -orf -o genbank
# Mle-Panxα10A
(+) ORFs: 0:1290
(-) ORFs: 1229:1091, 1067:821, 677:656, 560:533, 530:497, 313:223, 194:110, 25:19
LOCUS Mle-Panxα10A 1290 bp DNA UNK 01-JAN-1980
DEFINITION Mle-Panxα10A cDNA - ML32831a-1.
ACCESSION Mle-Panxα10A
VERSION Mle-Panxα10A
KEYWORDS .
SOURCE .
ORGANISM .
.
FEATURES Location/Qualifiers
orf 1..1290
/added_by="SeqBuddy"
orf complement(1092..1229)
/added_by="SeqBuddy"
orf complement(822..1067)
/added_by="SeqBuddy"
orf complement(657..677)
/added_by="SeqBuddy"
orf complement(534..560)
/added_by="SeqBuddy"
orf complement(498..530)
/added_by="SeqBuddy"
orf complement(224..313)
/added_by="SeqBuddy"
orf complement(111..194)
/added_by="SeqBuddy"
orf complement(20..25)
/added_by="SeqBuddy"
ORIGIN
1 atgcgtttat cagaaaagtc tacatcacac gattgcaaag cttgcatcac acgatcgcac
61 aacgaagatt gtgctaggag atggggtata acgatagatg acgggtggga tcaactcaat
121 cggagtttta tgttcggcct gctcgttgtc atgggaacga ctgtcactgt ccggcaatac
181 accggcagtg tcatcagttg tgacggcttc aaaaagtttg gatctacttt tgcggaggat
241 tactgttgga cccagggaca gtacacagtt ttagaaggat atgaccaacc caaccaaaat
301 atcccctgcc cattgcccgc tgcgtttgct ccgtaccccg ggattttccc ggaggagcta
361 tcgcactgtc tggttggcgc tcgtaaagcc ggccagtctg aggacctgat taacggtacg
421 aggttaaagt gcccagaccc tgatcagttg ttgtcaccga cgcggatatc ccacctatgg
481 taccagtggg tcccttttta cttctggctg gcggctgctg ccttcttcat gccctaccta
541 ttgtacaaga attttggcat aggagatatc aagcctctcg tgagatttct acacaatcca
601 gtagaatcag accaggaatt gaagaagatg acagacaagg ccgcaacctg gctgttctac
661 aagtttgacc tgtacatgag cgaacagtcg ctcctagcaa gtctgaccaa taaacacggt
721 cttggtctat ctgtggtctt tgtaaagatc ctatatgccg cagtttcgtt cgggtgtttt
781 ctcctgaccg ctgatatgtt ctcaattgga gatttcaaaa cctatggatc agaatggatt
841 aataagttga agttggaaga taatctagct acggaggaaa aggataagct ttttcctaaa
901 atggtggcat gtgaagtgaa acgctggggt gcatcaggta ttgaagagga acaagggatg
961 tgtgtcctgg cccccaacgt aatcaaccaa tacctattcc ttattctctg gttctgtctg
1021 gtattcgtga tgttctgcaa cattgtctcc atattcgcct ccctcatcaa gctcctcttc
1081 acctacggct cctaccgccg tctcctttcc accgccttcc tgagggacga ctccgccatc
1141 aaacacatgt acttcaacgt ggggtcgtca gggagattga tattgcacgt gctggcgaac
1201 aacaccgccc cgcgcgtctt cgaggacatc ctgctgaccc tggcccccaa gctgatccaa
1261 cggaaactca gagctaagga ctatgactaa
//
If you want to set a lower limit on ORF size, simply pass in an integer.
$: sb Mle-Panxα10.fa -orf 500
# Mle-Panxα10A
(+) ORFs: 0:1290
(-) ORFs: 1229:656
>Mle-Panxα10A cDNA - ML32831a-1.
atgcgtttatcagaaaagtctacatcacacgattgcaaagcttgcatcacacgatcgcac
aacgaagattgtgctaggagatggggtataacgatagatgacgggtgggatcaactcaat
cggagttttatgttcggcctgctcgttgtcatgggaacgactgtcactgtccggcaatac
accggcagtgtcatcagttgtgacggcttcaaaaagtttggatctacttttgcggaggat
tactgttggacccagggacagtacacagttttagaaggatatgaccaacccaaccaaaat
atcccctgcccattgcccgctgcgtttgctccgtaccccgggattttcccggaggagcta
tcgcactgtctggttggcgctcgtaaagccggccagtctgaggacctgattaacggtacg
aggttaaagtgcccagaccctgatcagttgttgtcaccgacgcggatatcccacctatgg
taccagtgggtccctttttacttctggctggcggctgctgccttcttcatgccctaccta
ttgtacaagaattttggcataggagatatcaagcctctcgtgagatttctacacaatcca
gtagaatcagaccaggaattgaagaagatgacagacaaggccgcaacctggctgttctac
aagtttgacctgtacatgagcgaacagtcgctcctagcaagtctgaccaataaacacggt
cttggtctatctgtggtctttgtaaagatcctatatgccgcagtttcgttcgggtgtttt
ctcctgaccgctgatatgttctcaattggagatttcaaaacctatggatcagaatggatt
aataagttgaagttggaagataatctagctacggaggaaaaggataagctttttcctaaa
atggtggcatgtgaagtgaaacgctggggtgcatcaggtattgaagaggaacaagggatg
tgtgtcctggcccccaacgtaatcaaccaatacctattccttattctctggttctgtctg
gtattcgtgatgttctgcaacattgtctccatattcgcctccctcatcaagctcctcttc
acctacggctcctaccgccgtctcctttccaccgccttcctgagggacgactccgccatc
aaacacatgtacttcaacgtggggtcgtcagggagattgatattgcacgtgctggcgaac
aacaccgccccgcgcgtcttcgaggacatcctgctgaccctggcccccaagctgatccaa
cggaaactcagagctaaggactatgactaa
If you are not interested in the reverse complement ORFs, pass in the word 'false'.
$: sb Mle-Panxα10.fa -orf false
# Mle-Panxα10A
(+) ORFs: 0:1290
>Mle-Panxα10A cDNA - ML32831a-1.
atgcgtttatcagaaaagtctacatcacacgattgcaaagcttgcatcacacgatcgcac
aacgaagattgtgctaggagatggggtataacgatagatgacgggtgggatcaactcaat
cggagttttatgttcggcctgctcgttgtcatgggaacgactgtcactgtccggcaatac
accggcagtgtcatcagttgtgacggcttcaaaaagtttggatctacttttgcggaggat
tactgttggacccagggacagtacacagttttagaaggatatgaccaacccaaccaaaat
atcccctgcccattgcccgctgcgtttgctccgtaccccgggattttcccggaggagcta
tcgcactgtctggttggcgctcgtaaagccggccagtctgaggacctgattaacggtacg
aggttaaagtgcccagaccctgatcagttgttgtcaccgacgcggatatcccacctatgg
taccagtgggtccctttttacttctggctggcggctgctgccttcttcatgccctaccta
ttgtacaagaattttggcataggagatatcaagcctctcgtgagatttctacacaatcca
gtagaatcagaccaggaattgaagaagatgacagacaaggccgcaacctggctgttctac
aagtttgacctgtacatgagcgaacagtcgctcctagcaagtctgaccaataaacacggt
cttggtctatctgtggtctttgtaaagatcctatatgccgcagtttcgttcgggtgtttt
ctcctgaccgctgatatgttctcaattggagatttcaaaacctatggatcagaatggatt
aataagttgaagttggaagataatctagctacggaggaaaaggataagctttttcctaaa
atggtggcatgtgaagtgaaacgctggggtgcatcaggtattgaagaggaacaagggatg
tgtgtcctggcccccaacgtaatcaaccaatacctattccttattctctggttctgtctg
gtattcgtgatgttctgcaacattgtctccatattcgcctccctcatcaagctcctcttc
acctacggctcctaccgccgtctcctttccaccgccttcctgagggacgactccgccatc
aaacacatgtacttcaacgtggggtcgtcagggagattgatattgcacgtgctggcgaac
aacaccgccccgcgcgtcttcgaggacatcctgctgaccctggcccccaagctgatccaa
cggaaactcagagctaaggactatgactaa