-
Notifications
You must be signed in to change notification settings - Fork 2
/
search_16s_algo.html
123 lines (120 loc) · 5.69 KB
/
search_16s_algo.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>
<meta content="en-us" http-equiv="Content-Language"/>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<meta content="no-cache, no-store, must-revalidate" http-equiv="Cache-Control"/>
<meta content="no-cache" http-equiv="Pragma"/>
<meta content="0" http-equiv="Expires"/>
<title>
allpairs_global command
</title>
<link href="stylesx.css" rel="stylesheet" type="text/css"/>
<style type="text/css">
body.c4 {background-color:#c0c0c0;}
div.c3 {position:absolute; top:45px; left:20px; width:830px; background-color:#ffffff; border-width:10px; border-style:solid;border-color:white;}
span.c2 {font-weight: bold}
div.c1 {position:absolute; top:10px; left:20px; width:850px; height:60px;}
.TopButtonPara { color:white; background-color:rgb(50,100,150); border-color:rgb(50,100,150); font-family:Arial, Helvetica, sans-serif; font-weight:normal; font-size:9pt; text-align:center; border-width:4px; border-style:solid; }
.TopButton { color:white; }
a.TopButton:link { text-decoration:none; }
a.TopButton:visited { text-decoration:none; }
a.TopButton:hover { color:orange; }
.NewButtonPara { color:white; background-color:rgb(50,100,150); border-color:rgb(50,100,150); font-family:Arial, Helvetica, sans-serif; font-weight:normal; font-size:9pt; text-align:center; border-width:4px; border-style:solid; }
.NewButton { color:white; }
a.NewButton:link { text-decoration:none; }
a.NewButton:visited { text-decoration:none; }
a.NewButton:hover { color:orange; }
.SideButtonPara { color:white; font-family:Arial, Helvetica, sans-serif; font-size:9pt; font-weight:normal; text-align:center; line-height:18px; }
.SideButton { color:white; }
a.SideButton:link { text-decoration:none; }
a.SideButton:visited { text-decoration:none; }
a.SideButton:hover { color:orange; }
</style>
</head>
<body style="background-color:#c0c0c0;">
<div>
<a href="https://drive5.com/usearch">
<img alt="USEARCH v12" src="usearch12_banner.jpg" style="position:absolute; top:40px; left:10px; padding:0px; border:0px;"/>
</a>
</div>
<div style="position:absolute; top:115px; left:10px; width:850px; background-color:#ffffff; min-height:500px">
<div style="position:relative; float:left; background-color:#696969; width:125px; left: 0px; min-height:500px; padding:5px; height: 125px;">
<div class="SideButtonPara" style="text-align:center; padding-top:5px;">
<a class="SideButton" href="index.html">
Docs home
</a>
<br/>
<hr style="border:0; border-bottom: 1px solid white;"/>
<a class="SideButton" href="cmds.html">
Commands
</a>
<br/>
<a class="SideButton" href="topics.html">
Topics
</a>
<br/>
<a class="SideButton" href="citation.html">
Publications
</a>
<br/>
</div>
</div>
<div class="ManText" style="left:20px; position: absolute; left:135px; width:695px; background-color:white; padding:10px">
<h1>
SEARCH_16S algorithm
</h1>
<p class="auto-style1">
See also
<br/>
<a href="citation.html">
SEARCH_16S paper
</a>
<br/>
<a href="cmd_search_16s.html">
search_16s command
</a>
</p>
<p class="auto-style1">
The SEARCH_16S algorithm searches for 16S genes in long sequences such as chromosoms and contigs. It identifies segments with a high frequency of 13-mers in known 16S genes
<em>
(signature words
</em>
)
<em>
,
</em>
then searches within each such segment for conserved motifs close to the beginning and end of the gene. Finding a pair of motifs within the expected length range confirms the presence of the gene and provides consistent, homologous endpoints. It would be preferable to identify the true endpoints of the functional sequence, but the 16S gene is spliced out of the ribosomal operon by mechanisms that are not fully understood and lacks known sequence signals analogous to start and stop codons for protein-coding genes. I validated SEARCH_16S on finished prokaryotic genomes and curated SSU databases, finding that it has >99% sensitivity to known genes and no unambiguous false positives in control datasets containing metazoan sequences and random sequences. Details are in
<a href="citation.html">
the paper
</a>
.
</p>
<p class="auto-style1">
<img alt="Image" src="search_16s_coli.gif"/>
</p>
<p class="auto-style1">
<strong>
SEARCH_16S identifies two genes in a region of the
</strong>
<em>
<strong>
E. coli
</strong>
</em>
<strong>
chromosome reverse strand
</strong>
. (Figure from
<a href="citation.html">
SEARCH_16S paper
</a>
).
<br/>
In the top panel, the density of signature 13-mers over windows of length 1,000bp is shown for positions 1,108,000 - 1,284,000 in Genbank sequence AP009048.1. Most positions have a density close to the expected background of ~120 words per window. The two 16S genes in this region (green bars) are visible as spikes where the density approaches 1,000. The lower panel shows the region from positions 1,216,000 to 1,220,000 where the second gene is located. The trapezoidal shape of the density is explained by windows which contain some words before / after the beginning / end of the gene; the flat peak of length approx. 500bp is due to windows that contain only 16S words. The boundary motifs are found at positions 1,217,327 (C11F) and 1,218,860 (C1512R).
<br/>
</p>
</div>
</div>
</body>
</html>