-
Notifications
You must be signed in to change notification settings - Fork 0
/
KEGG pathway analysis of DI:WI.html
277 lines (234 loc) · 15.3 KB
/
KEGG pathway analysis of DI:WI.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
<!DOCTYPE html>
<html lang="en">
<head>
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"></script>
</head>
<body ng-app="app">
<div class="container">
<h2>Metascape Gene List Analysis Report</h2><p/> <a href="http://metascape.org">metascape.org</a><sup>1</sup><p/>
<h3>Bar Graph Summary</h3>
<div class="panel panel-info">
<div class="panel-heading">Figure 1. Bar graph of enriched terms across input gene lists, colored by p-values.</div>
<div class="panel-body">
<table>
<tr>
<td>
<img src="./Enrichment_heatmap/HeatmapSelectedGO.png" style="width:1000px;">
</td>
</tr>
<tr>
<td align='center'>
<a href='./Enrichment_heatmap/HeatmapSelectedGO.pdf' title='download PDF file'>
<img class='link' src='icon/PDF48.png' >
</a>
</td>
</tr>
</table>
</div>
</div><p/>
<h3>Gene Lists</h3>
User-provided gene identifiers are first converted into their corresponding M. musculus Entrez gene IDs using the latest version of the database (last updated on 2024-09-01). If multiple identifiers correspond to the same Entrez gene ID, they will be considered as a single Entrez gene ID in downstream analyses. The gene lists are summarized in Table 1.<p/>
<div class="panel panel-info">
<div class="panel-heading">Table 1. Statistics of input gene lists.</div>
<div class="panel-body"><TABLE class="table">
<THEAD>
<TR>
<TH class="info">Name</TH>
<TH class="info">Total</TH>
<TH class="info">Unique</TH>
</TR>
</THEAD>
<TBODY>
<TR>
<TD>TranscriptID</TD>
<TD>30</TD>
<TD>30</TD>
</TR>
</TBODY>
</TABLE></div>
</div>
<h3>Pathway and Process Enrichment Analysis</h3>
For each given gene list, pathway and process enrichment analysis have been carried out with the following ontology sources: KEGG Pathway. All genes in the genome have been used as the enrichment background. Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) are collected and grouped into clusters based on their membership similarities. More specifically, p-values are calculated based on the cumulative hypergeometric distribution<sup>2</sup>, and q-values are calculated using the Benjamini-Hochberg procedure to account for multiple testings<sup>3</sup>. Kappa scores<sup>4</sup> are used as the similarity metric when performing hierarchical clustering on the enriched terms, and sub-trees with a similarity of > 0.3 are considered a cluster. The most statistically significant term within a cluster is chosen to represent the cluster.<p/>
<div class="panel panel-info">
<div class="panel-heading">Table 2. Top 2 clusters with their representative enriched terms (one per cluster). "Count" is the number of genes in the user-provided lists with membership in the given ontology term. "%" is the percentage of all of the user-provided genes that are found in the given ontology term (only input genes with at least one ontology term annotation are included in the calculation). "Log10(P)" is the p-value in log base 10. "Log10(q)" is the multi-test adjusted p-value in log base 10.</div>
<div class="panel-body"><TABLE class="table">
<THEAD>
<TR>
<TH class="info">GO</TH>
<TH class="info">Category</TH>
<TH class="info">Description</TH>
<TH class="info">Count</TH>
<TH class="info">%</TH>
<TH class="info">Log10(P)</TH>
<TH class="info">Log10(q)</TH>
</TR>
</THEAD>
<TBODY>
<TR>
<TD>mmu04940</TD>
<TD>KEGG Pathway</TD>
<TD>Type I diabetes mellitus - Mus musculus (house mouse)</TD>
<TD>9</TD>
<TD>33.33</TD>
<TD>-16.13</TD>
<TD>-13.58</TD>
</TR>
<TR>
<TD>mmu05150</TD>
<TD>KEGG Pathway</TD>
<TD>Staphylococcus aureus infection - Mus musculus (house mouse)</TD>
<TD>4</TD>
<TD>14.81</TD>
<TD>-4.71</TD>
<TD>-3.44</TD>
</TR>
</TBODY>
</TABLE></div>
</div><p/>
To further capture the relationships between the terms, a subset of enriched terms has been selected and rendered as a network plot, where terms with a similarity > 0.3 are connected by edges. We select the terms with the best p-values from each of the 20 clusters, with the constraint that there are no more than 15 terms per cluster and no more than 250 terms in total. The network is visualized using <a href="http://www.cytoscape.org">Cytoscape</a><sup>5</sup>, where each node represents an enriched term and is colored first by its cluster ID (Figure 2.a) and then by its p-value (Figure 2.b). These networks can be interactively viewed in Cytoscape through the .cys files (contained in the Zip package, which also contains a publication-quality version as a PDF) or within a browser by clicking on the web icon. For clarity, term labels are only shown for one term per cluster, so it is recommended to use Cytoscape or a browser to visualize the network in order to inspect all node labels. We can also export the network into a PDF file within Cytoscape, and then edit the labels using Adobe Illustrator for publication purposes. To switch off all labels, delete the "Label" mapping under the "Style" tab within Cytoscape, and then export the network view.<p/>
<div class="panel panel-info">
<div class="panel-heading">Figure 2. Network of enriched terms: (a) colored by cluster ID, where nodes that share the same cluster ID are typically close to each other; (b) colored by p-value, where terms containing more genes tend to have a more significant p-value.</div>
<div class="panel-body">
<table>
<tr>
<td>
<img src="./Enrichment_GO/ColorByCluster.png" style="width:500px;">
</td>
<td>
<img src="./Enrichment_GO/ColorByPValue.png" style="width:500px;">
</td>
</tr>
<tr>
<td align='center'>
<a href='./Enrichment_GO/ColorByCluster.pdf' title='download PDF file'>
<img class='link' src='icon/PDF48.png' >
</a>
<a href='./Enrichment_GO/GONetwork.cys' title='download CYS file'>
<img class='link' src='icon/CYS48.png' >
</a>
<a href='Enrichment_GO/GONetwork.html?Network=GONetwork&Style=ColorByCluster' title='interactive cytoscape' target='_blank'>
<img class='link' src='icon/WEB_CYS48.png' >
</a>
</td>
<td align='center'>
<a href='./Enrichment_GO/ColorByPValue.pdf' title='download PDF file'>
<img class='link' src='icon/PDF48.png' >
</a>
<a href='./Enrichment_GO/GONetwork.cys' title='download CYS file'>
<img class='link' src='icon/CYS48.png' >
</a>
<a href='Enrichment_GO/GONetwork.html?Network=GONetwork&Style=ColorByPValue' title='interactive cytoscape' target='_blank'>
<img class='link' src='icon/WEB_CYS48.png' >
</a>
</td>
</tr>
</table>
</div>
</div><p/>
<h3>Protein-protein Interaction Enrichment Analysis</h3>
For each given gene list, protein-protein interaction enrichment analysis has been carried out with the following databases: STRING<sup>6</sup>, BioGrid<sup>7</sup>, OmniPath<sup>8</sup>, InWeb_IM<sup>9</sup>.Only physical interactions in STRING (physical score > 0.132) and BioGrid are used (<a href="http://metascape.org/blog/?p=219">details</a>). The resultant network contains the subset of proteins that form physical interactions with at least one other member in the list. If the network contains between 3 and 500 proteins, the Molecular Complex Detection (MCODE) algorithm<sup>10</sup> has been applied to identify densely connected network components. The MCODE networks identified for individual gene lists have been gathered and are shown in Figure 3.<p/>
Pathway and process enrichment analysis has been applied to each MCODE component independently, and the three best-scoring terms by p-value have been retained as the functional description of the corresponding components, shown in the tables underneath corresponding network plots within Figure 3.<p/>
<div class="panel panel-info">
<div class="panel-heading">Figure 3. Protein-protein interaction network and MCODE components identified in the gene lists.</div>
<div class="panel-body">
<table border-collapse='collapse'>
<tr><td><img src="./Enrichment_PPI/TranscriptID_PPIColorByCluster.png" style="width:500px;"></td><td width="10px"></td><td><img src="./Enrichment_PPI/TranscriptID_MCODE_ALL_PPIColorByCluster.png" style="width:500px;"></td></tr>
<tr><td align="center"><a href='./Enrichment_PPI/TranscriptID_PPIColorByCluster.pdf' title='download PDF file'>
<img class='link' src='icon/PDF48.png' >
</a>
<a href='./Enrichment_PPI/MCODE_PPI.cys' title='download CYS file'>
<img class='link' src='icon/CYS48.png' >
</a>
<a href='Enrichment_PPI/PPINetwork.html?Network=TranscriptID_PPIColorByCluster&Style=PPIColorByClusterNoLabel&isPPI=True' title='interactive cytoscape' target='_blank'>
<img class='link' src='icon/WEB_CYS48.png' >
</a>
</td><td width="10px"></td><td align="center"><a href='./Enrichment_PPI/TranscriptID_MCODE_ALL_PPIColorByCluster.pdf' title='download PDF file'>
<img class='link' src='icon/PDF48.png' >
</a>
<a href='./Enrichment_PPI/MCODE_PPI.cys' title='download CYS file'>
<img class='link' src='icon/CYS48.png' >
</a>
<a href='Enrichment_PPI/PPINetwork.html?Network=TranscriptID_MCODE_ALL_PPIColorByCluster&Style=PPIColorByCluster&isPPI=True' title='interactive cytoscape' target='_blank'>
<img class='link' src='icon/WEB_CYS48.png' >
</a>
</td></tr>
<tr><td align="center" valign="top"><TABLE class="table">
<THEAD>
<TR>
<TH class="info">GO</TH>
<TH class="info">Description</TH>
<TH class="info">Log10(P)</TH>
</TR>
</THEAD>
<TBODY>
<TR>
<TD>mmu05330</TD>
<TD>Allograft rejection - Mus musculus (house mouse)</TD>
<TD>-12.8</TD>
</TR>
<TR>
<TD>mmu05332</TD>
<TD>Graft-versus-host disease - Mus musculus (house mouse)</TD>
<TD>-12.8</TD>
</TR>
<TR>
<TD>mmu04940</TD>
<TD>Type I diabetes mellitus - Mus musculus (house mouse)</TD>
<TD>-12.6</TD>
</TR>
</TBODY>
</TABLE></td><td width="10px"></td><td align="center" valign="top"><TABLE class="table">
<THEAD>
<TR>
<TH class="info">Color</TH>
<TH class="info">MCODE</TH>
<TH class="info">GO</TH>
<TH class="info">Description</TH>
<TH class="info">Log10(P)</TH>
</TR>
</THEAD>
<TBODY>
<TR>
<TD><div style="background-color:#E41A1C !important;width:40px;height:20px;"></div></TD>
<TD>MCODE_1</TD>
<TD>mmu05330</TD>
<TD>Allograft rejection - Mus musculus (house mouse)</TD>
<TD>-10.3</TD>
</TR>
<TR>
<TD><div style="background-color:#E41A1C !important;width:40px;height:20px;"></div></TD>
<TD>MCODE_1</TD>
<TD>mmu05332</TD>
<TD>Graft-versus-host disease - Mus musculus (house mouse)</TD>
<TD>-10.3</TD>
</TR>
<TR>
<TD><div style="background-color:#E41A1C !important;width:40px;height:20px;"></div></TD>
<TD>MCODE_1</TD>
<TD>mmu04940</TD>
<TD>Type I diabetes mellitus - Mus musculus (house mouse)</TD>
<TD>-10.1</TD>
</TR>
</TBODY>
</TABLE></td></tr>
</table>
</div>
</div><p/>
<h3>Reference</h3></p>
<ol style="list-style: decimal inside;">
<li>Zhou et al., Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nature Communications (2019) 10(1):1523.</li>
<li>Zar, J.H. Biostatistical Analysis 1999 4th edn., NJ Prentice Hall, pp. 523</li>
<li>Hochberg Y., Benjamini Y. More powerful procedures for multiple significance testing. Statistics in Medicine (1990) 9:811-818.</li>
<li>Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. (1960) 20:27-46.</li>
<li>Shannon P. et al., Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res (2003) 11:2498-2504.</li>
<li>Szklarczyk D. et al. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res. (2019) 47:D607-613.</li>
<li>Stark C. et al. BioGRID: a general repository for interaction datasets. Nucleic Acids Res. (2006) 34:D535-539.</li>
<li>Turei D. et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods. (2016) 13:966-967.</li>
<li>Li T. et al. A scored human protein-protein interaction network to catalyze genomic interpretation. Nat. Methods. (2017) 14:61-64.</li>
<li>Bader, G.D. et al. An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics (2003) 4:2.</li>
</ol>
</div>
</body></html>