-
Notifications
You must be signed in to change notification settings - Fork 2
/
supplementary_material.Rmd
1170 lines (904 loc) · 59.8 KB
/
supplementary_material.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: 'Supplementary Material'
date:
output:
bookdown::pdf_document2:
toc: true
toc_depth: 4
fontsize: 11pt
bibliography: ["references.bib"]
biblio-style: "nature"
link-citations: true
header-includes:
\usepackage[section]{placeins}
\setcounter{table}{0}
\renewcommand{\thetable}{S\arabic{table}}
\setcounter{figure}{0}
\renewcommand{\thefigure}{S\arabic{figure}}
\usepackage{hyperref}
\hypersetup{
colorlinks=true,
linktoc=all,
linkcolor=blue
}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE, error = FALSE)
```
```{r dependencies, include=FALSE}
library(randomForest)
library(caret)
library(fpc) # pamk
library(cluster) # pam
library(ape)
library(readr)
library(dplyr)
library(ggplot2)
library(ggforce)
library(stringr)
library(cowplot)
```
```{r load, cache=TRUE, include=FALSE}
source("scripts/helpers.R")
```
# Supplementary Note 1: Guide on how to understand the reports
The following text is a verbatim copy modified to work in print taken from memote’s documentation at the time of publication. For an updated version please check the latest [memote documentation](https://memote.readthedocs.io/en/latest/understanding_reports.html).
## Understanding the reports
\begin{figure}
\includegraphics[width=\linewidth]{figures/guide/image2.png}
\caption{Snapshot Report}
\label{fig:snapshot}
\end{figure}
\begin{figure}
\includegraphics[width=\linewidth]{figures/guide/image5.png}
\caption{Diff Report}
\label{fig:diff}
\end{figure}
\begin{figure}
\includegraphics[width=\linewidth]{figures/guide/image7.png}
\caption{History Report}
\label{fig:history}
\end{figure}
Memote will return one of four possible outputs. If your preferred workflow is to benchmark one or several genome-scale metabolic models (GSM) memote generates either a snapshot (Figure \@ref(fig:snapshot)) or a diff report (Figure \@ref(fig:diff)), respectively. For the reconstruction workflow the primary output is a history report (Figure \@ref(fig:history)). This will only work if the provided input models are formatted correctly in the [systems biology markup language (SBML)](http://sbml.org/Main_Page). However, if a provided model is not a valid SBML file, memote composes a report enumerating errors and warnings from the SBML validator the order of appearance. To better understand the output of the error report we refer the reader to this section of the [SBML documentation](http://sbml.org/Facilities/Documentation/Error_Categories). In this section, we will focus on how to understand the snapshot, diff and history reports.
### Orientation
#### Toolbar
In all three reports, the blue toolbar at the top shows (from left to right) the memote logo, a button which expands and collapses all test results, a button which displays the readme and the github icon which links to memote's github page. On the snapshot report, the toolbar will also display the identifier of the tested GEM and a timestamp showing when the test run was initiated.
#### Main Body
The main body of the reports is divided into an independent section to the left and a specific section to the right.
The tests in the independent section are agnostic of the type of modeled organism, preferred modeling paradigms, the complexity of a genome-scale metabolic model (GEM) or the types of identifiers that are used to describe its components. The tests in this section focus on testing adherence to fundamental principles of constraint-based modeling: mass, charge and stoichiometric balance as well as the presence of annotations. The results in this section can be normalized, and thus enable a comparison of GEMs. The **Score** at the bottom of the page summarises the results to further simplify comparison. While calculating an overall score for this section allows for the quick comparison of any two given models at a glance, we recommend a thorough analysis of all results with respect to the desired use case.
The specific section on the right provides model specific statistics and covers aspects of a metabolic network that can not be normalized without introducing bias. For instance, dedicated quality control of the biomass equation only applies to GEMs which are used to investigate cell growth, i.e., those for which a biomass equation has been generated. Some tests in this section are also influenced by whether the tested GEM represents a prokaryote or a eukaryote. Therefore the results cannot be generalized and direct comparisons ought to take bias into account.
#### Test Results
Test results are arranged in rows with the title visible to the left and the result on the right. The result is displayed as white text in a coloured rectangle detailed below in the subsection **Color**.
By default only the minimum information is visible as indicated by an arrow pointing down right of the result. Clicking anywhere in the row will expand the result revealing a description of the concept behind the test, its implementation and a brief summary of the result. In addition, there is a text field which contains plain text representations of Python objects which can be copied and pasted into Python code for follow up procedures.
Some tests carry out one operation on several parameters and therefore deviate slightly from the descriptions above. Expanding the title row reveals only the description, while rows of the individual parameters reveal the text fields.
In the history report, instead of text fields scatterplots show how the respective metrics developed over the commit history for each branch of a repository. By clicking an entry in the legend, it is possible to toggle its visibility in the plot.
### Interpretation
The variety of constraints-based modeling approaches and differences between various organisms compound the assessment of GSMs. While memote facilitates model assessment it can only do so within limitations. Please bear in mind the diversity of Paradigms that challenge some of memote's results.
#### Color
**Snapshot Report**
Results without highlights are kept in the main <span style="color:#2a7bb8">blue</span> color of the memote color scheme. Scored results (Figure G1) will be marked with a gradient ranging from <span style="color:#a11212">red</span> to <span style="color:#12a12e">green</span> denoting a low or a high score respectively:
![Snapshot Report Score Gradient](figures/guide/image6.png)
**Diff Report**
The colour in the Diff Report (Figure G2) depends on the ratio of the sample minimum to the sample maximum. Result sets where the sample minimum and the sample maximum are identical will be coloured in the main <span style="color:#2a7bb8">blue</span> color of the memote color scheme. Result sets where the sample minimum is very small relative to the sample maximum will appear <span style="color:#a11212">red</span. This ratio is calculated with as $$ 1 - (Min / Max)) * 100 $$.
This is then mapped to the following gradient:
![Diff Report Ratio Gradient](figures/guide/image4.png)
#### Score
Each test in the independent section provides a relative measure of completeness with regard to the tested property. The final score is the weighted sum of all individual test results normalized by the maximally achievable score, i.e., all individual results at 100%. Individual tests can be weighted, but it is also possible to apply weighting to entire subsections. Hence the final score is calculated:
$$ TotalScore = \frac{\sum_{Subsections} weight_{subsection}\times(\sum_{Tests} weight_{test} \times TestScore)}{Max Score} $$
Weights for sections and individual tests are indicated by a white number inside a magenta badge. No badge means that the weight defaults to 1.
The subsections “Consistency” and “Annotation - SBO” have weights of 3 and 2, respectively. The test “Stoichiometric Consistency” itself is weighted 3 times stronger than the remaining tests in the “Consistency” subsection. The remaining subsections and tests which cover annotations of metabolites, reactions and genes have weights of 1 (Supplementary Figure G1).
### Paradigms
#### "Reconstructions" and "Models"
Some authors may publish metabolic networks which are parameterized, ready to run flux balance analysis (FBA), these are referred to simply as 'models'. Alternatively, others may publish unconstrained metabolic knowledge bases (referred to as 'reconstructions'), from which several models can be derived by applying different constraints. Both can be encoded in SBML. With having an independent test section, we attempt to make both 'models' and 'reconstructions' comparable, although a user should be aware that this difference exists and [is subject to some discussion](https://github.com/opencobra/memote/issues/228). Please note that some tests in the specific section may error for a reconstruction as they require initialization.
#### "Lumped" and "Split" Biomass Reaction
There are two basic ways of specifying the biomass composition. The most common is a single lumped reaction containing all biomass precursors. Alternatively, the biomass equation can be split into several reactions each focusing on a different macromolecular component for instance a (1 gDW ash) + b (1 gDW phospholipids) + c (free fatty acids)+
d (1 gDW carbohydrates) + e (1 gDW protein) + f (1 gDW RNA) + g (1 gDW DNA) + h (vitamins/cofactors) + x ATP + x H2O-> 1 gDCW biomass + x ADP + x H + x Pi. The benefit of either approach depends very much on the use cases which are [discussed by the community](https://github.com/opencobra/memote/issues/243). Memote employs heuristics to identify the type of biomass which may fail to distinguish edge cases.
#### "Average" and "Unique" Metabolites
A metabolite consisting of a fixed core with variable branches such as a membrane lipid is sometimes implemented by averaging over the distribution of individual lipid species. The resulting pseudo-metabolite is assigned an average chemical formula, which requires scaling of stoichiometries of associated reactions to avoid floating point numbers in the chemical formulae. An alternative approach is to implement each species as a distinct metabolite in the model, which increases the total count of reactions. Memote cannot yet distinguish between these paradigms, which means that results in the specific sections that rely on the total number of reactions or scaling of stoichiometric parameters may be biased.
# Supplementary Note 2: Validation against experimental data
To compare model predictions to experimental measurements, a researcher would typically write a short script. The reproducibility of this script may be limited by the original author's style of writing code, whether the code has been rigorously checked for errors, and whether it is dependent on obsolete libraries. The latter, so called software rot, arises from a lack of active maintenance [@Beaulieu-Jones2017-tg].
In contrast, with memote researchers may optionally define a configuration file (in YAML format) in which they can set the medium and FBA objective. This file can be used by researchers without prior programming experience. It configures memote to execute clearly defined, formulaic operations, which are unit tested. Lastly, it confers the burden of maintenance to the memote community represented through this consortium. This does not only distribute the necessity for funding onto many shoulders, but also increases the likelihood of the codebase keeping up with advances in its core dependencies, i.e., keeping software rot at bay. The development of the COBRAToolbox [@Heirendt2017-ra] and cobrapy [@Ebrahim2013-wf] are pertinent examples of community projects that operate on a similar strategy. Moreover, frequent versioning ensures that users can return to previous versions to re-run analyses.
Setting up a version-controlled model repository not only allows researchers to publish a ‘default’ unspecific GEM of the investigated organism, but also reproducible instructions on how to obtain a model that is specific to the organism in a defined experimental context including, and validated against the data supporting this context. This formulaic approach of deriving a GEM into a condition-specific form supports Heavner and Price’s [@Heavner2015-gq] call for more transparency and reproducibility in metabolic network reconstruction (\@ref(fig:validation)).
\begin{figure}
\includegraphics[width=\linewidth]{figures/guide/image3.png}
\caption{Experimental tests can be tailored to a specific condition through the use of one or several configuration files (configs). (a) To validate GEMs against experimental data measured in specific conditions, researchers usually write their scripts which constrain the model. This is problematic as scripts can vary a lot and they are, unless actively maintained, susceptible to software rot. (b) With memote, user-defined configuration files replace scripts, which allows the experimental validation of GEMs to be unified and formalized. Bundling the model, configuration files, and experimental data within a version-controlled repository (indicated by the blue asterisk*) facilitates reproducibility.}
\label{fig:validation}
\end{figure}
# Supplementary Note 3: Integration in third party tools and services
Memote's core functions are available through a [python API](https://memote.readthedocs.io/en/latest/autoapi/) and the online service is available through either a [web interface](www.memote.io) or a programmatic [REST API](https://api.dd-decaf.eu/memote-webservice/#/). We have integrated memote in KBase [@Arkin2018] as an app, OptFlux [@Rocha2010] (version 3.4) as a plug-in and link to it from the BiGG Models Database [@King2015]. We plan to integrate it with BioModels [@Li2010], and the RAVEN toolbox [@Agren2013].
# Supplementary Note 4: Discussion of alternatives to memote
The cloud-based, distributed version control for GEMs encoded as SBML3FBC is only one possible implementation approach for version control and collaboration. Alternatives include Pathway Tools [@Karp2009] which internally stores organism data in the form of a database, and AuReMe [@Aite2018], which allows users to interact with a database by wikis. Although databases offer greater capacity and speed than single, large data files, the programmatic or form-based interaction and more complex setup procedure required for databases may not be easily accessible to a broad community. We see Memote in combination with GitHub, GitLab, or BioModels as a means of version control that is simple to set up and easy to manage.
For quality control, alternatives include rBioNet [@Thorleifsson2011], an extension to the COBRAToolbox [@Heirendt2017-ra]. It primarily focuses on guiding reconstruction by flagging operations which violate SOPs but also provides functions which print basic information such as the amount of model components and dead-end metabolites. Memote may be more widely adopted because no license for MATLAB is required. gsmodutils [@Gilbert2019] is another option but is less accessible to a wider community due to the need for proficiency in Python for use. We note that owing to the exchange format of SBML, memote is fully compatible with rBioNet and gsmodutils.
# Supplementary Note 5: Outlook
In future, memote could be extended to provide support for tests based on multi-omics data [@Hackett2016]. Moreover, to distribute all files of a model repository together, the model, supporting data and scripts could be automatically bundled into one ZIP-based archive file (so-called COMBINE archive) [@Bergmann2014]. These archives can include a formal description of simulation experiments to ensure exchangeability and reproducibility [@Waltemath2011].
The tests that memote offers only apply to stoichiometric models. However, the underlying principles behind memote could be applied to other modeling paradigms, i.e., to models of metabolism and expression (ME-models) [@OBrien2013], kinetic [@Vasilakou2016], or even systems pharmacological models [@Thiel2017].
# Supplementary Methods
To simplify interpretation, the following figures are grouped by the sections of their corresponding test cases as they appear in a snapshot report. The code that was used to generate the data and figures has been deposited on GitHub https://github.com/biosustain/memote-meta-study.
## Tested models
We tested models from seven GEM collections comprising manually and (semi)-automatically reconstructed GEMs (10,780 models in total):
(i) 801 semi-automatically built reconstructions of human gut bacteria from the AGORA [@Magnsdttir2016] collection (version 1.03; not condition-specific and including post-publication corrections [@Babaei2018], [@Magnsdttir2018]), (ii) 2,641 models from the Path2Models [@Bchel2013] branch of the BioModels [@Li2010], [@LeNovere2006], [@Chelliah2014] database hosting models automatically generated from pathway resources, and (iii) 5,511 and (iv) 1,632 models automatically reconstructed using CarveME [@Machado2018] and the Department of Energy’s Knowledge Base (KBase) [@Arkin2018] based on bacterial genomes in NCBI RefSeq, respectively. Furthermore, 36 manually reconstructed models from the (v) the BiGG25 database and two collections of published models as available from (vi) Ebrahim et al. [@Ebrahim2015] (80 models) and (vii) the OptFlux [@Rocha2010] software (79 models), of which 39 models are likely identical based on a filename comparison
Two collections contained models in non-standard formats that were omitted entirely (15 from Ebrahim et al and 49 from OptFlux)
In order to respect the limited resources on the [DTU high performance computing infrastructure](https://www.hpc.dtu.dk/), we set a maximum time limit for running the memote test suite. This introduced a bias against large models. Additionally, certain models failed the testing procedure. In the following we tabulate the total size of the collections as well as the final number of tested models. The results are shown in Table \@ref(tab:numbers).
```{r numbers}
kableExtra::usepackage_latex("threeparttable")
dplyr::tibble(
collection = factor(
c("agora", "carveme", "path", "kbase", "bigg", "ebrahim", "optflux"),
levels = c("agora", "carveme", "path", "kbase", "bigg", "ebrahim", "optflux")
),
name = c(
"AGORA",
"CarveMe",
"Path2Models",
"KBase",
paste0("BiGG", kableExtra::footnote_marker_symbol(1)),
paste0(
"Ebrahim \\textit{et al.}",
kableExtra::footnote_marker_symbol(2)
),
paste0("OptFlux Models", kableExtra::footnote_marker_symbol(2))
),
size = c(818, 5587, 2641, 1637, 36, 83, 100)
) %>%
dplyr::inner_join(
total_df %>% dplyr::group_by(collection) %>% dplyr::summarize(num_test = dplyr::n_distinct(model))
) %>%
dplyr::select(-collection) %>%
dplyr::mutate(percent = num_test * 100 / size) %>%
knitr::kable(
digits = 1,
booktabs = TRUE,
caption = "Number of tested models.",
col.names = c("Collection", "Number of Models", "Tested Models", "\\%"),
escape = FALSE
) %>%
kableExtra::kable_styling(full_width = FALSE, protect_latex = TRUE) %>%
kableExtra::footnote(
symbol = c(
"Please note that we removed the large number of \\\\textit{Escherichia coli} strain models from the BiGG collection and only included results from the models iJR904, iAF1260, iJO1366, and iML1515.",
"39 models from these two collections are likely identical based on a filename comparison."
),
escape = FALSE,
threeparttable = TRUE
)
```
```{r theme, include=FALSE}
# Ignore all textual result output in the following code chunks.
# Set the figure chunk options.
knitr::opts_chunk$set(results = 'hide', out.width = '100%', fig.asp = 0.618047, fig.align = 'center')
ggplot2::theme_set(cowplot::theme_cowplot(font_size = 11))
```
```{r cluster-layers, include=FALSE}
cluster_layers <- list(
ggplot2::geom_point(size = 1),
ggplot2::scale_color_manual("Collection", values = colors, labels = collection_labels),
ggplot2::scale_shape_manual("Collection", values = shapes, labels = collection_labels),
ggplot2::theme(
axis.title = ggplot2::element_blank(),
axis.text = ggplot2::element_blank(),
axis.ticks = ggplot2::element_blank()
)
)
```
## Clustering
```{r pca, fig.cap='Depicted are the first two components of a principal components analysis of the normalized test features (metrics).', cache=TRUE, dependson='load'}
ggplot2::ggplot(metric_pca_tbl,
ggplot2::aes(
x = x,
y = y,
color = collection,
shape = collection
)) +
cluster_layers
```
```{r tsne, fig.cap='Depicted are the distances between models in higher order space given by the normalized test features reduced to two dimensions using t-SNE.', cache=TRUE, dependson='load'}
ggplot2::ggplot(metric_tsne_tbl,
ggplot2::aes(
x = x,
y = y,
color = collection,
shape = collection
)) +
cluster_layers
```
```{r umap, fig.cap='Depicted are the distances between models in higher order space given by the normalized test features reduced to two dimensions using UMAP.', cache=TRUE, dependson='load'}
ggplot2::ggplot(metric_umap_tbl,
ggplot2::aes(
x = x,
y = y,
color = collection,
shape = collection
)) +
cluster_layers
```
In order to perform the clustering analyses, we used all normalized test metrics excluding some particular cases. Excluded are the Sections \@ref(sec:basic-info) & \@ref(sec:biomass) because the basic information only contains unnormalized model dimensions and because a biomass formulation is not present in all models. We further removed individual biomass related test cases, as well as the metabolic coverage since that is not properly normalized. Additionally, test cases that contained errors were penalized with the worst metric of one.
```{r importance-plot, include=FALSE}
####################################################
### plot.rf.var.importance.by.class.andMean.dotplot
####################################################
# Plot dotplot with variable importance mean over all classes
# Args:
# model: random forest model already build
# predVar: string of column ID with predictor/variables names values
# classVar: string of class variable in 'df'
# colorVector: vector of colors
# nBestFeatures: number of top relevant features to show in the plot.
# classNames: vector with ad-hoc class names.
plot.rf.var.importance.by.class.andMean.dotplot <- function(model, test_labels) {
imp_df <- randomForest::importance(model) %>%
tibble::as.tibble(rownames = "test") %>%
dplyr::select(-MeanDecreaseGini, mean = MeanDecreaseAccuracy) %>%
# Order by descending mean decrease in accuracy.
dplyr::arrange(desc(mean)) %>%
head(n = 15) %>%
# For the plot we order the levels in reverse due to axis arrangement.
dplyr::mutate(
test = gsub("overview.", "overview-", test, fixed = TRUE),
test = gsub("wrong_ids.", "wrong_ids-", test, fixed = TRUE),
test = factor(
test,
levels = rev(test),
labels = stringr::str_wrap(test_labels[rev(test)], width = 32),
ordered = TRUE
)
) %>%
tidyr::gather(key = "collection", value = "mean_da", -test) %>%
dplyr::mutate(collection = factor(
collection,
levels = c(
"agora",
"carveme",
"path",
"kbase",
"bigg",
"ebrahim",
"optflux",
"mean"
))
)
rf_colors <- c(colors, mean = "#60d660")
rf_labels <- c(collection_labels, mean = "Mean")
ggplot2::ggplot(imp_df,
ggplot2::aes(
x = mean_da,
y = test,
group = test,
color = collection
)) +
ggplot2::geom_segment(mapping = ggplot2::aes(yend = test),
xend = 0,
color = "grey50") +
ggplot2::geom_point(size = 2) +
ggplot2::scale_color_manual(values = rf_colors, guide = FALSE) +
ggplot2::facet_grid(. ~ collection) +
# ggplot2::facet_grid(. ~ collection, labeller = as_labeller(rf_labels, default = label_parsed)) +
ggplot2::xlab("Test Importance (Mean Decrease in Accuracy)") +
ggplot2::theme(
axis.text.x = ggplot2::element_text(angle = 45, hjust = 1),
strip.text.x = ggplot2::element_text(angle = 45),
axis.title.y = ggplot2::element_blank()
)
}
```
To determine the most relevant tests to discriminate between model collections, we built a classifier using a random forest [@Breiman2001] over the collections and normalized test results (0.99 accuracy and 0.01% out-of-bag (OOB) error). Then, the importance of each variable, i.e., test case, was ranked with the Mean Decrease in Accuracy (MDA) [@Louppe2013]. This metric measures the total decrease in accuracy, averaged over all trees of the forest, when the value of a given variable is permuted in the OOB samples. Figure \@ref(fig:random-forest) represents the 15 most discriminant features on average (see last column) and their independent relevance by collection. The higher the decrease in accuracy, the higher the relative contribution of such a test to differentiate among collections.
Thus, the five most discriminant tests are purely metabolic reactions, transport reactions, dead-end metabolites, orphan metabolites, and the presence of a non-growth associated maintenance reaction. Although there is a variable range of importance for each collection, e.g., for CarveMe transport reactions and orphans are more relevant; for Kbase transport reactions; for Ebrahim \textit{et al.} purely metabolic reactions. For a detailed study of the clustering properties, please refer to the \textit{Supplementary Clustering Analysis} notebook.
```{r random-forest, fig.cap='15 most relevant tests to discriminate among GEM collections, for each collection and the mean. Ranked in decreasing importance according to the \\textit{mean decrease in accuracy} metric averaged over all collections (last column), computed over a random forest classification model.', fig.asp=1.3}
load("data/rf_model_classCollection.Rdata")
model <- model$finalModel
test_labels <- total_df$title
names(test_labels) <- total_df$test
ggplot2::theme_set(cowplot::theme_cowplot(font_size = 8))
plot.rf.var.importance.by.class.andMean.dotplot(model, test_labels)
ggplot2::theme_set(cowplot::theme_cowplot(font_size = 11))
```
\FloatBarrier
## Test Suite
The database identifiers referenced throughout the Annotation sections belong to common biochemical databases that are listed in Table \@ref(tab:databases).
(ref:databases-ref1) [@Glasner2003]
(ref:databases-ref2) [@King2015]
(ref:databases-ref3) [@Caspi2009]
(ref:databases-ref4) [@Jeske2018]
(ref:databases-ref5) [@Pujar2017]
(ref:databases-ref6) [@Hastings2015]
(ref:databases-ref7) [@McDonald2009]
(ref:databases-ref8) [@Zhou2012]
(ref:databases-ref9) [@Wishart2017]
(ref:databases-ref10) [@KeshavaPrasad2009]
(ref:databases-ref11) [@Stein2003]
(ref:databases-ref12) [@Kanehisa2019]
(ref:databases-ref13) [@Moretti2015]
(ref:databases-ref14) [@NCBIResourceCoordinators2017]
(ref:databases-ref15) [@Fabregat2017]
(ref:databases-ref16) [@Morgat2016]
(ref:databases-ref17) [@Henry2010]
(ref:databases-ref18) [@UniProtConsortium2018]
```{r databases, results = 'asis'}
kableExtra::usepackage_latex("threeparttable")
dplyr::tibble(
databases = c("ASAP", "BiGG", "BioCyc", "BRENDA", "CCDS", "ChEBI", "EC-Code", "EcoGene", "HMDB", "HPRD", "InChI", "InChIKey", "Kegg", "MetaNetX", "NCBI Gene", "NCBI GI", "NCBI Protein", "PubChem", "Reactome", "RefSeq", "RHEA", "SEED", "Uniprot"),
component = c("gene", "reaction, metabolite", "reaction, metabolite", "reaction", "gene", "metabolite", "reaction", "gene", "metabolite", "gene", "metabolite", "metabolite", "gene, reaction, metabolite", "reaction, metabolite", "gene", "gene", "gene", "metabolite", "reaction, metabolite", "gene", "reaction", "metabolite", "gene"),
links = c("http://asap.ahabs.wisc.edu/asap/home.php", "http://bigg.ucsd.edu/universal/", "http://biocyc.org", "http://www.brenda-enzymes.org/", "http://www.ncbi.nlm.nih.gov/CCDS/", "https://www.ebi.ac.uk/chebi/", "http://www.enzyme-database.org/", "http://ecogene.org/", "http://www.hmdb.ca/", "http://www.hprd.org/", "https://www.ebi.ac.uk/chebi/", "http://cactus.nci.nih.gov/chemical/structure", "http://www.kegg.jp/", "http://www.metanetx.org", "http://ncbigene.bio2rdf.org/fct", "http://www.ncbi.nlm.nih.gov/protein/", "http://www.ncbi.nlm.nih.gov/protein", "https://pubchem.ncbi.nlm.nih.gov/", "http://www.reactome.org/", "http://www.ncbi.nlm.nih.gov/projects/RefSeq/", "http://www.rhea-db.org/", "http://modelseed.org/", "http://www.uniprot.org/"),
citations = c ("(ref:databases-ref1)", "(ref:databases-ref2)", "(ref:databases-ref3)", "(ref:databases-ref4)", "(ref:databases-ref5)", "(ref:databases-ref6)", "(ref:databases-ref7)", "(ref:databases-ref8)", "(ref:databases-ref9)", "(ref:databases-ref10)", "(ref:databases-ref11)", "N/A", "(ref:databases-ref12)", "(ref:databases-ref13)", "(ref:databases-ref14)", "(ref:databases-ref14)", "(ref:databases-ref14)", "(ref:databases-ref14)", "(ref:databases-ref15)", "(ref:databases-ref14)", "(ref:databases-ref16)", "(ref:databases-ref17)", "(ref:databases-ref18)")
) %>%
knitr::kable(
digits = 1,
booktabs = TRUE,
caption = "Biochemical Databases for Model Component Annotation.",
col.names = c("Databases", "Component Type", "URL", "Citation"),
escape = FALSE
) %>%
kableExtra::kable_styling(full_width = T, protect_latex = TRUE) %>%
kableExtra::column_spec(1, width = "2cm") %>%
kableExtra::column_spec(3, width = "10cm") %>%
kableExtra::landscape()
```
```{r sina-layers, include=FALSE}
sina_layers <- list(
ggforce::geom_sina(size = 1, scale = FALSE),
ggplot2::geom_boxplot(color = "black", outlier.shape = NA, fill = NA),
ggplot2::scale_x_discrete(labels = collection_labels),
ggplot2::scale_color_manual(values = colors, guide = FALSE),
ggplot2::scale_shape_manual(values = shapes, guide = FALSE),
ggplot2::theme(
axis.title.x = ggplot2::element_blank(),
axis.text.x = ggplot2::element_text(
angle = 45,
hjust = 1,
vjust = 1
)
)
)
```
### Summary of Observations
* SBO terms are only used by models from KBase and BiGG (Figure \@ref(fig:annotation-sbo-score)).
* Models from Path2Models and Opflux Models are formatted in legacy SBML (< Level 3,Version 1) without FBC package (Figures \@ref(fig:sbml-level) & \@ref(fig:fbc-presence)).
* Models from the collections of Ebrahim \textit{et al.}, and OptFlux Models are highly variable for many specific tests. Models from automatic reconstruction pipelines (AGORA, CarveMe, Path2Models, and KBase) or the controlled BiGG collection are much more similar within each collection yet still different from each other. This could be due to each collection focusing on a distinct set of taxonomies but could also be related to the algorithms and databases behind each collection (Section \@ref(sec:network-topology); Figures \@ref(fig:metabolicreactions), \@ref(fig:transportreactions), and \@ref(fig:reactionsidenticalgenes)).
* On biomass:
* Only for a minority of models in BiGG, Ebrahim \textit{et al.}, and OptFlux Models memote could not identify a biomass reaction (Figure \@ref(fig:biomass-presence)).
* A portion of models in the BiGG collections have inconsistent biomass equations followed by OptFlux Models and models in the collection by Ebrahim \textit{et al.}; all models in the CarveMe and Path2Model collections have inconsistent biomass reactions (Figure \@ref(fig:biomass-consistency)).
* Models that cannot be simulated using the default or complete medium exist in Path2Models, BiGG, Ebrahim \textit{et al.}, and OptFlux Models (Figure \@ref(fig:biomass-default-production) & \@ref(fig:biomass-open-production)).
* Possible artifacts from automatic reconstruction are present in models from AGORA and KBase that grow despite some biomass precursors being blocked when each precursor is optimized individually in default and complete medium (compare Figures \@ref(fig:biomass-precursors-default) & \@ref(fig:biomass-precursors-open) with \@ref(fig:biomass-default-production) & \@ref(fig:biomass-open-production)).
* The average fraction of reactions that participate in stoichiometrically-balanced cycles is larger for models from automatic reconstruction pipelines (AGORA, CarveMe, Path2Models, KBase) than for BiGG, Ebrahim \textit{et al.}, and OptfluxModels (Figure \@ref(fig:balanced-cycles)). This could be an artifact from automatic reconstruction processes.
* Reactions that involve oxygen are integral to the energy metabolism of many organisms. Not constraining these reactions carefully can lead to predictions that deviate from the expected phenotype, i.e., allowing anaerobic growth that should not be possible. The portion of oxygen-containing reactions that are reversible varies strongly across all seven collections. Models in BiGG have the lowest variance whereas models from Path2Models, Ebrahim \textit{et al.}, and OptFlux vary strongly (Figure \@ref(fig:reversible-oxygen-reactions)).
\FloatBarrier
### Scores
```{r total-plot, fig.cap='Total Score. Depicted are the sums of all test scores in all independent sections, applying the weights for individual test cases and sections as detailed in the snapshot report.'}
section_weights <- tibble::tibble(
section = c(
"consistency",
"annotation_met",
"annotation_rxn",
"annotation_gene",
"annotation_sbo"
),
weight = c(3, 1, 1, 1, 2)
)
total_df %>%
dplyr::filter(is.finite(score)) %>%
# First, sum the weighted test scores per section.
dplyr::group_by(collection, model, section) %>%
dplyr::summarize(score = sum(score * weight) / sum(weight)) %>%
# Second, sum the weighted section scores to a total.
dplyr::left_join(., section_weights, by = "section") %>%
dplyr::summarize(total = sum(score * weight) / sum(weight)) %>%
dplyr::ungroup() %>%
ggplot2::ggplot(
.,
ggplot2::aes(
x = collection,
y = total,
color = collection,
shape = collection,
label = model
)
) + sina_layers + ggplot2::ylab("Total Score") + ggplot2::ylim(0, 1)
```
```{r sum-plots, cache=TRUE, dependson='scores', include=FALSE}
sum_plots <- total_df %>%
dplyr::filter(is.finite(score)) %>%
# Sum the weighted test scores per section.
dplyr::group_by(collection, model, section) %>%
dplyr::summarize(total = sum(score * weight) / sum(weight)) %>%
dplyr::ungroup() %>%
dplyr::group_by(section) %>%
dplyr::do(
plot_sina = ggplot2::ggplot(
.,
ggplot2::aes(
x = collection,
y = total,
color = collection,
shape = collection,
label = model
)
) + sina_layers + ggplot2::ylab("Section Score Sub Total") + ggplot2::ylim(0, 1)
)
```
\FloatBarrier
### Independent Section
```{r plot-function, include=FALSE}
do_plot <- function(sub_tbl, var, label_width = 40) {
test_id <- dplyr::first(as.character(sub_tbl$test))
base_plot <- ggplot2::ggplot(
sub_tbl,
ggplot2::aes_string(
x = "collection",
y = var,
color = "collection",
shape = "collection",
label = "model"
)
) + sina_layers + ggplot2::ylab(stringr::str_wrap(y_axis_labels[test_id],
width = label_width))
if ((min(sub_tbl[[var]], na.rm = TRUE) < 0) | (max(sub_tbl[[var]], na.rm = TRUE) > 1)) {
return(base_plot)
} else {
return(base_plot + ggplot2::ylim(0, 1))
}
}
```
```{r scored-plots, cache=TRUE, dependson='scores', include=FALSE}
score_plots <- total_df %>%
dplyr::filter(is.finite(score)) %>%
dplyr::group_by(test) %>%
dplyr::do(
plot_sina = do_plot(., "score") + ggplot2::ylim(0, 1)
)
```
```{r metric-plots, cache=TRUE, dependson='load', include=FALSE}
metric_plots <- total_df %>%
dplyr::mutate(metric = ifelse(is.na(numeric), NA, metric)) %>%
dplyr::group_by(test) %>%
dplyr::do(
plot_sina = do_plot(., "metric")
)
```
```{r numeric-plots, cache=TRUE, dependson='load', include=FALSE}
numeric_plots <- total_df %>%
dplyr::group_by(test) %>%
dplyr::do(
plot_sina = do_plot(., "numeric")
)
```
\FloatBarrier
#### Consistency
```{r consistency-score, fig.cap='Consistency. Depicted are the sums of all test scores in this section, applying the weights of the individual test cases as detailed in the snapshot report.'}
sum_plots$plot_sina[sum_plots$section == "consistency"][[1]]
```
```{r stoichiometricconsistency, fig.cap='Stoichiometric consistency'}
score_plots$plot_sina[score_plots$test == "test_stoichiometric_consistency"][[1]]
```
```{r reactionmassbalance, fig.cap='Mass Balance. Please note that any reaction where at least one metabolite lacks a formula annotation is considered as unbalanced for the purpose of this test.'}
score_plots$plot_sina[score_plots$test == "test_reaction_mass_balance"][[1]]
```
```{r reactionchargebalance, fig.cap='Charge Balance. Please note that any reaction where at least one metabolite lacks charge information is considered as unbalanced for the purpose of this test.'}
score_plots$plot_sina[score_plots$test == "test_reaction_charge_balance"][[1]]
```
```{r finddisconnected, fig.cap='Metabolite Connectivity'}
score_plots$plot_sina[score_plots$test == "test_find_disconnected"][[1]]
```
```{r unboundedflux, fig.cap='Unbounded Flux in Default Medium'}
score_plots$plot_sina[score_plots$test == "test_find_reactions_unbounded_flux_default_condition"][[1]]
```
\FloatBarrier
#### Annotation - Metabolites
```{r annotation-met-score, fig.cap='Annotation - Metabolites. Depicted are the sums of all test scores in this section, applying the weights of the individual test cases as detailed in the snapshot report.'}
sum_plots$plot_sina[sum_plots$section == "annotation_met"][[1]]
```
```{r metaboliteannotationpresence, fig.cap='Presence of Metabolite Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_presence"][[1]]
```
\FloatBarrier
##### Metabolite Annotations Per Database
```{r metaboliteannotationpubmed, fig.cap='Metabolite Pubchem.compound Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_overview-pubchem.compound"][[1]]
```
```{r metaboliteannotationkegg, fig.cap='Metabolite KEGG.compound Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_overview-kegg.compound"][[1]]
```
```{r metaboliteannotationseed, fig.cap='Metabolite SEED.compound Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_overview-seed.compound"][[1]]
```
```{r metaboliteannotationinchikey, fig.cap='Metabolite InChIKey Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_overview-inchikey"][[1]]
```
```{r metaboliteannotationinchi, fig.cap='Metabolite InChI Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_overview-inchi"][[1]]
```
```{r metaboliteannotationchebi, fig.cap='Metabolite ChEBI Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_overview-chebi"][[1]]
```
```{r metaboliteannotationhmdb, fig.cap='Metabolite HMDB Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_overview-hmdb"][[1]]
```
```{r metaboliteannotationreactome, fig.cap='Metabolite Reactome Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_overview-reactome"][[1]]
```
```{r metaboliteannotationmetanetx, fig.cap='Metabolite MetaNetX.chemical Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_overview-metanetx.chemical"][[1]]
```
```{r metaboliteannotationbigg, fig.cap='Metabolite BiGG.metabolite Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_overview-bigg.metabolite"][[1]]
```
```{r metaboliteannotationbiocyc, fig.cap='Metabolite BioCyc Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_overview-biocyc"][[1]]
```
\FloatBarrier
##### Metabolite Annotation Conformity per Database
```{r wrongmetaboliteannotationpubmed, fig.cap='Correct Metabolite Pubchem.compound Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_wrong_ids-pubchem.compound"][[1]]
```
```{r wrongmetaboliteannotationkegg, fig.cap='Correct Metabolite KEGG.compound Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_wrong_ids-kegg.compound"][[1]]
```
```{r wrongmetaboliteannotationseed, fig.cap='Correct Metabolite SEED.compound Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_wrong_ids-seed.compound"][[1]]
```
```{r wrongmetaboliteannotationinchikey, fig.cap='Correct Metabolite InChIKey Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_wrong_ids-inchikey"][[1]]
```
```{r wrongmetaboliteannotationinchi, fig.cap='Correct Metabolite InChI Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_wrong_ids-inchi"][[1]]
```
```{r wrongmetaboliteannotationchebi, fig.cap='Correct Metabolite ChEBI Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_wrong_ids-chebi"][[1]]
```
```{r wrongmetaboliteannotationhmdb, fig.cap='Correct Metabolite HMDB Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_wrong_ids-hmdb"][[1]]
```
```{r wrongmetaboliteannotationreactome, fig.cap='Correct Metabolite Reactome Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_wrong_ids-reactome"][[1]]
```
```{r wrongmetaboliteannotationmetanetx, fig.cap='Correct Metabolite MetaNetX.chemical Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_wrong_ids-metanetx.chemical"][[1]]
```
```{r wrongmetaboliteannotationbigg, fig.cap='Correct Metabolite BiGG.metabolite Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_wrong_ids-bigg.metabolite"][[1]]
```
```{r wrongmetaboliteannotationbiocyc, fig.cap='Correct Metabolite BioCyc Annotation'}
score_plots$plot_sina[score_plots$test == "test_metabolite_annotation_wrong_ids-biocyc"][[1]]
```
```{r metabolitenamespaceconsistency, fig.cap='Uniform Metabolite Identifier Namespace'}
score_plots$plot_sina[score_plots$test == "test_metabolite_id_namespace_consistency"][[1]]
```
\FloatBarrier
#### Annotation - Reactions
```{r annotation-rxn-score, fig.cap='Annotation - Reactions. Depicted are the sums of all test scores in this section, applying the weights of the individual test cases as detailed in the snapshot report.'}
sum_plots$plot_sina[sum_plots$section == "annotation_rxn"][[1]]
```
```{r reactionannotationpresence, fig.cap='Presence of Reaction Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_presence"][[1]]
```
\FloatBarrier
##### Reaction Annotations Per Database
```{r reactionannotationrhea, fig.cap='Reaction Rhea Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_overview-rhea"][[1]]
```
```{r reactionannotationkegg, fig.cap='Reaction KEGG.reaction Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_overview-kegg.reaction"][[1]]
```
```{r reactionannotationseed, fig.cap='Reaction SEED.reaction Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_overview-seed.reaction"][[1]]
```
```{r reactionannotationmetanetx, fig.cap='Reaction MetaNetX.reaction Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_overview-metanetx.reaction"][[1]]
```
```{r reactionannotationbigg, fig.cap='Reaction BiGG.reaction Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_overview-bigg.reaction"][[1]]
```
```{r reactionannotationreactome, fig.cap='Reaction Reactome Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_overview-reactome"][[1]]
```
```{r reactionannotationec, fig.cap='Reaction Enzyme Classification Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_overview-ec-code"][[1]]
```
```{r reactionannotationbrenda, fig.cap='Reaction BRENDA Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_overview-brenda"][[1]]
```
```{r reactionannotationbiocyc, fig.cap='Reaction BioCyc Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_overview-biocyc"][[1]]
```
\FloatBarrier
##### Reaction Annotation Conformity Per Database
```{r wrongreactionannotationrhea, fig.cap='Correct Reaction Rhea Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_wrong_ids-rhea"][[1]]
```
```{r wrongreactionannotationkegg, fig.cap='Correct Reaction KEGG.reaction Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_wrong_ids-kegg.reaction"][[1]]
```
```{r wrongreactionannotationseed, fig.cap='Correct Reaction SEED.reaction Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_wrong_ids-seed.reaction"][[1]]
```
```{r wrongreactionannotationmetanetx, fig.cap='Correct Reaction MetaNetX.reaction Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_wrong_ids-metanetx.reaction"][[1]]
```
```{r wrongreactionannotationbigg, fig.cap='Correct Reaction BiGG.reaction Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_wrong_ids-bigg.reaction"][[1]]
```
```{r wrongreactionannotationreactome, fig.cap='Correct Reaction Reactome Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_wrong_ids-reactome"][[1]]
```
```{r wrongreactionannotationec, fig.cap='Correct Reaction Enzyme Classification Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_wrong_ids-ec-code"][[1]]
```
```{r wrongreactionannotationbrenda, fig.cap='Correct Reaction BRENDA Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_wrong_ids-brenda"][[1]]
```
```{r wrongreactionannotationbiocyc, fig.cap='Correct Reaction BioCyc Annotation'}
score_plots$plot_sina[score_plots$test == "test_reaction_annotation_wrong_ids-biocyc"][[1]]
```
```{r reactionnamespaceconsistency, fig.cap='Uniform Reaction Identifier Namespace'}
score_plots$plot_sina[score_plots$test == "test_reaction_id_namespace_consistency"][[1]]
```
\FloatBarrier
#### Annotation - Genes
```{r annotation-gene-score, fig.cap='Annotation - Genes. Depicted are the sums of all test scores in this section, applying the weights of the individual test cases as detailed in the snapshot report.'}
sum_plots$plot_sina[sum_plots$section == "annotation_gene"][[1]]
```
```{r gene-annotation-presence, fig.cap='Presence of Gene Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_presence"][[1]]
```
\FloatBarrier
##### Gene Annotations Per Database
```{r gene-annotation-refseq, fig.cap='Gene RefSeq Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_overview-refseq"][[1]]
```
```{r genenannotationuniprot, fig.cap='Gene UniProt Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_overview-uniprot"][[1]]
```
```{r genenannotationecogene, fig.cap='Gene EcoGene Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_overview-ecogene"][[1]]
```
```{r genenannotationkegg, fig.cap='Gene KEGG.genes Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_overview-kegg.genes"][[1]]
```
```{r genenannotationncbigi, fig.cap='Gene NCBIgi Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_overview-ncbigi"][[1]]
```
```{r genenannotationncbigene, fig.cap='Gene NCBIgene Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_overview-ncbigene"][[1]]
```
```{r genenannotationncbiprotein, fig.cap='Gene NCBIprotein Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_overview-ncbiprotein"][[1]]
```
```{r genenannotationccds, fig.cap='Gene CCDS Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_overview-ccds"][[1]]
```
```{r genenannotationhprd, fig.cap='Gene HPRD Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_overview-hprd"][[1]]
```
```{r genenannotationasap, fig.cap='Gene ASAP Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_overview-asap"][[1]]
```
\FloatBarrier
##### Gene Annotation Conformity Per Database
```{r wronggenenannotationrefseq, fig.cap='Correct Gene RefSeq Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_wrong_ids-refseq"][[1]]
```
```{r wronggenenannotationuniprot, fig.cap='Correct Gene UniProt Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_wrong_ids-uniprot"][[1]]
```
```{r wronggenenannotationecogene, fig.cap='Correct Gene EcoGene Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_wrong_ids-ecogene"][[1]]
```
```{r wronggenenannotationkegg, fig.cap='Correct Gene KEGG.genes Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_wrong_ids-kegg.genes"][[1]]
```
```{r wronggenenannotationncbigi, fig.cap='Correct Gene NCBIgi Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_wrong_ids-ncbigi"][[1]]
```
```{r wronggenenannotationncbigene, fig.cap='Correct Gene NCBIgene Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_wrong_ids-ncbigene"][[1]]
```
```{r wronggenenannotationncbiprotein, fig.cap='Correct Gene NCBIprotein Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_wrong_ids-ncbiprotein"][[1]]
```
```{r wronggenenannotationccds, fig.cap='Correct Gene CCDS Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_wrong_ids-ccds"][[1]]
```
```{r wronggenenannotationhprd, fig.cap='Correct Gene HPRD Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_wrong_ids-hprd"][[1]]
```
```{r wronggenenannotationasap, fig.cap='Correct Gene ASAP Annotation'}
score_plots$plot_sina[score_plots$test == "test_gene_product_annotation_wrong_ids-asap"][[1]]
```
\FloatBarrier
#### Annotation - SBO Terms
```{r annotation-sbo-score, fig.cap='Annotation - SBO Terms. Depicted are the sums of all test scores in this section, applying the weights of the individual test cases as detailed in the snapshot report.'}
sum_plots$plot_sina[sum_plots$section == "annotation_sbo"][[1]]
```
```{r metabolite-sbo-presence, fig.cap='Metabolite General SBO Presence'}
score_plots$plot_sina[score_plots$test == "test_metabolite_sbo_presence"][[1]]
```
```{r metabolitesbospecificpresence, fig.cap='Metabolite SBO:0000247 Presence'}
score_plots$plot_sina[score_plots$test == "test_metabolite_specific_sbo_presence"][[1]]
```
```{r reaction-sbo-presence, fig.cap='Reaction General SBO Presence'}
score_plots$plot_sina[score_plots$test == "test_reaction_sbo_presence"][[1]]
```
```{r reactionsbospecificpresence, fig.cap='Metabolic Reaction SBO:0000176 Presence'}
score_plots$plot_sina[score_plots$test == "test_metabolic_reaction_specific_sbo_presence"][[1]]
```
```{r transportreactionsbospecificpresence, fig.cap='Transport Reaction SBO:0000185 Presence'}
score_plots$plot_sina[score_plots$test == "test_transport_reaction_specific_sbo_presence"][[1]]
```
```{r exchangereactionsbospecificpresence, fig.cap='Exchange Reaction SBO:0000627 Presence'}
score_plots$plot_sina[score_plots$test == "test_exchange_specific_sbo_presence"][[1]]
```
```{r demandreactionsbospecificpresence, fig.cap='Demand Reaction SBO:0000628 Presence'}
score_plots$plot_sina[score_plots$test == "test_demand_specific_sbo_presence"][[1]]
```
```{r sinkreactionsbospecificpresence, fig.cap='Sink Reaction SBO:0000632 Presence'}
score_plots$plot_sina[score_plots$test == "test_sink_specific_sbo_presence"][[1]]
```
```{r genesbopresence, fig.cap='Gene General SBO Presence'}
score_plots$plot_sina[score_plots$test == "test_gene_sbo_presence"][[1]]
```
```{r genesbospecificpresence, fig.cap='Gene SBO:0000243 Presence'}
score_plots$plot_sina[score_plots$test == "test_gene_specific_sbo_presence"][[1]]
```
```{r biomassreactionsbospecificpresence, fig.cap='Biomass Reaction SBO:0000629 Presence'}
score_plots$plot_sina[score_plots$test == "test_biomass_specific_sbo_presence"][[1]]
```
\FloatBarrier
### Specific Section
#### SBML
```{r sbml-level, fig.cap='SBML Level and Version'}
metric_plots$plot_sina[metric_plots$test == "test_sbml_level"][[1]]
```
```{r fbc-presence, fig.cap='FBC not Enabled'}
metric_plots$plot_sina[metric_plots$test == "test_fbc_presence"][[1]]
```
\FloatBarrier
#### Basic Information {#sec:basic-info}
```{r modelid, fig.cap='Model Identifier Presence'}
metric_plots$plot_sina[metric_plots$test == "test_model_id_presence"][[1]]
```
```{r nummetabolites, fig.cap='Number of Metabolites'}
numeric_plots$plot_sina[numeric_plots$test == "test_metabolites_presence"][[1]] + ggplot2::scale_y_log10()
```
```{r numreactions, fig.cap='Number of Reactions'}
numeric_plots$plot_sina[numeric_plots$test == "test_reactions_presence"][[1]] + ggplot2::scale_y_log10()
```
```{r numgenes, fig.cap='Number of Genes'}
numeric_plots$plot_sina[numeric_plots$test == "test_genes_presence"][[1]] + ggplot2::scale_y_log10()
```
```{r compartmentspresence, fig.cap='Number of Compartments'}
numeric_plots$plot_sina[numeric_plots$test == "test_compartments_presence"][[1]]
```
```{r metaboliccoverage, fig.cap='Metabolic Coverage'}
metric_plots$plot_sina[metric_plots$test == "test_metabolic_coverage"][[1]]
```
\FloatBarrier
#### Metabolite Information
```{r uniquemetabolic, fig.cap='Unique Metabolites'}
metric_plots$plot_sina[metric_plots$test == "test_find_unique_metabolites"][[1]]
```
```{r duplicatemetabolites, fig.cap='Duplicate Metabolites in Identical Compartments'}
metric_plots$plot_sina[metric_plots$test == "test_find_duplicate_metabolites_in_compartments"][[1]]
```
```{r metaboliteswithoutcharge, fig.cap='Metabolites Without Charge'}
metric_plots$plot_sina[metric_plots$test == "test_metabolites_charge_presence"][[1]]
```