-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
87 lines (61 loc) · 2.64 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE)
```
# mlcov
R package for selection of covariate effects using ML.
The methodology implemented in the `mlcov` R package consists of 4 key steps:
1.) The dataset, comprised of empirical Bayesian estimates of individual parameters (EBEs) and covariate sets, is randomly split into five folds (step 1, data splitting).
2.) The covariate selection (step 2) is performed by applying the Lasso algorithm to reduce irrelevant or redundant covariates due to correlation followed by the Boruta algorithm to iteratively identify relevant covariates based on their importance scores.
3.) A voting mechanism (step 3) across folds determines the final selected covariates based on their robustness. Note that these first three steps are implemented by a simple call to the function `ml_cov_search`.
4.) Finally, residual plots (step 4) are employed to evaluate the covariate-parameter relationships.Following the covariate selection using the proposed ML method, an XGboost model is trained on the selected covariates and the remaining trends between residuals (difference between the actual target values and the model's predicted values) and unselected covariates are examined. The primary goal is to ensure that the ML method did not overlook any significant trends or relationships that could be captured by additional covariates. This step is implemented in a separate function `generate_residual_plots`.
Visit the [PAGE Abstract](https://www.page-meeting.org/?abstract=10996) to learn more.
## Installation
```{r, eval = FALSE}
if (!requireNamespace("remotes", quietly = TRUE)) {
install.packages("remotes")
}
remotes::install_github("certara/mlcov")
```
# Usage
Import data file:
```{r, message=FALSE}
library(mlcov)
data_file <- system.file(package = "mlcov", "supplementary", "tab33")
data <- read.table(data_file, skip = 1, header = TRUE)
```
Perform covariate search:
```{r}
result <- ml_cov_search(
data = data,
pop_param = c("V1","CL"),
cov_continuous = c("AGE","WT","HT","BMI","ALB","CRT",
"FER","CHOL","WBC","LYPCT","RBC",
"HGB","HCT","PLT"),
cov_factors = c("SEX","RACE","DIAB","ALQ","WACT","SMQ")
)
print(result)
```
Generate SHAP plots:
```{r}
generate_shap_summary_plot(
result,
x_bound = NULL,
dilute = FALSE,
scientific = FALSE,
my_format = NULL,
title = NULL,
title.position = 0.5,
ylab = NULL,
xlab = NULL
)
```
Generate residual plots:
```{r}
generate_residuals_plot(data = data, result, pop_param = 'CL')
```
```{r}
generate_residuals_plot(data = data, result, pop_param = 'V1')
```