-
Notifications
You must be signed in to change notification settings - Fork 12
/
README.Rmd
222 lines (147 loc) · 10.8 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# dominanceanalysis <img src='logo/dominance_analysis_logo.png' align="right" height="139" />
[![codecov](https://codecov.io/gh/clbustos/dominanceAnalysis/branch/master/graph/badge.svg)](https://app.codecov.io/gh/clbustos/dominanceAnalysis)
[![Stable version](http://www.r-pkg.org/badges/version-last-release/dominanceanalysis)](https://cran.r-project.org/package=dominanceanalysis)
[![downloads](http://cranlogs.r-pkg.org/badges/grand-total/dominanceanalysis)](https://cran.r-project.org/package=dominanceanalysis)
Dominance Analysis (Azen and Budescu, 2003, 2006; Azen and Traxel, 2009; Budescu, 1993; Luo and Azen, 2013), for multiple regression models: Ordinary Least Squares, Generalized Linear Models, Dynamic Linear Models and Hierarchical Linear Models.
**Features**:
- Provides complete, conditional and general dominance analysis for *lm* (univariate and multivariate), *clm*, *dynlm*, *lmer*, *betareg* and *glm* (family=binomial) models.
- Covariance / correlation matrixes could be used as input for OLS dominance analysis, using `lmWithCov()` and `mlmWithCov()` methods, respectively.
- Multiple criteria can be used as fit indices, which is useful especially for HLM.
# Examples
## Linear regression
We could apply dominance analysis directly on the data, using *lm* (see Azen and Budescu, 2003).
The *attitude* data is composed of six predictors of the overall rating of 35 clerical employees of a large financial organization: complaints, privileges, learning, raises, critical and advancement. The method `dominanceAnalysis()` can retrieve all necessary information directly from a *lm* model.
```{r}
library(dominanceanalysis)
lm.attitude<-lm(rating~.,attitude)
da.attitude<-dominanceAnalysis(lm.attitude)
```
Using `print()` method on the *dominanceAnalysis* object, we can see that *complaints* completely dominates all other predictors, followed by *learning* (lrnn). The remaining 4 variables (prvl,rass,crtc,advn) don't show a consistent pattern for complete and conditional dominance. The average contribution of each predictor is also presented, that defines defines general dominance.
The `print()` method uses `abbreviate`, to allow complex models to be visualized at a glance.
```{r}
print(da.attitude)
```
The dominance brief and average contribution of each predictor could be retrieved separately using `dominanceBriefing()` and `averageContribution()` methods, respectively.
```{r}
dominanceBriefing(da.attitude, abbrev = TRUE)$r2
averageContribution(da.attitude)
```
The `summary()` method shows the complete dominance analysis matrix, that presents all fit differences between levels. Also, provides the average contribution of each variable.
```{r}
summary(da.attitude)
```
To evaluate the robustness of our results, we can use bootstrap analysis (Azen and Budescu, 2006).
We applied a bootstrap analysis using `bootDominanceAnalysis()` method with $R^2$ as a fit index and 100 permutations. For precise results, you need to run at least 1000 replications.
```{r, cache=T}
set.seed(1234)
bda.attitude<-bootDominanceAnalysis(lm.attitude, R=100)
```
The `summary()` method presents the results for the bootstrap analysis. *Dij* shows the original result, and *mDij*, the mean for Dij on bootstrap samples and *SE.Dij* its standard error. *Pij* is the proportion of bootstrap samples where *i* dominates *j*, *Pji* is the proportion of bootstrap samples where *j* dominates *i* and *Pnoij* is the proportion of samples where no dominance can be asserted. *Rep* is the proportion of samples where original dominance is replicated.
We can see that the value of complete dominance for *complaints* is fairly robust over all variables (Dij almost equal to mDij, and small SE), contrarily to *learning* (Dij differs from mDij, and bigger SE).
```{r, cache=T}
summary(bda.attitude)
```
Another way to perform the dominance analysis is by using a correlation or covariance matrix. As an example, we use the *ability.cov* matrix which is composed of five specific skills that might explain *general intelligence* (general). The biggest average contribution is for predictor *reading* (0.152). Nevertheless, in the output of `summary()` method on level 1, we can see that *picture* (0.125) dominates over *reading* (0.077) on *vocab* submodel.
```{r}
lmwithcov<-lmWithCov( f = general~picture+blocks+maze+reading+vocab,
x = cov2cor(ability.cov$cov))
da.cov<-dominanceAnalysis(lmwithcov)
print(da.cov)
summary(da.cov)
```
## Hierarchical Linear Models
For Hierarchical Linear Models using *lme4*, you should provide a null model (see Luo and Azen, 2013).
As an example, we use *npk* dataset, which contains information about a classical N, P, K (nitrogen, phosphate, potassium) factorial experiment on the growth of peas conducted on 6 blocks.
```{r}
library(lme4)
lmer.npk.1<-lmer(yield~N+P+K+(1|block),npk)
lmer.npk.0<-lmer(yield~1+(1|block),npk)
da.lmer<-dominanceAnalysis(lmer.npk.1,null.model=lmer.npk.0)
```
Using `print()` method, we can see that random effects are modeled as a constant (1 | block).
```{r}
print(da.lmer)
```
The fit indices used in the analysis were *n.marg* (Nakagawa's marginal R²), *n.cond* (Nakagawa's conditional R²), *rb.r2.1* (R&B $R^2_1$: Level-1 variance component explained by predictors), *rb.r2.2* (R&B $R^2_2$: Level-2 variance component explained by predictors), *sb.r2.1* (S&B $R^2_1$: Level-1 proportional reduction in error predicting scores at Level-1), and *sb.r2.2* (S&B $R^2_2$: Level-2 proportional reduction in error predicting scores at Level-1). We can see that using *rb.r2.1* and *sb.r2.1* index, that shows influence of predictors on Level-1 variance, clearly *nitrogen* dominates over *potassium* and *phosphate*, and *potassium* dominates over *phosphate*.
```{r}
s.da.lmer=summary(da.lmer)
s.da.lmer
sm.rb.r2.1=s.da.lmer$rb.r2.1$summary.matrix
# Nitrogen completely dominates potassium
as.logical(na.omit(sm.rb.r2.1$N > sm.rb.r2.1$K))
# Nitrogen completely dominates phosphate
as.logical(na.omit(sm.rb.r2.1$N > sm.rb.r2.1$P))
# Potassium completely dominates phosphate
as.logical(na.omit(sm.rb.r2.1$K > sm.rb.r2.1$P))
```
## Logistic regression
Dominance analysis can be used in logistic regression (see Azen and Traxel, 2009).
As an example, we used the *esoph* dataset, that contains information about a case-control study of (o)esophageal cancer in Ille-et-Vilaine, France.
Looking at the report for standard glm summary method, we can see that the linear effect of each variable was significant (*p* < 0.05 for *agegp.L*, *alcgp.L* and *tobgp.L*), such as the quadratic effect of predictor age (*p* < 0.05 for *agegp.Q*). Even so,it is hard to identify which variable is more important to predict esophageal cancer.
```{r}
glm.esoph<-glm(cbind(ncases,ncontrols)~agegp+alcgp+tobgp, esoph,family="binomial")
summary(glm.esoph)
```
We performed dominance analysis on this dataset and the results are shown below. The fit indices were *r2.m* ($R^2_M$: McFadden's measure), *r2.cs* ($R^2_{CS}$: Cox and Snell's measure), *r2.n* ($R^2_N$: Nagelkerke's measure) and *r2.e* ($R^2_E$: Estrella's measure). For all fit indices, we can conclude that *age* and *alcohol* completely dominate *tobacco*, while *age* shows general dominance over both *alcohol* and *tobacco.*
```{r}
da.esoph<-dominanceAnalysis(glm.esoph)
print(da.esoph)
summary(da.esoph)
```
Then, we performed a bootstrap analysis. Using McFadden's measure (*r2.m*), we can see that bootstrap dominance of *age* over *tobacco*, and of *alcohol* over *tobacco* have standard errors (*SE.Dij*) near 0 and reproducibility (*Rep*) close to 1, so are fairly robust on all levels.Dominance values of *age* over *alcohol* are not easily reproducible and require more research
```{r,cache=T}
set.seed(1234)
da.b.esoph<-bootDominanceAnalysis(glm.esoph,R = 200)
print(format(summary(da.b.esoph)$r2.m,digits=3),row.names=F)
```
## Set of predictors
Budescu (1993) shows that dominance analysis can be applied to groups or set of inseparable predictors. The Longley's economic regression data is know for have a highly collinear set on `Employed` variable. We can see that `GNP.deflator`, `GNP`, `Population` and `Year` are highly correlated.
```{r}
data(longley)
round(cor(longley),2)
```
We can group GNP and employment related variables, to determine the importance of both groups of variables. The GNP related variables dominates completely population, and we can see that all predictors dominates generally over employment.
```{r}
terms.r<-c(GNP.rel="GNP.deflator+GNP",
employment="Unemployed+Armed.Forces",
"Population",
"Year")
da.longley<-dominanceAnalysis(lm(Employed~.,longley),terms = terms.r)
print(da.longley)
```
## Installation
You can install the stable version from [CRAN](https://cran.r-project.org/package=dominanceanalysis)
``` r
install.packages('dominanceanalysis')
```
Also, you can install the latest version from [github](https://github.com/clbustos/dominanceanalysis) with:
``` r
library(devtools)
install_github("clbustos/dominanceanalysis")
```
## Authors
* Claudio Bustos Navarrete: Creator and maintainer
* Filipa Coutinho Soares: Documentation and testing
## Acknowledgments
* Daniel Schlaepfer: Reported an error in the logistic regression code.
* Xiong Zhang: Contributed to the incorporation of dynamic linear models.
* Maartje Hidalgo: Contributed to the incorporation of beta regression.
## References
* Budescu, D. V. (1993). Dominance analysis: A new approach to the problem of relative importance of predictors in multiple regression. Psychological Bulletin, 114(3), 542-551. https://doi.org/10.1037/0033-2909.114.3.542
* Azen, R., & Budescu, D. V. (2003). The dominance analysis approach for comparing predictors in multiple regression. Psychological Methods, 8(2), 129-148. https://doi.org/10.1037/1082-989X.8.2.129
* Azen, R., & Budescu, D. V. (2006). Comparing Predictors in Multivariate Regression Models: An Extension of Dominance Analysis. Journal of Educational and Behavioral Statistics, 31(2), 157-180. https://doi.org/10.3102/10769986031002157
* Azen, R., & Traxel, N. (2009). Using Dominance Analysis to Determine Predictor Importance in Logistic Regression. Journal of Educational and Behavioral Statistics, 34(3), 319-347. https://doi.org/10.3102/1076998609332754
* Luo, W., & Azen, R. (2013). Determining Predictor Importance in Hierarchical Linear Models Using Dominance Analysis. Journal of Educational and Behavioral Statistics, 38(1), 3-31. https://doi.org/10.3102/1076998612458319
* Shou, Y., & Smithson, M. (2015). Evaluating Predictors of Dispersion: A Comparison of Dominance Analysis and Bayesian Model Averaging. Psychometrika, 80(1), 236-256. https://doi.org/10.1007/s11336-013-9375-8