|
| 1 | +--- |
| 2 | +title: "Observation level random effects" |
| 3 | +description: "" |
| 4 | +author: "Thierry Onkelinx" |
| 5 | +date: "2020-07-01" |
| 6 | +categories: ["r", "statistics"] |
| 7 | +tags: ["r", "analysis", "mixed model"] |
| 8 | +output: |
| 9 | + md_document: |
| 10 | + preserve_yaml: true |
| 11 | + variant: gfm |
| 12 | +--- |
| 13 | + |
| 14 | +## Definition |
| 15 | + |
| 16 | +An observation level random effect (OLRE) is a random effect with a |
| 17 | +different level for each observation. Combined with a Poisson |
| 18 | +distribution it can capture moderate overdispersion. |
| 19 | + |
| 20 | +## Example |
| 21 | + |
| 22 | +The model below is a simple mixed model. \(\delta_{ijk}\) is the OLRE. |
| 23 | + |
| 24 | +\[Y_{ijk} \sim Poisson(\mu_{ijk})\] |
| 25 | + |
| 26 | +\[\log(\mu_{ijk}) = \eta_{ijk}\] |
| 27 | + |
| 28 | +\[\eta_{ijk} = \beta_0 + \beta_1 X_i + b_j + \delta_{ijk}\] |
| 29 | + |
| 30 | +\[b_j \sim N(0, \sigma_b)\] |
| 31 | + |
| 32 | +\[\delta_{ijk} \sim N(0, \sigma_{olre})\] |
| 33 | + |
| 34 | +Let’s generate some overdispersed data using a negative binomial |
| 35 | +distribution. |
| 36 | + |
| 37 | +``` r |
| 38 | +set.seed(324) |
| 39 | +n.i <- 10 |
| 40 | +n.j <- 10 |
| 41 | +n.k <- 10 |
| 42 | +beta.0 <- 1 |
| 43 | +beta.1 <- 0.3 |
| 44 | +sigma.b <- 0.5 |
| 45 | +theta <- 5 |
| 46 | +dataset <- expand.grid( |
| 47 | + X = seq_len(n.i), |
| 48 | + b = seq_len(n.j), |
| 49 | + Replicate = seq_len(n.k) |
| 50 | +) |
| 51 | +rf.b <- rnorm(n.j, mean = 0, sd = sigma.b) |
| 52 | +dataset$eta <- beta.0 + beta.1 * dataset$X + rf.b[dataset$b] |
| 53 | +dataset$mu <- exp(dataset$eta) |
| 54 | +dataset$Y <- rnbinom(nrow(dataset), mu = dataset$mu, size = theta) |
| 55 | +dataset$OLRE <- seq_len(nrow(dataset)) |
| 56 | +``` |
| 57 | + |
| 58 | +``` r |
| 59 | +library(ggplot2) |
| 60 | +``` |
| 61 | + |
| 62 | + ## Warning: package 'ggplot2' was built under R version 4.0.2 |
| 63 | + |
| 64 | +``` r |
| 65 | +ggplot(dataset, aes(x = X, y = Y, group = Replicate)) + |
| 66 | + geom_line() + |
| 67 | + geom_point() + |
| 68 | + facet_wrap(~b) |
| 69 | +``` |
| 70 | + |
| 71 | +<!-- --> |
| 72 | + |
| 73 | +Next we fit the model with an OLRE. |
| 74 | + |
| 75 | +``` r |
| 76 | +library(lme4) |
| 77 | +``` |
| 78 | + |
| 79 | + ## Loading required package: Matrix |
| 80 | + |
| 81 | +``` r |
| 82 | +m1 <- glmer(Y ~ X + (1 | b) + (1 | OLRE), |
| 83 | + data = dataset, |
| 84 | + family = poisson |
| 85 | +) |
| 86 | +summary(m1) |
| 87 | +``` |
| 88 | + |
| 89 | + ## Generalized linear mixed model fit by maximum likelihood (Laplace |
| 90 | + ## Approximation) [glmerMod] |
| 91 | + ## Family: poisson ( log ) |
| 92 | + ## Formula: Y ~ X + (1 | b) + (1 | OLRE) |
| 93 | + ## Data: dataset |
| 94 | + ## |
| 95 | + ## AIC BIC logLik deviance df.resid |
| 96 | + ## 6871.8 6891.4 -3431.9 6863.8 996 |
| 97 | + ## |
| 98 | + ## Scaled residuals: |
| 99 | + ## Min 1Q Median 3Q Max |
| 100 | + ## -2.06918 -0.41636 -0.00649 0.27116 1.55094 |
| 101 | + ## |
| 102 | + ## Random effects: |
| 103 | + ## Groups Name Variance Std.Dev. |
| 104 | + ## OLRE (Intercept) 0.1732 0.4161 |
| 105 | + ## b (Intercept) 0.1475 0.3840 |
| 106 | + ## Number of obs: 1000, groups: OLRE, 1000; b, 10 |
| 107 | + ## |
| 108 | + ## Fixed effects: |
| 109 | + ## Estimate Std. Error z value Pr(>|z|) |
| 110 | + ## (Intercept) 1.002971 0.127698 7.854 4.02e-15 *** |
| 111 | + ## X 0.298558 0.005803 51.447 < 2e-16 *** |
| 112 | + ## --- |
| 113 | + ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 |
| 114 | + ## |
| 115 | + ## Correlation of Fixed Effects: |
| 116 | + ## (Intr) |
| 117 | + ## X -0.282 |
| 118 | + |
| 119 | +The estimates for \(\beta_0\), \(\beta_1\) and \(\sigma_b\) are |
| 120 | +reasonably close to the ones used to generate the data. |
| 121 | + |
| 122 | +## Check the sanity of the model |
| 123 | + |
| 124 | +Models with an OLRE should be used carefully. Because OLRE can have a |
| 125 | +very strong influence on the model. One should always check the standard |
| 126 | +deviation of the ORLE. High standard deviations are a good indication |
| 127 | +for problems. Let’s show why. |
| 128 | + |
| 129 | +The main model (without OLRE) is |
| 130 | +\(\gamma_{ijk} = \beta_0 + \beta_1 X_i + b_j\). The OLRE correct this |
| 131 | +for overdispersion \(\eta_{ijk} = \gamma_{ijk} + \delta_{ijk}\). Since |
| 132 | +\(\delta_{ijk} \sim N(0, \sigma_{olre})\), the 2.5% and 97.5% quantiles |
| 133 | +of this distribution define a range in which 95% of the plausible OLRE |
| 134 | +values are. The 2.5% quantile (\(\delta_{lcl}\)) indicates a strong but |
| 135 | +plausible downward correction, the 97.5% quantile (\(\delta_{ucl}\)) an |
| 136 | +upward correction. So due to the corrections we have |
| 137 | +\(\eta_{ijk} = (\gamma_{ijk} + \delta_{lcl}; \gamma_{ijk} + \delta_{ucl}) = \gamma_{ijk} + (\delta_{lcl};\delta_{ucl})\). |
| 138 | +In a Poisson distribution with log-link we have |
| 139 | +\(\log(\mu_{ijk}) = \eta_{ijk}\) or |
| 140 | +\(\mu_{ijk} = e ^{\eta_{ijk}} = e ^{\gamma_{ijk} + (\delta_{lcl};\delta_{ucl})}\). |
| 141 | +The ratio \(r_{ijk}\) between the upper and lower boundary of |
| 142 | +\(\mu_{ijk}\) is: |
| 143 | + |
| 144 | +\[r_{ijk} = \frac{e ^{\gamma_{ijk} + \delta_{ucl}}}{e ^{\gamma_{ijk} + \delta_{lcl})}}\] |
0 commit comments