-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lme4: additive notation #196
Comments
Thanks! Thinking a bit more about this, there seem to be three issues:
Some code for illustration is here: library(equatiomatic)
library(lme4)
#> Loading required package: Matrix
mod <- lmer(score ~ wave + (1 | sid) + (1 | school), data = sim_longitudinal)
# model includes an intercept ("fixed effect")
fixef(mod)
#> (Intercept) wave
#> 97.1122331 0.1736105
# all of the random effects are mean zero
mean(ranef(mod)$school[,1])
#> [1] 9.453623e-13
mean(ranef(mod)$sid[,1])
#> [1] 5.764416e-11
# to predict the first observation: (sid = 1, school = 1, wave = 0)
fixef(mod)[1] + ranef(mod)$sid[1,1] + ranef(mod)$school[1,1]
#> (Intercept)
#> 101.9404
predict(mod)[1] # check
#> 1
#> 101.9404
# whether the data is nested or not is irrelevant for the model
mod2 <- lmer(score ~ (1 | wave) + (1 | sid) + (1 | school), data = sim_longitudinal)
# model includes an intercept ("fixed effect")
fixef(mod2)
#> (Intercept)
#> 97.89348
# all of the random effects are mean zero
mean(ranef(mod2)$wave[,1])
#> [1] 5.902878e-12
mean(ranef(mod2)$school[,1])
#> [1] 1.460903e-10
mean(ranef(mod2)$sid[,1])
#> [1] 1.642559e-10
# to predict the first observation: (sid = 1, school = 1, wave = 0)
fixef(mod2)[1] + ranef(mod2)$sid[1,1] + ranef(mod2)$school[1,1] + ranef(mod2)$wave[1,1]
#> (Intercept)
#> 101.959
predict(mod2)[1] # check
#> 1
#> 101.959 Created on 2021-09-03 by the reprex package (v2.0.1) |
Yes, I understand how the parameters add together (I teach advanced courses on multilevel modeling). What I'm not understanding, however, is how the additive nature isn't clear from the screenshot with the means set to zero. The intercept is \alpha, which is randomly varying across j, k, and l. Each of these have their own distributions with a mean of zero. If the parameter is specified as being generated by more than one distribution, then additivity is implied (i.e., the single value is generated by two or more distributions). I don't like the idea of including additional greek notation to imply the additivity because it makes generalizing to n levels more difficult. For example, in the case you included from Gelman/Hill, we have gamma and delta. Would we move to zeta for the next level? What about for even more levels? iota, kappa, etc.? That seems overly complicated to me. |
I guess that's where I'd prefer a more explicit notation, especially for teaching purposes, and I also wouldn't think about alpha as one parameter, although I see now how one can do that (but don't you still require the overall intercept? -- because the three alphas have mean zero). Here's a proposal at writing this model with fewer greek letters, but explicit additivity: (By the way, the variances have subscripts for the individual indices, e.g. |
In this specific case I agree that is more clear. But if you have a random slope for wave then you have a mean for wave (the fixed effect) plus all those same offsets. Currently, a model like that would look like this library(equatiomatic)
m <- lme4::lmer(score ~ wave + (wave | sid) + (wave | school) + (wave | district),
data = sim_longitudinal)
#> boundary (singular) fit: see ?isSingular
extract_eq(m) Created on 2021-09-03 by the reprex package (v0.3.0) I do think we should swap each mean for a zero. But how would this model look with the notation you've proposed? |
Of course, one could also reorder such that the \alpha come all first -- maybe that makes more sense. |
This would be another option, although it's fairly noisy (but it's also a complicated model). The problem with the previous version is that variance components such as It's fairly late here, so I'll leave this for now and see if I have another idea tomorrow. |
I like it. It's sort of an expanded notation. Perhaps I could include that as an additional argument so you could have it output either way. I do think we'd still need to specify the mean for wave though, so perhaps something like this. EDIT: Thinking about this more - I think I'd probably swap the betas for lower case b's, but otherwise I think it could work. |
How so? Each index corresponds to the given level, j for sid, k for school, and l for district. So |
The index j implies that for each sid, a separate variance is estimated. That's not the case in this model, but it could be a possible model, so it seems worth distinguishing the two cases. I added the intercept for \beta in this notation. I realize it's verbose, but it seems to avoid all ambiguities. |
I guess I'm not clear on what you mean by this. Are you talking about the residual variance structure? I would say that j implies parameter-level variance across sid levels (i.e., the offsets from the main effect). I will think about this more and perhaps try to get some feedback from others. The below seems clear to me, but maybe I'm too close to it. |
I guess the trouble with the above is it doesn't fully make clear the connection between the main effect and the random effects as yours does. Alright... I think I'm convinced (and yes, this is only a few moments after stating I felt like I needed to think about it more). I will probably try to work on this next week. Thank you for sticking with me on this. |
Okay, even more complicated model. What do you think of this? One thing I'm not sure of is all the superscripts denoting the level of variation. They get pretty burdenson. Notice this model has group-level predictors, and the predictors as the sid level vary at the schid level. library(tidyverse)
library(lme4)
#> Loading required package: Matrix
#>
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#>
#> expand, pack, unpack
three_lev_link <- "https://stats.idre.ucla.edu/wp-content/uploads/2016/02/eg_hlm.sav"
three_lev <- haven::read_sav(three_lev_link) %>%
haven::as_factor() %>%
mutate(school = as.numeric(as.character(school)),
cid = as.numeric(as.character(cid))) %>%
select(schid = school,
sid = cid,
size:mobility,
female:hispanic, retained,
grade, year,
math)
m <- lmer(math ~ year * female + mobility + lowinc +
(year|sid) + (year * female|schid),
data = three_lev)
#> boundary (singular) fit: see ?isSingular
equatiomatic::extract_eq(m) Note this is not really a reprex, because I swapped out the equation for one I created by hand. Created on 2021-09-04 by the reprex package (v0.3.0) |
Sorry, one last thing here. Just thinking about this again:
Were you referring specifically to the j in the variance-covariance matrix of the random effects? I agree for sure it should be dropped there. But I'm thinking if we do that then we could drop all the superscripts for sid, schid, etc. Do you think it would still be ambiguous if we did that? |
Thanks. I really prefer having the group-level predictors at the "level" they are at (even though it doesn't make a difference in estimation, it can really help clarity when you have a bunch of predictors at a bunch of different levels). For example, see Gelman and Hill, Equation 12.15, p. 266, or Equation 13.2 on p. 280. The point about the j makes total sense. |
Alright well... starting to feel less certain about this move. I think there's a few things we could clean up with the current notation but I'm not sure I want to fully move to this proposal. I'm about to release v0.3 and was considering trying to implement some of these changes before then, but I think I'll stick with it as is for now and we can continue discussing this if you'd like for v0.3.1 or v0.4. To me, this notation is less clear than what I proposed in this comment. |
This is not urgent for me at all -- no rush. I would use this purely for teaching (and I think it's great for that!) To be honest, though, with your previous example I have trouble seeing the relationship between the equations and the model... The year covariate and the interaction is missing, and why do the parameters for sid involve g, which is coming from the schid level? If we think generatively about this model, then you would first have to draw from the last multivariate normal to generate the g's, and then make another draw from the sid distribution, for which the means depend on g. I don't think that's the model that is estimated by |
Yeah, I see what you're saying. The current version looks like this: Which I think is mostly clear. The interaction is shown by this part, because female, at the sid level, is predicting the beta_1 slope. I definitely think the variance-covariance part should be cleaned up by removing the subscripts. But the rest of it is, from my perspective, is pretty clear. I agree that the additive nature is not explicit, and I like the proposal to get that included, at least as an option. But I'm going to take some time to think on it a bit more before I actually devote any time to it. |
If I substitute the higher-level equations back into the first equation, this is what comes out: Maybe it's just that I'm not used to this notation, but I find it hardly clear -- especially in terms of relating the quantities given in the notation to the lmer output. I've updated the other example like this to more clearly distinguish the levels: These quantities relate directly to the lme4 output:
> fixef(m)
(Intercept) year female mobility lowinc year:female
0.148895814 0.757746730 -0.028796146 -0.010256601 -0.007573428 0.012771590
> head(ranef(m)$sid)
(Intercept) year
101480302 0.01228463 -0.01490673
173559292 0.46964220 0.03304830
174743401 -0.09559708 -0.04008357
174755092 0.18573450 -0.02796675
179986251 -1.15298238 -0.13414677
179989511 -1.10176634 -0.14879591
> head(ranef(m)$schid)
(Intercept) year female year:female
2020 0.223138136 0.18956016 -0.17981166 0.007486238
2040 0.097887246 0.11542878 -0.09179669 0.004223772
2180 -0.003541226 -0.11315393 0.03201687 0.001413563
2330 0.144209600 -0.06703731 0.05946655 -0.032387448
2340 0.234471831 -0.03703165 -0.07267607 -0.005913269
2380 -0.043747381 0.11065318 -0.06507523 0.015913536 |
Yes, now that I'm looking at that one closer I see it does have some issues. Let me look at a different model (and I'll file an issue for that specific model if I end up sticking with (mostly) the current notation to fix it) and I'll try to work up a simulation to illustrate the notation I'm going for. |
Okay, I think the primary issue with the notation was that I was trying to not use different greek letters at each level. But that just really doesn't end up working. If we do use different greek notation at each level, however, I think it all works. This is kind of a mashup between the Gelman and Hill notation and the Raudenbush and Bryk notation, but I think it works. The model I'm working with is this one library(equatiomatic)
library(lme4)
#> Loading required package: Matrix
library(tidyverse)
# calculate district means
dist_mean <- tapply(
sim_longitudinal$score,
sim_longitudinal$district,
mean
)
# put them in a df to merge
dist_mean <- data.frame(
district = names(dist_mean),
dist_mean = dist_mean
)
# create a new df with dist_mean added
d <- merge(sim_longitudinal, dist_mean, by = "district")
d <- d %>%
mutate(prop_low = (prop_low - mean(prop_low)) * 100,
dist_mean = dist_mean - mean(dist_mean))
m <- lmer(score ~ wave + group + treatment + prop_low * dist_mean +
(wave | sid) + (wave | school) + (wave + prop_low | district),
data = d
)
#> boundary (singular) fit: see ?isSingular Created on 2021-09-08 by the reprex package (v0.3.0) It's highly overfit, but it illustrates a lot of the complexities I'd like it to. There are group-level predictors, cross-level interactions, and terms varying across various levels (including the group-level predictors). I'm propose modifying the existing output from Here's a simulation that illustrates this equation formulation. It does take a fair amount of time to run, however (and assumes you've already run the above). ########### Set params ###########
theta0_delta0.gamma0.alpha <- 102 # overall intercept
theta0_delta0.gamma0.beta1 <- 0.2 # fixed effect: wave
gamma1_alpha <- -5.3 # fixed effect: grouplow
gamma2_alpha <- -4.08 # fixed effect: group medium
gamma3_alpha <- -2.12 # fixed effect: treatment1
theta0_delta1.gamma0.alpha <- 0.18 # fixed effect: prop_low
theta1_delta0.gamma0.alpha <- 1.36 # fixed effect: dist_mean
theta1_delta1.gamma0.alpha <- -0.03 # fixed effect: prop_low:dist_mean
# Variance-Covariance matrices
sid_vcv <- VarCorr(m)$sid
sch_vcv <- VarCorr(m)$school
dist_vcv <- matrix(
c(0.15, -0.5, 0.01,
-0.5, 2.09, -0.01,
0.01, 0.01, 0.01),
byrow = TRUE,
ncol = 3,
dimnames = list(c("intercept", "prop_low", "wave"),
c("intercept", "prop_low", "wave"))
)
# Residual variance
m_sigma <- 0.5
# N's
n_districts <- 100
n_schools_per_dist <- 20
stu_per_school <- 50
n_waves <- 8
# create some data; note `d_` is used to denote "data", not parameters
sim_data <- tibble(
district = seq_len(n_districts),
d_dist_mean = rnorm(n_districts, mean(d$dist_mean), sd(d$dist_mean)),
) %>%
mutate(sch_data = map(
seq_len(nrow(.)), function(x) {
tibble(schid = seq_len(n_schools_per_dist),
d_prop_low = rnorm(n_schools_per_dist, mean(d$prop_low), sd(d$prop_low)))
}
)) %>%
unnest(sch_data) %>%
mutate(schid = paste0(district, ".", schid)) %>%
group_by(schid) %>%
nest() %>%
ungroup() %>%
mutate(stu_data = map(
seq_len(nrow(.)), function(x) {
tibble(sid = seq_len(stu_per_school),
d_group_low = c(rep(1, floor(stu_per_school/3)),
rep(0, stu_per_school - floor(stu_per_school/3))),
d_group_medium = c(rep(0, stu_per_school - floor(stu_per_school/3)),
rep(1, floor(stu_per_school/3))),
d_treatment = c(rep(0, stu_per_school/2), rep(1, stu_per_school/2)))
}
)) %>%
unnest(data) %>%
unnest(stu_data) %>%
mutate(sid = paste0(schid, ".", sid)) %>%
nest_by(sid) %>%
mutate(d_wave = list(seq_len(n_waves) - 1)) %>%
ungroup() %>%
unnest(data) %>%
unnest(d_wave)
# simulation function
simulate_obs <- function(means, vcv) {
sim <- MASS::mvrnorm(1, means, vcv)
nms <- names(sim)
matrix(sim, nrow = 1) %>%
as_tibble() %>%
setNames(nms)
}
########### Conduct simulation ###########
# district level
dist_pars <- sim_data %>%
distinct(district, .keep_all = TRUE) %>%
rowwise() %>%
mutate(
intercept = theta0_delta0.gamma0.alpha +
(theta1_delta0.gamma0.alpha * d_dist_mean),
plow = theta0_delta1.gamma0.alpha,
slope = theta0_delta0.gamma0.beta1,
dist = list(
simulate_obs(c(intercept, plow, slope), dist_vcv)
)) %>%
select(district, dist) %>%
unnest(dist) %>%
rename(dist_intercept = intercept,
dist_prop_low = prop_low,
dist_wave = wave)
sim_data <- sim_data %>%
left_join(dist_pars)
# school level, simulated from each corresponding district mean
sch_pars <- sim_data %>%
distinct(schid, .keep_all = TRUE) %>%
rowwise() %>%
mutate(
intercept = dist_intercept +
(dist_prop_low * d_prop_low) +
(theta1_delta1.gamma0.alpha * (d_prop_low * d_dist_mean)),
slope = dist_wave,
sch = list(
simulate_obs(c(intercept, slope), sch_vcv)
)) %>%
select(schid, sch) %>%
unnest(sch) %>%
rename(sch_intercept = `(Intercept)`,
sch_wave = wave)
sim_data <- sim_data %>%
left_join(sch_pars)
# student level, simulated from each corresponding school mean
sid_pars <- sim_data %>%
distinct(sid, .keep_all = TRUE) %>%
rowwise() %>%
mutate(intercept = sch_intercept + (gamma1_alpha * d_group_low) +
(gamma2_alpha * d_group_medium) + (gamma3_alpha * d_treatment),
slope = sch_wave) %>%
mutate(stu = list(
simulate_obs(c(intercept, slope), sid_vcv)
)) %>%
select(sid, stu) %>%
unnest(stu) %>%
rename(stu_intercept = `(Intercept)`,
stu_wave = wave)
sim_data <- sim_data %>%
left_join(sid_pars)
# Observation level, simulated for each student
simulated <- sim_data %>%
rowwise() %>%
mutate(mean = stu_intercept + (stu_wave * d_wave) ,
score = rnorm(1, mean, m_sigma)) %>%
select(sid:d_wave, score)
# estimate model
m_sim <- lmer(score ~ d_wave + d_group_low + d_group_medium + d_treatment +
d_prop_low * d_dist_mean +
(d_wave | sid) + (d_wave | schid) + (d_wave + d_prop_low | district),
data = simulated
)
# compare
tibble(
param = names(fixef(m)),
true = c(102, 0.2, -5.3, -4.08, -2.12, 0.18, 1.36, -0.03),
estimated = fixef(m_sim)
)
#> # A tibble: 8 × 3
#> param true estimated
#> <chr> <dbl> <dbl>
#> 1 (Intercept) 102 101.8998
#> 2 wave 0.2 0.1989466
#> 3 grouplow -5.3 -5.231331
#> 4 groupmedium -4.08 -4.028604
#> 5 treatment1 -2.12 -2.161790
#> 6 prop_low 0.18 0.2271691
#> 7 dist_mean 1.36 1.361268
#> 8 prop_low:dist_mean -0.03 -0.1093143
# true student vcv
sid_vcv
#> (Intercept) wave
#> (Intercept) 86.6246703 -0.57107437
#> wave -0.5710744 0.08556414
#> attr(,"stddev")
#> (Intercept) wave
#> 9.3072375 0.2925135
#> attr(,"correlation")
#> (Intercept) wave
#> (Intercept) 1.0000000 -0.2097616
#> wave -0.2097616 1.0000000
# estimated student vcv
VarCorr(m_sim)$sid
#> (Intercept) d_wave
#> (Intercept) 86.2525184 -0.57587790
#> d_wave -0.5758779 0.08577146
#> attr(,"stddev")
#> (Intercept) d_wave
#> 9.2872234 0.2928676
#> attr(,"correlation")
#> (Intercept) d_wave
#> (Intercept) 1.0000000 -0.2117255
#> d_wave -0.2117255 1.0000000
# true school vcv
sch_vcv
#> (Intercept) wave
#> (Intercept) 9.93157307 -1.399767e-02
#> wave -0.01399767 1.972852e-05
#> attr(,"stddev")
#> (Intercept) wave
#> 3.151439841 0.004441679
#> attr(,"correlation")
#> (Intercept) wave
#> (Intercept) 1.0000000 -0.9999991
#> wave -0.9999991 1.0000000
# estimated school vcv
VarCorr(m_sim)$schid
#> (Intercept) d_wave
#> (Intercept) 9.5328845 -1.157120e-02
#> d_wave -0.0115712 1.404535e-05
#> attr(,"stddev")
#> (Intercept) d_wave
#> 3.087536967 0.003747713
#> attr(,"correlation")
#> (Intercept) d_wave
#> (Intercept) 1 -1
#> d_wave -1 1
# true district vcv
dist_vcv
#> intercept prop_low wave
#> intercept 0.15 -0.50 0.01
#> prop_low -0.50 2.09 -0.01
#> wave 0.01 0.01 0.01
cov2cor(dist_vcv)
#> intercept prop_low wave
#> intercept 1.0000000 -0.89299953 0.25819889
#> prop_low -0.8929995 1.00000000 -0.06917145
#> wave 0.2581989 0.06917145 1.00000000
# estimated district vcv (note the columns/rows are in a different order
cov2cor(dist_vcv)
#> (Intercept) d_wave d_prop_low
#> (Intercept) 0.22203609 0.03213051 -0.35245848
#> d_wave 0.03213051 0.01099242 0.01759074
#> d_prop_low -0.35245848 0.01759074 1.82081966
#> attr(,"stddev")
#> (Intercept) d_wave d_prop_low
#> 0.4712071 0.1048447 1.3493775
#> attr(,"correlation")
#> (Intercept) d_wave d_prop_low
#> (Intercept) 1.0000000 0.6503679 -0.5543228
#> d_wave 0.6503679 1.0000000 0.1243380
#> d_prop_low -0.5543228 0.1243380 1.0000000 I'm still not going to implement this immediately, but I think this is the direction I'd like to move. I also do think it would be good to have an additional option that includes all of the fixed effects in a single line, similar to how you have outlined above. That could be an additional argument to |
Sorry, for the late reply. Yes, that seems to fix the problems. I do think that the alternative notation does help with relating the quantities of the equation to the model output, so if you consider putting in a switch to allow for that kind of notation, I'd be all for it. |
@datalorax : first, thanks so much for making the I want to echo @elbersb 's OP, since I am also confused about the lack of additive notation for random effects. For example: library(lme4);
#> Loading required package: Matrix
library(equatiomatic); # extract math latex formula for LMM
library(texPreview); # display math latex formula
### simulate example multilevel data ###########################
n.patients <- 100;
sim.data.one.patient.one.gene <- function(patient.id = 1, n.foci = 2, n.lesion = 2) {
data <- data.frame(
lesion.id = rep(1:n.lesion, each = n.foci),
foci.id = rep(1:n.foci, times = n.lesion)
);
data$abundance <- rnorm(n = 1:nrow(data));
data$patient.id <- patient.id;
return(data);
}
set.seed(123);
sim.data <- lapply(
X = 1:n.patients,
FUN = function(x) {
sim.data.one.patient.one.gene(patient.id = x)
}
);
sim.data <- do.call(rbind, sim.data);
sim.data <- sim.data[,c('patient.id', colnames(sim.data)[colnames(sim.data) != 'patient.id'])];
sim.data$treat <- sample(
x = c('pre', 'post'),
size = nrow(sim.data),
replace = TRUE
);
sim.data$cohort <- sample(
x = c('cohort.1', 'cohort.2'),
size = nrow(sim.data),
replace = TRUE
);
sim.data$treat <- factor(sim.data$treat);
sim.data$cohort <- factor(sim.data$cohort);
sim.data$patient.id <- factor(sim.data$patient.id);
sim.data$lesion.id <- factor(sim.data$lesion.id);
sim.data$foci.id <- factor(sim.data$foci.id);
### fit lmer ###########################
fit <- lmer(
formula = abundance ~ treat + cohort + (1|patient.id/lesion.id),
data = sim.data
);
#> boundary (singular) fit: see help('isSingular')
tex_preview(extract_eq(fit)) Created on 2024-06-12 with reprex v2.0.2 Question: Am I correct that your random intercept notation implies The latter seems more intuitive to me - any reasons you went with the former?. I suppose if I use your formulas for a publication, I can just include a note explaining the above notation. |
I think I would use equatiomatic to extract the code but then manually modify it so it matches what is shown above. I unfortunately do not have time to make fixes to this now, but it does have a new maintainer (@phgrosjean), and he may be able to implement a fix. |
Got it, will do. I'm having a pretty hard time following the notation in your post above. In any case, I can just manually edit the latex if I find a notation that makes more sense to me. |
I took this example from the documentation:
I think the notation of the "alpha" term is somewhat confusing, and also (I think) non-standard, as it is not clear how alpha_j, alpha_k, and alpha_l are combined to produce the mean of score_i. Gelman & Hill (p. 289) use the following additive notation, which feels much more natural to me:
The text was updated successfully, but these errors were encountered: