Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dropped & misnamed coefficients in multivariate model with subsets & missing observations #1350

Closed
wpetry opened this issue May 11, 2022 · 3 comments
Labels
Milestone

Comments

@wpetry
Copy link

wpetry commented May 11, 2022

Problem

I think brm is dropping coefficients (with a cryptic warning), then misnaming the remaining coefficients in some multivariate models. This behavior shows up in models with categorical predictors (or interactions) when a category doesn't have any observations in the subset used for a given response variable. This is a lot like the model in #360.

I expected to see all of the coefficients in the model summary (the same as those returned by get_prior). When there are no data for a particular category, I expected samples from the prior to be returned. Instead, I think the coefficients for categories that have no observations in the subset are being dropped. The warning traces to rename_pars:
In x$fit@sim$fnames_oi[change$pos] <- change$fnames : number of items to replace is not a multiple of replacement length

More concerning, I think the remaining coefficients are being misnamed. This is easiest to see with an example.

Reproducible example

Two Gaussian response variables (resp1 and resp2), one continuous predictor (pred), one categorical with five levels (trt). The model fits an intercept and slope for each level of trt. Each resp* uses it's own (overlapping) subset of observations. Some fraction of the data are missing. This is the simplest model I can get to reproduce the problem.

library(brms)
set.seed(7429)
dat <- data.frame(s1 = rep(c(TRUE, FALSE), each = 20),
                  s2 = rep(c(TRUE, FALSE), times = 20),
                  trt = sample(letters[1:5], size = 40, replace = TRUE),
                  resp1 = rnorm(40),
                  resp2 = rnorm(40),
                  pred = rnorm(40))
dat$resp1[sample(1:nrow(dat), size = 10)] <- NA_real_
dat$resp2[sample(1:nrow(dat), size = 10)] <- NA_real_

pr <- prior(normal(0, 1), class = "Intercept", resp = "resp1")+
  prior(normal(0, 1), class = "b", resp = "resp1")+
  prior(student_t(3, 0, 1), class = "sigma", resp = "resp1")+
  prior(normal(0, 1), class = "Intercept", resp = "resp2")+
  prior(normal(0, 1), class = "b", resp = "resp2")+
  prior(student_t(3, 0, 1), class = "sigma", resp = "resp2")

# fit multivariate regression with treatment intercepts & slopes
mod <- brm(bf(resp1 | subset(s1) ~ pred * trt)+
              bf(resp2 | subset(s2) ~ pred * trt)+
              set_rescor(FALSE),
            prior = pr,
            data = dat)
fixef(mod)
> fixef(mod)
                    Estimate Est.Error       Q2.5     Q97.5
resp1_Intercept  0.124314029 0.4664846 -0.7981032 1.0396267
resp2_Intercept -0.058489417 0.3845320 -0.8051694 0.6999184
resp1_pred       0.118354734 0.4610578 -0.8209008 0.9688173
resp1_trtb       0.222167999 0.7851979 -1.2886598 1.7857427
resp1_trtc       0.410709053 0.6507444 -0.9056891 1.6619746
resp1_trtd      -0.042645774 0.6260059 -1.2912430 1.1641244
resp1_trte      -0.191413604 0.6205323 -1.3918378 1.0600148
resp1_pred:trtb  0.135878128 0.9086114 -1.6306915 1.8869672
resp1_pred:trtc -0.683845123 0.7098990 -1.9776018 0.8088393
resp1_pred:trtd -0.837813299 0.7326894 -2.2254481 0.6943058
resp1_pred:trte -0.327580081 0.6274094 -1.4918064 0.9576667
resp2_pred      -0.251073355 0.4067055 -1.0393228 0.5849635
resp2_trtb      -0.036095931 0.8524550 -1.6938496 1.6508865
resp2_trtc       0.745312615 0.5520089 -0.3678196 1.8279240
resp2_trtd      -0.087670996 0.5794801 -1.2142046 1.0195655
resp2_trte      -0.005124052 0.7906322 -1.5315402 1.5187297
resp2_pred:trtb  0.171879526 0.7588272 -1.3411928 1.6327431
resp2_pred:trtc  0.565049279 0.6418409 -0.7762201 1.7874330

Notice that b_resp2_pred:trtd and b_resp2_pred:trte are missing from the coefficients. But the it's treatment trtc and trte that have missing observations. I think the parameter renaming step is filling the treatment coefficients alphabetically, but running out.

> xtabs(~is.na(resp2) + trt, data = dat)
            trt
is.na(resp2) a b c d e
       FALSE 4 8 5 6 7
       TRUE  3 2 0 5 0

Additional clues

The expected behavior (= all coefficients in the summary) shows up when:

  1. Reducing to a univariate model with the same data, formula, etc.
mod2 <- brm(resp2 | subset(s2) ~ pred * trt,
            prior = prior(normal(0, 1), class = "Intercept")+
              prior(normal(0, 1), class = "b")+
              prior(student_t(3, 0, 1), class = "sigma"),
            data = dat)
  1. All categories have >=1 observation in the subset used for the affected response. To see this, re-simulate the data after set.seed(7430) and re-fit the multivariate model.

Your work on this package has made so much more possible in my research. Thank you!

@wpetry
Copy link
Author

wpetry commented May 13, 2022

Possibly relates to #1346

@paul-buerkner
Copy link
Owner

Thanks! Will check it out.

@paul-buerkner paul-buerkner added this to the brms 2.17.0++ milestone May 16, 2022
paul-buerkner added a commit that referenced this issue May 25, 2022
@paul-buerkner
Copy link
Owner

paul-buerkner commented May 25, 2022

Should now be fixed on github.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants