dropped & misnamed coefficients in multivariate model with subsets & missing observations #1350

wpetry · 2022-05-11T22:23:46Z

Problem

I think brm is dropping coefficients (with a cryptic warning), then misnaming the remaining coefficients in some multivariate models. This behavior shows up in models with categorical predictors (or interactions) when a category doesn't have any observations in the subset used for a given response variable. This is a lot like the model in #360.

I expected to see all of the coefficients in the model summary (the same as those returned by get_prior). When there are no data for a particular category, I expected samples from the prior to be returned. Instead, I think the coefficients for categories that have no observations in the subset are being dropped. The warning traces to rename_pars:
In x$fit@sim$fnames_oi[change$pos] <- change$fnames : number of items to replace is not a multiple of replacement length

More concerning, I think the remaining coefficients are being misnamed. This is easiest to see with an example.

Reproducible example

Two Gaussian response variables (resp1 and resp2), one continuous predictor (pred), one categorical with five levels (trt). The model fits an intercept and slope for each level of trt. Each resp* uses it's own (overlapping) subset of observations. Some fraction of the data are missing. This is the simplest model I can get to reproduce the problem.

library(brms)
set.seed(7429)
dat <- data.frame(s1 = rep(c(TRUE, FALSE), each = 20),
                  s2 = rep(c(TRUE, FALSE), times = 20),
                  trt = sample(letters[1:5], size = 40, replace = TRUE),
                  resp1 = rnorm(40),
                  resp2 = rnorm(40),
                  pred = rnorm(40))
dat$resp1[sample(1:nrow(dat), size = 10)] <- NA_real_
dat$resp2[sample(1:nrow(dat), size = 10)] <- NA_real_

pr <- prior(normal(0, 1), class = "Intercept", resp = "resp1")+
  prior(normal(0, 1), class = "b", resp = "resp1")+
  prior(student_t(3, 0, 1), class = "sigma", resp = "resp1")+
  prior(normal(0, 1), class = "Intercept", resp = "resp2")+
  prior(normal(0, 1), class = "b", resp = "resp2")+
  prior(student_t(3, 0, 1), class = "sigma", resp = "resp2")

# fit multivariate regression with treatment intercepts & slopes
mod <- brm(bf(resp1 | subset(s1) ~ pred * trt)+
              bf(resp2 | subset(s2) ~ pred * trt)+
              set_rescor(FALSE),
            prior = pr,
            data = dat)
fixef(mod)

> fixef(mod)
                    Estimate Est.Error       Q2.5     Q97.5
resp1_Intercept  0.124314029 0.4664846 -0.7981032 1.0396267
resp2_Intercept -0.058489417 0.3845320 -0.8051694 0.6999184
resp1_pred       0.118354734 0.4610578 -0.8209008 0.9688173
resp1_trtb       0.222167999 0.7851979 -1.2886598 1.7857427
resp1_trtc       0.410709053 0.6507444 -0.9056891 1.6619746
resp1_trtd      -0.042645774 0.6260059 -1.2912430 1.1641244
resp1_trte      -0.191413604 0.6205323 -1.3918378 1.0600148
resp1_pred:trtb  0.135878128 0.9086114 -1.6306915 1.8869672
resp1_pred:trtc -0.683845123 0.7098990 -1.9776018 0.8088393
resp1_pred:trtd -0.837813299 0.7326894 -2.2254481 0.6943058
resp1_pred:trte -0.327580081 0.6274094 -1.4918064 0.9576667
resp2_pred      -0.251073355 0.4067055 -1.0393228 0.5849635
resp2_trtb      -0.036095931 0.8524550 -1.6938496 1.6508865
resp2_trtc       0.745312615 0.5520089 -0.3678196 1.8279240
resp2_trtd      -0.087670996 0.5794801 -1.2142046 1.0195655
resp2_trte      -0.005124052 0.7906322 -1.5315402 1.5187297
resp2_pred:trtb  0.171879526 0.7588272 -1.3411928 1.6327431
resp2_pred:trtc  0.565049279 0.6418409 -0.7762201 1.7874330

Notice that b_resp2_pred:trtd and b_resp2_pred:trte are missing from the coefficients. But the it's treatment trtc and trte that have missing observations. I think the parameter renaming step is filling the treatment coefficients alphabetically, but running out.

> xtabs(~is.na(resp2) + trt, data = dat)
            trt
is.na(resp2) a b c d e
       FALSE 4 8 5 6 7
       TRUE  3 2 0 5 0

Additional clues

The expected behavior (= all coefficients in the summary) shows up when:

Reducing to a univariate model with the same data, formula, etc.

mod2 <- brm(resp2 | subset(s2) ~ pred * trt,
            prior = prior(normal(0, 1), class = "Intercept")+
              prior(normal(0, 1), class = "b")+
              prior(student_t(3, 0, 1), class = "sigma"),
            data = dat)

All categories have >=1 observation in the subset used for the affected response. To see this, re-simulate the data after set.seed(7430) and re-fit the multivariate model.

Your work on this package has made so much more possible in my research. Thank you!

The text was updated successfully, but these errors were encountered:

wpetry · 2022-05-13T18:10:37Z

Possibly relates to #1346

paul-buerkner · 2022-05-16T05:44:44Z

Thanks! Will check it out.

paul-buerkner · 2022-05-25T12:42:07Z

Should now be fixed on github.

paul-buerkner added the bug label May 16, 2022

paul-buerkner added this to the brms 2.17.0++ milestone May 16, 2022

paul-buerkner added a commit that referenced this issue May 25, 2022

fix issue #1350

65bfcdc

paul-buerkner closed this as completed May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dropped & misnamed coefficients in multivariate model with subsets & missing observations #1350

dropped & misnamed coefficients in multivariate model with subsets & missing observations #1350

wpetry commented May 11, 2022

wpetry commented May 13, 2022

paul-buerkner commented May 16, 2022

paul-buerkner commented May 25, 2022 •

edited

Loading

dropped & misnamed coefficients in multivariate model with subsets & missing observations #1350

dropped & misnamed coefficients in multivariate model with subsets & missing observations #1350

Comments

wpetry commented May 11, 2022

Problem

Reproducible example

Additional clues

wpetry commented May 13, 2022

paul-buerkner commented May 16, 2022

paul-buerkner commented May 25, 2022 • edited Loading

paul-buerkner commented May 25, 2022 •

edited

Loading