-
-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plot_predictions with random effects #735
Comments
Hey @jt-lab thanks a lot for raising the issue and sharing the code / dataset! At a quick glance, this is because in the random effects model, bmb.interpret.predictions(
model=model,
idata=idata,
covariates=["factor3", "factor2", "factor1"],
)
Thus, your first plot above seems to be comparing I have a couple more comments about this, but don't have the time this morning. I will communicate here in the next day. Thanks! |
@GStechschulte, thanks a lot! That makes sense. So perhaps we should predict a single out-of-sample individual in this situation. A quick update: the hdi bar issue described above was unrelated to this... Looking forward to your further comments! Many thanks for the support! |
Of course, anytime! I appreciate getting the feedback. Regarding
I am looking into this. Thanks for also pointing this out.
However, library(brms)
library(marginaleffects)
dat <- read.csv("simulated_data_order_prob2.csv")
dat$factor3 = as.factor(dat$factor3)
dat$factor2 = as.factor(dat$factor2)
dat$factor1 = as.factor(dat$factor1)
formula <- bf(correct | trials(count) ~ 0 + factor3:factor2 + factor1 + (0 + factor3:factor2 | individual))
model <- brm(formula, data = dat, family = binomial)
# unit-level predictions averaged over individuals
plot_predictions(
model,
by=c("factor3", "factor2", "factor1")
) The plot shows the marginal estimates as it was averaged over all individuals. @tomicapretto what do you think? All the pieces are there in |
Since the majority of the functions are there. Here's a working demo in Bambi now: model = bmb.Model(
"p(correct, count) ~ 0 + factor3:factor2 + factor1 + (0 + factor3:factor2 | individual)",
data,
family="binomial",
categorical=["factor3", "individual", "factor1", "factor2"],
priors=priors,
noncentered=False
)
idata = model.fit(tune=2000, draws=2000, random_seed=123, init='adapt_diag',
target_accept=0.9, idata_kwargs={'log_likelihood':True})
bmb.interpret.plot_predictions(
model=model,
idata=idata,
average_by=["factor3", "factor2", "factor1"],
fig_kwargs={"figsize": (12, 4), "sharey": True},
) Side note: I know the {marginaleffects} plot and the Bambi plot aren't the same. I would take the {marginaleffects} plot above with caution as I quickly did that and there were divergences, etc. It was used as an example implementation. |
Thank you so much, @GStechschulte! I just came here to thank you for your previous post with the explanations and examples! But this is of course even greater! By the way, what do you think of these ideas:
If you like I could try implementing these! Many thanks again |
@jt-lab thank you! 😄
Right. Not passing any variables into
I have came to realise that unless the user really studies the docs, it is difficult to understand what is all being created and computed. Thus, I do like the idea to be more transparent (optionally). @tomicapretto do you have any thoughts?
I had not thought of this until I saw you do it. At the moment, I would like to limit the amount of plotting code we introduce (Matplotlib is not the most fun to develop and it is difficult to write tests for the content in the plots). Unless more users ask for this, I think I personally won't pursue it. Nonetheless, I liked your solution with seaborn 😄 Thanks for the ideas! 👍🏼 |
We just wanted to try the average_by solution but there is no argument average_by in plot_predictions. Also I don't see it in the docs or code on github. Even in your fork it's not there. So maybe I misunderstood that this was an already existing workaround? Or is there some secret branch it is on? :-D
I see, makes sense.
Yes that works okayish. We had some order-related trouble again as seaborn creates the order depending on the order in the data (if not specified otherwise) and plot_prediction uses a different order. So one has to watch out for that. |
Just seeing the issue. Thanks for proactively writing the code and opening the PR :)
I like the idea too! I only want to make 2 points
I think one possible approach is to have a configuration instance that comes with a default option and users can do something like
@jt-lab I agree with @GStechschulte's response here. It's already a lot to maintain the existing functionality. Anything related to observed data should be deferred to the user. @GStechschulte let's continue any related discussion in the PR where you implement the changes :) |
@GStechschulte, we have further played around with plot_predictions and came across some behavior we don't understand . It might be an issue with handling random effects, hence I describe it here:
For a model with random effects (p(correct, count) ~ 0 + factor3:factor2 + factor1 + (0 + factor3:factor2 | individual)), the predictions seem to be off compared to the data points (e.g. see Factor1=A, Factor2=orange, Factor3=1 but also others):
Also, in factor C, the hdi bars get a bit smaller. It's barely visible here, but in another data set (which I cannot share), they are about 3 times smaller than those of levels A and B, apparently without any reason related to the data.
The same model but without the random effects produces predictions which are pretty close to the empirical means:
Many thanks in advance!
Code to reproduce this and the dataset:
Data Set:
simulated_data_order_prob2.csv
The text was updated successfully, but these errors were encountered: