Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colors not showing correctly when specifiying "show_data = T" in the plot method of ggpredict object #404

Open
lukasla opened this issue Nov 20, 2023 · 1 comment
Labels
bug 🐛 Something isn't working

Comments

@lukasla
Copy link

lukasla commented Nov 20, 2023

Hi,

I encountered a strange bug when plotting data with 4 levels (probably also happens with more). The second and/or third color in the vector of specified colors is set to some other arbritary color when "show_data = T", when "show_data = F" everything is as expected.

Here is an example (which statistically doesn't make sense, just to show what I mean):

library(ggeffects)
library(splines)
data(efc)

fit <- lm(barthtot ~ c12hour  * c161sex + e42dep, data = efc)

pred<-ggpredict(fit, terms = c("c161sex","c12hour [4,35,77,168]"))
plot(pred,show_data = T, color=c("purple","green","blue","red")) # here the colors do not match the input colors - for second and third color

plot(pred,show_data = F, color=c("purple","green","blue","red")) # here the colors are as expected 

Not a major issue just something that I spent some time with before realizing its actually a bug not something I did ;-)

Thanks for providing such a great tool!
Lukas

@strengejacke strengejacke added the bug 🐛 Something isn't working label Nov 20, 2023
@strengejacke
Copy link
Owner

This one is tricky, indeed. If the 2nd variable in terms is continuous, you may have many more values in the data than shown in the "grouped" predictions. In your example, you see predicted values for the values 4, 35, 77 and 168 of c12hour. However, the raw data for c12hour contains much more different values, and thus, the dots receive a gradient color, depending on how "close" the dots (i.e. the data values) are to the requested values (4, 35, 77 and 168). Thus, your provided color scale is passed to ggplot2::scale_color_gradient(), and therefore, the colors look different from their original color codes. If you don't show data points, there's no need for gradient color scale, and thus, colors are perfectly matching. Same when you have categorical variables as 2nd term. Since all categories are present in the data, colors will perfectly match.

library(ggeffects)
data(efc)
efc <- datawizard::to_factor(efc, c("e42dep", "c161sex"))
fit <- lm(barthtot ~ c161sex * e42dep, data = efc)
pred <- ggpredict(fit, terms = c("c161sex", "e42dep"))

plot(pred, color = c("purple", "green", "blue", "red"))

plot(pred, show_data = TRUE, jitter = TRUE, color = c("purple", "green", "blue", "red"))

Created on 2023-11-21 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants