-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop non-constant aesthetics more thoroughly #4917
Drop non-constant aesthetics more thoroughly #4917
Conversation
@thomasp85 A few more days without proper internet for me. Should be able to look this over carefully early next week. |
@clauswilke no problem at all, was just going through open PRs and this seemed to belong in your review bucket given the issue discussion — no stress |
…-partially-dropped-aes
@clauswilke will you have time for this one within the next couple of weeks? Sorry to keep pinging you — if you are floored by other work just let me know |
@thomasp85 I'll look it over today. |
Thanks. Honestly, I'm not sure if this is the best way to record the dropped columns. Any suggestion is welcome. |
@yutannihilation Yeah, that's why I've been stalling. I'm not entirely sure what to think. The process of making an environment, adding things to it, and then listing all the contents seems very awkward. Also difficult to reason about. Have you considered working with a simple vector of names and using set functions ( |
I think the idea of using an env as a hashmap is not so uncommon. I came up with several ideas including yours, and I just couldn't figure out which one is most efficient (and didn't have time to benchmark them). So, do you mean something like this? dropped_columns <- list()
stats <- mapply(function(new, old) {
...
dropped_columns <- append(dropped_columns, names(old)[!unique_idx])
...
}, stats, groups, SIMPLIFY = FALSE)
...
dropped <- Reduce(dropped_columns, union) |
Would this work?
Either way, much simpler to read than the code using environments. For code that is not performance critical, I'd always go with the code that is easier to read than the one that (potentially) is a little faster. |
Or alternatively:
|
Thanks, makes sense. I'll use the first one. Another one I attempted was returning a list rather than a data.frame (and I forgot to remove stats <- mapply(function(new, old) {
...
dropped <- union(dropped, names(old)[!unique_idx])
result <- vec_cbind(
new,
old[rep(1, nrow(new)), unique_idx, drop = FALSE]
)
list(
result = result,
dropped = dropped
)
}, stats, groups, SIMPLIFY = FALSE)
data_new <- vec_rbind(!!!lapply(stats, function(x) x[["result"]])
dropped <- vec_unique(vctrs_c(!!!lapply(stats, function(x) x[["dropped"]]))) |
@yutannihilation Thanks! I have two more questions:
|
It's defined here. After the vctrs integration (#4868), most of the code uses Line 355 in c86ed7b
Sure. I'll add tests. |
Added tests.
What do you mean by "what happens" here? All get dropped. Do you expect anything more? Sorry, I'm not sure if I understand the intention of this question. |
@yutannihilation I was just trying to understand how the code works. There's something that I'm still missing. You're creating separate data frames for each group, and then you're row-binding them all together. That's fine, but what happens when the data frames for different groups have different columns. Shouldn't row-binding fail in this case? But now looking it up, it seems like |
R/stat-.r
Outdated
if (length(dropped) > 0) { | ||
cli::cli_warn(c( | ||
"The following aesthetics were dropped during statistical transformation: {.field {glue_collapse(dropped, sep = ', ')}}", | ||
"i" = "This can happen when ggplot fails to infer the correct grouping structure in the data.", | ||
"i" = "Did you forget to specify a {.code group} aesthetic or to convert a numerical variable into a factor?" | ||
)) | ||
} | ||
data_new | ||
data_new[, !names(data_new) %in% dropped, drop = FALSE] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the logic here is quite right. Shouldn't line 155 come before line 147? We want to drop all columns in dropped
, not just the ones that are not in self$dropped_aes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking this again, it seems you are right. Thanks for catching.
Yes, that's why we see this kind of incomplete fill/colour aesthetic. This pull request doesn't change anything here. The code inside |
I added many comments and rewrote the code a bit. I hope this will make it a bit easier to understand the logic... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks great now!
Thanks so much for reviewing this complicated pull request! I hope I can do better next time...! |
I think you did great. Just because there's room for improvement doesn't mean the first iteration was bad. |
Fix #4394
ggplot2 drops an aesthetic when the values are not constant within the group. Currently, this happens per group. Even when the dropping happens, if any of the groups still have the aesthetics, the result keeps the column because
vec_rbind()
kindly fills the mismatch withNA
.ggplot2/R/stat-.r
Line 133 in 10eba36
This creates incomplete result like below. As discussed on #4394, this should drop the aesthetic instead of just showing a warning (which I tried in #4916).
Created on 2022-07-27 by the reprex package (v2.0.1)
This pull request inlines
uniquecols()
to catch the dropped columns and record it to an empty env so that we can refer to it. I feel this is better rather than inferring from the results what columns are dropped, but I'm open to any better idea.