-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent behavior for violin plot for zero variance vs near-zero variance #27
Comments
This ultimately comes from
It may be better to try and fix it there. I don't see any obvious "nice" solution, but I'll think about it some more. |
@deepayan Thanks for the info on density. A couple options: I am happy to implement one or the other or both if you like either suggestion.
library(lattice)
simple.data <- data.frame(
# Constant variance in group a
# Normal data in group b
values = c(rep(1, 5), rnorm(50, sd = 2)),
variable = c(rep("a", 5), rep("b", 50))
)
bwplot(values ~ variable,
data = simple.data,
ylim = c(-5, 10),
main = 'Which has the zero variance?',
panel = function(...) {
panel.violin(...)
})
# Not correct at all
# Error is caught silently and nonsense values returned
bwplot(values ~ variable,
data = simple.data,
ylim = c(-5, 10),
main = 'Nonsense values returned',
panel = function(...) {
panel.violin(..., cut = c(0, 1))
}) Created on 2023-02-15 with reprex v2.0.2 Both of these could be worth adding. Just doing 1) won't change the default but will allow user to supply specific parameters for each density plot. 2) I think would be a good default as it is unintuitive if a user plots violin plots for multiple categories and one has no variance. |
Yes, I mostly agree with your analysis. The blanket try() call is definitely not ideal, but given that on failure we just draw a line at x[1] suggests that it was intended to catch the 0 variance case. This sort of works for your original problem if we add The problem with So, definitely some version of your suggestion 1 should be implemented. Probably just change my.density <- function(x) {
if (sd(x) > 0)
do.call(stats::density, c(list(x = x), darg))
else
list(x = rep(x[1], 3), y = c(0, 1, 0))
} I will have to think a bit more about the second suggestion, but please do send a patch if you come up with one easily. I am worried about situations where different panels might have a different set of non-empty levels for the grouping variable. A similar problem will happen with |
@stefaneng I see that you have already submitted a PR for 2. Thanks, will take a look tomorrow. |
Fixed by commit 130b7cd |
Violin plots have completely different behavior if variance is zero versus if there is near-zero variance. In the zero case, the kernel is extended way beyond the value. In the near zero case, the violin plot is constrained to an extremely small region which appears as a line. It seems like the desired behavior should be zero variance should be a line.
Created on 2023-01-23 with reprex v2.0.2
The text was updated successfully, but these errors were encountered: