You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems like with large, skewed datasets the density estimate for a point can be exactly zero. This doesn't make sense to me, since all the points should represent some data. It also presents a technical issue if I, say, wanted to log-transform the color scale.
library(ggplot2)
library(ggpointdensity)
df<-data.frame(x= c(rep(0, 100000), rnorm(100000)),
y= c(rep(0, 100000), rnorm(100000)))
p<- ggplot(df, aes(x=x, y=y)) +
geom_pointdensity()
p#> geom_pointdensity using method='kde2d' due to large number of points (>20k)
p+ scale_color_continuous(trans="log10")
#> geom_pointdensity using method='kde2d' due to large number of points (>20k)#> Warning: Transformation introduced infinite values in discrete y-axis
This is maybe related to the default bandwidth estimator used by MASS::k2de(). If I supply my own values of h using a different bandwidth estimator (e.g. bw.nrd0()) I don't have this issue or the issue with bandwith == 0 (#21). Even the documentation says that bw.nrd() "has remained the default for historical and compatibility reasons, rather than as a general recommendation". Perhaps it would be better for stat_pointdensity() to calculate its own bandwidth rather than relying on the defaults for k2de()
It seems like with large, skewed datasets the density estimate for a point can be exactly zero. This doesn't make sense to me, since all the points should represent some data. It also presents a technical issue if I, say, wanted to log-transform the color scale.
Created on 2024-02-08 with reprex v2.0.2
Session info
The text was updated successfully, but these errors were encountered: