-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Density geometry revamp #1157
base: master
Are you sure you want to change the base?
Changes from all commits
4b3c4f2
944fc29
ca168c3
6b2462e
d278888
978bc4a
084bb29
29022de
0f90c44
3e9617c
d3c77ac
55e4ff8
c6b2104
229b9fd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,191 @@ | ||
struct DensityGeometry <: Gadfly.GeometryElement | ||
stat::Gadfly.StatisticElement | ||
order::Int | ||
tag::Symbol | ||
end | ||
|
||
function DensityGeometry(; n=256, | ||
bandwidth=-Inf, | ||
adjust=1.0, | ||
kernel=Normal, | ||
trim=false, | ||
scale=:area, | ||
position=:dodge, | ||
orientation=:horizontal, | ||
order=1, | ||
tag=empty_tag) | ||
stat = Gadfly.Stat.DensityStatistic(n, bandwidth, adjust, kernel, trim, | ||
scale, position, orientation, false) | ||
DensityGeometry(stat, order, tag) | ||
end | ||
|
||
DensityGeometry(stat; order=1, tag=empty_tag) = DensityGeometry(stat, order, tag) | ||
|
||
""" | ||
Geom.density(; bandwidth, adjust, kernel, trim, scale, position, orientation, order) | ||
|
||
Draws a kernel density estimate. This is a cousin of [`Geom.histogram`](@ref) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it would be nice to document how Geom.density(Stat.identity) behaved. in other words, what aesthetics does Geom.density directly use. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, I'll add this. |
||
that is especially useful when the datapoints originate from a underlying smooth | ||
distribution. Unlike histograms, density estimates do not suffer from edge | ||
effects from incorrect bin choices. Some caveats do apply: | ||
|
||
1) Plot components do not necessarily correspond to the raw datapoints, but | ||
instead to the kernel density estimation of the underlying distribution | ||
2) Density estimation improves as a function of the number of data points and | ||
can be misleadingly smooth when the number of datapoints is small. | ||
3) Results can be sensitive to the choise of `kernel` and `bandwidth` | ||
|
||
For horizontal histograms (default), `Geom.density` draws the kernel density | ||
estimate of `x` optionally grouped by `color`. If the `orientation=:vertical` | ||
flag is passed to the function, then densities will be computed along `y`. The | ||
estimates are normalized by default to have areas equal to 1, but this can | ||
changed by passing `scale=:count` to scale by the raw number of datapoints or | ||
`scale=:peak` to scale by the max height of the estimate. Additionally, multiple | ||
densities can be stacked using the `position=:stack` flag or the conditional | ||
density estimate can be drawn using `position=:fill`. See | ||
[`Stat.DensityStatistic`](@ref Gadfly.Stat.DensityStatistic) for details on | ||
optional parameters that can control the `bandwidth`, `kernel`, etc used. | ||
|
||
External links | ||
|
||
* [Kernel Density Estimation on Wikipedia](https://en.wikipedia.org/wiki/Kernel_density_estimation) | ||
""" | ||
const density = DensityGeometry | ||
|
||
element_aesthetics(::DensityGeometry) = Symbol[] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. element_aesthetics should contain There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I don't leave this blank, they are filled with autogenerated values so it's impossible to give useful error messages using |
||
default_statistic(geom::DensityGeometry) = Gadfly.Stat.DensityStatistic(geom.stat) | ||
|
||
function render(geom::DensityGeometry, theme::Gadfly.Theme, aes::Gadfly.Aesthetics) | ||
Gadfly.assert_aesthetics_defined("Geom.density", aes, :x, :y) | ||
Gadfly.assert_aesthetics_equal_length("Geom.density", aes, :x, :y) | ||
|
||
grouped_data = Gadfly.groupby(aes, [:color], :y) | ||
densities = Array{NTuple{2, Float64}}[] | ||
colors = [] | ||
|
||
for (keys, belongs) in grouped_data | ||
xs = aes.x[belongs] | ||
ys = aes.y[belongs] | ||
|
||
push!(densities, [(x, y) for (x, y) in zip(xs, ys)]) | ||
push!(colors, keys[1] != nothing ? keys[1] : theme.default_color) | ||
end | ||
|
||
ctx = context(order=geom.order) | ||
# TODO: This should be user controllable | ||
if geom.stat.position == :dodge | ||
compose!(ctx, Compose.polygon(densities, geom.tag), stroke(colors), fill(nothing)) | ||
else | ||
compose!(ctx, Compose.polygon(densities, geom.tag), fill(colors)) | ||
end | ||
|
||
compose!(ctx, svgclass("geometry")) | ||
end | ||
|
||
struct ViolinGeometry <: Gadfly.GeometryElement | ||
stat::Gadfly.StatisticElement | ||
order::Int | ||
tag::Symbol | ||
end | ||
|
||
function ViolinGeometry(; n=256, | ||
bandwidth=-Inf, | ||
adjust=1.0, | ||
kernel=Normal, | ||
trim=true, | ||
scale=:area, | ||
orientation=:vertical, | ||
order=1, | ||
tag=empty_tag) | ||
stat = Gadfly.Stat.DensityStatistic(n, bandwidth, adjust, kernel, trim, | ||
scale, :dodge, orientation, true) | ||
ViolinGeometry(stat, order, tag) | ||
end | ||
|
||
""" | ||
Geom.violin[(; bandwidth, adjust, kernel, trim, order)] | ||
|
||
Draws a violin plot which is a combination of [`Geom.density`](@ref) and | ||
[`Geom.boxplot`](@ref). This plot type is useful for comparing differences in | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i don't see a call to Geom.boxplot in the Geom.violin code. is this docstring correct in this regard? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I meant it stylistically, not technically, but that could change (see #1157 (comment)) |
||
the distribution of quantitative data between categories, especially when the | ||
data is non-normally distributed. See [`Geom.density`](@ref) for some caveats. | ||
|
||
In the case of standard vertical violins, `Geom.violin` draws the density | ||
estimate of `y` optionally grouped categorically by `x` and colored | ||
with `color`. See [`Stat.DensityStatistic`](@ref Gadfly.Stat.DensityStatistic) | ||
for details on optional parameters that can control the `bandwidth`, `kernel`, | ||
etc used. | ||
|
||
```@example | ||
using RDatasets, Gadfly | ||
|
||
df = dataset("ggplot2", "diamonds") | ||
|
||
p = plot(df, x=:Cut, y=:Carat, color=:Cut, Geom.violin()) | ||
draw(SVG("diamonds_violin1.svg", 10cm, 8cm), p) # hide | ||
nothing # hide | ||
``` | ||
![](diamonds_violin1.svg) | ||
""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no other Geom docstrings currently have examples. i'd suggest moving this to the gallery. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this would be nice to change, but it depends on JuliaDocs/Documenter.jl#736 anyway so I'll remove this for now. |
||
const violin = ViolinGeometry | ||
|
||
element_aesthetics(::ViolinGeometry) = [] | ||
|
||
default_statistic(geom::ViolinGeometry) = Gadfly.Stat.DensityStatistic(geom.stat) | ||
|
||
function render(geom::ViolinGeometry, theme::Gadfly.Theme, aes::Gadfly.Aesthetics) | ||
|
||
Gadfly.assert_aesthetics_defined("Geom.violin", aes, :y, :width) | ||
Gadfly.assert_aesthetics_equal_length("Geom.violin", aes, :y, :width) | ||
|
||
output_dims, groupon = Gadfly.Stat._find_output_dims(geom.stat) | ||
grouped_data = Gadfly.groupby(aes, groupon, output_dims[2]) | ||
violins = Array{NTuple{2, Float64}}[] | ||
|
||
(aes.color == nothing) && (aes.color = fill(theme.default_color, length(aes.x))) | ||
colors = eltype(aes.color)[] | ||
color_opts = unique(aes.color) | ||
split = false | ||
# TODO: Add support for dodging violins (i.e. having more than two colors | ||
# per major category). Also splitting should not happen automatically, but | ||
# as a optional keyword to Geom.violin | ||
if length(keys(grouped_data)) > 2*length(unique(getfield(aes, output_dims[1]))) | ||
error("Violin plots do not currently support having more than 2 colors per $(output_dims[1]) category") | ||
elseif length(color_opts) == 2 | ||
split = true | ||
end | ||
|
||
for (keys, belongs) in grouped_data | ||
x, color = keys | ||
ys = getfield(aes, output_dims[2])[belongs] | ||
ws = aes.width[belongs] | ||
|
||
if split | ||
pos = findfirst(color_opts, color) | ||
if pos == 1 | ||
push!(violins, [(x - w/2, y) for (y, w) in zip(ys, ws)]) | ||
else | ||
push!(violins, reverse!([(x + w/2, y) for (y, w) in zip(ys, ws)])) | ||
end | ||
push!(colors, color) | ||
else | ||
push!(violins, vcat([(x - w/2, y) for (y, w) in zip(ys, ws)], | ||
reverse!([(x + w/2, y) for (y, w) in zip(ys, ws)]))) | ||
push!(colors, color != nothing ? color : theme.default_color) | ||
end | ||
end | ||
|
||
if geom.stat.orientation == :horizontal | ||
for violin in violins | ||
for i in 1:length(violin) | ||
violin[i] = reverse(violin[i]) | ||
end | ||
end | ||
end | ||
|
||
ctx = context(order=geom.order) | ||
compose!(ctx, Compose.polygon(violins, geom.tag), fill(colors)) | ||
|
||
compose!(ctx, svgclass("geometry")) | ||
|
||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this long line will likely create a horizontal slider in the generated doc html. a hard line break would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I had already corrected this locally.