-
Notifications
You must be signed in to change notification settings - Fork 124
/
Copy pathcreating-other.Rmd
167 lines (123 loc) · 9.53 KB
/
creating-other.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
# 2D frequencies {#frequencies-2D}
## Rectangular binning in plotly.js
\index{add\_trace()@\texttt{add\_trace()}!add\_heatmap()@\texttt{add\_heatmap()}}
\index{add\_trace()@\texttt{add\_trace()}!add\_histogram2d()@\texttt{add\_histogram2d()}}
\index{add\_trace()@\texttt{add\_trace()}!add\_histogram2dcontour()@\texttt{add\_histogram2dcontour()}}
\index{colorbar()@\texttt{colorbar()}!title@\texttt{title}}
The **plotly** package provides two functions for displaying rectangular bins: `add_heatmap()` and `add_histogram2d()`. For numeric data, the `add_heatmap()` function is a 2D analog of `add_bars()` (bins must be pre-computed), and the `add_histogram2d()` function is a 2D analog of `add_histogram()` (bins can be computed in the browser). Thus, I recommend `add_histogram2d()` for exploratory purposes, since you don't have to think about how to perform binning. It also provides a useful [`zsmooth`](https://plot.ly/r/reference/#histogram2d-zsmooth) attribute for effectively increasing the number of bins (currently, "best" performs a [bi-linear interpolation](https://en.wikipedia.org/wiki/Bilinear_interpolation), a type of nearest neighbors algorithm), and [`nbinsx`](https://plot.ly/r/reference/#histogram2d-nbinsx)/[`nbinsy`](https://plot.ly/r/reference/#histogram2d-nbinsy) attributes to set the number of bins in the x and/or y directions. Figure \@ref(fig:histogram2d) compares three different uses of `add_histogram()`: (1) plotly.js's default binning algorithm, (2) the default plus smoothing, (3) setting the number of bins in the x and y directions. It is also worth noting that filled contours, instead of bins, can be used in any of these cases by using `add_histogram2dcontour()` instead of `add_histogram2d()`.
```r
p <- plot_ly(diamonds, x = ~log(carat), y = ~log(price))
subplot(
add_histogram2d(p) %>%
colorbar(title = "default") %>%
layout(xaxis = list(title = "default")),
add_histogram2d(p, zsmooth = "best") %>%
colorbar(title = "zsmooth") %>%
layout(xaxis = list(title = "zsmooth")),
add_histogram2d(p, nbinsx = 60, nbinsy = 60) %>%
colorbar(title = "nbins") %>%
layout(xaxis = list(title = "nbins")),
shareY = TRUE, titleX = TRUE
)
```
```{r histogram2d, echo = FALSE, fig.cap = "(ref:histogram2d)", out.extra = if (knitr::is_html_output()) 'data-url="/interactives/histogram2d.html"'}
knitr::include_graphics("images/histogram2d.svg")
```
## Rectangular binning in R {#rectangular-binning-in-r}
In Chapter \@ref(bars-histograms), we leveraged a number of algorithms in R for computing the "optimal" number of bins for a histogram, via `hist()`, and routing those results to `add_bars()`. There is a surprising lack of research and computational tools for the 2D analog, and among the research that does exist, solutions usually depend on characteristics of the unknown underlying distribution, so the typical approach is to assume a Gaussian form [@mde]. Practically speaking, that assumption is not very useful, but 2D kernel density estimation provides a useful alternative that tends to be more robust to changes in distributional form. Although kernel density estimation requires choice of kernel and a bandwidth parameter, the `kde2d()` function from the **MASS** package provides a well-supported rule-of-thumb for estimating the bandwidth of a Gaussian kernel density [@MASS]. Figure \@ref(fig:heatmap-corr-diamonds) uses `kde2d()` to estimate a 2D density, scales the relative frequency to an absolute frequency, then uses the `add_heatmap()` function to display the results as a heatmap.
\index{Kernel density estimation!MASS::kde2d()@\texttt{MASS::kde2d()}}
```r
kde_count <- function(x, y, ...) {
kde <- MASS::kde2d(x, y, ...)
df <- with(kde, setNames(expand.grid(x, y), c("x", "y")))
# The 'z' returned by kde2d() is a proportion,
# but we can scale it to a count
df$count <- with(kde, c(z) * length(x) * diff(x)[1] * diff(y)[1])
data.frame(df)
}
kd <- with(diamonds, kde_count(log(carat), log(price), n = 30))
plot_ly(kd, x = ~x, y = ~y, z = ~count) %>%
add_heatmap() %>%
colorbar(title = "Number of diamonds")
```
```{r heatmap-corr-diamonds, echo = FALSE, fig.cap = "(ref:heatmap-corr-diamonds)", out.extra = if (knitr::is_html_output()) 'data-url="/interactives/heatmap-corr-diamonds.html"'}
knitr::include_graphics("images/heatmap-corr-diamonds.svg")
```
## Categorical axes
The functions `add_histogram2d()`, `add_histogram2dcontour()`, and `add_heatmap()` all support categorical axes. Thus, `add_histogram2d()` _can_ be used to easily display 2-way contingency tables, but since it is easier to compare values along a common scale rather than compare colors [@graphical-perception], I recommend creating [grouped bar charts](#multiple-discrete-distributions) instead. The `add_heatmap()` function can still be useful for categorical axes, however, as it allows us to display whatever quantity we want along the z axis (color).
\index{colorbar()@\texttt{colorbar()}!limits@\texttt{limits}}
Figure \@ref(fig:correlation) uses `add_heatmap()` to display a correlation matrix. Notice how the `limits` arguments in the `colorbar()` function can be used to expand the limits of the color scale to reflect the range of possible correlations (something that is not easily done in plotly.js).
```r
corr <- cor(dplyr::select_if(diamonds, is.numeric))
plot_ly(colors = "RdBu") %>%
add_heatmap(x = rownames(corr), y = colnames(corr), z = corr) %>%
colorbar(limits = c(-1, 1))
```
```{r correlation, echo = FALSE, fig.cap = "(ref:correlation)", out.extra = if (knitr::is_html_output()) 'data-url="/interactives/correlation.html"'}
knitr::include_graphics("images/correlation.svg")
```
# 3D charts
## Markers
As it turns out, by simply adding a `z` attribute `plot_ly()` automatically renders markers, lines, and paths in three dimensions. That means, all the techniques we learned in Sections \@ref(markers) and \@ref(lines) can be re-used for 3D charts:
```r
plot_ly(mpg, x = ~cty, y = ~hwy, z = ~cyl) %>%
add_markers(color = ~cyl)
```
```{r 3D-scatterplot, echo = FALSE, fig.cap = "(ref:3D-scatterplot)", out.extra = if (knitr::is_html_output()) 'data-url="/interactives/3D-scatterplot.html"'}
knitr::include_graphics("images/3D-scatterplot.svg")
```
## Paths
To make a path in 3D, use `add_paths()` in the same way you would for a 2D path, but add a third variable `z`, as Figure \@ref(fig:3D-paths) does.
```r
plot_ly(mpg, x = ~cty, y = ~hwy, z = ~cyl) %>%
add_paths(color = ~displ)
```
```{r 3D-paths, echo = FALSE, fig.cap = "(ref:3D-paths)", out.extra = if (knitr::is_html_output()) 'data-url="/interactives/3D-paths.html"'}
knitr::include_graphics("images/3D-paths.png")
```
## Lines
Figure \@ref(fig:3D-lines) uses `add_lines()` instead of `add_paths()` to ensure the points are connected by the x axis instead of the row ordering.
```r
plot_ly(mpg, x = ~cty, y = ~hwy, z = ~cyl) %>%
add_lines(color = ~displ)
```
```{r 3D-lines, echo = FALSE, fig.cap = "(ref:3D-lines)", out.extra = if (knitr::is_html_output()) 'data-url="/interactives/3D-lines.html"'}
knitr::include_graphics("images/3D-lines.png")
```
As with non-3D lines, you can make multiple lines by specifying a grouping variable.
```r
plot_ly(mpg, x = ~cty, y = ~hwy, z = ~cyl) %>%
group_by(cyl) %>%
add_lines(color = ~displ)
```
```{r 3D-lines-groups, echo = FALSE, fig.cap = "(ref:3D-lines-groups)", out.extra = if (knitr::is_html_output()) 'data-url="/interactives/3D-lines-groups.html"'}
knitr::include_graphics("images/3D-lines-groups.png")
```
## Axes
\index{layout()@\texttt{layout()}!3D Axes}
For 3D plots, be aware that the axis objects are a part of the [`scene`](https://plot.ly/r/reference/#layout-scene) definition, which is part of the `layout()`. That is, if you wanted to set axis titles (e.g., Figure \@ref(fig:3D-axes)), or something else specific to the axis definition, the relation between axes (i.e., [`aspectratio`](https://plot.ly/r/reference/#layout-scene-aspectratio)), or the default setting of the camera (i.e., [`camera`](https://plot.ly/r/reference/#layout-scene-camera)); you would do so via the `scence`.
```r
plot_ly(mpg, x = ~cty, y = ~hwy, z = ~cyl) %>%
add_lines(color = ~displ) %>%
layout(
scene = list(
xaxis = list(title = "MPG city"),
yaxis = list(title = "MPG highway"),
zaxis = list(title = "Number of cylinders")
)
)
```
```{r 3D-axes, echo = FALSE, fig.cap = "(ref:3D-axes)", out.extra = if (knitr::is_html_output()) 'data-url="/interactives/3D-axes.html"'}
knitr::include_graphics("images/3D-axes.png")
```
## Surfaces
\index{add\_trace()@\texttt{add\_trace()}!add\_surface()@\texttt{add\_surface()}}
Creating 3D surfaces with `add_surface()` is a lot like creating heatmaps with `add_heatmap()`. In fact, you can even create 3D surfaces over categorical x/y (try changing `add_heatmap()` to `add_surface()` in Figure \@ref(fig:correlation))! That being said, there should be a sensible ordering to the x/y axes in a surface plot since plotly.js interpolates z values. Usually the 3D surface is over a continuous region, as is done in Figure \@ref(fig:surface) to display the height of a volcano. If a numeric matrix is provided to z as in Figure \@ref(fig:surface), the x and y attributes do not have to be provided, but if they are, the length of x should match the number of columns in the matrix and y should match the number of rows.
```r
x <- seq_len(nrow(volcano)) + 100
y <- seq_len(ncol(volcano)) + 500
plot_ly() %>% add_surface(x = ~x, y = ~y, z = ~volcano)
```
```{r surface, echo = FALSE, fig.cap = "(ref:surface)", out.extra = if (knitr::is_html_output()) 'data-url="/interactives/surface.html"'}
knitr::include_graphics("images/surface.png")
```