forked from mattflor/chorddiag
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathchorddiagram-vignette.Rmd
148 lines (106 loc) · 6.83 KB
/
chorddiagram-vignette.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
---
title: "Creating D3 Chord Diagrams"
author: "Matthias Flor"
date: "`r Sys.Date()`"
output:
rmarkdown::html_vignette:
toc: yes
fig_caption: yes
vignette: >
%\VignetteIndexEntry{Creating D3 Chord Diagrams}
%\VignetteKeyword{Chord Diagram}
%\VignetteKeyword{D3}
%\VignetteKeyword{HTML Widget}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
# Introduction
The `chorddiag` package allows to create interactive chord diagrams using the JavaScript visualization library D3 (http://d3js.org) from within R via the `htmlwidgets` interfacing framework.
In short, chord diagrams show directed relationships among a group of entities.
The chord diagram layout is explained by Mike Bostock, the creator of D3, in more detail here: https://github.com/mbostock/d3/wiki/Chord-Layout.
To quote the explanation found there:
> "Consider a hypothetical population of people with different hair colors: black, blonde, brown and red. Each person in this population has a preferred hair color for a dating partner; of the 29,630 (hypothetical) people with black hair, 40% (11,975) prefer partners with the same hair color. This preference is asymmetric: for example, only 10% of people with blonde hair prefer black hair, while 20% of people with black hair prefer blonde hair. A chord diagram visualizes these relationships by drawing quadratic Bézier curves between arcs. The source and target arcs represents two mirrored subsets of the total population, such as the number of people with black hair that prefer blonde hair, and the number of people with blonde hair that prefer black hair."
([Mike Bostock](https://github.com/mbostock/d3/wiki/Chord-Layout))
The package's JavaScript code is based on http://bl.ocks.org/mbostock/4062006, with modifications for fading behaviour and addition of tooltips.
# Installation
The package is available from github and can be installed with
```{r, eval = FALSE}
devtools::install_github("mattflor/chorddiag")
```
(you obviously need the `devtools` package for this).
After installation, the package is loaded via
```{r, eval = FALSE}
library(chorddiag)
```
# Examples
## Hair Color Preference
To create a chord diagram for the hair color preference example stated in the introduction, we need the preferences in matrix format:
```{r}
m <- matrix(c(11975, 5871, 8916, 2868,
1951, 10048, 2060, 6171,
8010, 16145, 8090, 8045,
1013, 990, 940, 6907),
byrow = TRUE,
nrow = 4, ncol = 4)
haircolors <- c("black", "blonde", "brown", "red")
dimnames(m) <- list(have = haircolors,
prefer = haircolors)
print(m)
```
```{r, echo = FALSE, results = 'asis'}
pander::pandoc.table(m, style = "rmarkdown",
caption = "Hair color preference data. Row names show what hair color people have, and column names show what hair color they prefer for a dating partner.")
```
Then, we can pass this matrix to the `chorddiag` function to create the chord diagram:
```{r, eval = FALSE, fig.width = 8}
chorddiag(m)
```
![Default chord diagram for the hair color preference dataset. Note that all images in this vignette are static. When generated by the `chorddiag` function, the diagrams will be interactive. This includes chords fading, tooltips, and resizing.](images/chorddiagram-directional-hair-default.png)
The chord diagram can be customized easily.
Here, we call the function with custom colors and provide some padding to avoid group names overlapping with tick labels:
```{r, eval = FALSE, fig.width = 8}
groupColors <- c("#000000", "#FFDD89", "#957244", "#F26223")
chorddiag(m, groupColors = groupColors, groupnamePadding = 20)
```
![Customized chord diagram for the hair color preference dataset, using custom colors and more padding between the diagram and group labels to avoid overlap with tick labels.](images/chorddiagram-directional-hair.png)
*Interactive* chord diagram refers to chord fading and tooltip popups on certain mouse over events.
E.g. if the mouse pointer hovers over the chord connecting the "blonde" and "red" groups, a tooltip is displayed giving the numbers for the chord, and all other chords fade away.
Or, when hovering over a group arc, all chords *not * belonging to that group fade away, and a tooltip displays summarized group information.
Fading levels can be set, and tooltip layout can be customized to some degree as well; for details, see the `chorddiag` function's documentation.
![Tooltip and chord fading showcase for the hair color preference chord diagram. In this case, we can see that a considerable fraction of blonde people prefer red-haired dating partners whereas only a small fraction of red-haired people prefer blonde partners.](images/chorddiagram-directional-hair-tooltip.png)
## Uber Rides
http://bost.ocks.org/mike/uberdata/
## Titanic Survival (Bipartite Chord Diagram)
The default chord diagram type is **directional**, allowing for visualization of asymmetric relationships.
But chord diagrams can also be a useful visualization of frequency distributions for two categories of groups, in other word contingency tables (or cross tabulations or crosstabs).
In this package, this type of chord diagram is called **bipartite** (because there are only chords *between* categories but not *within* categories).
Here is an example for the `Titanic` dataset.
First, we create a contingency table of how many passengers from the different classes and from the crew survived or died when the Titanic sunk.
```{r, fig.width = 8, warning = FALSE, message = FALSE}
library(dplyr)
titanic_tbl <- dplyr::tbl_df(Titanic)
titanic_tbl <- titanic_tbl %>%
mutate_each(funs(factor), Class:Survived)
by_class_survival <- titanic_tbl %>%
group_by(Class, Survived) %>%
summarize(Count = sum(n))
titanic.mat <- matrix(by_class_survival$Count, nrow = 4, ncol = 2)
dimnames(titanic.mat ) <- list(Class = levels(titanic_tbl$Class),
Survival = levels(titanic_tbl$Survived))
print(titanic.mat)
```
Note that we labeled the dimensions of the matrix by assigning a named list to `dimnames`.
The dimension labels (here: "Class" and "Survival") will automatically be used in the chord diagram.
```{r, echo = FALSE, results = 'asis'}
pander::pandoc.table(titanic.mat,
type = "rmarkdown",
caption = "Titanic data. Counts of people who died ('No') or survived ('Yes') grouped by membership to passenger class and crew.")
```
We can create a "bipartite" chord diagram for this matrix by setting `type = "bipartite"`.
```{r, eval = FALSE}
groupColors <- c("#2171b5", "#6baed6", "#bdd7e7", "#bababa", "#d7191c", "#1a9641")
chorddiag(titanic.mat, type = "bipartite",
groupColors = groupColors,
tickInterval = 50)
```
![A bipartite chord diagram visualizing survival grouped by class / crew for the `Titanic` data.](images/chorddiagram-bipartite-titanic.png)