-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
158 lines (111 loc) · 4.65 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
title: "Weighted data survey tables in R"
author: "John Johnson"
date: "3/9/2020"
output:
github_document:
toc: true
toc_depth: 2
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-"
)
```
`pollster` is an R package for making topline and crosstab tables of simple weighted survey data. The package is designed for use with labelled data, like what you might use the `haven` package to import from Stata or SPSS. It follows tidyverse programming conventions, and output tables are also in the form of a tidy data frame, or tibble.
Only simple weights are currently supported. For complex survey designs, we recommend the excellent [`survey` package](http://r-survey.r-forge.r-project.org/survey/).
The core functions are:
* `topline()`
* `crosstab()`
* `crosstab_3way()`
Each of these functions also has a twin version which includes a column for the margin of error calculated to include the design effect of the weights.
* `moe_topline()`
* `moe_crosstab()`
* `moe_crosstab_3way()`
There are also two special functions which calculate the design effect component of the margin of error for each survey wave independently.
* `moe_wave_crosstab()`
* `moe_wave_crosstab_3way()`
Other functions are included to calculate simple weighted summary statistics.
* `wtd_mean()` is a tidy-compliant wrapper around `stats::weighted.mean()`
* `summary_table()` returns a tible with summary statistics similar to the Stata command `sum`
## Installation
Install it this way.
```
install.packages("pollster")
```
Or get the development version.
```
remotes::install_github("jdjohn215/pollster")
```
## Basic usage
`pollster` includes a dataset of Illinois responses to the Current Population Survey's voter registration supplement.
```{r}
library(pollster)
head(illinois)
```
Make a topline table like this. The output is a tibble.
```{r}
topline(df = illinois, variable = maritalstatus, weight = weight)
```
Make a crosstab like this.
```{r}
crosstab(df = illinois, x = educ6, y = maritalstatus, weight = weight)
```
If you prefer, you can also get the output in long format.
```{r}
crosstab(df = illinois, x = educ6, y = maritalstatus, weight = weight, format = "long")
```
A three-way crosstab is just a normal crosstab with a third control variable. Often, this third variable is time.
```{r}
crosstab_3way(df = illinois, x = educ6, y = maritalstatus, z = year, weight = weight)
```
## Making tables and graphs
Wide format is best for displaying table output. Long format is best for making graphs. `pollster` outputs dovetail seamlessly with [`knitr::kable()`](https://www.rdocumentation.org/packages/knitr/versions/1.28/topics/kable) and [`ggplot2::ggplot()`](https://ggplot2.tidyverse.org/). These examples show very basic html table output, but you can customize the appearance of your tables almost endlessly in either html or pdf formats using Hao Zhu's excellent [`kableExtra` package](https://haozhu233.github.io/kableExtra/).
```{r}
library(dplyr)
crosstab(df = illinois, x = sex, y = educ6, weight = weight) %>%
knitr::kable(digits = 0)
```
```{r}
library(ggplot2)
crosstab(df = illinois, x = sex, y = educ6, weight = weight, format = "long") %>%
ggplot(aes(educ6, pct, fill = sex)) +
geom_bar(stat = "identity", position = "dodge")
```
Three-way crosstabs are ideal for plotting time series graphs and/or faceted plots.
```{r}
crosstab_3way(df = illinois, x = sex, y = educ6, z = year, weight = weight, format = "long") %>%
ggplot(aes(year, pct, col = sex)) +
geom_line() +
facet_wrap(facets = vars(educ6))
```
## Margin of error
Each `pollster` function comes with a twin function which includes a margin of error column. For example:
```{r}
moe_topline(df = illinois, variable = voter, weight = weight)
```
By default, `moe_crosstab` output comes in long format, but you can also specify wide format.
```{r}
moe_crosstab(df = illinois, x = raceethnic, y = voter, weight = weight, format = "wide")
```
```{r}
moe_crosstab(df = illinois, x = raceethnic, y = voter, weight = weight) %>%
ggplot(aes(x = pct, y = raceethnic, xmin = (pct - moe), xmax = (pct + moe), color = voter)) +
geom_pointrange(position = position_dodge(width = 0.2))
```
## Summary table
`summary_table()` creates a simple summary table of a weighted numeric variable.
```{r}
summary_table(df = illinois, variable = age, weight = weight)
```
You can choose `name_style = "pretty"` if you want column headings appropriate for a formatted table.
```{r}
summary_table(df = illinois, variable = age,
weight = weight, name_style = "pretty") %>%
knitr::kable()
```