diff --git a/docs/api/covidcast-signals/fb-survey.md b/docs/api/covidcast-signals/fb-survey.md
index 7c6323321..28b0ec55f 100644
--- a/docs/api/covidcast-signals/fb-survey.md
+++ b/docs/api/covidcast-signals/fb-survey.md
@@ -1,10 +1,10 @@
---
-title: Symptom Surveys
+title: COVID-19 Trends and Impact Survey
parent: Data Sources and Signals
grand_parent: COVIDcast Epidata API
---
-# Symptom Surveys
+# COVID-19 Trends and Impact Survey
{: .no_toc}
* **Source name:** `fb-survey`
@@ -17,9 +17,10 @@ grand_parent: COVIDcast Epidata API
## Overview
-This data source is based on symptom surveys run by the Delphi group at Carnegie
-Mellon. Facebook directs a random sample of its users to these surveys, which
-are voluntary. Users age 18 or older are eligible to complete the surveys, and
+This data source is based on the [COVID-19 Trends and Impact Survey
+(CTIS)](../../symptom-survey/) run by the Delphi group at Carnegie Mellon.
+Facebook directs a random sample of its users to these surveys, which are
+voluntary. Users age 18 or older are eligible to complete the surveys, and
their survey responses are held by CMU and are sharable with other health
researchers under a data use agreement. No individual survey responses are
shared back to Facebook. See our [surveys
@@ -575,7 +576,7 @@ $$
where $$\pi_i$$ is an estimated probability (produced by Facebook) that an
individual with the same state-by-age-gender profile as user $$i$$ would be a
-Facebook user and take our CMU survey. The adjustment we make follows a standard
+Facebook user and take our survey. The adjustment we make follows a standard
inverse probability weighting strategy (this being a special case of importance
sampling).
diff --git a/docs/symptom-survey/coding.md b/docs/symptom-survey/coding.md
index 278c6d931..d7fe82c17 100644
--- a/docs/symptom-survey/coding.md
+++ b/docs/symptom-survey/coding.md
@@ -1,15 +1,15 @@
---
title: Questions and Coding
-parent: COVID Symptom Survey
+parent: COVID-19 Trends and Impact Survey
nav_order: 6
---
# Questions and Coding
{: .no_toc}
-The symptom surveys have been deployed in several waves. We have tried to ensure
-the coding of waves is consistent. This page provides the full survey text and
-coding schemes.
+The COVID-19 Trends and Impacts Survey (CTIS) has been deployed in several waves.
+We have tried to ensure the coding of waves is consistent. This page provides
+the full survey text and coding schemes.
## Table of contents
@@ -467,7 +467,7 @@ new items were meant to capture reasons for vaccine hesitancy among respondents.
when you use responses from multiple waves of this survey, since they may
shift which occupations respondents choose.
* C14a is a revision of item C14, changed from "the past 5 days" to "the past
- 7 days" to be consistent with other items on the COVID Symptom Survey.
+ 7 days" to be consistent with other items on CTIS.
C14a replaces C14.
* C17a is a revision of item C17, which asked respondents if they have had a
flu vaccination since June 2020. C17a changed the date to July 1, 2020 and
diff --git a/docs/symptom-survey/collaboration-revision.md b/docs/symptom-survey/collaboration-revision.md
index 48aa1616f..e344f5ba9 100644
--- a/docs/symptom-survey/collaboration-revision.md
+++ b/docs/symptom-survey/collaboration-revision.md
@@ -1,16 +1,16 @@
---
title: Collaboration and Survey Revision
-parent: COVID Symptom Survey
+parent: COVID-19 Trends and Impact Survey
nav_order: 1
---
# Collaboration and Survey Revision
-Delphi continues to revise the COVID-19 Symptom Survey instruments in order to
-prioritize items that have the greatest utility for the response to the COVID-19
-pandemic. We conduct revisions in collaboration with data users, fellow
-researchers, and public health officials, to ensure the survey data best serves
-public health and research goals.
+Delphi continues to revise the COVID-19 Trends and Impact Survey (CTIS)
+instruments in order to prioritize items that have the greatest utility for the
+response to the COVID-19 pandemic. We conduct revisions in collaboration with
+data users, fellow researchers, and public health officials, to ensure the
+survey data best serves public health and research goals.
## Proposing Revisions
@@ -18,7 +18,7 @@ If there is a revision or question you would like us to consider, please fill
out [this form requesting details about your
proposal](https://forms.gle/q6NS8fPJJofKQ9mM8). This request can be submitted by
researchers regardless of whether they have a signed Data Use Agreement for the
-individual responses to the COVID Symptom Survey.
+individual responses to the COVID-19 Trends and Impact Survey.
## Collaboration Meetings
diff --git a/docs/symptom-survey/contingency-tables.md b/docs/symptom-survey/contingency-tables.md
index 7604506c6..b2b741ce6 100644
--- a/docs/symptom-survey/contingency-tables.md
+++ b/docs/symptom-survey/contingency-tables.md
@@ -1,6 +1,6 @@
---
title: Contingency Tables
-parent: COVID Symptom Survey
+parent: COVID-19 Trends and Impact Survey
nav_order: 4
---
@@ -8,7 +8,7 @@ nav_order: 4
{: .no_toc}
This documentation describes the fine-resolution contingency tables produced by
-grouping [COVID Symptom Survey](./index.md) individual responses by various
+grouping [COVID-19 Trends and Impact Survey (CTIS)](./index.md) individual responses by various
self-reported demographic features.
* [Weekly files](https://www.cmu.edu/delphi-web/surveys/weekly-rollup/)
@@ -119,6 +119,7 @@ Within a CSV, the first few columns store metadata of the aggregation:
| `ISO_3` | Three-letter ISO country code ("USA") |
| `GID_0` | GADM level 0 ID |
| `state` | State name; "Overall" if aggregation not grouped at the state level |
+| `GID_1` | GADM level 1 ID |
| `state_fips` | State FIPS code; `NA` if aggregation not grouped at the state level |
| `county` | County name; "Overall" if aggregation not grouped at the county level |
| `county_fips` | County FIPS code; `NA` if aggregation not grouped at the county level |
diff --git a/docs/symptom-survey/data-access.md b/docs/symptom-survey/data-access.md
index dd7cfacf2..a0fa103e4 100644
--- a/docs/symptom-survey/data-access.md
+++ b/docs/symptom-survey/data-access.md
@@ -1,16 +1,16 @@
---
title: Getting Data Access
-parent: COVID Symptom Survey
+parent: COVID-19 Trends and Impact Survey
nav_order: 0
---
# Getting Data Access
The Delphi Research Group at Carnegie Mellon University (CMU), in partnership
-with Facebook, has conducted the COVID Symptom Survey to better understand the
-spread of COVID-19 and its effects on public health and well-being. This may
-help improve our local and national responses to the pandemic and our
-understanding of how it has affected society.
+with Facebook, has conducted the COVID-19 Trends and Impact Survey (CTIS) to
+better understand the spread of COVID-19 and its effects on public health and
+well-being. This may help improve our local and national responses to the
+pandemic and our understanding of how it has affected society.
[High-level aggregates](../api/covidcast.md) of select survey items are
publicly available in the [COVIDcast API](../api/covidcast-signals/fb-survey.md).
@@ -25,9 +25,9 @@ Agreement (DUA). To request access to the data please submit the information
requested in [Facebook's page on obtaining data
access](https://dataforgood.fb.com/docs/covid-19-symptom-survey-request-for-data-access/),
which sets out the basic conditions and provides a form to request access. An
-[international version of the COVID Symptom Survey](https://covidmap.umd.edu/)
-is conducted by the University of Maryland (UMD) and access can be requested
-through the same form.
+[international version of CTIS](https://covidmap.umd.edu/) is conducted by the
+University of Maryland (UMD) and access can be requested through the same
+form.
The United States survey protocol has been reviewed by the Carnegie Mellon
University Institutional Review Board with IRB ID STUDY2020_00000162.
diff --git a/docs/symptom-survey/index.md b/docs/symptom-survey/index.md
index f8591c4c0..fb5961d05 100644
--- a/docs/symptom-survey/index.md
+++ b/docs/symptom-survey/index.md
@@ -1,12 +1,12 @@
---
-title: COVID Symptom Survey
+title: COVID-19 Trends and Impact Survey
has_children: true
nav_order: 2
---
-# COVID Symptom Survey
+# COVID-19 Trends and Impact Survey
-Since April 2020, Delphi has conducted a voluntary COVID-19 symptom survey,
+Since April 2020, Delphi has conducted a voluntary survey about COVID-19,
distributed daily to users in the United States via a partnership with Facebook.
This survey asks respondents about COVID-like symptoms, their behavior (such as
social distancing), mental health, and economic and health impacts they have
@@ -29,7 +29,7 @@ If you have questions about the survey or getting access to data, contact us at
## Credits
-The COVID Symptom Survey is a project of the [Delphi
+The COVID-19 Trends and Impact Survey (CTIS) is a project of the [Delphi
Group](https://delphi.cmu.edu/) at Carnegie Mellon University. The Principal
Investigator is [Alex Reinhart](https://www.refsmmat.com/); Wichada La
Motte-Kerr is Survey Coordinator. The survey protocol is reviewed by the
@@ -59,18 +59,30 @@ the survey in publications based on the data. Specifically, we ask that you:
2. Cite this web page for details about the survey. For example, you may cite it
as
- > Delphi Group (2021). COVID Symptom Survey.
+ > Delphi Group (2021). COVID-19 Trends and Impact Survey.
>
A journal article describing the survey and its methods is currently in
preparation, and we will update this page when it is available so that you
can cite it instead.
-3. Send a copy of your publication, once it appears publicly as a preprint or
- journal article, to .
-
-Additionally, please note that the data use agreement requires that if you
-disclose survey microdata, Delphi must agree on the aggregation method that you
-will use to ensure reported estimates do not disclose any individual
-identifiable information, including individual survey results. If you are unsure
-whether a particular aggregation will prevent disclosure of individual survey
-results, please email us at .
+3. The data use agreement requires that if you disclose survey microdata, Delphi
+ must agree on the aggregation method that you will use to ensure reported
+ estimates do not disclose any individual identifiable information, including
+ individual survey results. If you are unsure whether a particular aggregation
+ will prevent disclosure of individual survey results, please email us at
+ .
+4. Finally, send a copy of your publication, once it appears publicly as a
+ preprint or journal article, to .
+
+When referring to the survey in text, we prefer the following formats:
+
+* Long form (such as in an introduction or methods description): "The Delphi
+ Group at Carnegie Mellon University U.S. COVID-19 Trends and Impact Survey, in
+ partnership with Facebook".
+* Short form (used after the long form has been introduced): "The U.S. COVID-19
+ Trends and Impact Survey"
+* Acronym form: "Delphi US CTIS"
+
+Prior to July 2021, the survey was known as the COVID Symptom Survey (CSS), and
+some older documentation and publication may still refer to this name. We prefer
+that new publications and materials refer to the new name.
diff --git a/docs/symptom-survey/modules.md b/docs/symptom-survey/modules.md
index d362ae288..b80118f30 100644
--- a/docs/symptom-survey/modules.md
+++ b/docs/symptom-survey/modules.md
@@ -1,17 +1,17 @@
---
title: Survey Modules & Randomization
-parent: COVID Symptom Survey
+parent: COVID-19 Trends and Impact Survey
nav_order: 7
---
# Questions and Coding
{: .no_toc}
-To reduce the overall length of the instrument and minimize response burden, the
-COVID Symptom Survey will consist of a block of daily core questions and will
-use a randomized module approach for the other topics. Implementation of this
-approach started in [Wave 11](coding.md#wave-11), which launched on May 20,
-2021.
+To reduce the overall length of the instrument and minimize response burden,
+the COVID-19 Trends and Impact Survey (CTIS) will consist of a block of daily
+core questions and will use a randomized module approach for the other topics.
+Implementation of this approach started in [Wave 11](coding.md#wave-11), which
+launched on May 20, 2021.
Each respondent invited to take the survey will be asked the daily core
questions. The daily core questions for Wave 11 include:
diff --git a/docs/symptom-survey/problems.md b/docs/symptom-survey/problems.md
index 44f3e73dd..9313bd300 100644
--- a/docs/symptom-survey/problems.md
+++ b/docs/symptom-survey/problems.md
@@ -1,15 +1,15 @@
---
title: Problems and Data Errors
-parent: COVID Symptom Survey
+parent: COVID-19 Trends and Impact Survey
nav_order: 8
---
# Problems and Data Errors
{: .no_toc}
-Given the scale of the COVID Symptom Survey, we occasionally encounter data
-errors or survey implementation problems that affect the interpretation of
-results. All problems will be logged here.
+Given the scale of the COVID-19 Trends and Impact Survey (CTIS), we occasionally
+encounter data errors or survey implementation problems that affect the
+interpretation of results. All problems will be logged here.
## Table of contents
{: .no_toc .text-delta}
diff --git a/docs/symptom-survey/server-access.md b/docs/symptom-survey/server-access.md
index 720c27ca8..5795f0db7 100644
--- a/docs/symptom-survey/server-access.md
+++ b/docs/symptom-survey/server-access.md
@@ -1,15 +1,16 @@
---
title: SFTP Server Access
-parent: COVID Symptom Survey
+parent: COVID-19 Trends and Impact Survey
nav_order: 2
---
# SFTP Server Access
Researchers with data use agreements to access the raw data from the COVID-19
-symptom survey can access the data over SFTP. (If you do not have a data use
-agreement, see the [main survey page](index.md) for information about getting
-access and about aggregate data that is available for public download.)
+Trends and Impact Survey (CTIS) can access the data over SFTP. (If you do not
+have a data use agreement, see the [main survey page](index.md) for
+information about getting access and about aggregate data that is available
+for public download.)
If you're not familiar with SFTP, it is a protocol for securely accessing and downloading
large amounts of data from remote servers. The instructions below explain how to
diff --git a/docs/symptom-survey/survey-files.md b/docs/symptom-survey/survey-files.md
index 48ca31190..a38425a3a 100644
--- a/docs/symptom-survey/survey-files.md
+++ b/docs/symptom-survey/survey-files.md
@@ -1,16 +1,17 @@
---
title: Response Files
-parent: COVID Symptom Survey
+parent: COVID-19 Trends and Impact Survey (CTIS)
nav_order: 3
---
# Response Files
{: .no_toc}
-Users with access to the [COVID Symptom Survey](./index.md) individual response
-data should have received SFTP credentials for a private server where the data
-are stored. To connect to the server, see the [server access documentation](server-access.md).
-This documentation describes the survey data available on that server.
+Users with access to the [COVID-19 Trends and Impact Survey (CTIS)](./index.md)
+individual response data should have received SFTP credentials for a private
+server where the data are stored. To connect to the server, see the [server
+access documentation](server-access.md). This documentation describes the
+survey data available on that server.
You must sign a Data Use Agreement with Facebook and with CMU to gain
access to the individual survey responses. If you have not done so, aggregate
diff --git a/docs/symptom-survey/survey-utils.R b/docs/symptom-survey/survey-utils.R
index 0c1987220..b30c0ed3f 100644
--- a/docs/symptom-survey/survey-utils.R
+++ b/docs/symptom-survey/survey-utils.R
@@ -12,7 +12,7 @@ library(dplyr)
#' This function extracts the date from each file, determines which files
#' contain reissued data, and produces a single data frame representing the most
#' recent data available for each day. It can read gzip-compressed CSV files,
-#' such as those on the SFTP site, using `readr::read_csv`.
+#' such as those on the SFTP site, using `readr::read_csv()`.
#'
#' This function handles column types correctly for surveys up to Wave 4.
#'
@@ -38,57 +38,83 @@ get_survey_df <- function(directory, pattern = "*.csv.gz$") {
big_df <- map_dfr(
latest_files,
function(f) {
- # stop readr from thinking commas = thousand separators,
- # and from inferring column types incorrectly
- read_csv(file.path(directory, f), locale = locale(grouping_mark = ""),
+ # stop readr from thinking commas = thousand separators, and from
+ # inferring column types incorrectly
+ read_csv(file.path(directory, f),
+ locale = locale(grouping_mark = ""),
col_types = cols(
+ UserLanguage = col_character(),
+ StartDatetime = col_datetime(),
+ EndDatetime = col_datetime(),
+ weight = col_number(),
+ wave = col_integer(),
+ fips = col_character(),
+ A2 = col_number(),
+ A5_1 = col_number(),
+ A5_2 = col_number(),
+ A5_3 = col_number(),
A2b = col_number(),
A3 = col_character(),
A4 = col_number(),
- B2 = col_character(),
- B2_14_TEXT = col_character(),
- B2c = col_character(),
- B2c_14_TEXT = col_character(),
- B4 = col_number(),
- B5 = col_number(),
- B7 = col_character(),
- B10b = col_character(),
- B12a = col_character(),
- C1 = col_character(),
- C3 = col_number(),
- C4 = col_number(),
- C5 = col_number(),
- C7 = col_number(),
- C13 = col_character(),
- C13a = col_character(),
- D1_4_TEXT = col_character(),
- E3 = col_character(),
- fips = col_character(),
- UserLanguage = col_character(),
- StartDatetime = col_character(),
- EndDatetime = col_character(),
- Q65 = col_integer(),
- Q66 = col_integer(),
- Q67 = col_integer(),
- Q68 = col_integer(),
- Q69 = col_integer(),
- Q70 = col_integer(),
- Q71 = col_integer(),
- Q72 = col_integer(),
- Q73 = col_integer(),
- Q74 = col_integer(),
- Q75 = col_integer(),
- Q76 = col_integer(),
- Q77 = col_integer(),
- Q78 = col_integer(),
- Q79 = col_integer(),
- Q80 = col_integer(),
- .default = col_number()))
+ B2b = col_number(),
+ .default = col_character()))
}
)
return(big_df)
}
+#' Split multiselect options into codable form
+#'
+#' Multiselect options are coded by Qualtrics as a comma-separated string of
+#' selected options, like "1,14", or the empty string if no options are
+#' selected. Split these into vectors of selected options, which can be queried
+#' using `is_selected()`.
+#'
+#' @param column vector of selections, like c("1,4", "5", ...)
+#' @return list of same length, each entry of which is a vector of selected
+#' options
+split_options <- function(column) {
+ return(strsplit(column, ",", fixed = TRUE))
+}
+
+#' Test if a specific choice is selected in a multiselect item
+#'
+#' This is used for items that allow respondents to select multiple options from
+#' a list, such as the symptoms items. Checking whether a specific selection is
+#' selected in either "" (empty string) or `NA` responses will produce `NA`s, so
+#' that empty responses are treated as missing, rather than as the item not
+#' being selected.
+#'
+#' @param vec A list whose entries are character vectors, such as c("14", "15"),
+#' as produced by `split_options()`.
+#' @param selection one string, such as "14", representing the answer choice of
+#' interest
+#' @return a logical vector; for each entry in `vec`, the boolean indicates
+#' whether `selection` is contained in the character vector.
+#' @examples
+#' \dontrun{
+#' symptoms <- split_options(data$B2)
+#'
+#' # vector of T/F/NA for each respondent's fever status
+#' fever <- is_selected(symptoms, "1")
+#' }
+is_selected <- function(vec, selection) {
+ selections <- unlist(lapply(
+ vec,
+ function(resp) {
+ if (length(resp) == 0 || all(is.na(resp))) {
+ # Qualtrics files code no selection as "" (empty string), which is
+ # parsed by `read_csv()` as `NA` (missing) by default. Since all our
+ # selection items include "None of the above" or similar, treat both no
+ # selection ("") or missing (NA) as missing, for generality.
+ NA
+ } else {
+ selection %in% resp
+ }
+ }))
+
+ return(selections)
+}
## Helper function to extract dates from each file's filename.
get_file_properties <- function(filename) {
short <- strsplit(filename, ".", fixed = TRUE)[[1]][1]
diff --git a/docs/symptom-survey/weights.md b/docs/symptom-survey/weights.md
index fc8bca191..928a32724 100644
--- a/docs/symptom-survey/weights.md
+++ b/docs/symptom-survey/weights.md
@@ -1,15 +1,16 @@
---
title: Survey Weights
-parent: COVID Symptom Survey
+parent: COVID-19 Trends and Impact Survey
nav_order: 5
---
# Survey Weights
{: .no_toc}
-The symptom survey individual response files contain survey weights calculated
-by Facebook. These weights are also used to produce our [public contingency tables](contingency-tables.md)
-and the geographic aggregates [in the COVIDcast Epidata API](../api/covidcast-signals/fb-survey.md).
+The survey's individual response files contain respondent weights calculated
+by Facebook. These weights are also used to produce our [public contingency
+tables](contingency-tables.md) and the geographic aggregates [in the COVIDcast
+Epidata API](../api/covidcast-signals/fb-survey.md).
Facebook has provided documentation to describe the calculation and usage of
these weights, [available here](symptom-survey-weights.pdf). This documentation