diff --git a/docs/api/covidcast-signals/fb-survey.md b/docs/api/covidcast-signals/fb-survey.md index 7c6323321..28b0ec55f 100644 --- a/docs/api/covidcast-signals/fb-survey.md +++ b/docs/api/covidcast-signals/fb-survey.md @@ -1,10 +1,10 @@ --- -title: Symptom Surveys +title: COVID-19 Trends and Impact Survey parent: Data Sources and Signals grand_parent: COVIDcast Epidata API --- -# Symptom Surveys +# COVID-19 Trends and Impact Survey {: .no_toc} * **Source name:** `fb-survey` @@ -17,9 +17,10 @@ grand_parent: COVIDcast Epidata API ## Overview -This data source is based on symptom surveys run by the Delphi group at Carnegie -Mellon. Facebook directs a random sample of its users to these surveys, which -are voluntary. Users age 18 or older are eligible to complete the surveys, and +This data source is based on the [COVID-19 Trends and Impact Survey +(CTIS)](../../symptom-survey/) run by the Delphi group at Carnegie Mellon. +Facebook directs a random sample of its users to these surveys, which are +voluntary. Users age 18 or older are eligible to complete the surveys, and their survey responses are held by CMU and are sharable with other health researchers under a data use agreement. No individual survey responses are shared back to Facebook. See our [surveys @@ -575,7 +576,7 @@ $$ where $$\pi_i$$ is an estimated probability (produced by Facebook) that an individual with the same state-by-age-gender profile as user $$i$$ would be a -Facebook user and take our CMU survey. The adjustment we make follows a standard +Facebook user and take our survey. The adjustment we make follows a standard inverse probability weighting strategy (this being a special case of importance sampling). diff --git a/docs/symptom-survey/coding.md b/docs/symptom-survey/coding.md index 278c6d931..d7fe82c17 100644 --- a/docs/symptom-survey/coding.md +++ b/docs/symptom-survey/coding.md @@ -1,15 +1,15 @@ --- title: Questions and Coding -parent: COVID Symptom Survey +parent: COVID-19 Trends and Impact Survey nav_order: 6 --- # Questions and Coding {: .no_toc} -The symptom surveys have been deployed in several waves. We have tried to ensure -the coding of waves is consistent. This page provides the full survey text and -coding schemes. +The COVID-19 Trends and Impacts Survey (CTIS) has been deployed in several waves. +We have tried to ensure the coding of waves is consistent. This page provides +the full survey text and coding schemes. ## Table of contents @@ -467,7 +467,7 @@ new items were meant to capture reasons for vaccine hesitancy among respondents. when you use responses from multiple waves of this survey, since they may shift which occupations respondents choose. * C14a is a revision of item C14, changed from "the past 5 days" to "the past - 7 days" to be consistent with other items on the COVID Symptom Survey. + 7 days" to be consistent with other items on CTIS. C14a replaces C14. * C17a is a revision of item C17, which asked respondents if they have had a flu vaccination since June 2020. C17a changed the date to July 1, 2020 and diff --git a/docs/symptom-survey/collaboration-revision.md b/docs/symptom-survey/collaboration-revision.md index 48aa1616f..e344f5ba9 100644 --- a/docs/symptom-survey/collaboration-revision.md +++ b/docs/symptom-survey/collaboration-revision.md @@ -1,16 +1,16 @@ --- title: Collaboration and Survey Revision -parent: COVID Symptom Survey +parent: COVID-19 Trends and Impact Survey nav_order: 1 --- # Collaboration and Survey Revision -Delphi continues to revise the COVID-19 Symptom Survey instruments in order to -prioritize items that have the greatest utility for the response to the COVID-19 -pandemic. We conduct revisions in collaboration with data users, fellow -researchers, and public health officials, to ensure the survey data best serves -public health and research goals. +Delphi continues to revise the COVID-19 Trends and Impact Survey (CTIS) +instruments in order to prioritize items that have the greatest utility for the +response to the COVID-19 pandemic. We conduct revisions in collaboration with +data users, fellow researchers, and public health officials, to ensure the +survey data best serves public health and research goals. ## Proposing Revisions @@ -18,7 +18,7 @@ If there is a revision or question you would like us to consider, please fill out [this form requesting details about your proposal](https://forms.gle/q6NS8fPJJofKQ9mM8). This request can be submitted by researchers regardless of whether they have a signed Data Use Agreement for the -individual responses to the COVID Symptom Survey. +individual responses to the COVID-19 Trends and Impact Survey. ## Collaboration Meetings diff --git a/docs/symptom-survey/contingency-tables.md b/docs/symptom-survey/contingency-tables.md index 7604506c6..b2b741ce6 100644 --- a/docs/symptom-survey/contingency-tables.md +++ b/docs/symptom-survey/contingency-tables.md @@ -1,6 +1,6 @@ --- title: Contingency Tables -parent: COVID Symptom Survey +parent: COVID-19 Trends and Impact Survey nav_order: 4 --- @@ -8,7 +8,7 @@ nav_order: 4 {: .no_toc} This documentation describes the fine-resolution contingency tables produced by -grouping [COVID Symptom Survey](./index.md) individual responses by various +grouping [COVID-19 Trends and Impact Survey (CTIS)](./index.md) individual responses by various self-reported demographic features. * [Weekly files](https://www.cmu.edu/delphi-web/surveys/weekly-rollup/) @@ -119,6 +119,7 @@ Within a CSV, the first few columns store metadata of the aggregation: | `ISO_3` | Three-letter ISO country code ("USA") | | `GID_0` | GADM level 0 ID | | `state` | State name; "Overall" if aggregation not grouped at the state level | +| `GID_1` | GADM level 1 ID | | `state_fips` | State FIPS code; `NA` if aggregation not grouped at the state level | | `county` | County name; "Overall" if aggregation not grouped at the county level | | `county_fips` | County FIPS code; `NA` if aggregation not grouped at the county level | diff --git a/docs/symptom-survey/data-access.md b/docs/symptom-survey/data-access.md index dd7cfacf2..a0fa103e4 100644 --- a/docs/symptom-survey/data-access.md +++ b/docs/symptom-survey/data-access.md @@ -1,16 +1,16 @@ --- title: Getting Data Access -parent: COVID Symptom Survey +parent: COVID-19 Trends and Impact Survey nav_order: 0 --- # Getting Data Access The Delphi Research Group at Carnegie Mellon University (CMU), in partnership -with Facebook, has conducted the COVID Symptom Survey to better understand the -spread of COVID-19 and its effects on public health and well-being. This may -help improve our local and national responses to the pandemic and our -understanding of how it has affected society. +with Facebook, has conducted the COVID-19 Trends and Impact Survey (CTIS) to +better understand the spread of COVID-19 and its effects on public health and +well-being. This may help improve our local and national responses to the +pandemic and our understanding of how it has affected society. [High-level aggregates](../api/covidcast.md) of select survey items are publicly available in the [COVIDcast API](../api/covidcast-signals/fb-survey.md). @@ -25,9 +25,9 @@ Agreement (DUA). To request access to the data please submit the information requested in [Facebook's page on obtaining data access](https://dataforgood.fb.com/docs/covid-19-symptom-survey-request-for-data-access/), which sets out the basic conditions and provides a form to request access. An -[international version of the COVID Symptom Survey](https://covidmap.umd.edu/) -is conducted by the University of Maryland (UMD) and access can be requested -through the same form. +[international version of CTIS](https://covidmap.umd.edu/) is conducted by the +University of Maryland (UMD) and access can be requested through the same +form. The United States survey protocol has been reviewed by the Carnegie Mellon University Institutional Review Board with IRB ID STUDY2020_00000162. diff --git a/docs/symptom-survey/index.md b/docs/symptom-survey/index.md index f8591c4c0..fb5961d05 100644 --- a/docs/symptom-survey/index.md +++ b/docs/symptom-survey/index.md @@ -1,12 +1,12 @@ --- -title: COVID Symptom Survey +title: COVID-19 Trends and Impact Survey has_children: true nav_order: 2 --- -# COVID Symptom Survey +# COVID-19 Trends and Impact Survey -Since April 2020, Delphi has conducted a voluntary COVID-19 symptom survey, +Since April 2020, Delphi has conducted a voluntary survey about COVID-19, distributed daily to users in the United States via a partnership with Facebook. This survey asks respondents about COVID-like symptoms, their behavior (such as social distancing), mental health, and economic and health impacts they have @@ -29,7 +29,7 @@ If you have questions about the survey or getting access to data, contact us at ## Credits -The COVID Symptom Survey is a project of the [Delphi +The COVID-19 Trends and Impact Survey (CTIS) is a project of the [Delphi Group](https://delphi.cmu.edu/) at Carnegie Mellon University. The Principal Investigator is [Alex Reinhart](https://www.refsmmat.com/); Wichada La Motte-Kerr is Survey Coordinator. The survey protocol is reviewed by the @@ -59,18 +59,30 @@ the survey in publications based on the data. Specifically, we ask that you: 2. Cite this web page for details about the survey. For example, you may cite it as - > Delphi Group (2021). COVID Symptom Survey. + > Delphi Group (2021). COVID-19 Trends and Impact Survey. > A journal article describing the survey and its methods is currently in preparation, and we will update this page when it is available so that you can cite it instead. -3. Send a copy of your publication, once it appears publicly as a preprint or - journal article, to . - -Additionally, please note that the data use agreement requires that if you -disclose survey microdata, Delphi must agree on the aggregation method that you -will use to ensure reported estimates do not disclose any individual -identifiable information, including individual survey results. If you are unsure -whether a particular aggregation will prevent disclosure of individual survey -results, please email us at . +3. The data use agreement requires that if you disclose survey microdata, Delphi + must agree on the aggregation method that you will use to ensure reported + estimates do not disclose any individual identifiable information, including + individual survey results. If you are unsure whether a particular aggregation + will prevent disclosure of individual survey results, please email us at + . +4. Finally, send a copy of your publication, once it appears publicly as a + preprint or journal article, to . + +When referring to the survey in text, we prefer the following formats: + +* Long form (such as in an introduction or methods description): "The Delphi + Group at Carnegie Mellon University U.S. COVID-19 Trends and Impact Survey, in + partnership with Facebook". +* Short form (used after the long form has been introduced): "The U.S. COVID-19 + Trends and Impact Survey" +* Acronym form: "Delphi US CTIS" + +Prior to July 2021, the survey was known as the COVID Symptom Survey (CSS), and +some older documentation and publication may still refer to this name. We prefer +that new publications and materials refer to the new name. diff --git a/docs/symptom-survey/modules.md b/docs/symptom-survey/modules.md index d362ae288..b80118f30 100644 --- a/docs/symptom-survey/modules.md +++ b/docs/symptom-survey/modules.md @@ -1,17 +1,17 @@ --- title: Survey Modules & Randomization -parent: COVID Symptom Survey +parent: COVID-19 Trends and Impact Survey nav_order: 7 --- # Questions and Coding {: .no_toc} -To reduce the overall length of the instrument and minimize response burden, the -COVID Symptom Survey will consist of a block of daily core questions and will -use a randomized module approach for the other topics. Implementation of this -approach started in [Wave 11](coding.md#wave-11), which launched on May 20, -2021. +To reduce the overall length of the instrument and minimize response burden, +the COVID-19 Trends and Impact Survey (CTIS) will consist of a block of daily +core questions and will use a randomized module approach for the other topics. +Implementation of this approach started in [Wave 11](coding.md#wave-11), which +launched on May 20, 2021. Each respondent invited to take the survey will be asked the daily core questions. The daily core questions for Wave 11 include: diff --git a/docs/symptom-survey/problems.md b/docs/symptom-survey/problems.md index 44f3e73dd..9313bd300 100644 --- a/docs/symptom-survey/problems.md +++ b/docs/symptom-survey/problems.md @@ -1,15 +1,15 @@ --- title: Problems and Data Errors -parent: COVID Symptom Survey +parent: COVID-19 Trends and Impact Survey nav_order: 8 --- # Problems and Data Errors {: .no_toc} -Given the scale of the COVID Symptom Survey, we occasionally encounter data -errors or survey implementation problems that affect the interpretation of -results. All problems will be logged here. +Given the scale of the COVID-19 Trends and Impact Survey (CTIS), we occasionally +encounter data errors or survey implementation problems that affect the +interpretation of results. All problems will be logged here. ## Table of contents {: .no_toc .text-delta} diff --git a/docs/symptom-survey/server-access.md b/docs/symptom-survey/server-access.md index 720c27ca8..5795f0db7 100644 --- a/docs/symptom-survey/server-access.md +++ b/docs/symptom-survey/server-access.md @@ -1,15 +1,16 @@ --- title: SFTP Server Access -parent: COVID Symptom Survey +parent: COVID-19 Trends and Impact Survey nav_order: 2 --- # SFTP Server Access Researchers with data use agreements to access the raw data from the COVID-19 -symptom survey can access the data over SFTP. (If you do not have a data use -agreement, see the [main survey page](index.md) for information about getting -access and about aggregate data that is available for public download.) +Trends and Impact Survey (CTIS) can access the data over SFTP. (If you do not +have a data use agreement, see the [main survey page](index.md) for +information about getting access and about aggregate data that is available +for public download.) If you're not familiar with SFTP, it is a protocol for securely accessing and downloading large amounts of data from remote servers. The instructions below explain how to diff --git a/docs/symptom-survey/survey-files.md b/docs/symptom-survey/survey-files.md index 48ca31190..a38425a3a 100644 --- a/docs/symptom-survey/survey-files.md +++ b/docs/symptom-survey/survey-files.md @@ -1,16 +1,17 @@ --- title: Response Files -parent: COVID Symptom Survey +parent: COVID-19 Trends and Impact Survey (CTIS) nav_order: 3 --- # Response Files {: .no_toc} -Users with access to the [COVID Symptom Survey](./index.md) individual response -data should have received SFTP credentials for a private server where the data -are stored. To connect to the server, see the [server access documentation](server-access.md). -This documentation describes the survey data available on that server. +Users with access to the [COVID-19 Trends and Impact Survey (CTIS)](./index.md) +individual response data should have received SFTP credentials for a private +server where the data are stored. To connect to the server, see the [server +access documentation](server-access.md). This documentation describes the +survey data available on that server. You must sign a Data Use Agreement with Facebook and with CMU to gain access to the individual survey responses. If you have not done so, aggregate diff --git a/docs/symptom-survey/survey-utils.R b/docs/symptom-survey/survey-utils.R index 0c1987220..b30c0ed3f 100644 --- a/docs/symptom-survey/survey-utils.R +++ b/docs/symptom-survey/survey-utils.R @@ -12,7 +12,7 @@ library(dplyr) #' This function extracts the date from each file, determines which files #' contain reissued data, and produces a single data frame representing the most #' recent data available for each day. It can read gzip-compressed CSV files, -#' such as those on the SFTP site, using `readr::read_csv`. +#' such as those on the SFTP site, using `readr::read_csv()`. #' #' This function handles column types correctly for surveys up to Wave 4. #' @@ -38,57 +38,83 @@ get_survey_df <- function(directory, pattern = "*.csv.gz$") { big_df <- map_dfr( latest_files, function(f) { - # stop readr from thinking commas = thousand separators, - # and from inferring column types incorrectly - read_csv(file.path(directory, f), locale = locale(grouping_mark = ""), + # stop readr from thinking commas = thousand separators, and from + # inferring column types incorrectly + read_csv(file.path(directory, f), + locale = locale(grouping_mark = ""), col_types = cols( + UserLanguage = col_character(), + StartDatetime = col_datetime(), + EndDatetime = col_datetime(), + weight = col_number(), + wave = col_integer(), + fips = col_character(), + A2 = col_number(), + A5_1 = col_number(), + A5_2 = col_number(), + A5_3 = col_number(), A2b = col_number(), A3 = col_character(), A4 = col_number(), - B2 = col_character(), - B2_14_TEXT = col_character(), - B2c = col_character(), - B2c_14_TEXT = col_character(), - B4 = col_number(), - B5 = col_number(), - B7 = col_character(), - B10b = col_character(), - B12a = col_character(), - C1 = col_character(), - C3 = col_number(), - C4 = col_number(), - C5 = col_number(), - C7 = col_number(), - C13 = col_character(), - C13a = col_character(), - D1_4_TEXT = col_character(), - E3 = col_character(), - fips = col_character(), - UserLanguage = col_character(), - StartDatetime = col_character(), - EndDatetime = col_character(), - Q65 = col_integer(), - Q66 = col_integer(), - Q67 = col_integer(), - Q68 = col_integer(), - Q69 = col_integer(), - Q70 = col_integer(), - Q71 = col_integer(), - Q72 = col_integer(), - Q73 = col_integer(), - Q74 = col_integer(), - Q75 = col_integer(), - Q76 = col_integer(), - Q77 = col_integer(), - Q78 = col_integer(), - Q79 = col_integer(), - Q80 = col_integer(), - .default = col_number())) + B2b = col_number(), + .default = col_character())) } ) return(big_df) } +#' Split multiselect options into codable form +#' +#' Multiselect options are coded by Qualtrics as a comma-separated string of +#' selected options, like "1,14", or the empty string if no options are +#' selected. Split these into vectors of selected options, which can be queried +#' using `is_selected()`. +#' +#' @param column vector of selections, like c("1,4", "5", ...) +#' @return list of same length, each entry of which is a vector of selected +#' options +split_options <- function(column) { + return(strsplit(column, ",", fixed = TRUE)) +} + +#' Test if a specific choice is selected in a multiselect item +#' +#' This is used for items that allow respondents to select multiple options from +#' a list, such as the symptoms items. Checking whether a specific selection is +#' selected in either "" (empty string) or `NA` responses will produce `NA`s, so +#' that empty responses are treated as missing, rather than as the item not +#' being selected. +#' +#' @param vec A list whose entries are character vectors, such as c("14", "15"), +#' as produced by `split_options()`. +#' @param selection one string, such as "14", representing the answer choice of +#' interest +#' @return a logical vector; for each entry in `vec`, the boolean indicates +#' whether `selection` is contained in the character vector. +#' @examples +#' \dontrun{ +#' symptoms <- split_options(data$B2) +#' +#' # vector of T/F/NA for each respondent's fever status +#' fever <- is_selected(symptoms, "1") +#' } +is_selected <- function(vec, selection) { + selections <- unlist(lapply( + vec, + function(resp) { + if (length(resp) == 0 || all(is.na(resp))) { + # Qualtrics files code no selection as "" (empty string), which is + # parsed by `read_csv()` as `NA` (missing) by default. Since all our + # selection items include "None of the above" or similar, treat both no + # selection ("") or missing (NA) as missing, for generality. + NA + } else { + selection %in% resp + } + })) + + return(selections) +} ## Helper function to extract dates from each file's filename. get_file_properties <- function(filename) { short <- strsplit(filename, ".", fixed = TRUE)[[1]][1] diff --git a/docs/symptom-survey/weights.md b/docs/symptom-survey/weights.md index fc8bca191..928a32724 100644 --- a/docs/symptom-survey/weights.md +++ b/docs/symptom-survey/weights.md @@ -1,15 +1,16 @@ --- title: Survey Weights -parent: COVID Symptom Survey +parent: COVID-19 Trends and Impact Survey nav_order: 5 --- # Survey Weights {: .no_toc} -The symptom survey individual response files contain survey weights calculated -by Facebook. These weights are also used to produce our [public contingency tables](contingency-tables.md) -and the geographic aggregates [in the COVIDcast Epidata API](../api/covidcast-signals/fb-survey.md). +The survey's individual response files contain respondent weights calculated +by Facebook. These weights are also used to produce our [public contingency +tables](contingency-tables.md) and the geographic aggregates [in the COVIDcast +Epidata API](../api/covidcast-signals/fb-survey.md). Facebook has provided documentation to describe the calculation and usage of these weights, [available here](symptom-survey-weights.pdf). This documentation