Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add load_data function that wraps load_jhu_data and load_healthdata_data #16

Closed
elray1 opened this issue Dec 4, 2020 · 2 comments
Closed
Assignees
Labels

Comments

@elray1
Copy link
Collaborator

elray1 commented Dec 4, 2020

Function description:

#' Assemble a data frame of incident and cumulative deaths or cases due to
#' COVID-19 as they were available as of one or more past dates.
#'
#' @param issues vector of issue dates (i.e. report dates) to use for querying data,
#' either \code{Date} objects or strings in the format 'yyyy-mm-dd'. Data for the
#' requested measures that were reported or updated exactly on the specified
#' issue date(s) will be returned. If multiple issue dates are provided, the result
#' includes the data for all such issue dates.
#' @param as_of character vector of "as of" dates to use for querying truths in
#' format 'yyyy-mm-dd'. For each spatial unit and temporal reporting unit, the last
#' available data with an issue date on or before the given \code{as_of} date are returned.
#' @param spatial_resolution character vector specifying spatial unit types to
#' include: one or more of 'county', 'state' and/or 'national'.
#' @param temporal_resolution string specifying temporal resolution
#' to include: one of 'daily' or 'weekly'
#' @param measure string specifying measure of covid dynamics:
#' one of 'deaths', 'cases', or hospitalizations
#' @param source string specifying data source.  Currently supported sources are
#' "jhu" for the "deaths" or "cases" measures or "healthdata" for the "hospitalizations"
#' measure.
#'
#' @return data frame with columns location (fips code), date, inc, cum, issue_date, as_of
#'
#' @details Data for a specified \code{issue} are only returned if the data were first available
#' on that date, or were updated on that date. A warning is generated for any issue dates
#' for which no data were available.
#' 
#' A query based on an \code{as_of} date returns the data for the most recent
#' \code{issue} date that is on or before the specified \code{as_of} date.
#' A warning is generated for any \code{as_of} dates for which no data were
#' available; this only occurs if the \code{as_of} date is prior to any data release for the
#' specified measure.
#' 
#' If the user provides values for both \code{issue} and as_of, a warning is generated
#' and the argument for \code{issue} is ignored.
#' 
#' If multiple \code{issue} dates or \code{as_of} dates are provided, the result combines
#' the data for all such dates. If no value is provided for either \code{issue} or \code{as_of},
#' results for the most recent available \code{as_of} date are returned.
#'
#' @export
load_data <- function(
    issue = NULL,
    as_of = NULL,
    spatial_resolution = "state",
    temporal_resolution = "weekly",
    measure = "deaths"
    source = NULL)

Logic:

  • Validate that spatial resolution is one or more of 'county', 'state' and/or 'national' (multiple may be provided)
  • Validate that temporal resolution is one of 'daily' or 'weekly' (only one may be provided)
  • Validate that measure is one of 'deaths', 'cases', or 'hospitalizations' (only one may be provided)
  • Validate that source is correctly specified. If it is NULL, set it to the appropriate value based on the requested measure. Otherwise, check that it is valid based on the requested measure; throw an error if not.
  • Validate that at most one of issue and as_of is non-NULL. If both are non-null, throw a warning and set issue to NULL
  • Identify the correct load_..._data function to use based on the requested measure and source.
  • if issue is not NULL, use purrr::map_dfr to call the load_..._data function identified above for every requested issue_date and return the combined results.
  • if as_of is not NULL, use purrr::map_dfr to call the load_..._data function identified above for every requested as_of and return the combined results.
  • if both issue and as_of are NULL, call the load_..._data function once with null arguments; it will by default use the latest as_of available.

Notes:

@nickreich
Copy link
Member

Under the @param for as_of you have some language about

#' If multiple issue dates are provided, the result combines the data for all such
#' issue dates. A warning is generated for any issue dates for which no data were
#' available.

should that be "if multiple as_of dates are provided?

Furthermore, not entirely clear to me what should happen if multiple issue_dates and as_of dates are passed? Is the idea that we are querying the data to return relevant data for every (issue_date, as_of) pair where issue_date <= as_of? This could be more clearly stated if that is the case.

minor: is "measure of covid prevalence" the right phrasing? maybe "measure of covid dynamics"?

@elray1
Copy link
Collaborator Author

elray1 commented Dec 16, 2020

I updated the description above to try to clarify Nick's questions above

@elray1 elray1 added the asof label Dec 17, 2020
@elray1 elray1 closed this as completed Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants