feat: update for epiprocess R6 refactor

* remove references to R6 and mutation * use epiprocess correctly * fix the authors section of DESCRIPTION * upgrade renv * update all packages in renv * integrate Rprofile with user Rprofile
cmu-delphi · May 1, 2024 · 1ac91a2 · 1ac91a2
1 parent 4c3830c
commit 1ac91a2
Show file tree

Hide file tree

Showing 6 changed files with 790 additions and 679 deletions.
diff --git a/.Rprofile b/.Rprofile
@@ -1 +1,7 @@
 source("renv/activate.R")
+
+# Check if user .Rprofile exists
+if (file.exists("~/.Rprofile")) {
+  # Source user .Rprofile
+  source("~/.Rprofile")
+}
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -2,11 +2,12 @@ Package: delphitoolingbook
 Title: Delphi Tooling
 Version: 0.0.0.9999
 Authors@R: c(
-    person("Daniel", "McDonald", "J.", "daniel@stat.ubc.ca", role = c("cre", "aut"),
-    person("Logan", "Brooks", role = c("cre","aut"),
-    person("Rachel", "Lobay", role = "aut"))
-    person("Ryan", "Tibshirani", "J.", "ryantibs@berkeley.edu", role = "aut"),
-Description: 
+    person("Daniel", "McDonald", "J.", "daniel@stat.ubc.ca", role = c("cre", "aut")),
+    person("Logan", "Brooks", role = c("cre","aut")),
+    person("Rachel", "Lobay", role = "aut"),
+    person("Ryan", "Tibshirani", "J.", "ryantibs@berkeley.edu", role = "aut")
+    )
+Description:
   | This book is a longform introduction to analysing and forecasting epidemiological data.
 License: MIT + file LICENSE
 Imports:

diff --git a/archive.qmd b/archive.qmd
@@ -25,9 +25,8 @@ source("_common.R")
 
 ## Getting data into `epi_archive` format
 
-An `epi_archive` object
-can be constructed from a data frame, data table, or tibble, provided that it
-has (at least) the following columns:
+An `epi_archive` object can be constructed from a data frame, data table, or
+tibble, provided that it has (at least) the following columns:
 
 * `geo_value`: the geographic value associated with each row of measurements.
 * `time_value`: the time value associated with each row of measurements.
@@ -55,10 +54,10 @@ class(x)
 print(x)
 ```
 
-An `epi_archive` is special kind of class called an R6 class. Its primary field
-is a data table `DT`, which is of class `data.table` (from the `data.table`
-package), and has columns `geo_value`, `time_value`, `version`, as well as any
-number of additional columns.
+An `epi_archive` is an S3 class. Its primary field is a data table `DT`, which
+is of class `data.table` (from the `data.table` package), and has columns
+`geo_value`, `time_value`, `version`, as well as any number of additional
+columns.
 
 ```{r}
 class(x$DT)
@@ -70,33 +69,18 @@ for the data table, as well as any other specified in the metadata (described
 below). There can only be a single row per unique combination of key variables,
 and therefore the key variables are critical for figuring out how to generate a
 snapshot of data from the archive, as of a given version (also described below).
-   
+
 ```{r, error=TRUE}
 key(x$DT)
 ```
-
-In general, the last version of each observation is carried forward (LOCF) to
-fill in data between recorded versions. **A word of caution:** R6 objects,
-unlike most other objects in R, have reference semantics. An important
-consequence of this is that objects are not copied when modified.
-
-```{r}
-original_value <- x$DT$percent_cli[1]
-y <- x # This DOES NOT make a copy of x
-y$DT$percent_cli[1] = 0
-head(y$DT)
-head(x$DT) 
-x$DT$percent_cli[1] <- original_value
-```
 
-To make a copy, we can use the `clone()` method for an R6 class, as in `y <-
-x$clone()`. You can read more about reference semantics in Hadley Wickham's
-[Advanced R](https://adv-r.hadley.nz/r6.html#r6-semantics) book.
+In general, the last version of each observation is carried forward (LOCF) to
+fill in data between recorded versions.
 
 ## Some details on metadata
 
 The following pieces of metadata are included as fields in an `epi_archive`
-object: 
+object:
 
 * `geo_type`: the type for the geo values.
 * `time_type`: the type for the time values.
@@ -112,10 +96,8 @@ call (as it did in the case above).
 
 A key method of an `epi_archive` class is `as_of()`, which generates a snapshot
 of the archive in `epi_df` format. This represents the most up-to-date values of
-the signal variables as of a given version. This can be accessed via `x$as_of()`
-for an `epi_archive` object `x`, but the package also provides a simple wrapper 
-function `epix_as_of()` since this is likely a more familiar interface for users
-not familiar with R6 (or object-oriented programming).
+the signal variables as of a given version. This can be accessed via
+`epix_as_of()`.
 
 ```{r}
 x_snapshot <- epix_as_of(x, max_version = as.Date("2021-06-01"))
@@ -125,7 +107,7 @@ max(x_snapshot$time_value)
 attributes(x_snapshot)$metadata$as_of
 ```
 
-We can see that the max time value in the `epi_df` object `x_snapshot` that was 
+We can see that the max time value in the `epi_df` object `x_snapshot` that was
 generated from the archive is May 29, 2021, even though the specified version
 date was June 1, 2021. From this we can infer that the doctor's visits signal
 was 2 days latent on June 1. Also, we can see that the metadata in the `epi_df`
@@ -134,7 +116,7 @@ object has the version date recorded in the `as_of` field.
 By default, using the maximum of the `version` column in the underlying data table in an
 `epi_archive` object itself generates a snapshot of the latest values of signal
 variables in the entire archive. The `epix_as_of()` function issues a warning in
-this case, since updates to the current version may still come in at a later 
+this case, since updates to the current version may still come in at a later
 point in time, due to various reasons, such as synchronization issues.
 
 ```{r}
@@ -143,15 +125,15 @@ x_latest <- epix_as_of(x, max_version = max(x$DT$version))
 
 Below, we pull several snapshots from the archive, spaced one month apart. We
 overlay the corresponding signal curves as colored lines, with the version dates
-marked by dotted vertical lines, and draw the latest curve in black (from the 
+marked by dotted vertical lines, and draw the latest curve in black (from the
 latest snapshot `x_latest` that the archive can provide).
 
 ```{r, fig.width = 8, fig.height = 7}
 self_max <- max(x$DT$version)
 versions <- seq(as.Date("2020-06-01"), self_max - 1, by = "1 month")
 snapshots <- map(
-  versions, 
-  function(v) { 
+  versions,
+  function(v) {
     epix_as_of(x, max_version = v) %>% mutate(version = v)
   }) %>%
   list_rbind() %>%
@@ -162,37 +144,35 @@ snapshots <- map(
 ```{r, fig.height=7}
 #| code-fold: true
 ggplot(snapshots %>% filter(!latest),
-            aes(x = time_value, y = percent_cli)) +  
-  geom_line(aes(color = factor(version)), na.rm = TRUE) + 
+            aes(x = time_value, y = percent_cli)) +
+  geom_line(aes(color = factor(version)), na.rm = TRUE) +
   geom_vline(aes(color = factor(version), xintercept = version), lty = 2) +
   facet_wrap(~ geo_value, scales = "free_y", ncol = 1) +
   scale_x_date(minor_breaks = "month", date_labels = "%b %Y") +
   scale_color_viridis_d(option = "A", end = .9) +
-  labs(x = "Date", y = "% of doctor's visits with CLI") + 
+  labs(x = "Date", y = "% of doctor's visits with CLI") +
   theme(legend.position = "none") +
   geom_line(data = snapshots %>% filter(latest),
-               aes(x = time_value, y = percent_cli), 
+               aes(x = time_value, y = percent_cli),
             inherit.aes = FALSE, color = "black", na.rm = TRUE)
 ```
 
 We can see some interesting and highly nontrivial revision behavior: at some
 points in time the provisional data snapshots grossly underestimate the latest
 curve (look in particular at Florida close to the end of 2021), and at others
-they overestimate it (both states towards the beginning of 2021), though not 
+they overestimate it (both states towards the beginning of 2021), though not
 quite as dramatically. Modeling the revision process, which is often called
 *backfill modeling*, is an important statistical problem in it of itself.
 
 
-## Merging `epi_archive` objects 
+## Merging `epi_archive` objects
 
 Now we demonstrate how to merge two `epi_archive` objects together, e.g., so
 that grabbing data from multiple sources as of a particular version can be
-performed with a single `as_of` call. The `epi_archive` class provides a method
-`merge()` precisely for this purpose. The wrapper function is called
-`epix_merge()`; this wrapper avoids mutating its inputs, while `x$merge` will
-mutate `x`. Below we merge the working `epi_archive` of versioned percentage CLI
-from outpatient visits to another one of versioned COVID-19 case reporting data,
-which we fetch the from the [COVIDcast
+performed with a single `as_of` call. The `epiprocess` packages provides
+`epix_merge()` for this purpose. Below we merge the working `epi_archive` of
+versioned percentage CLI from outpatient visits to another one of versioned
+COVID-19 case reporting data, which we fetch the from the [COVIDcast
 API](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html/), on the
 rate scale (counts per 100,000 people in the population).
 
@@ -209,7 +189,7 @@ When merging archives, unless the archives have identical data release patterns,
   the other).
 
 ```{r, message = FALSE, warning = FALSE,eval=FALSE}
-# This code is for illustration and doesn't run. 
+# This code is for illustration and doesn't run.
 # The result is saved/loaded in the (hidden) next chunk from `{epidatasets}`
 y <- covidcast(
   data_source = "jhu-csse",
@@ -224,24 +204,13 @@ y <- covidcast(
   select(geo_value, time_value, version = issue, case_rate_7d_av = value) %>%
   as_epi_archive(compactify = TRUE)
 
-x$merge(y, sync = "locf", compactify = FALSE)
+x <- epix_merge(x, y, sync = "locf", compactify = FALSE)
 print(x)
 head(x$DT)
 ```
 
-```{r, echo=FALSE}
-x <- archive_cases_dv_subset
-print(x)
-head(x$DT)
-```
-
-Importantly, see that `x$merge` mutated `x` to hold the result of the merge. We
-could also have used `xy = epix_merge(x, y)` to avoid mutating `x`. See the
-documentation for either for more detailed descriptions of what mutation,
-pointer aliasing, and pointer reseating is possible.
-
 ## Sliding version-aware computations
-    
+
 ::: {.callout-note}
 TODO: need a simple example here.
 :::
diff --git a/epiprocess.qmd b/epiprocess.qmd
@@ -15,17 +15,17 @@ contains the most up-to-date values of the signals variables, as of a given
 time.
 
 By convention, functions in the `epiprocess` package that operate on `epi_df`
-objects begin with `epi`. For example: 
+objects begin with `epi`. For example:
 
 - `epi_slide()`, for iteratively applying a custom computation to a variable in
   an `epi_df` object over sliding windows in time;
-  
+
 - `epi_cor()`, for computing lagged correlations between variables in an
   `epi_df` object, (allowing for grouping by geo value, time value, or any other
   variables).
 
 Functions in the package that operate directly on given variables do not begin
-  with `epi`. For example: 
+  with `epi`. For example:
 
 - `growth_rate()`, for estimating the growth rate of a given signal at given
   time values, using various methodologies;
@@ -35,20 +35,18 @@ Functions in the package that operate directly on given variables do not begin
 
 ## `epi_archive`: full version history of a data set
 
-The second main data structure in the package is called
-[`epi_archive`]. This is a special class (R6 format) 
-wrapped around a data table that stores the archive (version history) of some
-signal variables of interest.
+The second main data structure in the package is called [`epi_archive`]. This is
+an S3 class containing a data table that stores the archive (version history) of
+some signal variables of interest.
 
 By convention, functions in the `epiprocess` package that operate on
 `epi_archive` objects begin with `epix` (the "x" is meant to remind you of
-"archive"). These are just wrapper functions around the public methods for the
-`epi_archive` R6 class. For example:
+"archive"). For example:
 
 - `epix_as_of()`, for generating a snapshot in `epi_df` format from the data
   archive, which represents the most up-to-date values of the signal variables,
   as of the specified version;
-  
+
 - `epix_fill_through_version()`, for filling in some fake version data following
   simple rules, for use when downstream methods expect an archive that is more
   up-to-date (e.g., if it is a forecasting deadline date and one of our data