-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wip feat: replace R6 epi_archive
with S3 implementation
#431
Conversation
6f1891d
to
d479882
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor todos & larger suggestions from a partial first pass; thought I'd just post them now since some probably will be slightly annoying to implement & finishing reviewing's going to take a while.
For future review passes:
- I need to learn the
new_
etc. pattern --- I think Daniel opened an issue about us using this forepi_df
s & Hadley has a chapter about it & related functions. - Check that we're achieving the intended non-mutation interface.
- Finish looking over files; rest of
archive.R
, all ofgrouped_epi_archive.R
, ... - Check
archive_cases_dv_subset.R
[and vignettes]. - [Assess vignette snaps disk size, testing completeness.]
Force pushed some changes, just a heads up @brookslogan |
5ff4e03
to
a67a324
Compare
R/archive.R
Outdated
other_keys = NULL, | ||
additional_metadata = NULL, | ||
compactify = NULL, | ||
clobberable_versions_start = NA, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest: either (A) eliminate new_epi_archive
, keep it as as_epi_archive
, or (B) introduce a new_
and validate_
that are more minimal, and have as_epi_archive
be the "helper" as described here [and use new_epi_archive
internally where we can] [and use the same approach to default handling in new_
and as_
]. [epi_archive()
as the helper seems like it might be confusing; people are used to data.frame()
, tibble()
, list()
, etc., which allow you to construct things in a different way.]
suggest: either (a) make more defaults non-NULL, or (b) make clobberable_versions_start
default also NULL (& replace with NA) if it's possible. I'm not sure about (a) vs. (b). (a) gives more type info, but also could hide other arg names in limited-width autocomplete windows. With (a) approach, there can sometimes also be some issues with confusing defaults --- imagine we had geo_type = guess_geo_type(x)
default and rebound x
before geo_type
was evaluated --- but it doesn't look like we rebind before assigning any of the defaults, so I don't think we're in such a situation .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I added a validate_epi_archive and moved a bulk of the validation from new_epi_archive in there. Now as_epi_archive calls validate and then new. new_epi_archive is only used there, so this change should be fine for now, but in the future we can refactor some safe internal calls to as_epi_archive with new_epi_archive. There is still a bunch of validation and construction of the data.table object in new_epi_archive, so that's another thing that could be fixed.
I also made clobberable_versions_start
default to NULL and then I set it to NA in that case. Just went with the consistent option for now, though we can easily switch it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #445
Core refactor changes look pretty good! Please:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see separate checklist.
R/methods-epi_archive.R
Outdated
#' operator. Currently, the only situation where there is potentially aliasing | ||
#' is of the `DT` in edge cases with `all_versions = TRUE`, but this may change | ||
#' in the future. | ||
#' | ||
#' @examples | ||
#' # warning message of data latency shown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: while trying to understand what this old comment was, I ran into this possible mistake:
archive_cases_dv_subset$version
--- which partial-matches (potentially silently by default?) to $versions_end
. I'm not sure if there's a class we can swap in to prevent this (vctrs::list_of
requires exact names, but also is designed for homogeneous lists with too complex of an entry type for an atomic vector). Since we expect to encapsulate DT sometime, maybe this will become less of an issue (users should expect less to be able to get version
very easily), but would probably be nice if there's an easy pre-baked solution. Guess we might be able to just implement $ for the epi_archive
or roll our own intermediate class...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm updating some docs in this region in a side branch.
note: also finding that x$clobberable_versions_start <- <value>
doesn't validate, which is probably suboptimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, interesting observations. Partial matching and S3 classes not having private attributes seem like a very leaky things to try to guard. I'm leaning towards just telling users that it is unsafe to directly modify epi_archives, outside epiprocess functions. We can then add safe modify functions as we get feature requests in individual instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Though I might sort of soft-request the [forbiddence of] partial matching thing in $
and at least name validation in $<-
and [[<-
also to catch errors we make in development. Not part of this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> #449
@@ -171,8 +256,6 @@ epix_fill_through_version <- function(x, fill_versions_end, | |||
#' as_epi_archive(compactify = TRUE) | |||
#' # merge results stored in a third object: | |||
#' xy <- epix_merge(x, y) | |||
#' # vs. mutating x to hold the merge result: | |||
#' x$merge(y) | |||
#' | |||
#' @importFrom data.table key set setkeyv | |||
#' @export |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: would we want this as an S3 generic as well, or as an implementation of the merge
generic? Then people using methods(, "epi_archive")
might get better results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. I've never used methods(, "<class>")
to explore documentation. Let's just see how epix_*
works out for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Side note: methods(, "<class>")
or sloop::{s3,s4}_methods_class()
(better than the former when s4 is involved in ways that confuse methods()
or its users) is useful for exploring undocumented stuff --- often S3 implementations won't actually be exported, documented, etc. Of course it's not comprehensive if there are regular functions that can also operate on the class. I guess that was one benefit of R6, potentially having a sort of full list of functionality at your fingertips.
I manually checked the diffs in the vignettes and found that there were no unexpected changes. I've purged the vignette snapshots from this branch by rebasing and force pushing, FYI @brookslogan Here is the code used to do vignette snapshots, in case we need it later: # test-snapshots.R
vignettes <- paste0(here::here("vignettes/"), c(
"advanced.Rmd",
"aggregation.Rmd",
"archive.Rmd",
"epiprocess.Rmd",
"slide.Rmd"
))
for (input_file in vignettes) {
test_that(paste0("snapshot vignette ", basename(input_file)), {
# skip("Skipping snapshot tests by default, as they are slow.")
output_file <- sub("\\.Rmd$", ".html", input_file)
withr::with_file(output_file, {
devtools::build_rmd(input_file)
expect_snapshot_file(output_file)
})
})
} Instructions:
|
* remove comment #417 * bump version to 0.7.6 and add NEWS line
- Forbid `NA` `compactify` - Remove `missing` checks when `is.null` suffices - Remove redundant default code - Make local `other_keys` have more consistent typing across branches
- Validate length. - Tweak message regarding type since typeof is length 1. - Actually raise error if NA when NA not allowed. - Make tests check the source of the error, since not being specific + R configuration masked some of these issues.
- S3 class vectors are ordered, so use `identical` - Improve class vector formatting - Tweak other `class` and `typeof` message text - Improve duplicate colnames message - Improve vector interpolation formatting - Fix typo in GCD error messaging
Print to stdout and without using messages for all the output. Prevents Rmds from splitting print output into multiple chunks. Allows `capture.output` by default to capture all expected output, and the same for logging utilities expecting regular output to come from stdout.
This applied for a different default `clobberable_versions_start`.
- Update `epix_as_of` docs further based on `clobberable_versions_start` now defaulting to `NA`. - Don't include `max_version =` in example `epix_as_of` calls as it seems atypical and a strange name if extracting a snapshot rather than an archive.
We don't want to try to use an `epi_archive` method implementation on a `grouped_epi_archive`, or have `is_epi_archive` succeed on them even with `grouped_okay = FALSE`, to prevent attempted extraction of nonexistent fields.
- Use new `%>% clone()` when we want a deep copy - Use aliasing instead of shallow copies, since with S3 lists we should not have the threat of mutation of the shallow list structure
f45f30e
to
d5c89b7
Compare
* remove is_epi_archive and delete in epix_slide * simplify group_by_drop_default * prune library calls in tests * remove here and waldo from Suggests * pull most validation work from new_epi_archive into validate_epi_archive * call validate_epi_archive in as_epi_archive
I think this PR is ready. @nmdefries (thank you!) reproduced my run of exploration-tooling with this branch and found that the forecaster outputs did not change. I found a few things that need to change in https://github.com/cmu-delphi/delphi-tooling-book, so I'm working on a PR there. We can probably merge this and I can have that one ready tomorrow. |
Checklist
Please:
PR).
brookslogan, nmdefries.
DESCRIPTION
. Always incrementthe patch version number (the third number), unless you are making a
release PR from dev to main, in which case increment the minor version
number (the second number).
(backwards-incompatible changes to the documented interface) are noted.
Collect the changes under the next release number (e.g. if you are on
1.7.2, then write your changes under the 1.8 heading).
process.
Change explanations for reviewer
Attention conservation notice: probably don't review this yet, still waiting downstream A/B testsepi_archive
to use S3 #430A/B tests TODO (A = epiprocess dev, B = this branch):
Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch
epi_archive
s #340