Skip to content

291 date period #297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Mar 27, 2024
Merged

291 date period #297

merged 26 commits into from
Mar 27, 2024

Conversation

dajmcdon
Copy link
Contributor

@dajmcdon dajmcdon commented Mar 18, 2024

Checklist

Please:

  • Make sure this PR is against "dev", not "main".
  • Request a review from one of the current epipredict main reviewers:
    dajmcdon.
  • Makes sure to bump the version number in DESCRIPTION and NEWS.md.
    Always increment the patch version number (the third number), unless you are
    making a release PR from dev to main, in which case increment the minor
    version number (the second number).
  • Describe changes made in NEWS.md, making sure breaking changes
    (backwards-incompatible changes to the documented interface) are noted.
    Collect the changes under the next release number (e.g. if you are on
    0.7.2, then write your changes under the 0.8 heading).

Change explanations for reviewer

This should allow for weekly and annual forecasts.

  • Ahead/lag always operate on the time_type of the epi_df. But forecast_date / target_date always were dates. These can clash in non-daily forecast tasks.
  • We now validate these against the time_type of the epi_df. This should now work naturally. There's a test case for annual forecasts in the panel-data vignette in Panel Data Vignette (Issue 99) #115 .
  • While here, there were a few minor things I fixed:
    • prefix head and tail with utils:: to avoid R CMD Check notes.
    • switch a few rlang::abort() to cli::cli_abort()
    • layer_population_scaling() was producing messages because the by argument could be empty
    • Note the tolower() processing that is commented out
    • lag/ahead are coerced to integer silently
    • In some tests, prep() needed to have the training data.

Magic GitHub syntax to mark associated Issue(s) as resolved when this is merged into the default branch

@dajmcdon dajmcdon linked an issue Mar 18, 2024 that may be closed by this pull request
Comment on lines +166 to +167
# object$df <- object$df %>%
# dplyr::mutate(dplyr::across(tidyselect::where(is.character), tolower))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong, but I'm not yet certain that it's safe to remove.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing some timetravel, it looks like it's been here in one form or another forever. What's confusing to me is that this looks like it does nothing. Was the point to make all the characters columns lowercase (such as geo_value) to match correctly maybe, but there was a parenthesis problem? Was dplyr::mutate(dplyr::across(tidyselect::where(is.character)), tolower) meant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure why it was there in the first place. I suspect that the included state population dataset used capitals while typical usage from the API gives geos in lower case. But that shouldn't have resulted in hardcoded workarounds here (that are prone to failure).

So I think this should go forever, but I wanted to be sure that if some downstream use errored out, I could find this and try to track it more carefully.

@dajmcdon dajmcdon requested a review from dsweber2 March 18, 2024 19:00
@dajmcdon dajmcdon marked this pull request as ready for review March 18, 2024 19:00
Copy link
Contributor

@dsweber2 dsweber2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me. Might want to slightly restructure the validate_date error to reduce on redundancy and improve readability, but that's not really a huge change.

I tried to clarify some of the tests a little bit, since I was struggling to follow them, but its nothing essential.
If I were to suggest more tests, it would be a for loop over pairs of types for layer_add_forecast_date() and latest, with the expectation that all but the matching ones would fail. Probably overkill.

I think it makes sense to merge this before #296, as that also involves date types and working with the ahead.

Comment on lines +166 to +167
# object$df <- object$df %>%
# dplyr::mutate(dplyr::across(tidyselect::where(is.character), tolower))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing some timetravel, it looks like it's been here in one form or another forever. What's confusing to me is that this looks like it does nothing. Was the point to make all the characters columns lowercase (such as geo_value) to match correctly maybe, but there was a parenthesis problem? Was dplyr::mutate(dplyr::across(tidyselect::where(is.character)), tolower) meant?

@@ -11,8 +11,9 @@ latest <- jhu %>%

test_that("layer validation works", {
f <- frosting()
expect_error(layer_add_forecast_date(f, "a"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ok'd so we can add months? e.g. "May"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not that this is ok, but rather that the layer_add_*() function can no longer validate the date format immediately. It has to happen later.

@dshemetov dshemetov self-requested a review March 21, 2024 23:55
Copy link
Contributor

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read most of the code, but didn't have enough context to fully follow it. Dropped a few thoughts for now, hopefully they're helpful and not distracting.

}
forecast_date <- coerce_time_type(possible_fd, expected_time_type)
ahead <- extract_argument(the_recipe, "step_epi_ahead", "ahead")
target_date <- forecast_date + ahead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does having this go from max_time_value + ahead to forecast_date + ahead cause this to function to behave differently? If yes and testing this PR becomes tough, then we could reduce scope here and punt these changes to another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate question: how do the units in ahead change depending on the time type? Is this primarily up to the user to make sure they specify aheads in the right units?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On max_time_value + ahead, this was the default previously, and remains the fall back (see lines below). However, if there is a forecast_date, then I think the default for target_date should be to use the specified forecast_date first.

This PR is meant to allow for ahead to always be an integer. Then it should be in units relative to the time_type of the epi_df. It took a lot of effort to make that happen though.

@dajmcdon dajmcdon merged commit c94d5f9 into dev Mar 27, 2024
3 checks passed
@dajmcdon dajmcdon deleted the 291-date-period branch March 27, 2024 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Date processing mismatch
3 participants