Skip to content

get_test_data returning NA's if there are NA's in the most recent data #267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dsweber2 opened this issue Nov 4, 2023 · 5 comments
Open
Assignees

Comments

@dsweber2
Copy link
Contributor

dsweber2 commented Nov 4, 2023

This seems like a bug. An example of what I mean:

jhu <- filter(
  case_death_rate_subset,
  time_value >= "2021-06-04",
  time_value <= "2021-12-31",
  geo_value %in% c("ca", "fl", "tx", "ny", "nj")
)
r <- epi_recipe(counts_subset) %>%
  add_role(geo_value_factor, new_role = "predictor") %>%
  step_dummy(geo_value_factor) %>%
  ## Occasionally, data reporting errors / corrections result in negative
  ## cases / deaths
  step_mutate(cases = pmax(cases, 0), deaths = pmax(deaths, 0)) %>%
  step_epi_lag(cases, deaths, lag = c(0, 7)) %>%
  step_epi_ahead(deaths, ahead = 7, role = "outcome") %>%
  step_epi_naomit() 
geo_values <-jhu$geo_value %>% unique()
one_day_nas <- tibble(
  geo_value = geo_values,
  time_value = as.Date("2022-01-01"),
  case_rate = NA,
  death_rate = runif(length(geo_values))
)
second_day_nas <- one_day_nas %>%
  mutate(time_value = as.Date("2022-01-02"))
jhu_nad <- jhu %>%
  as_tibble() %>%
  bind_rows(one_day_nas, second_day_nas) %>%
  as_epi_df()
attributes(jhu_nad)$metadata$as_of <- max(jhu_nad$time_value) + 3
get_test_data(r, jhu_nad)

The example workflow is unfortunately buried in the guts of exploration tooling; arx_forecastersort of does do the right thing, though it thinks the last day with data is the last day with NA data.

@dsweber2
Copy link
Contributor Author

kind of related to #106

@dsweber2
Copy link
Contributor Author

dsweber2 commented Jun 3, 2024

@dshemetov how would this interact with the work you've been doing with forecast and get_test_data? Should we hold off on this until that's done?

@dshemetov
Copy link
Contributor

dshemetov commented Jun 3, 2024

I'm hoping that the work there ends up resolving this, so let's just make sure to follow up on this after that's done.

@dshemetov dshemetov self-assigned this Mar 4, 2025
@dsweber2
Copy link
Contributor Author

Mostly deprecated in light of #293

@dshemetov
Copy link
Contributor

IIRC this was addressed via a generous use of clear_last_minute_nas, different datasets (weekly), and the introduction of our own get_oversized_test_data() function. The reprex above is worth tracking, to make sure it's fixed by get_test_data() in #293 though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants