-
Notifications
You must be signed in to change notification settings - Fork 10
Add latency adjustment #279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thinking about this, it actually needs to tie the test and training data together, right? e.g the amount to extend the ahead we're training depends on the latency of the test data. If we're bundling both fit and predict into a forecast function, then we can probably do this. outside that context I think it ends up invalid? |
oh, rereading this, I guess you're thinking do |
Thinking about Problem example for example_data <-
data.frame(
day = ymd("2012-06-07") + days(1:12),
x1 = round(runif(12), 2),
x2 = round(runif(12), 2),
x3 = round(runif(12), 2)
)
example_data$x1[c(1, 5, 6)] <- NA
example_data$x2[c(1:4, 10)] <- NA
example_data$x2[c(8:12, 10)] <- NA
example_data
day x1 x2 x3
1 2012-06-08 NA NA 0.67
2 2012-06-09 0.08 NA 0.70
3 2012-06-10 0.55 NA 0.05
4 2012-06-11 0.46 NA 0.45
5 2012-06-12 NA 0.19 0.54
6 2012-06-13 NA 0.57 0.16
7 2012-06-14 0.56 0.51 0.98
8 2012-06-15 0.67 NA 0.21
9 2012-06-16 0.16 NA 0.06
10 2012-06-17 0.99 NA 0.51
11 2012-06-18 0.16 NA 0.34
12 2012-06-19 0.33 NA 0.45 Which has more seven_pt <- recipe(~., data = example_data) %>%
update_role(day, new_role = "time_index") %>%
step_impute_roll(all_numeric_predictors(), window = 5) %>%
prep(training = example_data)
# The training set:
bake(seven_pt, new_data = NULL)
<date> <dbl> <dbl> <dbl>
1 2012-06-08 0.46 0.19 0.67
2 2012-06-09 0.08 0.19 0.7
3 2012-06-10 0.55 0.19 0.05
4 2012-06-11 0.46 0.38 0.45
5 2012-06-12 0.55 0.19 0.54
6 2012-06-13 0.56 0.57 0.16
7 2012-06-14 0.56 0.51 0.98
8 2012-06-15 0.67 0.54 0.21
9 2012-06-16 0.16 0.51 0.06
10 2012-06-17 0.99 NA 0.51
11 2012-06-18 0.16 NA 0.34
12 2012-06-19 0.33 NA 0.45 results in more NA's. It won't accept So as far as I can tell, either we add a |
I recently learned that |
I think the [shifting/ahead&lag-adjustment] approach will give better results than locf imputation for dealing with shared latency between all signals&epikeys (ahead adjustment) and the per-signal, cross-epikey latency (lag adjustment). However, I'm assuming that it doesn't handle any differences in latency between epikeys for the same signal; that part could be done [--- and probably should be done by default in canned forecasters ---] with locf imputation just to enable getting some predictions while still just requiring a single fit. [The locf step should also probably warn if it's locfing very far, or maybe at all, similar to the warnings in these adjustment steps. E.g., if a location is more than a month behind the other locations, something's probably up --- either they stopped reporting and you wouldn't want to forecast, or ingestion is failing, and you want to fix that.] (If we wanted better results then there's an approach that probably gives better performance by fitting e.g., one model for the regular locations and another model for each unique different set of signal latencies, e.g., normally it might just be VI having some extra latency so we'd fit a separate model based on the lags it has available, but still geopooling across everything. But that's metamodeling, probably not achievable with a step.) |
See the implementation here: https://github.com/cmu-delphi/exploration-tooling/blob/77d7e5eb95e4b17f40567430b0233f6b24cf100a/R/latency_adjusting.R#L11.
step_epi_ahead()
.target_date
). If thetarget_date
is autocalculated based on theas_of
for theepi_df
, the ahead could shift massively (say when using finalized data).Deliverables:
{recipes}
)get_test_data()
as neededThe text was updated successfully, but these errors were encountered: