-
Notifications
You must be signed in to change notification settings - Fork 10
Handling insufficient training data. #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Modifying
I'm mostly a fan of this; a silent result with NA predictions does tell the user that nothing was actually predicted on that day, and fits with other
For the usecases we've run into so far, what we do here will for the most part be downstream of what we choose above. The case you describe doesn't seem unreasonable though, and having it produce the same kind of behavior as the above cases is probably good from a consistency perspective. |
Perhaps better to discuss in-person? Or someone could attempt to implement the function and we can iterate from there. |
@dsweber2 @dshemetov
Brainstorming about how
epipredict
should handle insufficient training data.A workflow is a container with (1) preprocessor; (2) engine; (3) postprocessor. By itself, it hasn't seen any data (training or test).
When
fit()
is called on a workflow + training data, the preprocessor processes that training data.Typically, a
check_*()
in the recipe (when placed after any lagging/leading/growth rate calculations), would error with a message like "insufficient training data". We could make this a warning (or perhaps have an argument to the check that determines whether we get a warning or an error). The most obvious result in that case, would be an "unfit workflow". The downstream impact is that you can'tpredict()
an unfit workflow (this produces an error through it'sworkflow
class). This makes sense typically, but perhaps not in the "retrospective evaluation task". Would we actually prefer thatpredict(unfit_workflow, new_data)
produce another warning? Silently/verbosely return an empty tibble? Silently/verbosely return a tibble with all the keys innew_data
but NA predictions?It's possible that the
new_data
could have insufficient information to create the necessary lags. So from a fit workflow, we would not be able to produce a prediction.get_test_data()
tries to ensure that there is enough (and only enough), but you could pass in your own (say if you don't wish to refit the model but predict using the most recent available data with the old model). So we could add acheck_*()
here as well. Again, now shouldpredict(fit_workflow, bad_new_data
produce a warning? Silently/verbosely return an empty tibble? Silently/verbosely return a tibble with all the keys innew_data
but NA predictions?Finally, we have
arx_forecaster()
(or any of the other canned forecasters). It's not just aworkflow
, but also sees the training data (and the testing data) at workflow build time. So we have more flexibility. So we could do any combination of the above options, or we could check at the beginning, outside the workflow, and return an appropriate object. For your current task, you're only using$predictions
, so the rest doesn't matter (it also returns the fit workflow and a list of metadata). So we could just skip the postprocessor and return an empty tibble or NA predictions (even if we do something different in general).I suspect there are some other paths in the tree I'm not thinking of at the moment. What do you guys think?
The text was updated successfully, but these errors were encountered: