Closed
Description
The args list help entries for the canned forecasters, including ones that don't use lags, is:
nafill_buffer: At predict time, recent values of the training data are
used to create a forecast. However, these can be 'NA' due to,
e.g., data latency issues. By default, any missing values
will get filled with less recent data. Setting this value to
'NULL' will result in 1 extra recent row (beyond those
required for lag creation) to be used. Note that we require
at least 'min(lags)' rows of recent data per 'geo_value' to
create a prediction. For this reason, setting 'nafill_buffer
< min(lags)' will be treated as _additional_ allowed recent
data rather than the total amount of recent data to examine.
This confused me, and I think there may be some implementation bug(s).
Requests:
- State the default value, Inf, and what it does.
- State what "regular" values (non-NULL, not < min(lags)) do.
- "Setting this value to 'NULL' will result in 1 extra recent row (beyond those required for lag creation) to be used." ---
- An extra row from where, for what? --- I think I understand now after a while of thinking about the name and looking at the implementation, but things could be clearer.
- Why would this be expected to work so much that it's a special value?
- It looks like "those required for lag creation" corresponds to
max_lags + max_horizon
, but factors intokeep
inx <- dplyr::filter(x, forecast_date - time_value <= keep)
, which- if
forecast_date
is the default maxtime_value
, seems like more than is required for lag creation. - if
forecast_date
has been passed as the actual forecast date, makes it seem thatmax_horizon
should be something around the reporting latency instead, although maybe this is off by one sincekeep
is at minimummin_required + 1
notmin_required
. - (Also, this type of filter calls into question the naming
n_recent
.)
- if
- Actually, trying to use this, I get
out <- arx_forecaster(
case_death_rate_subset, "death_rate",
"death_rate",
args_list = arx_args_list(
ahead = 16,
lags = 7,
n_training = 30,
nafill_buffer = NULL
)
)
#> Error in if (is.finite(nafill_buffer)) arg_is_pos_int(nafill_buffer, allow_null = TRUE) :
#> argument is of length zero
- ("For this reason, setting 'nafill_buffer < min(lags)' will be treated as additional allowed recent data rather than the total amount of recent data to examine" --- still sounds like a footgun to have < min(lags) and >= min(lags) do different things.)
- Note/remind that this NA filling is not used in the training process?
- For forecasters that don't use lags, provide an explanation that doesn't mention lags?
Metadata
Metadata
Assignees
Labels
No labels