Clarify documentation for and sanity-check `nafill_buffer` usage

The args list help entries for the canned forecasters, including ones that don't use lags, is:
```
nafill_buffer: At predict time, recent values of the training data are
          used to create a forecast. However, these can be 'NA' due to,
          e.g., data latency issues. By default, any missing values
          will get filled with less recent data. Setting this value to
          'NULL' will result in 1 extra recent row (beyond those
          required for lag creation) to be used. Note that we require
          at least 'min(lags)' rows of recent data per 'geo_value' to
          create a prediction. For this reason, setting 'nafill_buffer
          < min(lags)' will be treated as _additional_ allowed recent
          data rather than the total amount of recent data to examine.
```
This confused me, and I think there may be some implementation bug(s).

Requests:
- State the default value, Inf, and what it does.
- State what "regular" values (non-NULL, not < min(lags)) do.
- "Setting this value to 'NULL' will result in 1 extra recent row (beyond those required for lag creation) to be used." ---
  - An extra row from where, for what?  --- I think I understand now after a while of thinking about the name and looking at the implementation, but things could be clearer.
  - Why would this be expected to work so much that it's a special value?
  - It looks like "those required for lag creation" corresponds to `max_lags + max_horizon`, but factors into `keep` in `x <- dplyr::filter(x, forecast_date - time_value <= keep)`, which
    - if `forecast_date` is the default max `time_value`, seems like more than is required for lag creation.
    - if `forecast_date` has been passed as the actual forecast date, makes it seem that `max_horizon` should be something around the reporting latency instead, although maybe this is off by one since `keep` is at minimum `min_required + 1` not `min_required`.
    - (Also, this type of filter calls into question the naming `n_recent`.)
  - Actually, trying to use this, I get
```r
out <- arx_forecaster(
  case_death_rate_subset, "death_rate",
  "death_rate",
  args_list = arx_args_list(
    ahead = 16,
    lags = 7,
    n_training = 30,
    nafill_buffer = NULL
  )
)
#> Error in if (is.finite(nafill_buffer)) arg_is_pos_int(nafill_buffer, allow_null = TRUE) : 
#>   argument is of length zero
```
- ("For this reason, setting 'nafill_buffer < min(lags)' will be treated as _additional_ allowed recent data rather than the total amount of recent data to examine" --- still sounds like a footgun to have < min(lags) and >= min(lags) do different things.)
- Note/remind that this NA filling is not used in the training process?
- For forecasters that don't use lags, provide an explanation that doesn't mention lags?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarify documentation for and sanity-check `nafill_buffer` usage #320

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarify documentation for and sanity-check nafill_buffer usage #320

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Clarify documentation for and sanity-check `nafill_buffer` usage #320