Skip to content

min_train_window, quantile_by_key args are not used #169

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
brookslogan opened this issue Apr 26, 2023 · 6 comments
Open

min_train_window, quantile_by_key args are not used #169

brookslogan opened this issue Apr 26, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@brookslogan
Copy link
Contributor

Grepping around, @dshemetov and I were unable to see min_train_window or quantile_by_key actually used by arx_forecaster, arx_classifier, flatline_forecaster, or any other forecaster I missed. We did find:

  • There might be a PR for the min_train_window part; see comment here.
  • quantile_by_key docs reference layer_residual_quantiles()' by_key parameter, but I can't see it actually being forwarded along.
@brookslogan brookslogan added the bug Something isn't working label Apr 26, 2023
@brookslogan
Copy link
Contributor Author

Some thoughts on naming things here and how they should function.

Naming:

  • @dshemetov, can you please add your suggestions?
  • max_n_recent, min_n_recent
  • max_n_times_per_epikey, min_n_times_per_epikey

Functionality:

  • Seems best to line up the ideas of training window and min train window, so if they request a min & max of 30, it won't drop stuff / error / etc. if one location is a little behind on reporting. (Something slightly suboptimal / a tradeoff: if say VI is behind on reporting, then its training time set is different than all the others. But I think we want to take this tradeoff.)
  • Can't think of other actually-competing options.

@brookslogan
Copy link
Contributor Author

This also ties into discussions about alternative interpretations of n in epix_slide, or making epipredict naturally act on epi_archives. Right now, epiprocess main has a different interpretation with epix_slide's n parameter, as does the dev branch's replacement, the before parameter.

@brookslogan
Copy link
Contributor Author

The min training window would also be useful as a step, or as part of step_training_window.

@brookslogan
Copy link
Contributor Author

A potential bigger tradeoff we're taking with a max_n_recent, min_n_recent approach is potential incompatibility with how things are treated at test time. At train time, we're happy to go search for training data farther in the past. But at test time (since we're fitting one model across all locations/etc.), we are being more demanding: we don't output forecasts for locations that have the expected predictor data missing. Not sure there's a better alternative though.

@brookslogan
Copy link
Contributor Author

brookslogan commented Apr 27, 2023

And another trade-off: it'll be harder to get epix_slide to feed an appropriate time window of data in using its current approach. But we were considering changing how it does time windowing anyway. [On current epiprocess dev branch: we could just use before = 365000L to get everything, but that's not efficient. So then we have to think about what the max number of additional times might be grabbed by this n_recent approach.]

@dshemetov
Copy link
Contributor

  • I like min_n_recent and max_n_recent
  • min_n_recent being mostly informational / for debugging sounds good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants