Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for prediction intervals for VARMAX regressor #4267

Merged
merged 5 commits into from
Aug 14, 2023

Conversation

christopherbunn
Copy link
Contributor

Resolves #4262

@codecov
Copy link

codecov bot commented Aug 9, 2023

Codecov Report

Merging #4267 (2890f04) into main (daa8568) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #4267     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        355     355             
  Lines      38915   38956     +41     
=======================================
+ Hits       38794   38835     +41     
  Misses       121     121             
Files Changed Coverage Δ
...tors/regressors/exponential_smoothing_regressor.py 100.0% <100.0%> (ø)
...mponents/estimators/regressors/varmax_regressor.py 100.0% <100.0%> (ø)
...lml/tests/component_tests/test_varmax_regressor.py 100.0% <100.0%> (ø)

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@eccabay eccabay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid! Just a few questions and some testing suggestions

# anchor represents where the simulations should start from (forecasting is done from the "end")
y_pred = self._component_obj._fitted_forecaster.simulate(
nsimulations=X.shape[0],
repetitions=400,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this fixed at 400?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is based on the one we have for exponential smoothing and this is the value that is set there. Do you think we should have it passed in as a parameter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, poking around in our exponential smoother and statsmodels' docs on the subject, it's unclear to me why this was set at 400. I think at least setting it as a constant would be good, since the number seems arbitrary.

Copy link
Contributor Author

@christopherbunn christopherbunn Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will update to include _N_REPETITIONS=400

@@ -217,9 +217,43 @@ def get_prediction_intervals(
Returns:
dict: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be updated since the return here will be a nested, per series dictionary - do I have that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, updated the doc string!

)
prediction_interval_result = {}
for series in self._component_obj._fitted_forecaster.model.endog_names:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are endog_names, where do those come from? Is that the columns of y in unstacked/dataframe format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are and they are set internally in statsmodels during the fit process. I can add a comment describing this. I don't think there's a better way to access this info other than storing it as class variable during the fit process?

Comment on lines 318 to 327
@pytest.mark.parametrize("use_covariates", [True, False])
def test_varmax_regressor_prediction_intervals(use_covariates, ts_multiseries_data):
X_train, X_test, y_train = ts_multiseries_data(no_features=not use_covariates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an interesting test here would be to check the cases where X is None and use_covariates is True, and where X is not None and use_covariates is False - we have lots of checks for those cases, it'd be nice to ensure we handle those smoothly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, you mean the cases where X in fit() and not in in get_prediction_intervals() right? I can add that case in!

# anchor represents where the simulations should start from (forecasting is done from the "end")
y_pred = self._component_obj._fitted_forecaster.simulate(
nsimulations=X.shape[0],
repetitions=400,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, poking around in our exponential smoother and statsmodels' docs on the subject, it's unclear to me why this was set at 400. I think at least setting it as a constant would be good, since the number seems arbitrary.

@christopherbunn christopherbunn merged commit 3572300 into main Aug 14, 2023
24 checks passed
@christopherbunn christopherbunn deleted the TML-7894_pred_int_varmax branch August 14, 2023 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prediction interval support for VARMAX
3 participants