-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for prediction intervals for VARMAX regressor #4267
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4267 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 355 355
Lines 38915 38956 +41
=======================================
+ Hits 38794 38835 +41
Misses 121 121
|
a6b738c
to
c435cd1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks solid! Just a few questions and some testing suggestions
# anchor represents where the simulations should start from (forecasting is done from the "end") | ||
y_pred = self._component_obj._fitted_forecaster.simulate( | ||
nsimulations=X.shape[0], | ||
repetitions=400, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this fixed at 400?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is based on the one we have for exponential smoothing and this is the value that is set there. Do you think we should have it passed in as a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, poking around in our exponential smoother and statsmodels' docs on the subject, it's unclear to me why this was set at 400. I think at least setting it as a constant would be good, since the number seems arbitrary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will update to include _N_REPETITIONS=400
@@ -217,9 +217,43 @@ def get_prediction_intervals( | |||
Returns: | |||
dict: Prediction intervals, keys are in the format {coverage}_lower or {coverage}_upper. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be updated since the return here will be a nested, per series dictionary - do I have that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, updated the doc string!
) | ||
prediction_interval_result = {} | ||
for series in self._component_obj._fitted_forecaster.model.endog_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are endog_names
, where do those come from? Is that the columns of y in unstacked/dataframe format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are and they are set internally in statsmodels during the fit process. I can add a comment describing this. I don't think there's a better way to access this info other than storing it as class variable during the fit process?
@pytest.mark.parametrize("use_covariates", [True, False]) | ||
def test_varmax_regressor_prediction_intervals(use_covariates, ts_multiseries_data): | ||
X_train, X_test, y_train = ts_multiseries_data(no_features=not use_covariates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an interesting test here would be to check the cases where X is None and use_covariates is True, and where X is not None and use_covariates is False - we have lots of checks for those cases, it'd be nice to ensure we handle those smoothly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, you mean the cases where X
in fit()
and not in in get_prediction_intervals()
right? I can add that case in!
# anchor represents where the simulations should start from (forecasting is done from the "end") | ||
y_pred = self._component_obj._fitted_forecaster.simulate( | ||
nsimulations=X.shape[0], | ||
repetitions=400, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, poking around in our exponential smoother and statsmodels' docs on the subject, it's unclear to me why this was set at 400. I think at least setting it as a constant would be good, since the number seems arbitrary.
dd72b1b
to
962d3e7
Compare
962d3e7
to
2890f04
Compare
Resolves #4262