Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/conformal prediction #2552

Open
wants to merge 67 commits into
base: master
Choose a base branch
from
Open

Conversation

dennisbader
Copy link
Collaborator

@dennisbader dennisbader commented Oct 3, 2024

Checklist before merging this PR:

  • Mentioned all issues that this PR fixes or addresses.
  • Summarized the updates of this PR under Summary.
  • Added an entry under Unreleased in the Changelog.

Fixes #1704, fixes #2161.

Short Summary

  • Adds the first two Conformal Prediction Models: ConformalNaiveModel, and ConformalQRModel (read more below).
  • Adds 3 new quantile interval metrics (plus their aggregated versions):
    • Interval Winkler Score iws(), and Mean Interval Winkler Scores miws() (time-aggregated) (source)
    • Interval Coverage ic() (binary if observation is within the quantile interval), and Mean Interval Covarage mic() (time-aggregated)
    • Interval Non-Conformity Score for Quantile Regression incs_qr(), and Mean ... mincs_qr() (time-aggregated) (source)
  • Adds support for overlap_end=True in ForecastingModel.residuals(). This computes historical forecasts and residuals that can extend further than the end of the target series. With this, all returned residual values have the same length per forecast (the last residuals will contain missing values, if the forecasts extended further into the future than the end of the target series).

Summary

Adds first conformal prediction models to Darts. Conformal models can be applied to any of Darts' global forecasting model, as long as the model has been fitted before. In general the workflow of the models to produce one forecast/prediction is as follows:

  • Extract a calibration set:
    The number of calibration examples from the most recent past to use for one conformal prediction can be defined at model creation with parameter cal_length. To make your life simpler, we support two modes:
    • Automatic extraction of the calibration set from the past of your input series (series, past_covariates, ...). This is the default mode and our predict/forecasting/backtest/.... API is identical to any other forecasting model
    • Supply a fixed calibration set with parameters cal_series, cal_past_covariates, ... .
  • Generate historical forecasts on the calibration set (using the forecasting model)
  • Compute the errors/non-conformity scores (specific to each conformal model) on these historical forecasts
  • Compute the quantile values from the errors / non-conformity scores (using our desired quantiles set at model creation with parameter quantiles).
  • Compute the conformal prediction: Add the calibrated intervals to (or adjust the existing intervals of) the forecasting model's predictions.

Notes:

  • When computing historical_forecasts(), backtest(), residuals(), ... the above is applied for each forecast
  • For multi-horizon forecasts, the above is applied for each step in the horizon separately
  • Focus was put on keeping it as efficient as possible using mostly "vectorized" operations

Input Support

All added conformal models support the following input (depending on the fitted forecasting model):

  • uni/multivariate target series
  • past/future/static covariates
  • single/multiple series

Forecast/Output Support

All models support the following prediction modes:

  • single/multi-horizon forecasts. For multi-horizon, the calibration process is repeated per step in the forecast horizon.
  • single/mutliple quantile intervals: It can be any number of quantile intervals as long as they are centered around the median (e.g., quantiles=[0.05, 0.2, 0.5, 0.8, 0.95]).
  • historical forecasts with expanding or rolling calibration sets with parameter cal_length (to make the algorithm adaptive)
  • direct quantile predictions using predict_likelihood_parameters=True, num_samples=1 in all prediction methods.
  • sampled predictions from these quantile predictions using num_samples>>1 in all prediction methods.

Requirements to use a conformal model:

  • Any pre-trained GlobalForecastingModel (global baselines, all regression models, all torch models)
  • A long enough calibration set, depending on the forecast horizon n. It must be possible to generate at least n + cal_length historical forecasts from the calibration input series.

Added Algorithms

Added two algorithms each with two symmetry modes:

  • ConformalNaiveModel: Adds calibrated intervals around the median forecast from the forecasting model.
    • symmetric=True:
      • The lower and upper interval bounds are calibrated by the same magnitude.
      • Non-conformity scores: uses metric ae() (absolute error) to compute the non-conformity scores
    • symmetric=False
      • The lower and upper interval bounds are calibrated separately
      • Non-conformity scores: uses metric err() (error) to compute the non-conformity scores of the upper bounds, an -err() for the lower bounds.
  • ConformalQRModel (Conformalized Quantile Regression, source): Calibrates the quantile predictions from a probabilistic forecasting model.
    • symmetric=True:
      • The lower and upper interval bounds are calibrated by the same magnitude.
      • Non-conformity scores: uses metric incs_qr(symmetric=True) (Quantile Regression Non-Conformity Score) to compute the non-conformity scores
    • symmetric=False
      • The lower and upper interval bounds are calibrated separately
      • Non-conformity scores: uses metric incs_qr(symmetric=False) (Quantile Regression Non-Conformity Score) to compute the non-conformity scores for the upper and lower bound separately.

@dennisbader dennisbader self-assigned this Oct 3, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

codecov bot commented Oct 15, 2024

Codecov Report

Attention: Patch coverage is 95.17820% with 23 lines in your changes missing coverage. Please review.

Project coverage is 94.12%. Comparing base (bb24999) to head (1272bfc).

Files with missing lines Patch % Lines
darts/models/forecasting/conformal_models.py 94.81% 17 Missing ⚠️
darts/utils/utils.py 89.79% 5 Missing ⚠️
darts/utils/historical_forecasts/utils.py 96.87% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2552      +/-   ##
==========================================
- Coverage   94.14%   94.12%   -0.02%     
==========================================
  Files         139      140       +1     
  Lines       14884    15311     +427     
==========================================
+ Hits        14013    14412     +399     
- Misses        871      899      +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@madtoinou madtoinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @dennis!

Some small comments, mostly documentation.

@@ -787,16 +790,19 @@ def merr(
Returns
-------
float
A single metric score for:
A single metric score for (with `len(q) <= 1`):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording is a bit confusing here, I would maybe order the words a bit differently

Suggested change
A single metric score for (with `len(q) <= 1`):
A single metric score (when `len(q) <= 1`) for:


- single univariate series.
- single multivariate series with `component_reduction`.
- sequence (list) of uni/multivariate series with `series_reduction` and `component_reduction`.
- a sequence (list) of uni/multivariate series with `series_reduction` and `component_reduction`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add an "a" here, we should add it to the bullets points above as well to stay consistent


- the input from the `float` return case above but with `len(q) > 1`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't find this bullet point very clear; you mean that when len(q) > 1, the bullets points from above are also applicable here? I wonder if it would not be just clearer to repeat them here as well?

For time series that are overlapping in time without having the same time index, setting `True`
will consider the values only over their common time interval (intersection in time).
q_interval
The quantile interval(s) to compute the metric on. Must be a tuple (single interval) or sequence tuples
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The quantile interval(s) to compute the metric on. Must be a tuple (single interval) or sequence tuples
The quantile interval(s) to compute the metric on. Must be a tuple (single interval) or sequence of tuples

Comment on lines +140 to +143
for q_high, q_low in zip(
self.quantiles[self.idx_median + 1 :][::-1],
self.quantiles[: self.idx_median],
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be a bit simpler to just reuse the list of tuples stored in self.q_interval instead of iterating on the array again.

model.predict(n=1)

pred = model.predict(n=self.horizon, series=self.ts_pass_train, **pred_lklp)
assert pred.n_components == self.ts_pass_train.n_components * 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert pred.n_components == self.ts_pass_train.n_components * 3
assert pred.n_components == self.ts_pass_train.n_components * len(kwargs["quantiles"])

len(pred_list) == 2
), f"Model {model_cls} did not return a list of prediction"
for pred, pred_fc in zip(pred_list, pred_fc_list):
assert pred.n_components == self.ts_pass_train.n_components * 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert pred.n_components == self.ts_pass_train.n_components * 3
assert pred.n_components == self.ts_pass_train.n_components * len(kwargs["quantiles"])

with pytest.raises(ValueError):
covs = cov_kwargs_train[cov_name]
covs = {cov_name: covs.stack(covs)}
_ = model.predict(n=OUT_LEN + 1, **covs, **pred_lklp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To eliminate the other possible source of error, n should be OUT_LEN

with pytest.raises(ValueError):
covs = cov_kwargs_notrain[cov_name]
covs = {cov_name: covs.stack(covs)}
_ = model.predict(n=OUT_LEN + 1, **covs, **pred_lklp)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n should be OUT_LEN.

model = ConformalNaiveModel(model=train_model(series), quantiles=quantiles)
# direct quantile predictions
pred_quantiles = model.predict(n=3, series=series, **pred_lklp)
# smapled predictions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# smapled predictions
# sampled predictions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding bootstrapping functionnality from residuals of a model Conformal Predictions
2 participants