Feat/conformal prediction #2552

dennisbader · 2024-10-03T13:31:58Z

Checklist before merging this PR:

Mentioned all issues that this PR fixes or addresses.
Summarized the updates of this PR under Summary.
Added an entry under Unreleased in the Changelog.

Fixes #1704, fixes #2161.

Short Summary

Adds the first two Conformal Prediction Models: ConformalNaiveModel, and ConformalQRModel (read more below).
Adds 3 new quantile interval metrics (plus their aggregated versions):
- Interval Winkler Score iws(), and Mean Interval Winkler Scores miws() (time-aggregated) (source)
- Interval Coverage ic() (binary if observation is within the quantile interval), and Mean Interval Covarage mic() (time-aggregated)
- Interval Non-Conformity Score for Quantile Regression incs_qr(), and Mean ... mincs_qr() (time-aggregated) (source)
Adds support for overlap_end=True in ForecastingModel.residuals(). This computes historical forecasts and residuals that can extend further than the end of the target series. With this, all returned residual values have the same length per forecast (the last residuals will contain missing values, if the forecasts extended further into the future than the end of the target series).

Summary

Adds first conformal prediction models to Darts. Conformal models can be applied to any of Darts' global forecasting model, as long as the model has been fitted before. In general the workflow of the models to produce one forecast/prediction is as follows:

Extract a calibration set:
The number of calibration examples from the most recent past to use for one conformal prediction can be defined at model creation with parameter cal_length. To make your life simpler, we support two modes:
- Automatic extraction of the calibration set from the past of your input series (series, past_covariates, ...). This is the default mode and our predict/forecasting/backtest/.... API is identical to any other forecasting model
- Supply a fixed calibration set with parameters cal_series, cal_past_covariates, ... .
Generate historical forecasts on the calibration set (using the forecasting model)
Compute the errors/non-conformity scores (specific to each conformal model) on these historical forecasts
Compute the quantile values from the errors / non-conformity scores (using our desired quantiles set at model creation with parameter quantiles).
Compute the conformal prediction: Add the calibrated intervals to (or adjust the existing intervals of) the forecasting model's predictions.

Notes:

When computing historical_forecasts(), backtest(), residuals(), ... the above is applied for each forecast

For multi-horizon forecasts, the above is applied for each step in the horizon separately

Focus was put on keeping it as efficient as possible using mostly "vectorized" operations

Input Support

All added conformal models support the following input (depending on the fitted forecasting model):

uni/multivariate target series
past/future/static covariates
single/multiple series

Forecast/Output Support

All models support the following prediction modes:

single/multi-horizon forecasts. For multi-horizon, the calibration process is repeated per step in the forecast horizon.
single/mutliple quantile intervals: It can be any number of quantile intervals as long as they are centered around the median (e.g., quantiles=[0.05, 0.2, 0.5, 0.8, 0.95]).
historical forecasts with expanding or rolling calibration sets with parameter cal_length (to make the algorithm adaptive)
direct quantile predictions using predict_likelihood_parameters=True, num_samples=1 in all prediction methods.
sampled predictions from these quantile predictions using num_samples>>1 in all prediction methods.

Requirements to use a conformal model:

Any pre-trained GlobalForecastingModel (global baselines, all regression models, all torch models)
A long enough calibration set, depending on the forecast horizon n. It must be possible to generate at least n + cal_length historical forecasts from the calibration input series.

Added Algorithms

Added two algorithms each with two symmetry modes:

ConformalNaiveModel: Adds calibrated intervals around the median forecast from the forecasting model.
- symmetric=True:
  - The lower and upper interval bounds are calibrated by the same magnitude.
  - Non-conformity scores: uses metric ae() (absolute error) to compute the non-conformity scores
- symmetric=False
  - The lower and upper interval bounds are calibrated separately
  - Non-conformity scores: uses metric err() (error) to compute the non-conformity scores of the upper bounds, an -err() for the lower bounds.
ConformalQRModel (Conformalized Quantile Regression, source): Calibrates the quantile predictions from a probabilistic forecasting model.
- symmetric=True:
  - The lower and upper interval bounds are calibrated by the same magnitude.
  - Non-conformity scores: uses metric incs_qr(symmetric=True) (Quantile Regression Non-Conformity Score) to compute the non-conformity scores
- symmetric=False
  - The lower and upper interval bounds are calibrated separately
  - Non-conformity scores: uses metric incs_qr(symmetric=False) (Quantile Regression Non-Conformity Score) to compute the non-conformity scores for the upper and lower bound separately.

review-notebook-app · 2024-10-03T16:32:23Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

codecov · 2024-10-15T16:25:07Z

Codecov Report

Attention: Patch coverage is 95.17820% with 23 lines in your changes missing coverage. Please review.

Project coverage is 94.12%. Comparing base (bb24999) to head (1272bfc).

Files with missing lines	Patch %	Lines
darts/models/forecasting/conformal_models.py	94.81%	17 Missing ⚠️
darts/utils/utils.py	89.79%	5 Missing ⚠️
darts/utils/historical_forecasts/utils.py	96.87%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2552      +/-   ##
==========================================
- Coverage   94.14%   94.12%   -0.02%     
==========================================
  Files         139      140       +1     
  Lines       14884    15311     +427     
==========================================
+ Hits        14013    14412     +399     
- Misses        871      899      +28

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

madtoinou

Amazing work @dennis!

Some small comments, mostly documentation.

madtoinou · 2024-10-03T15:31:27Z

darts/metrics/metrics.py

@@ -787,16 +790,19 @@ def merr(
    Returns
    -------
    float
-        A single metric score for:
+        A single metric score for (with `len(q) <= 1`):


The wording is a bit confusing here, I would maybe order the words a bit differently

Suggested change

A single metric score for (with `len(q) <= 1`):

A single metric score (when `len(q) <= 1`) for:

madtoinou · 2024-10-03T15:32:02Z

darts/metrics/metrics.py


        - single univariate series.
        - single multivariate series with `component_reduction`.
-        - sequence (list) of uni/multivariate series with `series_reduction` and `component_reduction`.
+        - a sequence (list) of uni/multivariate series with `series_reduction` and `component_reduction`.


If we add an "a" here, we should add it to the bullets points above as well to stay consistent

madtoinou · 2024-10-03T15:34:22Z

darts/metrics/metrics.py


+        - the input from the `float` return case above but with `len(q) > 1`.


I don't find this bullet point very clear; you mean that when len(q) > 1, the bullets points from above are also applicable here? I wonder if it would not be just clearer to repeat them here as well?

madtoinou · 2024-10-03T15:37:56Z

darts/metrics/metrics.py

+        For time series that are overlapping in time without having the same time index, setting `True`
+        will consider the values only over their common time interval (intersection in time).
+    q_interval
+        The quantile interval(s) to compute the metric on. Must be a tuple (single interval) or sequence tuples


Suggested change

The quantile interval(s) to compute the metric on. Must be a tuple (single interval) or sequence tuples

The quantile interval(s) to compute the metric on. Must be a tuple (single interval) or sequence of tuples

madtoinou · 2024-10-04T06:48:16Z

darts/models/forecasting/conformal_models.py

+            for q_high, q_low in zip(
+                self.quantiles[self.idx_median + 1 :][::-1],
+                self.quantiles[: self.idx_median],
+            )


Would be a bit simpler to just reuse the list of tuples stored in self.q_interval instead of iterating on the array again.

madtoinou · 2024-10-23T16:59:14Z