Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anomaly Detection (anomaly model, scorer, detector, aggregator) #1256

Merged
merged 151 commits into from
Dec 22, 2022
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
151 commits
Select commit Hold shift + click to select a range
85c7a7d
AD first Ver
Oct 4, 2022
5ac2fd5
AD first Version
Oct 4, 2022
5743d58
added ForecastingAnomalyModel/FilteringAnomalyModel, and scorers: Kme…
Oct 5, 2022
2450e9a
implemented GaussianMixtureScorer and allow multiple scorer inputs
Oct 7, 2022
900b95f
Added comments and possibility to input a list of scorers in AnomalyM…
Oct 10, 2022
8408c8f
Clean whitespace
Oct 10, 2022
e9880e1
Clean whitespace2
Oct 10, 2022
6b65a0e
Clean whitespace2
Oct 10, 2022
626da3d
Clean whitespace with VScode
Oct 10, 2022
887f933
Merge branch 'master' into feat/anomaly_detection_API
julien12234 Oct 14, 2022
babe8c7
Changed diff() position and added characteristic_length parameters
Oct 14, 2022
cd48097
renamed submodule
hrzn Oct 15, 2022
7d0b369
small changes
hrzn Oct 15, 2022
56eaf9d
small improvements
hrzn Oct 16, 2022
1c0d4f4
small changes
hrzn Oct 16, 2022
72799b0
Accepts all types UTS, MTS, list(UTS or MTS)
Oct 28, 2022
7f20166
move _diff() in child, so that scorers have all the same signature
Oct 28, 2022
7a37038
replaced L1, L2, and Abs_diff with Norm
Oct 31, 2022
e6a72da
add component_wise to WassersteinScorer
Oct 31, 2022
b117bd7
add component_wise to Kmeans
Oct 31, 2022
c63cef8
add component_wise to LOF
Oct 31, 2022
247001c
add component_wise to GaussianMixture
Oct 31, 2022
9782272
Accept num_samples for probabilistic models forecasting
Nov 3, 2022
729a1d9
Minor changes
Nov 4, 2022
d8fb10f
add comments, add likelihood
Nov 9, 2022
8661d6d
add laplace, + window parameter + parameter alllow_retrain
Nov 10, 2022
3b83b11
add cauchy and gamma likelihood
Nov 11, 2022
71f12dc
add utils.py, detectors, aggregators
julien12234 Nov 14, 2022
71e3a6e
removed show function for now
julien12234 Nov 14, 2022
e17a046
add show_anomalies() and show_anomalies_from_scores()
julien12234 Nov 16, 2022
c0bd73f
small changes
julien12234 Nov 16, 2022
a34c479
Merge branch 'master' of github.com:unit8co/darts into feat/anomaly_d…
hrzn Nov 20, 2022
c89c1c2
Merge branch 'master' into feat/anomaly_detection_API
hrzn Nov 21, 2022
fc29b78
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Nov 21, 2022
ca551e3
Some docstring improvements to AnomalyModels
hrzn Nov 21, 2022
1f8c52a
corrected Kmeans, LFO and Gaussian Scorer + added input from PR
julien12234 Nov 22, 2022
fa0618e
test commit
julien12234 Nov 22, 2022
003febd
negative LFO and gaussian
julien12234 Nov 23, 2022
45757b4
Merge branch 'master' into feat/anomaly_detection_API
hrzn Nov 25, 2022
b63e65c
Merge branch 'master' into feat/anomaly_detection_API
julien12234 Nov 28, 2022
7683bae
pre pull
julien12234 Nov 28, 2022
323fbc1
from prediciton structure
julien12234 Nov 28, 2022
0765d49
improved show_anomalies, changed structure _from_prediction
julien12234 Nov 30, 2022
8ffd0c1
small mistake in eval_accuracy in utils.py
julien12234 Nov 30, 2022
c8bea7f
return type of eval_acc
julien12234 Dec 1, 2022
17f5d68
changed way eval_acc returns in anomaly_model
julien12234 Dec 1, 2022
fa46383
added test for agg, dect, and scorers. upgrade agg trainable
julien12234 Dec 2, 2022
92013d1
added parameter return_UTS, and added test for scorers and anomaly_model
julien12234 Dec 3, 2022
124a221
small mistake in anomaly_model
julien12234 Dec 3, 2022
441bf24
New structure in files
julien12234 Dec 6, 2022
c3a56f5
Added warnings
julien12234 Dec 7, 2022
9987ad6
small change in wasserstein
julien12234 Dec 8, 2022
0a8b3f7
Merge branch 'master' into feat/anomaly_detection_API
hrzn Dec 9, 2022
d294740
filtering_am and forecasting_am
julien12234 Dec 9, 2022
c3efb69
Small improvements
hrzn Dec 9, 2022
13a365d
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 9, 2022
b5ad2fa
Fix test names
hrzn Dec 9, 2022
7f1582f
add pyod to requirements
hrzn Dec 9, 2022
6d771b2
rename scorers
hrzn Dec 9, 2022
7c02b1d
scorers imports
hrzn Dec 9, 2022
5c67ce2
Changed handling of kwargs in AD models
hrzn Dec 9, 2022
6948b61
update tests
hrzn Dec 9, 2022
3f1b21e
return single TimeSeries from score() in some cases
hrzn Dec 10, 2022
ec6baf4
small naming improvements
hrzn Dec 10, 2022
5e6f65e
Some improvements to anomaly models
hrzn Dec 10, 2022
68d388d
Small improvements to scorers
hrzn Dec 11, 2022
11ee748
Some small improvements
hrzn Dec 11, 2022
0d3464b
Fix tests
hrzn Dec 11, 2022
40aa67f
Merge branch 'master' into feat/anomaly_detection_API
hrzn Dec 12, 2022
e7640d7
Norm scorer docstring
hrzn Dec 12, 2022
d6e79af
test toy example agg and detectors
julien12234 Dec 12, 2022
7f5c30b
small docstring improvements
hrzn Dec 12, 2022
759ec9a
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 12, 2022
b9e50fb
Add vectorization todos
hrzn Dec 12, 2022
0695719
test toy example scorers
julien12234 Dec 12, 2022
a76759a
test toy example scorers
julien12234 Dec 12, 2022
b0047c5
test toy example PyOD
julien12234 Dec 13, 2022
0b09b8f
test toy example NLL scorers
julien12234 Dec 13, 2022
3f1ddf6
test toy example poisson nll scorer
julien12234 Dec 13, 2022
fcd623e
test toy example univariate anomaly_models
julien12234 Dec 13, 2022
b0d405a
test toy example univariate covariates forecasting_anomaly_models
julien12234 Dec 13, 2022
af4a489
update threshold detector docstring
hrzn Dec 13, 2022
34ca833
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 13, 2022
6df9ed6
change way to output string messages
julien12234 Dec 13, 2022
e3f486c
first implementation of julien H's PR review
julien12234 Dec 13, 2022
c160234
first implementation of julien H's PR review 2
julien12234 Dec 13, 2022
f745a45
anomaly_model forecasting multivariate test
julien12234 Dec 14, 2022
457ad5b
anomaly_model multivariate, w=1,2, len()=2 test for NLL scorers
julien12234 Dec 14, 2022
b40c7c7
changed NLL scorers: call scipy.stats function
julien12234 Dec 14, 2022
6d9279f
changed in anomaly_models (inner to outer for series and scorers)
julien12234 Dec 14, 2022
7b80625
Small changes to PyOD detector
hrzn Dec 15, 2022
4164c57
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 15, 2022
d5cc56a
improvements to wasserstein scorer docstring
hrzn Dec 15, 2022
10c8d6b
change in eval acc
julien12234 Dec 15, 2022
5cdfc6d
change in eval acc, new function _eval_accuracy_from_scores
julien12234 Dec 15, 2022
cb59cd4
Small improvements to aggregators
hrzn Dec 15, 2022
10a79ab
Small docstrings improvements
hrzn Dec 15, 2022
a435f6a
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 15, 2022
7a0f5e7
Utils docstring
hrzn Dec 15, 2022
33aeea4
change in detectors (vectorization and accepts list of param if multi…
julien12234 Dec 15, 2022
afd7c61
remove exp in PyODScorer... and updated test
julien12234 Dec 15, 2022
080e0f9
new test with np.testing
julien12234 Dec 15, 2022
883d587
agg accept only MTS or sequence of MTS
julien12234 Dec 16, 2022
409f215
removed old detectors
julien12234 Dec 16, 2022
005003a
new multivariate test for filtering anomaly model
julien12234 Dec 16, 2022
fa7f271
small changes to utils docstrings
hrzn Dec 16, 2022
aef8127
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 16, 2022
a4b8e53
test assert_array_almost_equal decimal 2
julien12234 Dec 16, 2022
81554b0
test assert_array_almost_equal decimal 1
julien12234 Dec 16, 2022
0e70677
test assert_array_almost_equal decimal 1
julien12234 Dec 16, 2022
7a7b553
second implementation of julien H's PR review
julien12234 Dec 16, 2022
1f1a9b1
vectorization of NLL scorers
julien12234 Dec 16, 2022
09a8ac6
problem with test_univariate_FilteringAnomalyModel
julien12234 Dec 16, 2022
107ebaf
replace abs by __abs__ in test_univariate_covariate_ForecastingAnomal…
julien12234 Dec 16, 2022
6a5bed4
Merge branch 'master' into feat/anomaly_detection_API
hrzn Dec 16, 2022
cb2a127
replace abs by __abs__ in ALL test_univariate_covariate_ForecastingAn…
julien12234 Dec 17, 2022
4a9619b
Increase coverage of scorers tests
hrzn Dec 20, 2022
816377b
Imports in submodules
hrzn Dec 20, 2022
323bda8
Some improvements to utils
hrzn Dec 20, 2022
edad060
Some improvements
hrzn Dec 20, 2022
8ea8ea7
significant rework of quantile detector
hrzn Dec 21, 2022
f4ef944
Rework threshold detector
hrzn Dec 21, 2022
ff62c3a
Rework NLL scorers
hrzn Dec 22, 2022
7e59cea
Rename NLL scorers files
hrzn Dec 22, 2022
ca9efc3
vectorize windowing in k-means
hrzn Dec 22, 2022
b1a73d9
vectorization of windowing in PyOD and Wasserstein
hrzn Dec 22, 2022
6041a67
Docstring improvements
hrzn Dec 22, 2022
b545d18
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
96206fa
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
5899096
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
3213467
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
18f9f49
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
2b8c9a9
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
eb714ef
Update darts/ad/anomaly_model/__init__.py
hrzn Dec 22, 2022
8e0a488
Update darts/ad/anomaly_model/__init__.py
hrzn Dec 22, 2022
22d9474
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
46b7603
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
18613c1
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
eb00de2
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
cbcbe1b
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
5a45fe6
Update darts/ad/scorers/__init__.py
hrzn Dec 22, 2022
b5ffd08
Update darts/ad/scorers/scorers.py
hrzn Dec 22, 2022
f597d21
Update darts/ad/scorers/scorers.py
hrzn Dec 22, 2022
b366367
Update darts/ad/scorers/scorers.py
hrzn Dec 22, 2022
ffbd101
Update darts/ad/scorers/kmeans_scorer.py
hrzn Dec 22, 2022
986aa2a
Merge branch 'master' into feat/anomaly_detection_API
hrzn Dec 22, 2022
f362b5a
PR comments
hrzn Dec 22, 2022
05f4378
Formatting
hrzn Dec 22, 2022
d5195ab
Update darts/ad/scorers/pyod_scorer.py
hrzn Dec 22, 2022
a13edd2
Small docstring improvement
hrzn Dec 22, 2022
82b65cb
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ coverage.xml
docs_env
.DS_Store
.gradle
*.csv
*.ipynb
julien12234 marked this conversation as resolved.
Show resolved Hide resolved

# used by CI to build with latest versions of dependencies
requirements-latest.txt
353 changes: 353 additions & 0 deletions darts/anomaly_detection/anomaly_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,353 @@
"""
hrzn marked this conversation as resolved.
Show resolved Hide resolved
AnomalyModel
-------
Anomaly models expect a model and a scorer, and will take as input a time series and returns its anomaly score
as a time series.

The model can be a forecasting method (ForecastingAnomalyModel) or a filtering method (FilteringAnomalyModel).
The main functions are `fit()` (only for the trainable model/scorer), `score()` and `score_metric()`. `fit()`
will train the model and/or the scorer, over the history of one time series. `score()` will apply the model on
the time series input, and the scorer on the prediction of the model and the time series input. The `score()`
will output the anomaly score of the time series input. The function `score_metric()` is the same as `score()`,
but outputs the score of an agnostic threshold metric (AUC-ROC or AUC-PR), between the predicted anomaly score
time series and a binary ground truth time series indicating the presence of anomalies.
"""

from abc import ABC, abstractmethod
from typing import Any, Dict, Optional, Sequence, Union

from darts.anomaly_detection.score import Scorer
from darts.logging import raise_if_not
from darts.models.filtering.filtering_model import FilteringModel
from darts.models.forecasting.forecasting_model import ForecastingModel
from darts.timeseries import TimeSeries


hrzn marked this conversation as resolved.
Show resolved Hide resolved
class AnomalyModel(ABC):
"Base class for all Anomaly Model"

@abstractmethod
def fit(
self, series: Union[TimeSeries, Sequence[TimeSeries]]
) -> Union[TimeSeries, Sequence[TimeSeries]]:
pass

@abstractmethod
def score(
self, series: Union[TimeSeries, Sequence[TimeSeries]]
) -> Union[TimeSeries, Sequence[TimeSeries]]:
pass

@abstractmethod
def score_metric(
self, series: Union[TimeSeries, Sequence[TimeSeries]]
) -> Union[TimeSeries, Sequence[TimeSeries]]:
pass


class ForecastingAnomalyModel(AnomalyModel):
def __init__(
self, model: ForecastingModel, scorer: Union[Scorer, Sequence[Scorer]]
):
"""Forecasting Anomaly Model

Parameters
----------
model : ForecastingModel
A forecasting model from Darts that will be used to predict the actual time series
scorer : Scorer
A scorer that will be used to convert the actual and predicted time series to
an anomaly score time series. If a list of n scorer is given, the anomaly model will test each
one of the scorers and output n anomaly score.
"""
super().__init__()

raise_if_not(
isinstance(model, ForecastingModel),
f"Model must be a darts.models.forecasting not a {type(model)}",
)
self.model = model

if isinstance(scorer, Sequence):
self.scorers = scorer
else:
self.scorers = [scorer]

for scorer in self.scorers:
raise_if_not(
isinstance(scorer, Scorer),
f"Scorer must be a darts.anomaly_detection.score not a {type(scorer)}",
)

def fit(
self,
series: TimeSeries,
model_fit_params: Optional[Dict[str, Any]] = None,
hist_forecasts_params: Optional[Dict[str, Any]] = None,
):
julien12234 marked this conversation as resolved.
Show resolved Hide resolved
"""Train the model and the scorer(s) on the given time series.

Parameters
----------
series : Darts TimeSeries
model_fit_params: dict, optional
parameters of the Darts `.fit()` forecasting model
hist_forecasts_params: dict, optional
parameters of the Darts `.historical_forecasts()` forecasting model

Returns
-------
self
Fitted Anomaly model (forecasting model and scorer(s))
"""

if model_fit_params is None:
model_fit_params = {}

if hist_forecasts_params is None:
hist_forecasts_params = {}

# fit forecasting model
if hasattr(self.model, "fit"):
if not self.model._fit_called:
self.model.fit(series, **model_fit_params)

# fit scorer model
for scorer in self.scorers:
if hasattr(scorer, "fit"):
pred = self.model.historical_forecasts(
series, retrain=False, **hist_forecasts_params
)

scorer.fit(pred, series)

def score(
self, series: TimeSeries, hist_forecasts_params: Optional[Dict[str, Any]] = None
):
"""Predicts the given input time series with the forecasting model, and applies the scorer(s)
on the prediction and the given input time series. Outputs the anomaly score of the given
input time series.

Parameters
----------
series : Darts TimeSeries
hist_forecasts_params: dict, optional
parameters of the Darts `.historical_forecasts()` forecasting model

Returns
-------
Darts TimeSeries
Anomaly score time series
"""

if hist_forecasts_params is None:
hist_forecasts_params = {}

raise_if_not(
self.model._fit_called,
f"Model {self.model} has not been trained. Please call .fit()",
)

pred = self.model.historical_forecasts(
series, retrain=False, **hist_forecasts_params
)
julien12234 marked this conversation as resolved.
Show resolved Hide resolved

anomaly_scores = []

for i, scorer in enumerate(self.scorers):
anomaly_scores.append(scorer.compute(pred, series))

if i == 0:
return anomaly_scores[0]
else:
return anomaly_scores

def score_metric(
self,
series: TimeSeries,
true_anomalies: TimeSeries,
hist_forecasts_params: Optional[Dict[str, Any]] = None,
metric="AUC_ROC",
):
julien12234 marked this conversation as resolved.
Show resolved Hide resolved
"""Predicts the given input time series with the forecasting model, and applies the
scorer(s) on the filtered time series and the given input time series. Returns the
score(s) of an agnostic threshold metric, based on the anomaly score given by the scorer(s).

Parameters
----------
series : Darts TimeSeries
actual_anomalies: Binary Darts TimeSeries
The ground truth of the anomalies (1 if it is an anomaly and 0 if not)
hist_forecasts_params: dict, optional
parameters of the Darts `.historical_forecasts()` forecasting model
metric: str
The selected metric to use. Can be 'AUC_ROC' (default value) or 'AUC_PR'

Returns
-------
float
Score for the time series
"""

if hist_forecasts_params is None:
hist_forecasts_params = {}

raise_if_not(
self.model._fit_called,
f"Model {self.model} has not been trained. Please call .fit()",
)

pred = self.model.historical_forecasts(
series, retrain=False, **hist_forecasts_params
)

anomaly_scores = []

for i, scorer in enumerate(self.scorers):
anomaly_scores.append(
scorer.compute_score(true_anomalies, pred, series, metric)
)

if i == 0:
return anomaly_scores[0]
else:
return anomaly_scores


class FilteringAnomalyModel(AnomalyModel):
def __init__(self, filter: FilteringModel, scorer: Union[Scorer, Sequence[Scorer]]):
"""Filtering Anomaly Model

Parameters
----------
model : Filtering
A filtering model from Darts that will be used to filter the actual time series
scorer : Scorer
A scorer that will be used to convert the actual and filtered time series to
an anomaly score time series
"""

super().__init__()

raise_if_not(
isinstance(filter, FilteringModel),
f"Model must be a darts.models.filtering not a {type(filter)}",
)
self.filter = filter

if isinstance(scorer, Sequence):
self.scorers = scorer
else:
self.scorers = [scorer]

for scorer in self.scorers:
raise_if_not(
isinstance(scorer, Scorer),
f"Scorer must be a darts.anomaly_detection.score not a {type(scorer)}",
)

def fit(
self, series: TimeSeries, filter_fit_params: Optional[Dict[str, Any]] = None
):
"""Train the filter and the scorer(s) on the given time series.

Parameters
----------
series : Darts TimeSeries
filter_fit_params: dict, optional
parameters of the Darts `.fit()` filtering model

Returns
-------
self
Fitted Anomaly model (filtering model and scorer(s))
"""

if filter_fit_params is None:
filter_fit_params = {}

# fit filtering model
if hasattr(self.filter, "fit"):
# TODO: check if filter is already fitted (for now fit it regardless -> only Kallman)
self.filter.fit(series, **filter_fit_params)

# fit scorer model
for scorer in self.scorers:
if hasattr(scorer, "fit"):
pred = self.filter.filter(series)
scorer.fit(pred, series)

def score(self, series: TimeSeries, filter_params: Optional[Dict[str, Any]] = None):
"""Filters the given input time series with the filtering model, and applies the scorer(s)
on the filtered time series and the given input time series. Outputs the anomaly score of
the given input time series.

Parameters
----------
series : Darts TimeSeries
filter_params: dict, optional
parameters of the Darts `.filter()` filtering model

Returns
-------
Darts TimeSeries
Anomaly score time series
"""

if filter_params is None:
filter_params = {}

pred = self.filter.filter(series, **filter_params)

anomaly_scores = []

for i, scorer in enumerate(self.scorers):
anomaly_scores.append(scorer.compute(pred, series))

if i == 0:
return anomaly_scores[0]
else:
return anomaly_scores

def score_metric(
self,
series: TimeSeries,
true_anomalies: TimeSeries,
filter_params: Optional[Dict[str, Any]] = None,
metric="AUC_ROC",
):
"""Filters the given input time series with the filtering model, and applies the scorer(s)
on the filtered time series and the given input time series. Returns the score(s)
of an agnostic threshold metric, based on the anomaly score given by the scorer(s).

Parameters
----------
series : Darts TimeSeries
actual_anomalies: Binary Darts TimeSeries
The ground truth of the anomalies (1 if it is an anomaly and 0 if not)
filter_params: dict, optional
parameters of the Darts `.filter()` filtering model
metric: str
The selected metric to use. Can be 'AUC_ROC' (default value) or 'AUC_PR'

Returns
-------
float
Score for the time series
"""
if filter_params is None:
filter_params = {}

pred = self.filter.filter(series, **filter_params)

anomaly_scores = []

for i, scorer in enumerate(self.scorers):
anomaly_scores.append(
scorer.compute_score(true_anomalies, pred, series, metric)
)

if i == 0:
return anomaly_scores[0]
else:
return anomaly_scores
Loading