Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anomaly Detection (anomaly model, scorer, detector, aggregator) #1256

Merged
merged 151 commits into from
Dec 22, 2022
Merged
Changes from 1 commit
Commits
Show all changes
151 commits
Select commit Hold shift + click to select a range
85c7a7d
AD first Ver
Oct 4, 2022
5ac2fd5
AD first Version
Oct 4, 2022
5743d58
added ForecastingAnomalyModel/FilteringAnomalyModel, and scorers: Kme…
Oct 5, 2022
2450e9a
implemented GaussianMixtureScorer and allow multiple scorer inputs
Oct 7, 2022
900b95f
Added comments and possibility to input a list of scorers in AnomalyM…
Oct 10, 2022
8408c8f
Clean whitespace
Oct 10, 2022
e9880e1
Clean whitespace2
Oct 10, 2022
6b65a0e
Clean whitespace2
Oct 10, 2022
626da3d
Clean whitespace with VScode
Oct 10, 2022
887f933
Merge branch 'master' into feat/anomaly_detection_API
julien12234 Oct 14, 2022
babe8c7
Changed diff() position and added characteristic_length parameters
Oct 14, 2022
cd48097
renamed submodule
hrzn Oct 15, 2022
7d0b369
small changes
hrzn Oct 15, 2022
56eaf9d
small improvements
hrzn Oct 16, 2022
1c0d4f4
small changes
hrzn Oct 16, 2022
72799b0
Accepts all types UTS, MTS, list(UTS or MTS)
Oct 28, 2022
7f20166
move _diff() in child, so that scorers have all the same signature
Oct 28, 2022
7a37038
replaced L1, L2, and Abs_diff with Norm
Oct 31, 2022
e6a72da
add component_wise to WassersteinScorer
Oct 31, 2022
b117bd7
add component_wise to Kmeans
Oct 31, 2022
c63cef8
add component_wise to LOF
Oct 31, 2022
247001c
add component_wise to GaussianMixture
Oct 31, 2022
9782272
Accept num_samples for probabilistic models forecasting
Nov 3, 2022
729a1d9
Minor changes
Nov 4, 2022
d8fb10f
add comments, add likelihood
Nov 9, 2022
8661d6d
add laplace, + window parameter + parameter alllow_retrain
Nov 10, 2022
3b83b11
add cauchy and gamma likelihood
Nov 11, 2022
71f12dc
add utils.py, detectors, aggregators
julien12234 Nov 14, 2022
71e3a6e
removed show function for now
julien12234 Nov 14, 2022
e17a046
add show_anomalies() and show_anomalies_from_scores()
julien12234 Nov 16, 2022
c0bd73f
small changes
julien12234 Nov 16, 2022
a34c479
Merge branch 'master' of github.com:unit8co/darts into feat/anomaly_d…
hrzn Nov 20, 2022
c89c1c2
Merge branch 'master' into feat/anomaly_detection_API
hrzn Nov 21, 2022
fc29b78
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Nov 21, 2022
ca551e3
Some docstring improvements to AnomalyModels
hrzn Nov 21, 2022
1f8c52a
corrected Kmeans, LFO and Gaussian Scorer + added input from PR
julien12234 Nov 22, 2022
fa0618e
test commit
julien12234 Nov 22, 2022
003febd
negative LFO and gaussian
julien12234 Nov 23, 2022
45757b4
Merge branch 'master' into feat/anomaly_detection_API
hrzn Nov 25, 2022
b63e65c
Merge branch 'master' into feat/anomaly_detection_API
julien12234 Nov 28, 2022
7683bae
pre pull
julien12234 Nov 28, 2022
323fbc1
from prediciton structure
julien12234 Nov 28, 2022
0765d49
improved show_anomalies, changed structure _from_prediction
julien12234 Nov 30, 2022
8ffd0c1
small mistake in eval_accuracy in utils.py
julien12234 Nov 30, 2022
c8bea7f
return type of eval_acc
julien12234 Dec 1, 2022
17f5d68
changed way eval_acc returns in anomaly_model
julien12234 Dec 1, 2022
fa46383
added test for agg, dect, and scorers. upgrade agg trainable
julien12234 Dec 2, 2022
92013d1
added parameter return_UTS, and added test for scorers and anomaly_model
julien12234 Dec 3, 2022
124a221
small mistake in anomaly_model
julien12234 Dec 3, 2022
441bf24
New structure in files
julien12234 Dec 6, 2022
c3a56f5
Added warnings
julien12234 Dec 7, 2022
9987ad6
small change in wasserstein
julien12234 Dec 8, 2022
0a8b3f7
Merge branch 'master' into feat/anomaly_detection_API
hrzn Dec 9, 2022
d294740
filtering_am and forecasting_am
julien12234 Dec 9, 2022
c3efb69
Small improvements
hrzn Dec 9, 2022
13a365d
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 9, 2022
b5ad2fa
Fix test names
hrzn Dec 9, 2022
7f1582f
add pyod to requirements
hrzn Dec 9, 2022
6d771b2
rename scorers
hrzn Dec 9, 2022
7c02b1d
scorers imports
hrzn Dec 9, 2022
5c67ce2
Changed handling of kwargs in AD models
hrzn Dec 9, 2022
6948b61
update tests
hrzn Dec 9, 2022
3f1b21e
return single TimeSeries from score() in some cases
hrzn Dec 10, 2022
ec6baf4
small naming improvements
hrzn Dec 10, 2022
5e6f65e
Some improvements to anomaly models
hrzn Dec 10, 2022
68d388d
Small improvements to scorers
hrzn Dec 11, 2022
11ee748
Some small improvements
hrzn Dec 11, 2022
0d3464b
Fix tests
hrzn Dec 11, 2022
40aa67f
Merge branch 'master' into feat/anomaly_detection_API
hrzn Dec 12, 2022
e7640d7
Norm scorer docstring
hrzn Dec 12, 2022
d6e79af
test toy example agg and detectors
julien12234 Dec 12, 2022
7f5c30b
small docstring improvements
hrzn Dec 12, 2022
759ec9a
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 12, 2022
b9e50fb
Add vectorization todos
hrzn Dec 12, 2022
0695719
test toy example scorers
julien12234 Dec 12, 2022
a76759a
test toy example scorers
julien12234 Dec 12, 2022
b0047c5
test toy example PyOD
julien12234 Dec 13, 2022
0b09b8f
test toy example NLL scorers
julien12234 Dec 13, 2022
3f1ddf6
test toy example poisson nll scorer
julien12234 Dec 13, 2022
fcd623e
test toy example univariate anomaly_models
julien12234 Dec 13, 2022
b0d405a
test toy example univariate covariates forecasting_anomaly_models
julien12234 Dec 13, 2022
af4a489
update threshold detector docstring
hrzn Dec 13, 2022
34ca833
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 13, 2022
6df9ed6
change way to output string messages
julien12234 Dec 13, 2022
e3f486c
first implementation of julien H's PR review
julien12234 Dec 13, 2022
c160234
first implementation of julien H's PR review 2
julien12234 Dec 13, 2022
f745a45
anomaly_model forecasting multivariate test
julien12234 Dec 14, 2022
457ad5b
anomaly_model multivariate, w=1,2, len()=2 test for NLL scorers
julien12234 Dec 14, 2022
b40c7c7
changed NLL scorers: call scipy.stats function
julien12234 Dec 14, 2022
6d9279f
changed in anomaly_models (inner to outer for series and scorers)
julien12234 Dec 14, 2022
7b80625
Small changes to PyOD detector
hrzn Dec 15, 2022
4164c57
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 15, 2022
d5cc56a
improvements to wasserstein scorer docstring
hrzn Dec 15, 2022
10c8d6b
change in eval acc
julien12234 Dec 15, 2022
5cdfc6d
change in eval acc, new function _eval_accuracy_from_scores
julien12234 Dec 15, 2022
cb59cd4
Small improvements to aggregators
hrzn Dec 15, 2022
10a79ab
Small docstrings improvements
hrzn Dec 15, 2022
a435f6a
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 15, 2022
7a0f5e7
Utils docstring
hrzn Dec 15, 2022
33aeea4
change in detectors (vectorization and accepts list of param if multi…
julien12234 Dec 15, 2022
afd7c61
remove exp in PyODScorer... and updated test
julien12234 Dec 15, 2022
080e0f9
new test with np.testing
julien12234 Dec 15, 2022
883d587
agg accept only MTS or sequence of MTS
julien12234 Dec 16, 2022
409f215
removed old detectors
julien12234 Dec 16, 2022
005003a
new multivariate test for filtering anomaly model
julien12234 Dec 16, 2022
fa7f271
small changes to utils docstrings
hrzn Dec 16, 2022
aef8127
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 16, 2022
a4b8e53
test assert_array_almost_equal decimal 2
julien12234 Dec 16, 2022
81554b0
test assert_array_almost_equal decimal 1
julien12234 Dec 16, 2022
0e70677
test assert_array_almost_equal decimal 1
julien12234 Dec 16, 2022
7a7b553
second implementation of julien H's PR review
julien12234 Dec 16, 2022
1f1a9b1
vectorization of NLL scorers
julien12234 Dec 16, 2022
09a8ac6
problem with test_univariate_FilteringAnomalyModel
julien12234 Dec 16, 2022
107ebaf
replace abs by __abs__ in test_univariate_covariate_ForecastingAnomal…
julien12234 Dec 16, 2022
6a5bed4
Merge branch 'master' into feat/anomaly_detection_API
hrzn Dec 16, 2022
cb2a127
replace abs by __abs__ in ALL test_univariate_covariate_ForecastingAn…
julien12234 Dec 17, 2022
4a9619b
Increase coverage of scorers tests
hrzn Dec 20, 2022
816377b
Imports in submodules
hrzn Dec 20, 2022
323bda8
Some improvements to utils
hrzn Dec 20, 2022
edad060
Some improvements
hrzn Dec 20, 2022
8ea8ea7
significant rework of quantile detector
hrzn Dec 21, 2022
f4ef944
Rework threshold detector
hrzn Dec 21, 2022
ff62c3a
Rework NLL scorers
hrzn Dec 22, 2022
7e59cea
Rename NLL scorers files
hrzn Dec 22, 2022
ca9efc3
vectorize windowing in k-means
hrzn Dec 22, 2022
b1a73d9
vectorization of windowing in PyOD and Wasserstein
hrzn Dec 22, 2022
6041a67
Docstring improvements
hrzn Dec 22, 2022
b545d18
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
96206fa
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
5899096
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
3213467
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
18f9f49
Update darts/ad/anomaly_model/filtering_am.py
hrzn Dec 22, 2022
2b8c9a9
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
eb714ef
Update darts/ad/anomaly_model/__init__.py
hrzn Dec 22, 2022
8e0a488
Update darts/ad/anomaly_model/__init__.py
hrzn Dec 22, 2022
22d9474
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
46b7603
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
18613c1
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
eb00de2
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
cbcbe1b
Update darts/ad/anomaly_model/forecasting_am.py
hrzn Dec 22, 2022
5a45fe6
Update darts/ad/scorers/__init__.py
hrzn Dec 22, 2022
b5ffd08
Update darts/ad/scorers/scorers.py
hrzn Dec 22, 2022
f597d21
Update darts/ad/scorers/scorers.py
hrzn Dec 22, 2022
b366367
Update darts/ad/scorers/scorers.py
hrzn Dec 22, 2022
ffbd101
Update darts/ad/scorers/kmeans_scorer.py
hrzn Dec 22, 2022
986aa2a
Merge branch 'master' into feat/anomaly_detection_API
hrzn Dec 22, 2022
f362b5a
PR comments
hrzn Dec 22, 2022
05f4378
Formatting
hrzn Dec 22, 2022
d5195ab
Update darts/ad/scorers/pyod_scorer.py
hrzn Dec 22, 2022
a13edd2
Small docstring improvement
hrzn Dec 22, 2022
82b65cb
Merge branch 'feat/anomaly_detection_API' of github.com:unit8co/darts…
hrzn Dec 22, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add component_wise to WassersteinScorer
Julien Adda committed Oct 31, 2022
commit e6a72da157fb10c977369dadc8eee1237802ece3
118 changes: 94 additions & 24 deletions darts/ad/scorers.py
Original file line number Diff line number Diff line change
@@ -775,64 +775,93 @@ def _score_core(
)


class WasserteinScorer(FittableAnomalyScorer):
"""WasserteinScorer anomaly score
class WassersteinScorer(FittableAnomalyScorer):
"""WassersteinScorer anomaly score

Wrapped around the Wassertein scipy.stats functon.
Wrapped around the Wasserstein scipy.stats functon.
Source code: <https://github.com/scipy/scipy/blob/v1.9.3/scipy/stats/_stats_py.py#L8675-L8749>.
"""

def __init__(self, window: Optional[int] = None, reduced_function=None) -> None:
def __init__(
self, window: Optional[int] = None, reduced_function=None, component_wise=False
) -> None:
"""
A Wassertein model is trained on the training data when the ``fit()`` method is called.
The ``score()`` method will return the wassertein distance bewteen the training distribution
A Wasserstein model is trained on the training data when the ``fit()`` method is called.
The ``score()`` method will return the Wasserstein distance bewteen the training distribution
and the window sample distribution. Both distributions are 1D.

TODO:
- understand better the math behind the Wassertein distance (ex: when the test distribution contains
- understand better the math behind the Wasserstein distance (ex: when the test distribution contains
only one sample)
- check if there is an equivalent wassertein distance for d-D distributions (currently only accepts 1D)
- check if there is an equivalent Wasserstein distance for d-D distributions (currently only accepts 1D)

If 2 time series are given in the ``fit()`` or ``score()`` methods, a reduced function, given as a parameter
in the __init__ method (reduced_function), will be applied to transform the 2 time series into 1.
Default: "abs_diff"

component_wise is a boolean parameter in the __init__ method indicating how the model should behave with input
that is a multivariate series. If set to True, the model will treat each width/dimension of the series
independently. If set to False, the model will concatenate the widths in the considered window and compute
the score.

Training:

The input can be a series (univariate or multivariate) or a list of series. All the values will be concatenated
to form one continuous array. If the series is of length n and width d, the array will be of length n*d.
The input can be a series (univariate or multivariate) or a list of series. The element of a list will be
concatenated to form one continuous array (by definition, each element have the same width/dimensions).

If the series is of length n and width d, the array will be of length n*d. If component_wise is True, each
width d is treated independently and the data is stored in a list of size d.
Each element is an array of length n.

If a list of series is given of length l, each series will be reduced to an array, and the l arrays will then
be concatenated to form a continuous array of length l*d*w.
be concatenated to form a continuous array of length l*d*n. If component_wise is True, the data is stored in a
list of size d. Each element is an array of length l*n.

The array will be kept in memory, representing the training data distribution.
In practice, the series or list of series would represent residuals than can be considered independent
and identically distributed (iid).


Compute score:

The input is a series (univariate or multivariate) or a list of series.

- If the series is multivariate of width w:
- if component_wise is set to False: it will return a univariate series (width=1). It represents
the anomaly score of the entire series in the considered window at each timestamp.
- if component_wise is set to True: it will return a multivariate series of width w. Each dimension
represents the anomaly score of the corresponding dimension of the input.

- If the series is univariate, it will return a univariate series regardless of the parameter
component_wise.

A window of size w (given as a parameter named window) is rolled on the series with a stride equal to 1.
At each timestamp, the previous w values will be used to form a vector of size w * width of the series.
The Wassertein distance will be computed between this vector and the train distribution.
The Wasserstein distance will be computed between this vector and the train distribution.
The function will return a float number indicating how different these two distributions are.
The output will be a series of width 1 and length n-w+1, with n being the length of the input series.
Each value will represent how anomalous the sample of the w previous values is.

If a list is given, a for loop will iterate through the list, and the function ``_score_core()``
will be applied independently on each series.

If component_wise is set to True, the algorithm will be applied to each width independently,
and be compared to their corresponding training data samples computed in the ``fit()`` method.

Parameters
----------
window
Size of the sliding window that represents the number of samples in the testing distribution to compare
with the training distribution in the Wassertein function
with the training distribution in the Wasserstein function
reduced_function
Optionally, reduced function to use if two series are given. It will transform the two series into one.
This allows the WasserteinScorer to compute the Wassertein distance on the original series or on its
This allows the WassersteinScorer to compute the Wasserstein distance on the original series or on its
residuals (difference between the prediction and the original series).
Must be one of "abs_diff" and "diff" (defined in ``_diff()``).
Default: "abs_diff"
component_wise
Boolean value indicating if the score needs to be computed for each width/dimension independently (True)
or by concatenating the width in the considered window to compute one score (False).
Default: False
"""

if window is None:
@@ -845,6 +874,8 @@ def __init__(self, window: Optional[int] = None, reduced_function=None) -> None:
"(preferably higher than 10 as it is the number of samples of the test distribution)",
)

self.component_wise = component_wise

def __str__(self):
return f"WasserteinScorer (window={self.window}, reduced_function={self.reduced_function})"

@@ -860,9 +891,26 @@ def _fit_core(
list_series = self._diff_sequence(list_series_1, list_series_2)

self._fit_called = True
self.training_data = np.concatenate(
[s.all_values(copy=False).flatten() for s in list_series]
)
self.width_trained_on = list_series[0].width

if self.component_wise and self.width_trained_on > 1:

concatenated_data = np.concatenate(
[s.all_values(copy=False) for s in list_series]
)

training_data = []
for width in range(self.width_trained_on):
training_data.append(concatenated_data[:, width].flatten())

self.training_data = training_data

else:
self.training_data = [
np.concatenate(
[s.all_values(copy=False).flatten() for s in list_series]
)
]

def _score_core(
self, series_1: TimeSeries, series_2: TimeSeries = None
@@ -873,16 +921,38 @@ def _score_core(
else:
series = self._diff(series_1, series_2)

raise_if_not(
self.width_trained_on == series.width,
f"Input must have the same width of the data used for training the Wassertein model, \
found width: {self.width_trained_on} and {series.width}",
)

distance = []
np_series = series.all_values(copy=False).flatten()

np_series = series.all_values(copy=False)

for i in range(len(series) - self.window + 1):
distance.append(
wasserstein_distance(
self.training_data,
np_series[i * series.width : (i + self.window + 1) * series.width],

temp_test = np_series[i : i + self.window + 1]

if self.component_wise:
width_result = []
for width in range(self.width_trained_on):
width_result.append(
wasserstein_distance(
self.training_data[width], temp_test[width].flatten()
)
)

distance.append(width_result)

else:
distance.append(
wasserstein_distance(
self.training_data[0],
temp_test.flatten(),
)
)
)

return TimeSeries.from_times_and_values(
series._time_index[self.window - 1 :], distance