From a699cce6037ff038f036452193312681f16c3739 Mon Sep 17 00:00:00 2001 From: Stephen0620 <41546633+Stephen0620@users.noreply.github.com> Date: Sun, 30 Jun 2019 18:12:23 -0700 Subject: [PATCH] Updated the documentation for time series (#158) * Update readme with latest feedback (#39) Updating readme with latest feedback. * Add THIRD-PARTY-NOTICES.txt and move CONTRIBUTING.md to root. (#40) * Initial checkin * Move to Hosted Mac pool * Update README.md * Manually copied naming changes over from master. * Revert "Merge remote-tracking branch 'upstream/temp/docs'" This reverts commit 93c73476e42e687c48889b58eb678b826dcbc41e, reversing changes made to 23500695a07b587f4b15420c874514940b42c74b. * Improve documentation regarding contributors. * Fix email address. * Create CODE_OF_CONDUCT.md * Update issue templates * Create PULL_REQUEST_TEMPLATE.md * Update issue templates * Update issue templates * Update issue templates * Fixing link in CONTRIBUTING.md (#44) * Update contributing.md link. (#43) * Initial checkin for ML.NET 0.7 upgrade * fix tests * put back columndropper * fix tests * Update scikit-learn links to use https instead of http * restart dotnetcore2 package work * fix build * fix mac & linux * fix build * fix build * dbg build * fix build * fix build * handle py 2.7 * handle py27 * fix py27 * fix build * fix build * fix build * ensure dependencies * ignore exceptions from ensure dependencies * up version * Update cv.py add case for X is data frame * Update cv.py add a space * add a test for cv with data frame * set DOTNET_SYSTEM_GLOBALIZATION_INVARIANT to true to fix app domain error * fix build * up version * Add instructions for editing docstrings. (#51) * Add instructions for editing docstrings. * Add footnote giving more information. * Fix build failures caused by dotnetcore2 module. (#67) * Fix importing of the dotnetcore2 module because it has inconsistent folder naming. * Fix file check for unix platforms. * Fix indentation levels. * Reduce number of build legs for PR validations and add nightly build definition with more robust build matrix. (#69) * Increase version to 0.6.5. (#71) * Update clr helper function to search multiple folders for clr binaries. (#72) * Update clr helper function to search multiple folders for clr binaries. * Moved responsiblity for Python version checking to utility functions. * Add clarifying comments. * Fix call to get_nimbusml_libs() * fix drop column param name * Remove restricted permissions on build.sh script. * Fix lightgbm test failures by updating runtime dependencies. * fix TensorFlowScorer model_location paramter name * Fix build.sh defaults so that it detects when running on a mac. * Since OneHotHashVectorizer is broken for output kind Key in ML.NET 0.7, usse ToKey() for unit tests * fix tests * fix pyproj test * fix win 3.6 build * fix comments * expose "parallel" to the fit/fit_transform function by including **param to the argument * add a test for the parallel * update parallel thread * fix tests comparison * Update thread, retry build * modify tests * specify pytest-cov version * update pytest-cov version in build command for linux * for windows use the latest pytest-cov * Enabled strong naming for DoNetBridge.dll (to be used for InternalsVisibleTo in ML.NET) * Changed the keys to be the same as other internal repos * Changed the key filename * Update to ML.NET 0.10.preview (#77) * Updating ML.NET nugets to latest 0.9 preview. * --generate_entrypoints phase 1 * Fixed Models.CrossValidator * Updated all entrypoints * New manifest.json, picket from Monte's branch * Updated API codegen * Replace ISchema and SchemaImpl with Schema and SchemaBuilder. * Revert "Replace ISchema and SchemaImpl with Schema and SchemaBuilder." This reverts commit dcd749d6a7d13c8768a62c4b8db377b3b8d62eaf. * Refactor IRowCursor to RowCursor. * Update ML.NET version in build.csproj. * Update manifest.json to ml.net commit 92e762686989215ddf45d9db3f0a1c989ee54d11 * Updated RunGraph.cs to ml.net 0.10 * Refactor Vbuffer * Added override to RowCursor methods * Update to NimbusML-privileged nugets from ML.NET. * Update to Microsoft.ML namespace without Runtime. * Schema and VBuffer fixes in NativeDataInterop. * API fixes for IRandom and IsText in RmlEnvironment and NativeDataView. * Work on getting VBuffer pointers from Spans. * Some VBuffer fixes * fix some class names * Fix Register Assembly names. * Remove ML.PipelineInference * fixed more classes * Add back columndropper for backward compatability. * Register Entrypoints assembly in environment. * Fix homebrew update problem on VS Hosted Mac images. * Updated all the nuget versions to be the same. * Attempt to fix the dataframe unit tests * Fixed test_pyproj * Optimized VBuffer changes * Changed bridge version value to 0.10 * Addressed PR comments * Simplify by using six.string_types (#89) * Simplify by using six.string_types * Force a retest * Removed ISchema from DotNetBridge (#90) * Removed ISchema * Fixed the tests * Addressed PR comments * Addressed Wei-Sheng's comments about documenting the purpose of Column.DetachedColumn. * add configuration for python 3.7 (#101) * add configuration for python 3.7 * fix broken unit test * Update build.sh * fix build for Windows * Linux py3.7 build * fix pytest version * upgrade pytest * fix pytest-cov version * fix isinstance(., int) for python 2.7 * build urls for Mac * final fixes * fix libomp * Removing 3.7 for now as its not in PyPI * Upgrade to ML.NET version 1.0.0 (#100) * ref v0.10 ML.NET * fix build * hook up to v0.11.0 ML.NET * fix build errors * fix build * include Microsoft.Data.DataView.dll in build * typo * remove protobuf dll * Regenerate code due to manifest changes * fix missing ep * Update to ML.NET 1.0.0-preview * fix .net build * update nuget for ML.NET * remove Data namespace dll * rollback nuget changes * move to final RC ML.NET * Regenerate classes as per updated manifest * fix maximum_number_of_iterations param name * fix parameter names * fix names * reference official v1.0 of ML.NET * fix tests * fix label column * Fix tests * fix lightgbm tests * fix OLS * fix tests * fix more tests * fix more tests * fix weight column name * more tests * fix normalized metrics * more errors * Fix CV * rename feature_column to feature_column_name * fix cv ranker * Fix lightgbm tests * fix changes due to upgrade of NGramFeaturizer * fix ngram featurizer * fix FactorizationMachine assert error * disable test which is not working now due to change in LightGbm version * fix model name * typo * handle nan in arrays * fix tests * fix tests * fix more tests * fix data type * fix AUC exception * kick the build * fix tests due to data change * fix ngram test * fix mutual info tests * copy libiomp lib * fix mac build * disable SymSgdNative for now * disable SymSgdBinary classifier tests for Linux * fix linux tests * fix linux tests * try linux * fix linux * skip SymSgdBinaryClassifier checks * fix entrypoint compiler * fix entry point generation * fix example tests run * fix typo * fix documentation regression * fix parameter name * fix examples * fix examples * fix tests * fix tests * fix linux * kick build * Fix code_fixer * fix skip take filters * fix estimator checks * Fix latest Windows build issues. (#105) * Fix build issue on Windows when VS2019 is installed. Note: The -version option could not be added directly to the FOR command due to a command script parsing issue. * Add missing arguments to fix build issue with latest version of autoflake. * Fixes #50 - summary() fails if called a second time. (#107) * Fixes #50 - summary() fails if called a second time. * Fixes #99. Do not use hardcoded file separator. (#108) Fixes #99. Do not use hard coded file separator. * Delete the cached summaries when refitting a pipeline or a predictor. (#109) * Fix build issue on Windows when VS2019 is installed. Note: The -version option could not be added directly to the FOR command due to a command script parsing issue. * Add missing arguments to fix build issue with latest version of autoflake. * Delete the cached summaries when refitting a pipeline or a predictor. Fixes #106 * Simplify the code that deletes cached summaries when calling fit. * Fix signature import error when using latest version of scikit-learn. (#116) * Fix signature import error when using latest version of scikit-learn. Fixes #111 * Move the conditional import of the signature method in to the utils package. * Package System.Drawing.Common.dll as its missing in dotnetcore2 (#120) * package System.Drawings.Common.dll as its missing in dotnetcore2 * typo * Add png for Image examples * try linux fix * rollback scikit learn version * test * debug * rollback test * rollback * fix fontconfig err * fix tests * print platform * get os names * test * test * fix linux * Upgrade the pytest-remotedata package to fix missing attribute error. (#121) * Upgrade the pytest-remotedata package to fix missing attribute error. Fixes #117 * Remove the RlsMacPy3.6 configuration from .vsts-ci.yml. * Upgrade version (#122) * package System.Drawings.Common.dll as its missing in dotnetcore2 * typo * Add png for Image examples * try linux fix * rollback scikit learn version * test * debug * rollback test * rollback * fix fontconfig err * fix tests * print platform * get os names * test * test * fix linux * Upgrade version * Support quoted strings by default (#124) * upgrade to ML.NET 1.1 (#126) * upgrade to ML.NET 1.1 * by default quote is + * assert changes due to quote * fix tensor flow example * Put long running tests in to their own folder to shorten build times. (#136) * Temporarily remove the dataframe examples from the test run to see how much that effects the test length. * Remove all examples from the tests to see how it impacts the CI run. * Put long running tests in to their own folder to shorten build times. * Update nimbusml.pyproj to reflect the newly moved test files. Forgot to save the nimbusml.pyproj in visual studio. * Expose ML.NET SSA & IID spike & changepoint detectors. (#135) * Initial creation of the IidSpikeDetector files to see what works and what doesn't. * Import the Microsoft.ML.TimeSeries assembly in to the project. * Use 'PassAs' in manifest.json to fix the source parameter name. * Use float32 for data dtype in IidSpikeDetector example. * Convert IidSpikeDetector to a standard transform. Add examples and tests. * Add pre-transform to IidSpikeDetector to fix incompatible data types. * Fix issues with the test_estimator_checks IidSpikeDetector tests. * Remove unnecessary TypeConverter import in IidSpikeDetector example. * Initial implementation of IidChangePointDetector. * Initial implementation of SsaSpikeDetector. * Initial implementation of SsaChangePointDetector. * Fix incorrect SsaSpikeDetector instance in test_estimator_checks. * Fix a few minor issues with time series unit tests and examples. (#139) * Skip Image.py and Image_df.py tests for Ubuntu 14 (#149) * * Fixed the script for generating the documentation (#144) * Moved _static to ci_script to solve an error while using sphinx * Removed amek_md.bat and merge the commands of it to make_yaml.bat * Moved metrics.rst to concepts * Rename time_series package to timeseries. (#150) * Initial checkin * Move to Hosted Mac pool * Manually copied naming changes over from master. * merge master to temp/docs for updating the documentation (#134) * merge master to documentation branch * fixed the ModuleNotFoundError for WordEmbedding_df.py * Merge branch 'documentation' into temp/docs (#143) * merge master to documentation branch * fixed the ModuleNotFoundError for WordEmbedding_df.py * Fixed the issue when generating the documentation guide and concepts * Moved _static to the right folder, and change PY36 to PY37 now * Made it work with Python3.6 * Put long running tests in to their own folder to shorten build times. (#136) * Temporarily remove the dataframe examples from the test run to see how much that effects the test length. * Remove all examples from the tests to see how it impacts the CI run. * Put long running tests in to their own folder to shorten build times. * Update nimbusml.pyproj to reflect the newly moved test files. Forgot to save the nimbusml.pyproj in visual studio. * Added undersocres to the files of time series --- src/python/nimbusml.pyproj | 10 +- .../timeseries/_iidchangepointdetector.py | 107 +++++++++++++ .../core/timeseries/_iidspikedetector.py | 91 +++++++++++ .../timeseries/_ssachangepointdetector.py | 138 ++++++++++++++++ .../core/timeseries/_ssaspikedetector.py | 129 +++++++++++++++ src/python/nimbusml/timeseries/__init__.py | 10 +- .../timeseries/_iidchangepointdetector.py | 119 ++++++++++++++ .../nimbusml/timeseries/_iidspikedetector.py | 101 ++++++++++++ .../timeseries/_ssachangepointdetector.py | 147 ++++++++++++++++++ .../nimbusml/timeseries/_ssaspikedetector.py | 136 ++++++++++++++++ 10 files changed, 978 insertions(+), 10 deletions(-) create mode 100644 src/python/nimbusml/internal/core/timeseries/_iidchangepointdetector.py create mode 100644 src/python/nimbusml/internal/core/timeseries/_iidspikedetector.py create mode 100644 src/python/nimbusml/internal/core/timeseries/_ssachangepointdetector.py create mode 100644 src/python/nimbusml/internal/core/timeseries/_ssaspikedetector.py create mode 100644 src/python/nimbusml/timeseries/_iidchangepointdetector.py create mode 100644 src/python/nimbusml/timeseries/_iidspikedetector.py create mode 100644 src/python/nimbusml/timeseries/_ssachangepointdetector.py create mode 100644 src/python/nimbusml/timeseries/_ssaspikedetector.py diff --git a/src/python/nimbusml.pyproj b/src/python/nimbusml.pyproj index 6c3542a6..47809618 100644 --- a/src/python/nimbusml.pyproj +++ b/src/python/nimbusml.pyproj @@ -592,11 +592,11 @@ - - - - - + + + + + diff --git a/src/python/nimbusml/internal/core/timeseries/_iidchangepointdetector.py b/src/python/nimbusml/internal/core/timeseries/_iidchangepointdetector.py new file mode 100644 index 00000000..ae874a1c --- /dev/null +++ b/src/python/nimbusml/internal/core/timeseries/_iidchangepointdetector.py @@ -0,0 +1,107 @@ +# -------------------------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------------------------- +# - Generated by tools/entrypoint_compiler.py: do not edit by hand +""" +IidChangePointDetector +""" + +__all__ = ["IidChangePointDetector"] + + +from ...entrypoints.timeseriesprocessingentrypoints_iidchangepointdetector import \ + timeseriesprocessingentrypoints_iidchangepointdetector +from ...utils.utils import trace +from ..base_pipeline_item import BasePipelineItem, DefaultSignature + + +class IidChangePointDetector(BasePipelineItem, DefaultSignature): + """ + + This transform detects the change-points in an i.i.d. sequence using + adaptive kernel density estimation and martingales. + + .. remarks:: + ``IIDChangePointDetector`` assumes a sequence of data points that are + independently sampled from one + stationary distribution. `Adaptive kernel density estimation + `_ + is used to model the distribution. + + This transform detects + change points by calculating the martingale score for the sliding + window based on the estimated distribution. + The idea is based on the `Exchangeability + Martingales `_ that + detects a change of distribution over a stream of i.i.d. values. In + short, the value of the + martingale score starts increasing significantly when a sequence of + small p-values are detected in a row; this + indicates the change of the distribution of the underlying data + generation process. + + :param confidence: The confidence for change point detection in the range + [0, 100]. Used to set the threshold of the martingale score for + triggering alert. + + :param change_history_length: The length of the sliding window on p-value + for computing the martingale score. + + :param martingale: The type of martingale betting function used for + computing the martingale score. Available options are {``Power``, + ``Mixture``}. + + :param power_martingale_epsilon: The epsilon parameter for the Power + martingale if martingale is set to ``Power``. + + :param params: Additional arguments sent to compute engine. + + .. seealso:: + :py:func:`IIDSpikeDetector + `, + :py:func:`SsaSpikeDetector + `, + :py:func:`SsaChangePointDetector + `. + + .. index:: models, timeseries, transform + + Example: + .. literalinclude:: + /../nimbusml/examples/IidSpikeChangePointDetector.py + :language: python + """ + + @trace + def __init__( + self, + confidence=95.0, + change_history_length=20, + martingale='Power', + power_martingale_epsilon=0.1, + **params): + BasePipelineItem.__init__( + self, type='transform', **params) + + self.confidence = confidence + self.change_history_length = change_history_length + self.martingale = martingale + self.power_martingale_epsilon = power_martingale_epsilon + + @property + def _entrypoint(self): + return timeseriesprocessingentrypoints_iidchangepointdetector + + @trace + def _get_node(self, **all_args): + algo_args = dict( + source=self.source, + name=self._name_or_source, + confidence=self.confidence, + change_history_length=self.change_history_length, + martingale=self.martingale, + power_martingale_epsilon=self.power_martingale_epsilon) + + all_args.update(algo_args) + return self._entrypoint(**all_args) diff --git a/src/python/nimbusml/internal/core/timeseries/_iidspikedetector.py b/src/python/nimbusml/internal/core/timeseries/_iidspikedetector.py new file mode 100644 index 00000000..00712d77 --- /dev/null +++ b/src/python/nimbusml/internal/core/timeseries/_iidspikedetector.py @@ -0,0 +1,91 @@ +# -------------------------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------------------------- +# - Generated by tools/entrypoint_compiler.py: do not edit by hand +""" +IidSpikeDetector +""" + +__all__ = ["IidSpikeDetector"] + + +from ...entrypoints.timeseriesprocessingentrypoints_iidspikedetector import \ + timeseriesprocessingentrypoints_iidspikedetector +from ...utils.utils import trace +from ..base_pipeline_item import BasePipelineItem, DefaultSignature + + +class IidSpikeDetector(BasePipelineItem, DefaultSignature): + """ + + This transform detects the spikes in a i.i.d. sequence using adaptive + kernel density estimation. + + .. remarks:: + ``IIDSpikeDetector`` assumes a sequence of data points that are + independently sampled from one stationary + distribution. `Adaptive kernel density estimation + `_ + is used to model the distribution. + The `p-value score + indicates the likelihood of the current observation according to + the estimated distribution. The lower its value, the more likely the + current point is an outlier. + + :param confidence: The confidence for spike detection in the range [0, + 100]. + + :param side: The argument that determines whether to detect positive or + negative anomalies, or both. Available options are {``Positive``, + ``Negative``, ``TwoSided``}. + + :param pvalue_history_length: The size of the sliding window for computing + the p-value. + + :param params: Additional arguments sent to compute engine. + + .. seealso:: + :py:func:`IIDChangePointDetector + `, + :py:func:`SsaSpikeDetector + `, + :py:func:`SsaChangePointDetector + `. + + .. index:: models, timeseries, transform + + Example: + .. literalinclude:: /../nimbusml/examples/IidSpikePointDetector.py + :language: python + """ + + @trace + def __init__( + self, + confidence=99.0, + side='TwoSided', + pvalue_history_length=100, + **params): + BasePipelineItem.__init__( + self, type='transform', **params) + + self.confidence = confidence + self.side = side + self.pvalue_history_length = pvalue_history_length + + @property + def _entrypoint(self): + return timeseriesprocessingentrypoints_iidspikedetector + + @trace + def _get_node(self, **all_args): + algo_args = dict( + source=self.source, + name=self._name_or_source, + confidence=self.confidence, + side=self.side, + pvalue_history_length=self.pvalue_history_length) + + all_args.update(algo_args) + return self._entrypoint(**all_args) diff --git a/src/python/nimbusml/internal/core/timeseries/_ssachangepointdetector.py b/src/python/nimbusml/internal/core/timeseries/_ssachangepointdetector.py new file mode 100644 index 00000000..297fae42 --- /dev/null +++ b/src/python/nimbusml/internal/core/timeseries/_ssachangepointdetector.py @@ -0,0 +1,138 @@ +# -------------------------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------------------------- +# - Generated by tools/entrypoint_compiler.py: do not edit by hand +""" +SsaChangePointDetector +""" + +__all__ = ["SsaChangePointDetector"] + + +from ...entrypoints.timeseriesprocessingentrypoints_ssachangepointdetector import \ + timeseriesprocessingentrypoints_ssachangepointdetector +from ...utils.utils import trace +from ..base_pipeline_item import BasePipelineItem, DefaultSignature + + +class SsaChangePointDetector(BasePipelineItem, DefaultSignature): + """ + + This transform detects the change-points in a seasonal time-series + using Singular Spectrum Analysis (SSA). + + .. remarks:: + `Singular Spectrum Analysis (SSA) + `_ is a + powerful framework for decomposing the time-series into trend, + seasonality and noise components as well as forecasting the future + values of the time-series. In order to remove the + effect of such components on anomaly detection, this transform add + SSA as a time-series modeler component in the detection pipeline. + + The SSA component will be trained and it predicts the next expected + value on the time-series under normal condition; this expected value + is + further used to calculate the amount of deviation from the normal + behavior at that timestamp. + The distribution of this deviation is then modeled using `Adaptive + kernel density estimation + `_. + + This transform detects + change points by calculating the martingale score for the sliding + window based on the estimated distribution of deviations. + The idea is based on the `Exchangeability + Martingales `_ that + detects a change of distribution over a stream of i.i.d. values. In + short, the value of the + martingale score starts increasing significantly when a sequence of + small p-values detected in a row; this + indicates the change of the distribution of the underlying data + generation process. + + :param training_window_size: The number of points, N, from the beginning + of the sequence used to train the SSA model. + + :param confidence: The confidence for change point detection in the range + [0, 100]. + + :param seasonal_window_size: An upper bound, L, on the largest relevant + seasonality in the input time-series, which also + determines the order of the autoregression of SSA. It must satisfy 2 + < L < N/2. + + :param change_history_length: The length of the sliding window on p-value + for computing the martingale score. + + :param error_function: The function used to compute the error between the + expected and the observed value. Possible values are: + {``SignedDifference``, ``AbsoluteDifference``, ``SignedProportion``, + ``AbsoluteProportion``, ``SquaredDifference``}. + + :param martingale: The type of martingale betting function used for + computing the martingale score. Available options are {``Power``, + ``Mixture``}. + + :param power_martingale_epsilon: The epsilon parameter for the Power + martingale if martingale is set to ``Power``. + + :param params: Additional arguments sent to compute engine. + + .. seealso:: + :py:func:`IIDChangePointDetector + `, + :py:func:`IIDSpikeDetector + `, + :py:func:`SsaSpikeDetector + `. + + .. index:: models, timeseries, transform + + Example: + .. literalinclude:: /../nimbusml/examples/SsaChangePointDetector.py + :language: python + """ + + @trace + def __init__( + self, + training_window_size=100, + confidence=95.0, + seasonal_window_size=10, + change_history_length=20, + error_function='SignedDifference', + martingale='Power', + power_martingale_epsilon=0.1, + **params): + BasePipelineItem.__init__( + self, type='transform', **params) + + self.training_window_size = training_window_size + self.confidence = confidence + self.seasonal_window_size = seasonal_window_size + self.change_history_length = change_history_length + self.error_function = error_function + self.martingale = martingale + self.power_martingale_epsilon = power_martingale_epsilon + + @property + def _entrypoint(self): + return timeseriesprocessingentrypoints_ssachangepointdetector + + @trace + def _get_node(self, **all_args): + algo_args = dict( + source=self.source, + name=self._name_or_source, + training_window_size=self.training_window_size, + confidence=self.confidence, + seasonal_window_size=self.seasonal_window_size, + change_history_length=self.change_history_length, + error_function=self.error_function, + martingale=self.martingale, + power_martingale_epsilon=self.power_martingale_epsilon) + + all_args.update(algo_args) + return self._entrypoint(**all_args) diff --git a/src/python/nimbusml/internal/core/timeseries/_ssaspikedetector.py b/src/python/nimbusml/internal/core/timeseries/_ssaspikedetector.py new file mode 100644 index 00000000..6a1097f8 --- /dev/null +++ b/src/python/nimbusml/internal/core/timeseries/_ssaspikedetector.py @@ -0,0 +1,129 @@ +# -------------------------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------------------------- +# - Generated by tools/entrypoint_compiler.py: do not edit by hand +""" +SsaSpikeDetector +""" + +__all__ = ["SsaSpikeDetector"] + + +from ...entrypoints.timeseriesprocessingentrypoints_ssaspikedetector import \ + timeseriesprocessingentrypoints_ssaspikedetector +from ...utils.utils import trace +from ..base_pipeline_item import BasePipelineItem, DefaultSignature + + +class SsaSpikeDetector(BasePipelineItem, DefaultSignature): + """ + + This transform detects the spikes in a seasonal time-series using + Singular Spectrum Analysis (SSA). + + .. remarks:: + `Singular Spectrum Analysis (SSA) + `_ is a + powerful + framework for decomposing the time-series into trend, seasonality and + noise components as well as forecasting + the future values of the time-series. In order to remove the effect + of such components on anomaly detection, + this transform adds SSA as a time-series modeler component in the + detection pipeline. + + The SSA component will be trained and it predicts the next expected + value on the time-series under normal condition; this expected value + is + further used to calculate the amount of deviation from the normal + (predicted) behavior at that timestamp. + The distribution of this deviation is then modeled using `Adaptive + kernel density estimation + `_. + + The `p-value score for the + current deviation is calculated based on the + estimated distribution. The lower its value, the more likely the + current point is an outlier. + + :param training_window_size: The number of points, N, from the beginning + of the sequence used to train the SSA + model. + + :param confidence: The confidence for spike detection in the range [0, + 100]. + + :param seasonal_window_size: An upper bound, L, on the largest relevant + seasonality in the input time-series, which + also determines the order of the autoregression of SSA. It must + satisfy 2 < L < N/2. + + :param side: The argument that determines whether to detect positive or + negative anomalies, or both. Available + options are {``Positive``, ``Negative``, ``TwoSided``}. + + :param pvalue_history_length: The size of the sliding window for computing + the p-value. + + :param error_function: The function used to compute the error between the + expected and the observed value. Possible + values are {``SignedDifference``, ``AbsoluteDifference``, + ``SignedProportion``, ``AbsoluteProportion``, + ``SquaredDifference``}. + + :param params: Additional arguments sent to compute engine. + + .. seealso:: + :py:func:`IIDChangePointDetector + `, + :py:func:`IIDSpikeDetector + `, + :py:func:`SsaChangePointDetector + `. + + .. index:: models, timeseries, transform + + Example: + .. literalinclude:: /../nimbusml/examples/SsaSpikeDetector.py + :language: python + """ + + @trace + def __init__( + self, + training_window_size=100, + confidence=99.0, + seasonal_window_size=10, + side='TwoSided', + pvalue_history_length=100, + error_function='SignedDifference', + **params): + BasePipelineItem.__init__( + self, type='transform', **params) + + self.training_window_size = training_window_size + self.confidence = confidence + self.seasonal_window_size = seasonal_window_size + self.side = side + self.pvalue_history_length = pvalue_history_length + self.error_function = error_function + + @property + def _entrypoint(self): + return timeseriesprocessingentrypoints_ssaspikedetector + + @trace + def _get_node(self, **all_args): + algo_args = dict( + source=self.source, + name=self._name_or_source, + training_window_size=self.training_window_size, + confidence=self.confidence, + seasonal_window_size=self.seasonal_window_size, + side=self.side, + pvalue_history_length=self.pvalue_history_length, + error_function=self.error_function) + + all_args.update(algo_args) + return self._entrypoint(**all_args) diff --git a/src/python/nimbusml/timeseries/__init__.py b/src/python/nimbusml/timeseries/__init__.py index 64e66add..626bcbc3 100644 --- a/src/python/nimbusml/timeseries/__init__.py +++ b/src/python/nimbusml/timeseries/__init__.py @@ -1,8 +1,8 @@ -from .iidspikedetector import IidSpikeDetector -from .iidchangepointdetector import IidChangePointDetector -from .ssaspikedetector import SsaSpikeDetector -from .ssachangepointdetector import SsaChangePointDetector -from .ssaforecaster import SsaForecaster +from ._iidspikedetector import IidSpikeDetector +from ._iidchangepointdetector import IidChangePointDetector +from ._ssaspikedetector import SsaSpikeDetector +from ._ssachangepointdetector import SsaChangePointDetector +from ._ssaforecaster import SsaForecaster __all__ = [ 'IidSpikeDetector', diff --git a/src/python/nimbusml/timeseries/_iidchangepointdetector.py b/src/python/nimbusml/timeseries/_iidchangepointdetector.py new file mode 100644 index 00000000..0df53ba7 --- /dev/null +++ b/src/python/nimbusml/timeseries/_iidchangepointdetector.py @@ -0,0 +1,119 @@ +# -------------------------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------------------------- +# - Generated by tools/entrypoint_compiler.py: do not edit by hand +""" +IidChangePointDetector +""" + +__all__ = ["IidChangePointDetector"] + + +from sklearn.base import TransformerMixin + +from ..base_transform import BaseTransform +from ..internal.core.timeseries._iidchangepointdetector import \ + IidChangePointDetector as core +from ..internal.utils.utils import trace + + +class IidChangePointDetector( + core, + BaseTransform, + TransformerMixin): + """ + + This transform detects the change-points in an i.i.d. sequence using + adaptive kernel density estimation and martingales. + + .. remarks:: + ``IIDChangePointDetector`` assumes a sequence of data points that are + independently sampled from one + stationary distribution. `Adaptive kernel density estimation + `_ + is used to model the distribution. + + This transform detects + change points by calculating the martingale score for the sliding + window based on the estimated distribution. + The idea is based on the `Exchangeability + Martingales `_ that + detects a change of distribution over a stream of i.i.d. values. In + short, the value of the + martingale score starts increasing significantly when a sequence of + small p-values are detected in a row; this + indicates the change of the distribution of the underlying data + generation process. + + :param columns: see `Columns `_. + + :param confidence: The confidence for change point detection in the range + [0, 100]. Used to set the threshold of the martingale score for + triggering alert. + + :param change_history_length: The length of the sliding window on p-value + for computing the martingale score. + + :param martingale: The type of martingale betting function used for + computing the martingale score. Available options are {``Power``, + ``Mixture``}. + + :param power_martingale_epsilon: The epsilon parameter for the Power + martingale if martingale is set to ``Power``. + + :param params: Additional arguments sent to compute engine. + + .. seealso:: + :py:func:`IIDSpikeDetector + `, + :py:func:`SsaSpikeDetector + `, + :py:func:`SsaChangePointDetector + `. + + .. index:: models, timeseries, transform + + Example: + .. literalinclude:: + /../nimbusml/examples/IidSpikeChangePointDetector.py + :language: python + """ + + @trace + def __init__( + self, + confidence=95.0, + change_history_length=20, + martingale='Power', + power_martingale_epsilon=0.1, + columns=None, + **params): + + if columns: + params['columns'] = columns + BaseTransform.__init__(self, **params) + core.__init__( + self, + confidence=confidence, + change_history_length=change_history_length, + martingale=martingale, + power_martingale_epsilon=power_martingale_epsilon, + **params) + self._columns = columns + + def get_params(self, deep=False): + """ + Get the parameters for this operator. + """ + return core.get_params(self) + + def _nodes_with_presteps(self): + """ + Inserts preprocessing before this one. + """ + from ..preprocessing.schema import TypeConverter + return [ + TypeConverter( + result_type='R4')._steal_io(self), + self] diff --git a/src/python/nimbusml/timeseries/_iidspikedetector.py b/src/python/nimbusml/timeseries/_iidspikedetector.py new file mode 100644 index 00000000..51582ae8 --- /dev/null +++ b/src/python/nimbusml/timeseries/_iidspikedetector.py @@ -0,0 +1,101 @@ +# -------------------------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------------------------- +# - Generated by tools/entrypoint_compiler.py: do not edit by hand +""" +IidSpikeDetector +""" + +__all__ = ["IidSpikeDetector"] + + +from sklearn.base import TransformerMixin + +from ..base_transform import BaseTransform +from ..internal.core.timeseries._iidspikedetector import \ + IidSpikeDetector as core +from ..internal.utils.utils import trace + + +class IidSpikeDetector(core, BaseTransform, TransformerMixin): + """ + + This transform detects the spikes in a i.i.d. sequence using adaptive + kernel density estimation. + + .. remarks:: + ``IIDSpikeDetector`` assumes a sequence of data points that are + independently sampled from one stationary + distribution. `Adaptive kernel density estimation + `_ + is used to model the distribution. + The `p-value score + indicates the likelihood of the current observation according to + the estimated distribution. The lower its value, the more likely the + current point is an outlier. + + :param columns: see `Columns `_. + + :param confidence: The confidence for spike detection in the range [0, + 100]. + + :param side: The argument that determines whether to detect positive or + negative anomalies, or both. Available options are {``Positive``, + ``Negative``, ``TwoSided``}. + + :param pvalue_history_length: The size of the sliding window for computing + the p-value. + + :param params: Additional arguments sent to compute engine. + + .. seealso:: + :py:func:`IIDChangePointDetector + `, + :py:func:`SsaSpikeDetector + `, + :py:func:`SsaChangePointDetector + `. + + .. index:: models, timeseries, transform + + Example: + .. literalinclude:: /../nimbusml/examples/IidSpikePointDetector.py + :language: python + """ + + @trace + def __init__( + self, + confidence=99.0, + side='TwoSided', + pvalue_history_length=100, + columns=None, + **params): + + if columns: + params['columns'] = columns + BaseTransform.__init__(self, **params) + core.__init__( + self, + confidence=confidence, + side=side, + pvalue_history_length=pvalue_history_length, + **params) + self._columns = columns + + def get_params(self, deep=False): + """ + Get the parameters for this operator. + """ + return core.get_params(self) + + def _nodes_with_presteps(self): + """ + Inserts preprocessing before this one. + """ + from ..preprocessing.schema import TypeConverter + return [ + TypeConverter( + result_type='R4')._steal_io(self), + self] diff --git a/src/python/nimbusml/timeseries/_ssachangepointdetector.py b/src/python/nimbusml/timeseries/_ssachangepointdetector.py new file mode 100644 index 00000000..3b02d49e --- /dev/null +++ b/src/python/nimbusml/timeseries/_ssachangepointdetector.py @@ -0,0 +1,147 @@ +# -------------------------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------------------------- +# - Generated by tools/entrypoint_compiler.py: do not edit by hand +""" +SsaChangePointDetector +""" + +__all__ = ["SsaChangePointDetector"] + + +from sklearn.base import TransformerMixin + +from ..base_transform import BaseTransform +from ..internal.core.timeseries._ssachangepointdetector import \ + SsaChangePointDetector as core +from ..internal.utils.utils import trace + + +class SsaChangePointDetector( + core, + BaseTransform, + TransformerMixin): + """ + + This transform detects the change-points in a seasonal time-series + using Singular Spectrum Analysis (SSA). + + .. remarks:: + `Singular Spectrum Analysis (SSA) + `_ is a + powerful framework for decomposing the time-series into trend, + seasonality and noise components as well as forecasting the future + values of the time-series. In order to remove the + effect of such components on anomaly detection, this transform add + SSA as a time-series modeler component in the detection pipeline. + + The SSA component will be trained and it predicts the next expected + value on the time-series under normal condition; this expected value + is + further used to calculate the amount of deviation from the normal + behavior at that timestamp. + The distribution of this deviation is then modeled using `Adaptive + kernel density estimation + `_. + + This transform detects + change points by calculating the martingale score for the sliding + window based on the estimated distribution of deviations. + The idea is based on the `Exchangeability + Martingales `_ that + detects a change of distribution over a stream of i.i.d. values. In + short, the value of the + martingale score starts increasing significantly when a sequence of + small p-values detected in a row; this + indicates the change of the distribution of the underlying data + generation process. + + :param columns: see `Columns `_. + + :param training_window_size: The number of points, N, from the beginning + of the sequence used to train the SSA model. + + :param confidence: The confidence for change point detection in the range + [0, 100]. + + :param seasonal_window_size: An upper bound, L, on the largest relevant + seasonality in the input time-series, which also + determines the order of the autoregression of SSA. It must satisfy 2 + < L < N/2. + + :param change_history_length: The length of the sliding window on p-value + for computing the martingale score. + + :param error_function: The function used to compute the error between the + expected and the observed value. Possible values are: + {``SignedDifference``, ``AbsoluteDifference``, ``SignedProportion``, + ``AbsoluteProportion``, ``SquaredDifference``}. + + :param martingale: The type of martingale betting function used for + computing the martingale score. Available options are {``Power``, + ``Mixture``}. + + :param power_martingale_epsilon: The epsilon parameter for the Power + martingale if martingale is set to ``Power``. + + :param params: Additional arguments sent to compute engine. + + .. seealso:: + :py:func:`IIDChangePointDetector + `, + :py:func:`IIDSpikeDetector + `, + :py:func:`SsaSpikeDetector + `. + + .. index:: models, timeseries, transform + + Example: + .. literalinclude:: /../nimbusml/examples/SsaChangePointDetector.py + :language: python + """ + + @trace + def __init__( + self, + training_window_size=100, + confidence=95.0, + seasonal_window_size=10, + change_history_length=20, + error_function='SignedDifference', + martingale='Power', + power_martingale_epsilon=0.1, + columns=None, + **params): + + if columns: + params['columns'] = columns + BaseTransform.__init__(self, **params) + core.__init__( + self, + training_window_size=training_window_size, + confidence=confidence, + seasonal_window_size=seasonal_window_size, + change_history_length=change_history_length, + error_function=error_function, + martingale=martingale, + power_martingale_epsilon=power_martingale_epsilon, + **params) + self._columns = columns + + def get_params(self, deep=False): + """ + Get the parameters for this operator. + """ + return core.get_params(self) + + def _nodes_with_presteps(self): + """ + Inserts preprocessing before this one. + """ + from ..preprocessing.schema import TypeConverter + return [ + TypeConverter( + result_type='R4')._steal_io(self), + self] diff --git a/src/python/nimbusml/timeseries/_ssaspikedetector.py b/src/python/nimbusml/timeseries/_ssaspikedetector.py new file mode 100644 index 00000000..ad831a15 --- /dev/null +++ b/src/python/nimbusml/timeseries/_ssaspikedetector.py @@ -0,0 +1,136 @@ +# -------------------------------------------------------------------------------------------- +# Copyright (c) Microsoft Corporation. All rights reserved. +# Licensed under the MIT License. +# -------------------------------------------------------------------------------------------- +# - Generated by tools/entrypoint_compiler.py: do not edit by hand +""" +SsaSpikeDetector +""" + +__all__ = ["SsaSpikeDetector"] + + +from sklearn.base import TransformerMixin + +from ..base_transform import BaseTransform +from ..internal.core.timeseries._ssaspikedetector import \ + SsaSpikeDetector as core +from ..internal.utils.utils import trace + + +class SsaSpikeDetector(core, BaseTransform, TransformerMixin): + """ + + This transform detects the spikes in a seasonal time-series using + Singular Spectrum Analysis (SSA). + + .. remarks:: + `Singular Spectrum Analysis (SSA) + `_ is a + powerful + framework for decomposing the time-series into trend, seasonality and + noise components as well as forecasting + the future values of the time-series. In order to remove the effect + of such components on anomaly detection, + this transform adds SSA as a time-series modeler component in the + detection pipeline. + + The SSA component will be trained and it predicts the next expected + value on the time-series under normal condition; this expected value + is + further used to calculate the amount of deviation from the normal + (predicted) behavior at that timestamp. + The distribution of this deviation is then modeled using `Adaptive + kernel density estimation + `_. + + The `p-value score for the + current deviation is calculated based on the + estimated distribution. The lower its value, the more likely the + current point is an outlier. + + :param columns: see `Columns `_. + + :param training_window_size: The number of points, N, from the beginning + of the sequence used to train the SSA + model. + + :param confidence: The confidence for spike detection in the range [0, + 100]. + + :param seasonal_window_size: An upper bound, L, on the largest relevant + seasonality in the input time-series, which + also determines the order of the autoregression of SSA. It must + satisfy 2 < L < N/2. + + :param side: The argument that determines whether to detect positive or + negative anomalies, or both. Available + options are {``Positive``, ``Negative``, ``TwoSided``}. + + :param pvalue_history_length: The size of the sliding window for computing + the p-value. + + :param error_function: The function used to compute the error between the + expected and the observed value. Possible + values are {``SignedDifference``, ``AbsoluteDifference``, + ``SignedProportion``, ``AbsoluteProportion``, + ``SquaredDifference``}. + + :param params: Additional arguments sent to compute engine. + + .. seealso:: + :py:func:`IIDChangePointDetector + `, + :py:func:`IIDSpikeDetector + `, + :py:func:`SsaChangePointDetector + `. + + .. index:: models, timeseries, transform + + Example: + .. literalinclude:: /../nimbusml/examples/SsaSpikeDetector.py + :language: python + """ + + @trace + def __init__( + self, + training_window_size=100, + confidence=99.0, + seasonal_window_size=10, + side='TwoSided', + pvalue_history_length=100, + error_function='SignedDifference', + columns=None, + **params): + + if columns: + params['columns'] = columns + BaseTransform.__init__(self, **params) + core.__init__( + self, + training_window_size=training_window_size, + confidence=confidence, + seasonal_window_size=seasonal_window_size, + side=side, + pvalue_history_length=pvalue_history_length, + error_function=error_function, + **params) + self._columns = columns + + def get_params(self, deep=False): + """ + Get the parameters for this operator. + """ + return core.get_params(self) + + def _nodes_with_presteps(self): + """ + Inserts preprocessing before this one. + """ + from ..preprocessing.schema import TypeConverter + return [ + TypeConverter( + result_type='R4')._steal_io(self), + self]