Skip to content

Commit

Permalink
Updated the documentation for time series (microsoft#158)
Browse files Browse the repository at this point in the history
* Update readme with latest feedback (microsoft#39)

Updating readme with latest feedback.

* Add THIRD-PARTY-NOTICES.txt and move CONTRIBUTING.md to root. (microsoft#40)

* Initial checkin

* Move to Hosted Mac pool

* Update README.md

* Manually copied naming changes over from master.

* Revert "Merge remote-tracking branch 'upstream/temp/docs'"

This reverts commit 93c7347, reversing
changes made to 2350069.

* Improve documentation regarding contributors.

* Fix email address.

* Create CODE_OF_CONDUCT.md

* Update issue templates

* Create PULL_REQUEST_TEMPLATE.md

* Update issue templates

* Update issue templates

* Update issue templates

* Fixing link in CONTRIBUTING.md (microsoft#44)

* Update contributing.md link. (microsoft#43)

* Initial checkin for ML.NET 0.7 upgrade

* fix tests

* put back columndropper

* fix tests

* Update scikit-learn links to use https instead of http

* restart dotnetcore2 package work

* fix build

* fix mac & linux

* fix build

* fix build

* dbg build

* fix build

* fix build

* handle py 2.7

* handle py27

* fix py27

* fix build

* fix build

* fix build

* ensure dependencies

* ignore exceptions from ensure dependencies

* up version

* Update cv.py

add case for X is data frame

* Update cv.py

add a space

* add a test for cv with data frame

* set DOTNET_SYSTEM_GLOBALIZATION_INVARIANT to true to fix app domain error

* fix build

* up version

* Add instructions for editing docstrings. (microsoft#51)

* Add instructions for editing docstrings.

* Add footnote giving more information.

* Fix build failures caused by dotnetcore2 module. (microsoft#67)

* Fix importing of the dotnetcore2 module because it has inconsistent folder naming.

* Fix file check for unix platforms.

* Fix indentation levels.

* Reduce number of build legs for PR validations and add nightly build definition with more robust build matrix. (microsoft#69)

* Increase version to 0.6.5. (microsoft#71)

* Update clr helper function to search multiple folders for clr binaries. (microsoft#72)

* Update clr helper function to search multiple folders for clr binaries.

* Moved responsiblity for Python version checking to utility functions.

* Add clarifying comments.

* Fix call to get_nimbusml_libs()

* fix drop column param name

* Remove restricted permissions on build.sh script.

* Fix lightgbm test failures by updating runtime dependencies.

* fix TensorFlowScorer model_location paramter name

* Fix build.sh defaults so that it detects when running on a mac.

* Since OneHotHashVectorizer is broken for output kind Key in ML.NET 0.7, usse ToKey() for unit tests

* fix tests

* fix pyproj test

* fix win 3.6 build

* fix comments

* expose "parallel" to the fit/fit_transform function by including **param to the argument

* add a test for the parallel

* update parallel thread

* fix tests comparison

* Update thread, retry build

* modify tests

* specify pytest-cov version

* update pytest-cov version in build command for linux

* for windows use the latest pytest-cov

* Enabled strong naming for DoNetBridge.dll (to be used for InternalsVisibleTo in ML.NET)

* Changed the keys to be the same as other internal repos

* Changed the key filename

* Update to ML.NET 0.10.preview (microsoft#77)

* Updating ML.NET nugets to latest 0.9 preview.

* --generate_entrypoints phase 1

* Fixed Models.CrossValidator

* Updated all entrypoints

* New manifest.json, picket from Monte's branch

* Updated API codegen

* Replace ISchema and SchemaImpl with Schema and SchemaBuilder.

* Revert "Replace ISchema and SchemaImpl with Schema and SchemaBuilder."

This reverts commit dcd749d.

* Refactor IRowCursor to RowCursor.

* Update ML.NET version in build.csproj.

* Update manifest.json to ml.net commit 92e762686989215ddf45d9db3f0a1c989ee54d11

* Updated RunGraph.cs to ml.net 0.10

* Refactor Vbuffer

* Added override to RowCursor methods

* Update to NimbusML-privileged nugets from ML.NET.

* Update to Microsoft.ML namespace without Runtime.

* Schema and VBuffer fixes in NativeDataInterop.

* API fixes for IRandom and IsText in RmlEnvironment and NativeDataView.

* Work on getting VBuffer pointers from Spans.

* Some VBuffer fixes

* fix some class names

* Fix Register Assembly names.

* Remove ML.PipelineInference

* fixed more classes

* Add back columndropper for backward compatability.

* Register Entrypoints assembly in environment.

* Fix homebrew update problem on VS Hosted Mac images.

* Updated all the nuget versions to be the same.

* Attempt to fix the dataframe unit tests

* Fixed test_pyproj

* Optimized VBuffer changes

* Changed bridge version value to 0.10

* Addressed PR comments

* Simplify by using six.string_types (microsoft#89)

* Simplify by using six.string_types

* Force a retest

* Removed ISchema from DotNetBridge (microsoft#90)

* Removed ISchema

* Fixed the tests

* Addressed PR comments

* Addressed Wei-Sheng's comments about documenting the purpose of Column.DetachedColumn.

* add configuration for python 3.7 (microsoft#101)

* add configuration for python 3.7

* fix broken unit test

* Update build.sh

* fix build for Windows

* Linux py3.7 build

* fix pytest version

* upgrade pytest

* fix pytest-cov version

* fix isinstance(., int) for python 2.7

* build urls for Mac

* final fixes

* fix libomp

* Removing 3.7 for now as its not in PyPI

* Upgrade to ML.NET version 1.0.0 (microsoft#100)

* ref v0.10 ML.NET

* fix build

* hook up to v0.11.0 ML.NET

* fix build errors

* fix build

* include Microsoft.Data.DataView.dll in build

* typo

* remove protobuf dll

* Regenerate code due to manifest changes

* fix missing ep

* Update to ML.NET 1.0.0-preview

* fix .net build

* update nuget for ML.NET

* remove Data namespace dll

* rollback nuget changes

* move to final RC ML.NET

* Regenerate classes as per updated manifest

* fix maximum_number_of_iterations param name

* fix parameter names

* fix names

* reference official v1.0 of ML.NET

* fix tests

* fix label column

* Fix tests

* fix lightgbm tests

* fix OLS

* fix tests

* fix more tests

* fix more tests

* fix weight column name

* more tests

* fix normalized metrics

* more errors

* Fix CV

* rename feature_column to feature_column_name

* fix cv ranker

* Fix lightgbm tests

* fix changes due to upgrade of NGramFeaturizer

* fix ngram featurizer

* fix FactorizationMachine assert error

* disable test which is not working now due to change in LightGbm version

* fix model name

* typo

* handle nan in arrays

* fix tests

* fix tests

* fix more tests

* fix data type

* fix AUC exception

* kick the build

* fix tests due to data change

* fix ngram test

* fix mutual info tests

* copy libiomp lib

* fix mac build

* disable SymSgdNative for now

* disable SymSgdBinary classifier tests for Linux

* fix linux tests

* fix linux tests

* try linux

* fix linux

* skip SymSgdBinaryClassifier checks

* fix entrypoint compiler

* fix entry point generation

* fix example tests run

* fix typo

* fix documentation regression

* fix parameter name

* fix examples

* fix examples

* fix tests

* fix tests

* fix linux

* kick build

* Fix code_fixer

* fix skip take filters

* fix estimator checks

* Fix latest Windows build issues. (microsoft#105)

* Fix build issue on Windows when VS2019 is installed.

Note: The -version option could not be added directly
to the FOR command due to a command script parsing issue.

* Add missing arguments to fix build issue with latest version of autoflake.

* Fixes microsoft#50 - summary() fails if called a second time. (microsoft#107)

* Fixes microsoft#50 - summary() fails if called a second time.

* Fixes microsoft#99. Do not use hardcoded file separator. (microsoft#108)

Fixes microsoft#99. Do not use hard coded file separator.

* Delete the cached summaries when refitting a pipeline or a predictor. (microsoft#109)

* Fix build issue on Windows when VS2019 is installed.

Note: The -version option could not be added directly
to the FOR command due to a command script parsing issue.

* Add missing arguments to fix build issue with latest version of autoflake.

* Delete the cached summaries when refitting a pipeline or a predictor.
Fixes microsoft#106

* Simplify the code that deletes cached summaries when calling fit.

* Fix signature import error when using latest version of scikit-learn. (microsoft#116)

* Fix signature import error when using latest version of scikit-learn.
Fixes microsoft#111

* Move the conditional import of the signature method in to the utils package.

* Package System.Drawing.Common.dll as its missing in dotnetcore2 (microsoft#120)

* package System.Drawings.Common.dll as its missing in dotnetcore2

* typo

* Add png for Image examples

* try linux fix

* rollback scikit learn version

* test

* debug

* rollback test

* rollback

* fix fontconfig err

* fix tests

* print platform

* get os names

* test

* test

* fix linux

* Upgrade the pytest-remotedata package to fix missing attribute error. (microsoft#121)

* Upgrade the pytest-remotedata package to fix missing attribute error.
Fixes microsoft#117

* Remove the RlsMacPy3.6 configuration from .vsts-ci.yml.

* Upgrade version (microsoft#122)

* package System.Drawings.Common.dll as its missing in dotnetcore2

* typo

* Add png for Image examples

* try linux fix

* rollback scikit learn version

* test

* debug

* rollback test

* rollback

* fix fontconfig err

* fix tests

* print platform

* get os names

* test

* test

* fix linux

* Upgrade version

* Support quoted strings by default (microsoft#124)

* upgrade to ML.NET 1.1 (microsoft#126)

* upgrade to ML.NET 1.1

* by default quote is +

* assert changes due to quote

* fix tensor flow example

* Put long running tests in to their own folder to shorten build times. (microsoft#136)

* Temporarily remove the dataframe examples from the test run
to see how much that effects the test length.

* Remove all examples from the tests to see how it impacts the CI run.

* Put long running tests in to their own folder to shorten build times.

* Update nimbusml.pyproj to reflect the newly moved test files.
Forgot to save the nimbusml.pyproj in visual studio.

* Expose ML.NET SSA & IID spike & changepoint detectors. (microsoft#135)

* Initial creation of the IidSpikeDetector files to see what works and
what doesn't.

* Import the Microsoft.ML.TimeSeries assembly in to the project.

* Use 'PassAs' in manifest.json to fix the source parameter name.

* Use float32 for data dtype in IidSpikeDetector example.

* Convert IidSpikeDetector to a standard transform. Add examples and tests.

* Add pre-transform to IidSpikeDetector to fix incompatible data types.

* Fix issues with the test_estimator_checks IidSpikeDetector tests.

* Remove unnecessary TypeConverter import in IidSpikeDetector example.

* Initial implementation of IidChangePointDetector.

* Initial implementation of SsaSpikeDetector.

* Initial implementation of SsaChangePointDetector.

* Fix incorrect SsaSpikeDetector instance in test_estimator_checks.

* Fix a few minor issues with time series unit tests and examples. (microsoft#139)

* Skip Image.py and Image_df.py tests for Ubuntu 14 (microsoft#149)

* * Fixed the script for generating the documentation (microsoft#144)

* Moved _static to ci_script to solve an error while using sphinx
* Removed amek_md.bat and merge the commands of it to make_yaml.bat
* Moved metrics.rst to concepts

* Rename time_series package to timeseries. (microsoft#150)

* Initial checkin

* Move to Hosted Mac pool

* Manually copied naming changes over from master.

* merge master to temp/docs for updating the documentation  (microsoft#134)

* merge master to documentation branch

* fixed the ModuleNotFoundError for WordEmbedding_df.py

* Merge branch 'documentation' into temp/docs (microsoft#143)

* merge master to documentation branch

* fixed the ModuleNotFoundError for WordEmbedding_df.py

* Fixed the issue when generating the documentation guide and concepts

* Moved _static to the right folder, and change PY36 to PY37 now

* Made it work with Python3.6

* Put long running tests in to their own folder to shorten build times. (microsoft#136)

* Temporarily remove the dataframe examples from the test run
to see how much that effects the test length.

* Remove all examples from the tests to see how it impacts the CI run.

* Put long running tests in to their own folder to shorten build times.

* Update nimbusml.pyproj to reflect the newly moved test files.
Forgot to save the nimbusml.pyproj in visual studio.

* Added undersocres to the files of time series
  • Loading branch information
Stephen0620 authored and Tsung-Sheng Huang (Hi-Tech Talents LLC) committed Jul 4, 2019
1 parent 562d892 commit e5a0452
Show file tree
Hide file tree
Showing 10 changed files with 977 additions and 11 deletions.
9 changes: 4 additions & 5 deletions src/python/nimbusml.pyproj
Original file line number Diff line number Diff line change
Expand Up @@ -592,11 +592,10 @@
<Compile Include="nimbusml\tests\timeseries\test_ssaspikedetector.py" />
<Compile Include="nimbusml\tests\timeseries\test_iidspikedetector.py" />
<Compile Include="nimbusml\tests\timeseries\__init__.py" />
<Compile Include="nimbusml\timeseries\iidchangepointdetector.py" />
<Compile Include="nimbusml\timeseries\iidspikedetector.py" />
<Compile Include="nimbusml\timeseries\ssachangepointdetector.py" />
<Compile Include="nimbusml\timeseries\ssaforecaster.py" />
<Compile Include="nimbusml\timeseries\ssaspikedetector.py" />
<Compile Include="nimbusml\timeseries\_iidchangepointdetector.py" />
<Compile Include="nimbusml\timeseries\_iidspikedetector.py" />
<Compile Include="nimbusml\timeseries\_ssachangepointdetector.py" />
<Compile Include="nimbusml\timeseries\_ssaspikedetector.py" />
<Compile Include="nimbusml\timeseries\__init__.py" />
<Compile Include="tests\test_estimator_checks.py" />
<Compile Include="nimbusml\tests\feature_extraction\text\test_lightlda.py" />
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------------------------
# - Generated by tools/entrypoint_compiler.py: do not edit by hand
"""
IidChangePointDetector
"""

__all__ = ["IidChangePointDetector"]


from ...entrypoints.timeseriesprocessingentrypoints_iidchangepointdetector import \
timeseriesprocessingentrypoints_iidchangepointdetector
from ...utils.utils import trace
from ..base_pipeline_item import BasePipelineItem, DefaultSignature


class IidChangePointDetector(BasePipelineItem, DefaultSignature):
"""
This transform detects the change-points in an i.i.d. sequence using
adaptive kernel density estimation and martingales.
.. remarks::
``IIDChangePointDetector`` assumes a sequence of data points that are
independently sampled from one
stationary distribution. `Adaptive kernel density estimation
<https://en.wikipedia.org/wiki/Variable_kernel_density_estimation>`_
is used to model the distribution.
This transform detects
change points by calculating the martingale score for the sliding
window based on the estimated distribution.
The idea is based on the `Exchangeability
Martingales <https://icml.cc/Conferences/2012/papers/808.pdf>`_ that
detects a change of distribution over a stream of i.i.d. values. In
short, the value of the
martingale score starts increasing significantly when a sequence of
small p-values are detected in a row; this
indicates the change of the distribution of the underlying data
generation process.
:param confidence: The confidence for change point detection in the range
[0, 100]. Used to set the threshold of the martingale score for
triggering alert.
:param change_history_length: The length of the sliding window on p-value
for computing the martingale score.
:param martingale: The type of martingale betting function used for
computing the martingale score. Available options are {``Power``,
``Mixture``}.
:param power_martingale_epsilon: The epsilon parameter for the Power
martingale if martingale is set to ``Power``.
:param params: Additional arguments sent to compute engine.
.. seealso::
:py:func:`IIDSpikeDetector
<nimbusml.preprocessing.timeseries.IIDSpikeDetector>`,
:py:func:`SsaSpikeDetector
<nimbusml.preprocessing.timeseries.SsaSpikeDetector>`,
:py:func:`SsaChangePointDetector
<nimbusml.preprocessing.timeseries.SsaChangePointDetector>`.
.. index:: models, timeseries, transform
Example:
.. literalinclude::
/../nimbusml/examples/IidSpikeChangePointDetector.py
:language: python
"""

@trace
def __init__(
self,
confidence=95.0,
change_history_length=20,
martingale='Power',
power_martingale_epsilon=0.1,
**params):
BasePipelineItem.__init__(
self, type='transform', **params)

self.confidence = confidence
self.change_history_length = change_history_length
self.martingale = martingale
self.power_martingale_epsilon = power_martingale_epsilon

@property
def _entrypoint(self):
return timeseriesprocessingentrypoints_iidchangepointdetector

@trace
def _get_node(self, **all_args):
algo_args = dict(
source=self.source,
name=self._name_or_source,
confidence=self.confidence,
change_history_length=self.change_history_length,
martingale=self.martingale,
power_martingale_epsilon=self.power_martingale_epsilon)

all_args.update(algo_args)
return self._entrypoint(**all_args)
91 changes: 91 additions & 0 deletions src/python/nimbusml/internal/core/timeseries/_iidspikedetector.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------------------------
# - Generated by tools/entrypoint_compiler.py: do not edit by hand
"""
IidSpikeDetector
"""

__all__ = ["IidSpikeDetector"]


from ...entrypoints.timeseriesprocessingentrypoints_iidspikedetector import \
timeseriesprocessingentrypoints_iidspikedetector
from ...utils.utils import trace
from ..base_pipeline_item import BasePipelineItem, DefaultSignature


class IidSpikeDetector(BasePipelineItem, DefaultSignature):
"""
This transform detects the spikes in a i.i.d. sequence using adaptive
kernel density estimation.
.. remarks::
``IIDSpikeDetector`` assumes a sequence of data points that are
independently sampled from one stationary
distribution. `Adaptive kernel density estimation
<https://en.wikipedia.org/wiki/Variable_kernel_density_estimation>`_
is used to model the distribution.
The `p-value <https://en.wikipedia.org/wiki/P-value`_> score
indicates the likelihood of the current observation according to
the estimated distribution. The lower its value, the more likely the
current point is an outlier.
:param confidence: The confidence for spike detection in the range [0,
100].
:param side: The argument that determines whether to detect positive or
negative anomalies, or both. Available options are {``Positive``,
``Negative``, ``TwoSided``}.
:param pvalue_history_length: The size of the sliding window for computing
the p-value.
:param params: Additional arguments sent to compute engine.
.. seealso::
:py:func:`IIDChangePointDetector
<nimbusml.preprocessing.timeseries.IIDChangePointDetector>`,
:py:func:`SsaSpikeDetector
<nimbusml.preprocessing.timeseries.SsaSpikeDetector>`,
:py:func:`SsaChangePointDetector
<nimbusml.preprocessing.timeseries.SsaChangePointDetector>`.
.. index:: models, timeseries, transform
Example:
.. literalinclude:: /../nimbusml/examples/IidSpikePointDetector.py
:language: python
"""

@trace
def __init__(
self,
confidence=99.0,
side='TwoSided',
pvalue_history_length=100,
**params):
BasePipelineItem.__init__(
self, type='transform', **params)

self.confidence = confidence
self.side = side
self.pvalue_history_length = pvalue_history_length

@property
def _entrypoint(self):
return timeseriesprocessingentrypoints_iidspikedetector

@trace
def _get_node(self, **all_args):
algo_args = dict(
source=self.source,
name=self._name_or_source,
confidence=self.confidence,
side=self.side,
pvalue_history_length=self.pvalue_history_length)

all_args.update(algo_args)
return self._entrypoint(**all_args)
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# --------------------------------------------------------------------------------------------
# Copyright (c) Microsoft Corporation. All rights reserved.
# Licensed under the MIT License.
# --------------------------------------------------------------------------------------------
# - Generated by tools/entrypoint_compiler.py: do not edit by hand
"""
SsaChangePointDetector
"""

__all__ = ["SsaChangePointDetector"]


from ...entrypoints.timeseriesprocessingentrypoints_ssachangepointdetector import \
timeseriesprocessingentrypoints_ssachangepointdetector
from ...utils.utils import trace
from ..base_pipeline_item import BasePipelineItem, DefaultSignature


class SsaChangePointDetector(BasePipelineItem, DefaultSignature):
"""
This transform detects the change-points in a seasonal time-series
using Singular Spectrum Analysis (SSA).
.. remarks::
`Singular Spectrum Analysis (SSA)
<https://en.wikipedia.org/wiki/Singular_spectrum_analysis>`_ is a
powerful framework for decomposing the time-series into trend,
seasonality and noise components as well as forecasting the future
values of the time-series. In order to remove the
effect of such components on anomaly detection, this transform add
SSA as a time-series modeler component in the detection pipeline.
The SSA component will be trained and it predicts the next expected
value on the time-series under normal condition; this expected value
is
further used to calculate the amount of deviation from the normal
behavior at that timestamp.
The distribution of this deviation is then modeled using `Adaptive
kernel density estimation
<https://en.wikipedia.org/wiki/Variable_kernel_density_estimation>`_.
This transform detects
change points by calculating the martingale score for the sliding
window based on the estimated distribution of deviations.
The idea is based on the `Exchangeability
Martingales <https://icml.cc/Conferences/2012/papers/808.pdf>`_ that
detects a change of distribution over a stream of i.i.d. values. In
short, the value of the
martingale score starts increasing significantly when a sequence of
small p-values detected in a row; this
indicates the change of the distribution of the underlying data
generation process.
:param training_window_size: The number of points, N, from the beginning
of the sequence used to train the SSA model.
:param confidence: The confidence for change point detection in the range
[0, 100].
:param seasonal_window_size: An upper bound, L, on the largest relevant
seasonality in the input time-series, which also
determines the order of the autoregression of SSA. It must satisfy 2
< L < N/2.
:param change_history_length: The length of the sliding window on p-value
for computing the martingale score.
:param error_function: The function used to compute the error between the
expected and the observed value. Possible values are:
{``SignedDifference``, ``AbsoluteDifference``, ``SignedProportion``,
``AbsoluteProportion``, ``SquaredDifference``}.
:param martingale: The type of martingale betting function used for
computing the martingale score. Available options are {``Power``,
``Mixture``}.
:param power_martingale_epsilon: The epsilon parameter for the Power
martingale if martingale is set to ``Power``.
:param params: Additional arguments sent to compute engine.
.. seealso::
:py:func:`IIDChangePointDetector
<nimbusml.preprocessing.timeseries.IIDChangePointDetector>`,
:py:func:`IIDSpikeDetector
<nimbusml.preprocessing.timeseries.IIDSpikeDetector>`,
:py:func:`SsaSpikeDetector
<nimbusml.preprocessing.timeseries.SsaSpikeDetector>`.
.. index:: models, timeseries, transform
Example:
.. literalinclude:: /../nimbusml/examples/SsaChangePointDetector.py
:language: python
"""

@trace
def __init__(
self,
training_window_size=100,
confidence=95.0,
seasonal_window_size=10,
change_history_length=20,
error_function='SignedDifference',
martingale='Power',
power_martingale_epsilon=0.1,
**params):
BasePipelineItem.__init__(
self, type='transform', **params)

self.training_window_size = training_window_size
self.confidence = confidence
self.seasonal_window_size = seasonal_window_size
self.change_history_length = change_history_length
self.error_function = error_function
self.martingale = martingale
self.power_martingale_epsilon = power_martingale_epsilon

@property
def _entrypoint(self):
return timeseriesprocessingentrypoints_ssachangepointdetector

@trace
def _get_node(self, **all_args):
algo_args = dict(
source=self.source,
name=self._name_or_source,
training_window_size=self.training_window_size,
confidence=self.confidence,
seasonal_window_size=self.seasonal_window_size,
change_history_length=self.change_history_length,
error_function=self.error_function,
martingale=self.martingale,
power_martingale_epsilon=self.power_martingale_epsilon)

all_args.update(algo_args)
return self._entrypoint(**all_args)
Loading

0 comments on commit e5a0452

Please sign in to comment.