Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Add azureml-dataprep support for dataflow objects #181

Merged
merged 18 commits into from
Jul 12, 2019
Merged

Add azureml-dataprep support for dataflow objects #181

merged 18 commits into from
Jul 12, 2019

Conversation

ganik
Copy link
Member

@ganik ganik commented Jul 10, 2019

fixes #146

@ganik ganik requested review from pieths and najeeb-kazmi July 10, 2019 23:03
@@ -284,7 +289,7 @@ class UnixMlNetInterface
// TRUSTED_PLATFORM_ASSEMBLIES
tpaList.c_str(),
// APP_PATHS
libsRoot,
dprepDirRoot,
Copy link
Member

@eerhardt eerhardt Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems wrong. Why is the DataPrep directory considered the APP_PATHS? Shouldn't it just instead be added to tpaList? #Resolved

Copy link
Member Author

@ganik ganik Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataPrep folder contains numerous dlls for azureml-dataprep package to run. Not all of them needed for in process integration that NimbusML doing here. Also many of dlls are duplicates of dotnetcore2 package (.NET Core CLR runtime). Putting DataPrep dlls on TPAList will

  1. make the list huuge.
  2. will have lot of duplicates like System.* dlls. I can of course filter out them, but this will be an additional logic step.
  3. If there are duplicates no guarantee which of them will be used at runtime.

I want to avoid this and use only dotnetcore2, ML.NET, Microsoft.DataPrep.* and Microsoft.DPrep.* dlls - these are put into TPA list. If for some reason I missed any of dlls that DPrep needs I have set probing path to Dprep folder.


In reply to: 302699171 [](ancestors = 302699171)

Copy link
Member

@eerhardt eerhardt Jul 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we treating the ML.NET and the DataPrep libraries differently? Maybe we should just put both locations on the APP_PATHS.

That way, when the next library comes along that we need to do this for, it is obvious to just add it to APP_PATHS. #Resolved

Copy link
Member Author

@ganik ganik Jul 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, next library will go same way as Dprep into APP_PATHS. Reasons I treat ML.NET and DataPrep differently bcs:

  1. Dprep is optional dependency, if its not installed NimbusML should still work in all scenario except taking input from Dprep files. ML.NET is a mandatory core dependency here. I am also very familiar with all the needed dlls. So felt TPAList is more appropriate here to list all of them.
  2. Want to be on a safer side. I am not familiar how probing with APP_PATHS work. If there are duplicate dlls in ML.NET and Dprep folder, will first in wins? With TPAList this seems not guaranteed, so could be same with APP-PATHS. I want to ensure NimbusML core ML.NET uses are untouched by this, so I used TPA List for ML.NET.
  3. With Dprep dlls - there are seems tons of them, which ones are necessary for my case and which ones are not, difficult to figure. So I packaged only few ones (<3MB total size) that are needed for JIT compilation when running NimbusML and set the APP_PATHS to Dprep folder for the rest.
    Another issue: there are mismatches in versions of *Dprep.dlls that are exposed in NuGet and installed with azureml-dataprep package. During testing I found out that I need to keep the versions of *Dprep.dll that I built against but for the rest of supporting dlls I can point to Dprep folder. We will have to do further testing to figure out correct versions of azureml-dataprep to be installed to work with built ones in NimbusML. Now it seems that latest azureml-dataprep package works.

In reply to: 302984801 [](ancestors = 302984801)

Copy link
Member

@eerhardt eerhardt Jul 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading https://docs.microsoft.com/en-us/dotnet/core/tutorials/netcore-hosting#step-3---prepare-runtime-properties, and talking with a CoreCLR dev, the recommendation here is to use the TRUSTED_PLATFORM_ASSEMBLIES for all the assemblies. From the doc:

Because the host has more control over which assemblies are loaded using the TPA list, it is a best practice for hosts to determine which assemblies they expect to load and list them explicitly.

I don't want to block you going this route, since I haven't really worked in NimbusML, but I just figured I'd give you as much information as possible to make an informed decision. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you Eric!


In reply to: 303107371 [](ancestors = 303107371)

@ganik ganik requested a review from daholste July 12, 2019 04:26
@ganik ganik merged commit c2f2b6b into microsoft:master Jul 12, 2019
@ganik ganik deleted the ganik/dprep1 branch August 8, 2019 19:40
pieths added a commit that referenced this pull request Aug 12, 2019
* Update readme with latest feedback (#39)

Updating readme with latest feedback.

* Add THIRD-PARTY-NOTICES.txt and move CONTRIBUTING.md to root. (#40)

* Initial checkin

* Move to Hosted Mac pool

* Update README.md

* Manually copied naming changes over from master.

* Revert "Merge remote-tracking branch 'upstream/temp/docs'"

This reverts commit 93c7347, reversing
changes made to 2350069.

* Improve documentation regarding contributors.

* Fix email address.

* Create CODE_OF_CONDUCT.md

* Update issue templates

* Create PULL_REQUEST_TEMPLATE.md

* Update issue templates

* Update issue templates

* Update issue templates

* Fixing link in CONTRIBUTING.md (#44)

* Update contributing.md link. (#43)

* Initial checkin for ML.NET 0.7 upgrade

* fix tests

* put back columndropper

* fix tests

* Update scikit-learn links to use https instead of http

* restart dotnetcore2 package work

* fix build

* fix mac & linux

* fix build

* fix build

* dbg build

* fix build

* fix build

* handle py 2.7

* handle py27

* fix py27

* fix build

* fix build

* fix build

* ensure dependencies

* ignore exceptions from ensure dependencies

* up version

* Update cv.py

add case for X is data frame

* Update cv.py

add a space

* add a test for cv with data frame

* set DOTNET_SYSTEM_GLOBALIZATION_INVARIANT to true to fix app domain error

* fix build

* up version

* Add instructions for editing docstrings. (#51)

* Add instructions for editing docstrings.

* Add footnote giving more information.

* Fix build failures caused by dotnetcore2 module. (#67)

* Fix importing of the dotnetcore2 module because it has inconsistent folder naming.

* Fix file check for unix platforms.

* Fix indentation levels.

* Reduce number of build legs for PR validations and add nightly build definition with more robust build matrix. (#69)

* Increase version to 0.6.5. (#71)

* Update clr helper function to search multiple folders for clr binaries. (#72)

* Update clr helper function to search multiple folders for clr binaries.

* Moved responsiblity for Python version checking to utility functions.

* Add clarifying comments.

* Fix call to get_nimbusml_libs()

* fix drop column param name

* Remove restricted permissions on build.sh script.

* Fix lightgbm test failures by updating runtime dependencies.

* fix TensorFlowScorer model_location paramter name

* Fix build.sh defaults so that it detects when running on a mac.

* Since OneHotHashVectorizer is broken for output kind Key in ML.NET 0.7, usse ToKey() for unit tests

* fix tests

* fix pyproj test

* fix win 3.6 build

* fix comments

* expose "parallel" to the fit/fit_transform function by including **param to the argument

* add a test for the parallel

* update parallel thread

* fix tests comparison

* Update thread, retry build

* modify tests

* specify pytest-cov version

* update pytest-cov version in build command for linux

* for windows use the latest pytest-cov

* Enabled strong naming for DoNetBridge.dll (to be used for InternalsVisibleTo in ML.NET)

* Changed the keys to be the same as other internal repos

* Changed the key filename

* Update to ML.NET 0.10.preview (#77)

* Updating ML.NET nugets to latest 0.9 preview.

* --generate_entrypoints phase 1

* Fixed Models.CrossValidator

* Updated all entrypoints

* New manifest.json, picket from Monte's branch

* Updated API codegen

* Replace ISchema and SchemaImpl with Schema and SchemaBuilder.

* Revert "Replace ISchema and SchemaImpl with Schema and SchemaBuilder."

This reverts commit dcd749d.

* Refactor IRowCursor to RowCursor.

* Update ML.NET version in build.csproj.

* Update manifest.json to ml.net commit 92e762686989215ddf45d9db3f0a1c989ee54d11

* Updated RunGraph.cs to ml.net 0.10

* Refactor Vbuffer

* Added override to RowCursor methods

* Update to NimbusML-privileged nugets from ML.NET.

* Update to Microsoft.ML namespace without Runtime.

* Schema and VBuffer fixes in NativeDataInterop.

* API fixes for IRandom and IsText in RmlEnvironment and NativeDataView.

* Work on getting VBuffer pointers from Spans.

* Some VBuffer fixes

* fix some class names

* Fix Register Assembly names.

* Remove ML.PipelineInference

* fixed more classes

* Add back columndropper for backward compatability.

* Register Entrypoints assembly in environment.

* Fix homebrew update problem on VS Hosted Mac images.

* Updated all the nuget versions to be the same.

* Attempt to fix the dataframe unit tests

* Fixed test_pyproj

* Optimized VBuffer changes

* Changed bridge version value to 0.10

* Addressed PR comments

* Simplify by using six.string_types (#89)

* Simplify by using six.string_types

* Force a retest

* Removed ISchema from DotNetBridge (#90)

* Removed ISchema

* Fixed the tests

* Addressed PR comments

* Addressed Wei-Sheng's comments about documenting the purpose of Column.DetachedColumn.

* add configuration for python 3.7 (#101)

* add configuration for python 3.7

* fix broken unit test

* Update build.sh

* fix build for Windows

* Linux py3.7 build

* fix pytest version

* upgrade pytest

* fix pytest-cov version

* fix isinstance(., int) for python 2.7

* build urls for Mac

* final fixes

* fix libomp

* Removing 3.7 for now as its not in PyPI

* Upgrade to ML.NET version 1.0.0 (#100)

* ref v0.10 ML.NET

* fix build

* hook up to v0.11.0 ML.NET

* fix build errors

* fix build

* include Microsoft.Data.DataView.dll in build

* typo

* remove protobuf dll

* Regenerate code due to manifest changes

* fix missing ep

* Update to ML.NET 1.0.0-preview

* fix .net build

* update nuget for ML.NET

* remove Data namespace dll

* rollback nuget changes

* move to final RC ML.NET

* Regenerate classes as per updated manifest

* fix maximum_number_of_iterations param name

* fix parameter names

* fix names

* reference official v1.0 of ML.NET

* fix tests

* fix label column

* Fix tests

* fix lightgbm tests

* fix OLS

* fix tests

* fix more tests

* fix more tests

* fix weight column name

* more tests

* fix normalized metrics

* more errors

* Fix CV

* rename feature_column to feature_column_name

* fix cv ranker

* Fix lightgbm tests

* fix changes due to upgrade of NGramFeaturizer

* fix ngram featurizer

* fix FactorizationMachine assert error

* disable test which is not working now due to change in LightGbm version

* fix model name

* typo

* handle nan in arrays

* fix tests

* fix tests

* fix more tests

* fix data type

* fix AUC exception

* kick the build

* fix tests due to data change

* fix ngram test

* fix mutual info tests

* copy libiomp lib

* fix mac build

* disable SymSgdNative for now

* disable SymSgdBinary classifier tests for Linux

* fix linux tests

* fix linux tests

* try linux

* fix linux

* skip SymSgdBinaryClassifier checks

* fix entrypoint compiler

* fix entry point generation

* fix example tests run

* fix typo

* fix documentation regression

* fix parameter name

* fix examples

* fix examples

* fix tests

* fix tests

* fix linux

* kick build

* Fix code_fixer

* fix skip take filters

* fix estimator checks

* Fix latest Windows build issues. (#105)

* Fix build issue on Windows when VS2019 is installed.

Note: The -version option could not be added directly
to the FOR command due to a command script parsing issue.

* Add missing arguments to fix build issue with latest version of autoflake.

* Fixes #50 - summary() fails if called a second time. (#107)

* Fixes #50 - summary() fails if called a second time.

* Fixes #99. Do not use hardcoded file separator. (#108)

Fixes #99. Do not use hard coded file separator.

* Delete the cached summaries when refitting a pipeline or a predictor. (#109)

* Fix build issue on Windows when VS2019 is installed.

Note: The -version option could not be added directly
to the FOR command due to a command script parsing issue.

* Add missing arguments to fix build issue with latest version of autoflake.

* Delete the cached summaries when refitting a pipeline or a predictor.
Fixes #106

* Simplify the code that deletes cached summaries when calling fit.

* Fix signature import error when using latest version of scikit-learn. (#116)

* Fix signature import error when using latest version of scikit-learn.
Fixes #111

* Move the conditional import of the signature method in to the utils package.

* Package System.Drawing.Common.dll as its missing in dotnetcore2 (#120)

* package System.Drawings.Common.dll as its missing in dotnetcore2

* typo

* Add png for Image examples

* try linux fix

* rollback scikit learn version

* test

* debug

* rollback test

* rollback

* fix fontconfig err

* fix tests

* print platform

* get os names

* test

* test

* fix linux

* Upgrade the pytest-remotedata package to fix missing attribute error. (#121)

* Upgrade the pytest-remotedata package to fix missing attribute error.
Fixes #117

* Remove the RlsMacPy3.6 configuration from .vsts-ci.yml.

* Upgrade version (#122)

* package System.Drawings.Common.dll as its missing in dotnetcore2

* typo

* Add png for Image examples

* try linux fix

* rollback scikit learn version

* test

* debug

* rollback test

* rollback

* fix fontconfig err

* fix tests

* print platform

* get os names

* test

* test

* fix linux

* Upgrade version

* Support quoted strings by default (#124)

* upgrade to ML.NET 1.1 (#126)

* upgrade to ML.NET 1.1

* by default quote is +

* assert changes due to quote

* fix tensor flow example

* Put long running tests in to their own folder to shorten build times. (#136)

* Temporarily remove the dataframe examples from the test run
to see how much that effects the test length.

* Remove all examples from the tests to see how it impacts the CI run.

* Put long running tests in to their own folder to shorten build times.

* Update nimbusml.pyproj to reflect the newly moved test files.
Forgot to save the nimbusml.pyproj in visual studio.

* Expose ML.NET SSA & IID spike & changepoint detectors. (#135)

* Initial creation of the IidSpikeDetector files to see what works and
what doesn't.

* Import the Microsoft.ML.TimeSeries assembly in to the project.

* Use 'PassAs' in manifest.json to fix the source parameter name.

* Use float32 for data dtype in IidSpikeDetector example.

* Convert IidSpikeDetector to a standard transform. Add examples and tests.

* Add pre-transform to IidSpikeDetector to fix incompatible data types.

* Fix issues with the test_estimator_checks IidSpikeDetector tests.

* Remove unnecessary TypeConverter import in IidSpikeDetector example.

* Initial implementation of IidChangePointDetector.

* Initial implementation of SsaSpikeDetector.

* Initial implementation of SsaChangePointDetector.

* Fix incorrect SsaSpikeDetector instance in test_estimator_checks.

* Fix a few minor issues with time series unit tests and examples. (#139)

* Skip Image.py and Image_df.py tests for Ubuntu 14 (#149)

* * Fixed the script for generating the documentation (#144)

* Moved _static to ci_script to solve an error while using sphinx
* Removed amek_md.bat and merge the commands of it to make_yaml.bat
* Moved metrics.rst to concepts

* Rename time_series package to timeseries. (#150)

* Fixed the issue of Ubuntu14 not skipping Image.py and Image_df.py (#161)

* Updated CharTokenizer.py example (#153)

* Skip CharTokenizer.py for extended tests (#163)

* Add support for returning custom values when overriding Pipeline.predict. (#155)

* Initial creation of the release-next.md file. (#165)

* Initial creation of the release-next.md file.

* Point the time series example links to the head of the master branch.

* Initial implementation of the SsaForecaster entry point. (#164)

* Final updates for release 1.2.0 (#167)

* Update the LightGbm entry point with the latest version from the manifest.

* Add SsaForecasting examples to the release notes.

* Add documentation modification to the release notes.

* Create the official 1.2.0 release notes. They have been put in the
docs/release-notes folder to closely match the ml.net directory
structure.

* Add correct version to the release notes title.

* Re-enable the SsaForecaster tests.

* Update to the latest version of ml.net. Update the NimbusML version.

* Fix issues with the summary unit tests.

* Comment out the SymSgdBinaryClassifier summary test. It does not
appear to be working on linux.

* Revert change b5eb937 to see if it (#168)

fixes the signed build issue.

* Bring back build.cmd commit. It did not fix the signed build issue. (#169)

* Revert change b5eb937 to see if it
fixes the signed build issue.

* Bring back commit b5eb937. It did
not fixed the signed build issue.

* Bring back the build.cmd change from b5eb937. (#170)

It did not fix the signed build issue.

* Use restored dotnet CLI for signing (#171)

* Update README.md

* Enable LinearSvmBinaryClassifier (#180)

* Enable LinearSvmBinaryClassifier, add examples, add test, and update docs

* Add test for predict_proba() and decision_function()

* Setup destructors for data passed to python (#184)

* pass destructor to python

* indent

* Add azureml-dataprep support for dataflow objects (#181)

* draft code

* draft

* delete

* add dprep dependency

* rollback

* rollback

* rollback

* test & example on using DprepDataStream

* add dprep path

* add dprep path

* fix mlnetpath

* optional dependency on dprep

* run dprep tests optionally

* fix typo

* Up sdk version

* fix linux dprep tests

* up version (#188)

* Save the model file when pickling a NimbusML Pipeline. (#189)

* Save the model file when pickling a NimbusML Pipeline.

* Add version to the pickled Pipeline.

* Add the steps attribute to a pickled Pipeline instance.

* Add extra unit test for pickled nimbusml pipelines.

* Add export_version to pickled base_pipeline_items.
Remove unnecessary export_version attribute from an unpickled Pipeline.

* Remove stored references to X and y in BasePredictor. (#195)

* Remove stored references to X and y in BasePredictor.

* Remove unnecessary scikit-learn import.

* Add observation level feature contributions to Pipeline and BasePredictor (#196)

* Add get_feature_contributions to Pipeline and BasePredictor, add example

* Add tests

* Update release-next.md

* Add classes_ to Pipeline and/or predictor when calling predict_proba. (#200)

* Add classes_ to Pipeline and/or predictor when calling predict_proba.

* Update test_estimator_checks.py to skip the check_dict_unchanged
test for any estimator which supports predict_proba or decision_function.

* Update Handler, Filter, and Indicator to automatically convert the input columns to float before performing the transform. (#204)

Fixes #203.

* Combine models from transforms, predictors and pipelines in to one model. (#208)

* Initial test implementation of combining 2 or more models in to one.

* Added support to Pipeline.combine_models for combining other types of items
and transform only inputs.

* Combine Pipeline._evaluation_infer and _evaluation in to one method.
This fixes an issue where a classifier graph would not contain the
correct nodes after calling Pipeline._predict().

* Missing part of previous check-in.

* Fix the Pipeline.combine_models signature to work with Python 2.7.

* Fix build (#209)

* T

* Fix cert

* Update release-next.md. (#211)

* Update release-next.md

* Update release-next.md

* Update release-next.md

* Add classifier and FileDataStream unit tests to test_pipeline_combining. (#212)

Add classifier and FileDataStream unit tests to test_pipeline_combining.

* Update release-next.md

* up version (#210)

* up version

* Up the version

* renamed factorization lib

* remove matrix factorization lib ref

* dbg libs

* fix libtensorflow framework

* package more libs

* add mkl proxy

* Enable EnsembleClassifier and EnsembleRegressor (#207)

* Enable EnsembleClassifier

* nit

* Enable EnsembleRegressor

* Add output combiners

* Add sub model selectors

* Update examples

* Add documentation for components

* Add diversity measure

* Improve examples

* Add tests

* Fix test_estimator_checks

* Create release notes for version 1.3.0. (#214)

* Update release-1.3.0.md

* Add --installPythonPackages flag to build scripts (#215)

* Add --installPythonPackages flag to build scripts

* close if statement in build.sh

* fix --runTestsOnly

* Fix a bug with the classes_ attribute when no y input is specified during fitting. (#218)

Fixes #216

* Add NumSharp.Core.dll (#220)

* Add timeseries documentation to the master branch. (#221)

* Ensure manifest.json is the latest version and run entry point compiler.

* Remove the non-underscore files that were introduced during the merge.

* Fix remaining unintended differences between branch and master.

* Remove unnecessary underscores from nimbusml.pyproj
najeeb-kazmi added a commit that referenced this pull request Oct 10, 2019
* Update readme with latest feedback (#39)

Updating readme with latest feedback.

* Add THIRD-PARTY-NOTICES.txt and move CONTRIBUTING.md to root. (#40)

* Initial checkin

* Move to Hosted Mac pool

* Update README.md

* Manually copied naming changes over from master.

* Revert "Merge remote-tracking branch 'upstream/temp/docs'"

This reverts commit 93c7347, reversing
changes made to 2350069.

* Improve documentation regarding contributors.

* Fix email address.

* Create CODE_OF_CONDUCT.md

* Update issue templates

* Create PULL_REQUEST_TEMPLATE.md

* Update issue templates

* Update issue templates

* Update issue templates

* Fixing link in CONTRIBUTING.md (#44)

* Update contributing.md link. (#43)

* Initial checkin for ML.NET 0.7 upgrade

* fix tests

* put back columndropper

* fix tests

* Update scikit-learn links to use https instead of http

* restart dotnetcore2 package work

* fix build

* fix mac & linux

* fix build

* fix build

* dbg build

* fix build

* fix build

* handle py 2.7

* handle py27

* fix py27

* fix build

* fix build

* fix build

* ensure dependencies

* ignore exceptions from ensure dependencies

* up version

* Update cv.py

add case for X is data frame

* Update cv.py

add a space

* add a test for cv with data frame

* set DOTNET_SYSTEM_GLOBALIZATION_INVARIANT to true to fix app domain error

* fix build

* up version

* Add instructions for editing docstrings. (#51)

* Add instructions for editing docstrings.

* Add footnote giving more information.

* Fix build failures caused by dotnetcore2 module. (#67)

* Fix importing of the dotnetcore2 module because it has inconsistent folder naming.

* Fix file check for unix platforms.

* Fix indentation levels.

* Reduce number of build legs for PR validations and add nightly build definition with more robust build matrix. (#69)

* Increase version to 0.6.5. (#71)

* Update clr helper function to search multiple folders for clr binaries. (#72)

* Update clr helper function to search multiple folders for clr binaries.

* Moved responsiblity for Python version checking to utility functions.

* Add clarifying comments.

* Fix call to get_nimbusml_libs()

* fix drop column param name

* Remove restricted permissions on build.sh script.

* Fix lightgbm test failures by updating runtime dependencies.

* fix TensorFlowScorer model_location paramter name

* Fix build.sh defaults so that it detects when running on a mac.

* Since OneHotHashVectorizer is broken for output kind Key in ML.NET 0.7, usse ToKey() for unit tests

* fix tests

* fix pyproj test

* fix win 3.6 build

* fix comments

* expose "parallel" to the fit/fit_transform function by including **param to the argument

* add a test for the parallel

* update parallel thread

* fix tests comparison

* Update thread, retry build

* modify tests

* specify pytest-cov version

* update pytest-cov version in build command for linux

* for windows use the latest pytest-cov

* Enabled strong naming for DoNetBridge.dll (to be used for InternalsVisibleTo in ML.NET)

* Changed the keys to be the same as other internal repos

* Changed the key filename

* Update to ML.NET 0.10.preview (#77)

* Updating ML.NET nugets to latest 0.9 preview.

* --generate_entrypoints phase 1

* Fixed Models.CrossValidator

* Updated all entrypoints

* New manifest.json, picket from Monte's branch

* Updated API codegen

* Replace ISchema and SchemaImpl with Schema and SchemaBuilder.

* Revert "Replace ISchema and SchemaImpl with Schema and SchemaBuilder."

This reverts commit dcd749d.

* Refactor IRowCursor to RowCursor.

* Update ML.NET version in build.csproj.

* Update manifest.json to ml.net commit 92e762686989215ddf45d9db3f0a1c989ee54d11

* Updated RunGraph.cs to ml.net 0.10

* Refactor Vbuffer

* Added override to RowCursor methods

* Update to NimbusML-privileged nugets from ML.NET.

* Update to Microsoft.ML namespace without Runtime.

* Schema and VBuffer fixes in NativeDataInterop.

* API fixes for IRandom and IsText in RmlEnvironment and NativeDataView.

* Work on getting VBuffer pointers from Spans.

* Some VBuffer fixes

* fix some class names

* Fix Register Assembly names.

* Remove ML.PipelineInference

* fixed more classes

* Add back columndropper for backward compatability.

* Register Entrypoints assembly in environment.

* Fix homebrew update problem on VS Hosted Mac images.

* Updated all the nuget versions to be the same.

* Attempt to fix the dataframe unit tests

* Fixed test_pyproj

* Optimized VBuffer changes

* Changed bridge version value to 0.10

* Addressed PR comments

* Simplify by using six.string_types (#89)

* Simplify by using six.string_types

* Force a retest

* Removed ISchema from DotNetBridge (#90)

* Removed ISchema

* Fixed the tests

* Addressed PR comments

* Addressed Wei-Sheng's comments about documenting the purpose of Column.DetachedColumn.

* add configuration for python 3.7 (#101)

* add configuration for python 3.7

* fix broken unit test

* Update build.sh

* fix build for Windows

* Linux py3.7 build

* fix pytest version

* upgrade pytest

* fix pytest-cov version

* fix isinstance(., int) for python 2.7

* build urls for Mac

* final fixes

* fix libomp

* Removing 3.7 for now as its not in PyPI

* Upgrade to ML.NET version 1.0.0 (#100)

* ref v0.10 ML.NET

* fix build

* hook up to v0.11.0 ML.NET

* fix build errors

* fix build

* include Microsoft.Data.DataView.dll in build

* typo

* remove protobuf dll

* Regenerate code due to manifest changes

* fix missing ep

* Update to ML.NET 1.0.0-preview

* fix .net build

* update nuget for ML.NET

* remove Data namespace dll

* rollback nuget changes

* move to final RC ML.NET

* Regenerate classes as per updated manifest

* fix maximum_number_of_iterations param name

* fix parameter names

* fix names

* reference official v1.0 of ML.NET

* fix tests

* fix label column

* Fix tests

* fix lightgbm tests

* fix OLS

* fix tests

* fix more tests

* fix more tests

* fix weight column name

* more tests

* fix normalized metrics

* more errors

* Fix CV

* rename feature_column to feature_column_name

* fix cv ranker

* Fix lightgbm tests

* fix changes due to upgrade of NGramFeaturizer

* fix ngram featurizer

* fix FactorizationMachine assert error

* disable test which is not working now due to change in LightGbm version

* fix model name

* typo

* handle nan in arrays

* fix tests

* fix tests

* fix more tests

* fix data type

* fix AUC exception

* kick the build

* fix tests due to data change

* fix ngram test

* fix mutual info tests

* copy libiomp lib

* fix mac build

* disable SymSgdNative for now

* disable SymSgdBinary classifier tests for Linux

* fix linux tests

* fix linux tests

* try linux

* fix linux

* skip SymSgdBinaryClassifier checks

* fix entrypoint compiler

* fix entry point generation

* fix example tests run

* fix typo

* fix documentation regression

* fix parameter name

* fix examples

* fix examples

* fix tests

* fix tests

* fix linux

* kick build

* Fix code_fixer

* fix skip take filters

* fix estimator checks

* Fix latest Windows build issues. (#105)

* Fix build issue on Windows when VS2019 is installed.

Note: The -version option could not be added directly
to the FOR command due to a command script parsing issue.

* Add missing arguments to fix build issue with latest version of autoflake.

* Fixes #50 - summary() fails if called a second time. (#107)

* Fixes #50 - summary() fails if called a second time.

* Fixes #99. Do not use hardcoded file separator. (#108)

Fixes #99. Do not use hard coded file separator.

* Delete the cached summaries when refitting a pipeline or a predictor. (#109)

* Fix build issue on Windows when VS2019 is installed.

Note: The -version option could not be added directly
to the FOR command due to a command script parsing issue.

* Add missing arguments to fix build issue with latest version of autoflake.

* Delete the cached summaries when refitting a pipeline or a predictor.
Fixes #106

* Simplify the code that deletes cached summaries when calling fit.

* Fix signature import error when using latest version of scikit-learn. (#116)

* Fix signature import error when using latest version of scikit-learn.
Fixes #111

* Move the conditional import of the signature method in to the utils package.

* Package System.Drawing.Common.dll as its missing in dotnetcore2 (#120)

* package System.Drawings.Common.dll as its missing in dotnetcore2

* typo

* Add png for Image examples

* try linux fix

* rollback scikit learn version

* test

* debug

* rollback test

* rollback

* fix fontconfig err

* fix tests

* print platform

* get os names

* test

* test

* fix linux

* Upgrade the pytest-remotedata package to fix missing attribute error. (#121)

* Upgrade the pytest-remotedata package to fix missing attribute error.
Fixes #117

* Remove the RlsMacPy3.6 configuration from .vsts-ci.yml.

* Upgrade version (#122)

* package System.Drawings.Common.dll as its missing in dotnetcore2

* typo

* Add png for Image examples

* try linux fix

* rollback scikit learn version

* test

* debug

* rollback test

* rollback

* fix fontconfig err

* fix tests

* print platform

* get os names

* test

* test

* fix linux

* Upgrade version

* Support quoted strings by default (#124)

* upgrade to ML.NET 1.1 (#126)

* upgrade to ML.NET 1.1

* by default quote is +

* assert changes due to quote

* fix tensor flow example

* Put long running tests in to their own folder to shorten build times. (#136)

* Temporarily remove the dataframe examples from the test run
to see how much that effects the test length.

* Remove all examples from the tests to see how it impacts the CI run.

* Put long running tests in to their own folder to shorten build times.

* Update nimbusml.pyproj to reflect the newly moved test files.
Forgot to save the nimbusml.pyproj in visual studio.

* Expose ML.NET SSA & IID spike & changepoint detectors. (#135)

* Initial creation of the IidSpikeDetector files to see what works and
what doesn't.

* Import the Microsoft.ML.TimeSeries assembly in to the project.

* Use 'PassAs' in manifest.json to fix the source parameter name.

* Use float32 for data dtype in IidSpikeDetector example.

* Convert IidSpikeDetector to a standard transform. Add examples and tests.

* Add pre-transform to IidSpikeDetector to fix incompatible data types.

* Fix issues with the test_estimator_checks IidSpikeDetector tests.

* Remove unnecessary TypeConverter import in IidSpikeDetector example.

* Initial implementation of IidChangePointDetector.

* Initial implementation of SsaSpikeDetector.

* Initial implementation of SsaChangePointDetector.

* Fix incorrect SsaSpikeDetector instance in test_estimator_checks.

* Fix a few minor issues with time series unit tests and examples. (#139)

* Skip Image.py and Image_df.py tests for Ubuntu 14 (#149)

* * Fixed the script for generating the documentation (#144)

* Moved _static to ci_script to solve an error while using sphinx
* Removed amek_md.bat and merge the commands of it to make_yaml.bat
* Moved metrics.rst to concepts

* Rename time_series package to timeseries. (#150)

* Fixed the issue of Ubuntu14 not skipping Image.py and Image_df.py (#161)

* Updated CharTokenizer.py example (#153)

* Skip CharTokenizer.py for extended tests (#163)

* Add support for returning custom values when overriding Pipeline.predict. (#155)

* Initial creation of the release-next.md file. (#165)

* Initial creation of the release-next.md file.

* Point the time series example links to the head of the master branch.

* Initial implementation of the SsaForecaster entry point. (#164)

* Final updates for release 1.2.0 (#167)

* Update the LightGbm entry point with the latest version from the manifest.

* Add SsaForecasting examples to the release notes.

* Add documentation modification to the release notes.

* Create the official 1.2.0 release notes. They have been put in the
docs/release-notes folder to closely match the ml.net directory
structure.

* Add correct version to the release notes title.

* Re-enable the SsaForecaster tests.

* Update to the latest version of ml.net. Update the NimbusML version.

* Fix issues with the summary unit tests.

* Comment out the SymSgdBinaryClassifier summary test. It does not
appear to be working on linux.

* Revert change b5eb937 to see if it (#168)

fixes the signed build issue.

* Bring back build.cmd commit. It did not fix the signed build issue. (#169)

* Revert change b5eb937 to see if it
fixes the signed build issue.

* Bring back commit b5eb937. It did
not fixed the signed build issue.

* Bring back the build.cmd change from b5eb937. (#170)

It did not fix the signed build issue.

* Use restored dotnet CLI for signing (#171)

* Update README.md

* Enable LinearSvmBinaryClassifier (#180)

* Enable LinearSvmBinaryClassifier, add examples, add test, and update docs

* Add test for predict_proba() and decision_function()

* Setup destructors for data passed to python (#184)

* pass destructor to python

* indent

* Add azureml-dataprep support for dataflow objects (#181)

* draft code

* draft

* delete

* add dprep dependency

* rollback

* rollback

* rollback

* test & example on using DprepDataStream

* add dprep path

* add dprep path

* fix mlnetpath

* optional dependency on dprep

* run dprep tests optionally

* fix typo

* Up sdk version

* fix linux dprep tests

* up version (#188)

* Save the model file when pickling a NimbusML Pipeline. (#189)

* Save the model file when pickling a NimbusML Pipeline.

* Add version to the pickled Pipeline.

* Add the steps attribute to a pickled Pipeline instance.

* Add extra unit test for pickled nimbusml pipelines.

* Add export_version to pickled base_pipeline_items.
Remove unnecessary export_version attribute from an unpickled Pipeline.

* Remove stored references to X and y in BasePredictor. (#195)

* Remove stored references to X and y in BasePredictor.

* Remove unnecessary scikit-learn import.

* Add observation level feature contributions to Pipeline and BasePredictor (#196)

* Add get_feature_contributions to Pipeline and BasePredictor, add example

* Add tests

* Update release-next.md

* Add classes_ to Pipeline and/or predictor when calling predict_proba. (#200)

* Add classes_ to Pipeline and/or predictor when calling predict_proba.

* Update test_estimator_checks.py to skip the check_dict_unchanged
test for any estimator which supports predict_proba or decision_function.

* Update Handler, Filter, and Indicator to automatically convert the input columns to float before performing the transform. (#204)

Fixes #203.

* Combine models from transforms, predictors and pipelines in to one model. (#208)

* Initial test implementation of combining 2 or more models in to one.

* Added support to Pipeline.combine_models for combining other types of items
and transform only inputs.

* Combine Pipeline._evaluation_infer and _evaluation in to one method.
This fixes an issue where a classifier graph would not contain the
correct nodes after calling Pipeline._predict().

* Missing part of previous check-in.

* Fix the Pipeline.combine_models signature to work with Python 2.7.

* Fix build (#209)

* T

* Fix cert

* Update release-next.md. (#211)

* Update release-next.md

* Update release-next.md

* Update release-next.md

* Add classifier and FileDataStream unit tests to test_pipeline_combining. (#212)

Add classifier and FileDataStream unit tests to test_pipeline_combining.

* Update release-next.md

* up version (#210)

* up version

* Up the version

* renamed factorization lib

* remove matrix factorization lib ref

* dbg libs

* fix libtensorflow framework

* package more libs

* add mkl proxy

* Enable EnsembleClassifier and EnsembleRegressor (#207)

* Enable EnsembleClassifier

* nit

* Enable EnsembleRegressor

* Add output combiners

* Add sub model selectors

* Update examples

* Add documentation for components

* Add diversity measure

* Improve examples

* Add tests

* Fix test_estimator_checks

* Create release notes for version 1.3.0. (#214)

* Update release-1.3.0.md

* Add --installPythonPackages flag to build scripts (#215)

* Add --installPythonPackages flag to build scripts

* close if statement in build.sh

* fix --runTestsOnly

* Fix a bug with the classes_ attribute when no y input is specified during fitting. (#218)

Fixes #216

* Add NumSharp.Core.dll (#220)

* Add timeseries documentation to the master branch. (#221)

* Docs update (#224)

* Fix documentation

* Few more

* More doc fixes (#228)

* More doc fixes

* A few nits

* Pass python path to Dprep (#232)

* remove Dprep* dll from wheel (#235)

* remove Dprep* dll from wheel

* Move Dprep calls into separate class

* test

* remove DprepLoader

* clean unused code (#236)

* clean unused code

* fix tests changes due to seed changes

* remove max_slots from graph

* delete Dprep dlls from python2.7

* fix linux extended tests for TensorFlow

* fix tests

* fix tests

* rollback

* fix tests

* disable estimator check

* fix tests

* fix tests again

* fix tabbing
removing -r from rm command

* remove experimental

* Enable scoring of ML.NET models saved with new TransformerChain format (#230)

* Handle new ML.NET model format for predictions

* fix

* use with{} statement with ZipFile

* Add initial implementation of DatasetTransformer. (#240)

* Update release-next for the 1.4 release. (#252)

* Update release-next.md

* Upgrade to ML.NET 1.4 (#251)

* Upgrade to ML.NET 1.4

* preview bits

* update refs

* Fix casing for the installPythonPackages build.sh argument. (#256)

* Rename lambda_ to l2_regularization in LinearSvmBinaryClasifier (#259)

* Initial implementation of csr_matrix output support. (#250)

* Initial implementation of csr_matrix output support.

* Whitespace change to kick off another build. The CentOs test run crashed.

* Rename as per comment

* Initial implementation of LpNormalizer. (#253)

* Initial implementation of LpNormalizer.

* Rename to LpScaler

* fix build

* fix casing

* up version (#262)

* Remove scikit-learn testing module from normal flow (#265)

* remove scikit learn testing module from normal flow

* fix build

* fix build

* Fix issue when using predict_proba or decision_function with combined models. (#272)

* Output predictor model file optionally (#270)

* Output predictor model file optionally

* fix comment

* fix unit tests

* Draft of ColumnConcat transform that takes in a prefix instead

* fix test

* fix test

* PrefixColumnConcat transform

* fix entrypoint namespace

* fix exception

* Handle no match scenario

* add exampl & test

* add test

* fix comments

* fix comments

* fix example

* Providing error message to python in exception (#273)

* spit out error message to python
upgrade patch version

* fix the test

* another test

* rollback

* Add I8 support to CSR matrix output. (#276)

* Get column names for transform model (#278)

* draft for schema

* resolve conflict

* debug pieces

* Few perf tricks

* rollback prints

* few perf tricks

* perf tricks

* fix csr

* set 0 byte

* Update schema example.

* Convert return value to list.

* Update schema example to use new list return value.

* Fix naming in Pipeline.get_schema.

* Add initial unit tests for Pipeline.get_schema().

* Check length in Pipeline.get_schema unit tests.

* few perf tricks

* fix linux tests

* rollback

* Temporarily use 'inclusive' test instead of positional test for columns since order is not valid in Python 2.6 and 3.5.

* fix comments

* Add variable length vector support (#267)

* Update Schema.py to remove the non-ASCII character (#291)

* Fix Pipeline._extract_classes_from_headers was not checking for valid steps. (#292)

* Save predictor_model when pickling a pipeline. (#295)

* Initial implementation of the WordTokenizer transform. (#296)

* Remove summary validation in Pipeline and enable the summary tests for the tree based predictors. (#298)

* Turn on dprep unit tests for all platforms and python versions except 2.7 (#303)

* Fix bug in Pipeline.transform() (#294)

* Remove unnecessary code from Pipeline.transform that was causing a bug

* Update release-next.md

* Remove y argument from transform() method

* Update release-next.md

* Fix test

* Fixed building of NimbusML with Python 3.5 on Windows (and other versions of Python) (#297)

* Update Schema.py to remove the non-ASCII character

* Update build.cmd

* Update build.cmd

* Update build.cmd

* Revert "Update build.cmd"

This reverts commit cb79b9d.

* Upgreate pip for all Python versions

* Update release notes. (#306)

* Added libtensorflow_framework.so.1 (#310)

* Add Permutation Feature Importance (PFI) (#279)

* Add PFI entrypoint

* Add PFI to Pipeline and BasePipelineItem, and examples

* Improved docs and sample

* Load model as PredictorModel, and remove label column and group ID column from EP inputs

* schema example reference

* Add test

* nit

* Update release-next.md

* Add tests to check PFI from loaded model

* Make SgdBinaryClassifier deterministic in test_estimator_checks.py

* Update ML.NET nugets to 1.4.0-preview2 and 0.16.0-preview2

* Fix test baseline values

* Fix Ranking PFI column names to work with with Py2.7 and Py3.5

* Initial implementation of DateTime input and output column support. (#290)

* Add support for DateTime output.

* Add support for DateTime input columns.

* Add unit test for DateTime column input and output.

* Fix DateTime.Kind == Unspecified output from dprep.

* Update the csproj files to point to the latest nuget packages.

* Update the Tensorflow.NET library version.

* Fix azureml dprep not available for Python 2.7

* Fix missing sys import.

* Fix broken assertEqual on Python 3.5.

* Fix BinaryDataStream not valid as input for transformer. (#307)

* Add test for fitting a BinaryDataStream.

* Use BinaryDataStream schema for retrieving feature columns in _init_graph_nodes.

* Add idv schema to BinaryDataStream.

* Fix DprepDataStream was passing in incorrect value to base class constructor.

* Remove column position check from unit test since it is unreliable on Python 3.5 and 2.7.

* Issue 300 (#311)

* Temporarily change running Mac pipeline to Python 3.6

* Temporary addition to view state of "result" in MacOS with Python 3.6

* Added additional temporary Python builds on Mac

* Added libtensorflow_framework.so.1 (#310)

* Revert "Temporary addition to view state of "result" in MacOS with Python 3.6"

This reverts commit d116dc8.

* Updated test_data_with_missing.test_input_conversion_to_float()

* Update test_data_with_missing.py

* Revert "Added additional temporary Python builds on Mac"

This reverts commit 1aa1526.

* Revert "Temporarily change running Mac pipeline to Python 3.6"

This reverts commit 4ec36fb.

* allow csr_matrix as input to predict_proba() (#305)

* draft

* draft

* rollback

* new entrypoint

* add assert

* rollback

* no print in test

* up version

* only Single type is allowed for Feature vector

* fix comments, rename entrypoint

* convert to single

* fix type

* add feature contribution test

* rename pipeline.get_schema() to pipeline.gat_output_columns()

* fix build

* Update release notes. (#312)

* Turn off shuffling for FactorizationMachineBinaryClassifier. (#316)

* Fix imports

* Fix few more conflicts and build

* Fix one more import

* Fix nimbusml.pyroj
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataprep integration
3 participants