-
Notifications
You must be signed in to change notification settings - Fork 62
Conversation
y shouldn't be here to begin with as it serves no purpose in Removing this may cause back-compat issues. Thoughts? @ganik #Resolved Refers to: src/python/nimbusml/pipeline.py:2252 in b74c64d. [](commit_id = b74c64d, deletion_comment = False) |
X, y_temp, columns_renamed, feature_columns, label_column, \ | ||
schema, weights, weight_column = self._preprocess_X_y(X, y) | ||
|
||
if not isinstance(y, (str, tuple)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tuple [](start = 35, length = 5)
when y could be a tuple? How y_temp is used now? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests pass without it, but I was more concerned about users having used transform(X, y) - not because there would be any reason to but just because it is there someone might have. In reply to: 537914880 [](ancestors = 537914880,537726367) Refers to: src/python/nimbusml/pipeline.py:2252 in b74c64d. [](commit_id = b74c64d, deletion_comment = False) |
i am ok with these breaking changes. its possible it could break someone, however its an easy fix. also since y is not needed i expect only few users going to be affected In reply to: 538134034 [](ancestors = 538134034,537914880,537726367) Refers to: src/python/nimbusml/pipeline.py:2252 in b74c64d. [](commit_id = b74c64d, deletion_comment = False) |
scikit Pipeline does not do .transform() when the last step is not a transform. When the last step is a transform, it does not accept .transform(x, y), only .transform(X) In reply to: 538150331 [](ancestors = 538150331,538134034,537914880,537726367) Refers to: src/python/nimbusml/pipeline.py:2252 in b74c64d. [](commit_id = b74c64d, deletion_comment = False) |
|
||
if not isinstance(y, (str, tuple)): | ||
y = y_temp | ||
X, _, _, _, _, schema, _, _ = self._preprocess_X_y(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_preprocess_X_y [](start = 43, length = 15)
Do you need to call this at all? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Update readme with latest feedback (#39) Updating readme with latest feedback. * Add THIRD-PARTY-NOTICES.txt and move CONTRIBUTING.md to root. (#40) * Initial checkin * Move to Hosted Mac pool * Update README.md * Manually copied naming changes over from master. * Revert "Merge remote-tracking branch 'upstream/temp/docs'" This reverts commit 93c7347, reversing changes made to 2350069. * Improve documentation regarding contributors. * Fix email address. * Create CODE_OF_CONDUCT.md * Update issue templates * Create PULL_REQUEST_TEMPLATE.md * Update issue templates * Update issue templates * Update issue templates * Fixing link in CONTRIBUTING.md (#44) * Update contributing.md link. (#43) * Initial checkin for ML.NET 0.7 upgrade * fix tests * put back columndropper * fix tests * Update scikit-learn links to use https instead of http * restart dotnetcore2 package work * fix build * fix mac & linux * fix build * fix build * dbg build * fix build * fix build * handle py 2.7 * handle py27 * fix py27 * fix build * fix build * fix build * ensure dependencies * ignore exceptions from ensure dependencies * up version * Update cv.py add case for X is data frame * Update cv.py add a space * add a test for cv with data frame * set DOTNET_SYSTEM_GLOBALIZATION_INVARIANT to true to fix app domain error * fix build * up version * Add instructions for editing docstrings. (#51) * Add instructions for editing docstrings. * Add footnote giving more information. * Fix build failures caused by dotnetcore2 module. (#67) * Fix importing of the dotnetcore2 module because it has inconsistent folder naming. * Fix file check for unix platforms. * Fix indentation levels. * Reduce number of build legs for PR validations and add nightly build definition with more robust build matrix. (#69) * Increase version to 0.6.5. (#71) * Update clr helper function to search multiple folders for clr binaries. (#72) * Update clr helper function to search multiple folders for clr binaries. * Moved responsiblity for Python version checking to utility functions. * Add clarifying comments. * Fix call to get_nimbusml_libs() * fix drop column param name * Remove restricted permissions on build.sh script. * Fix lightgbm test failures by updating runtime dependencies. * fix TensorFlowScorer model_location paramter name * Fix build.sh defaults so that it detects when running on a mac. * Since OneHotHashVectorizer is broken for output kind Key in ML.NET 0.7, usse ToKey() for unit tests * fix tests * fix pyproj test * fix win 3.6 build * fix comments * expose "parallel" to the fit/fit_transform function by including **param to the argument * add a test for the parallel * update parallel thread * fix tests comparison * Update thread, retry build * modify tests * specify pytest-cov version * update pytest-cov version in build command for linux * for windows use the latest pytest-cov * Enabled strong naming for DoNetBridge.dll (to be used for InternalsVisibleTo in ML.NET) * Changed the keys to be the same as other internal repos * Changed the key filename * Update to ML.NET 0.10.preview (#77) * Updating ML.NET nugets to latest 0.9 preview. * --generate_entrypoints phase 1 * Fixed Models.CrossValidator * Updated all entrypoints * New manifest.json, picket from Monte's branch * Updated API codegen * Replace ISchema and SchemaImpl with Schema and SchemaBuilder. * Revert "Replace ISchema and SchemaImpl with Schema and SchemaBuilder." This reverts commit dcd749d. * Refactor IRowCursor to RowCursor. * Update ML.NET version in build.csproj. * Update manifest.json to ml.net commit 92e762686989215ddf45d9db3f0a1c989ee54d11 * Updated RunGraph.cs to ml.net 0.10 * Refactor Vbuffer * Added override to RowCursor methods * Update to NimbusML-privileged nugets from ML.NET. * Update to Microsoft.ML namespace without Runtime. * Schema and VBuffer fixes in NativeDataInterop. * API fixes for IRandom and IsText in RmlEnvironment and NativeDataView. * Work on getting VBuffer pointers from Spans. * Some VBuffer fixes * fix some class names * Fix Register Assembly names. * Remove ML.PipelineInference * fixed more classes * Add back columndropper for backward compatability. * Register Entrypoints assembly in environment. * Fix homebrew update problem on VS Hosted Mac images. * Updated all the nuget versions to be the same. * Attempt to fix the dataframe unit tests * Fixed test_pyproj * Optimized VBuffer changes * Changed bridge version value to 0.10 * Addressed PR comments * Simplify by using six.string_types (#89) * Simplify by using six.string_types * Force a retest * Removed ISchema from DotNetBridge (#90) * Removed ISchema * Fixed the tests * Addressed PR comments * Addressed Wei-Sheng's comments about documenting the purpose of Column.DetachedColumn. * add configuration for python 3.7 (#101) * add configuration for python 3.7 * fix broken unit test * Update build.sh * fix build for Windows * Linux py3.7 build * fix pytest version * upgrade pytest * fix pytest-cov version * fix isinstance(., int) for python 2.7 * build urls for Mac * final fixes * fix libomp * Removing 3.7 for now as its not in PyPI * Upgrade to ML.NET version 1.0.0 (#100) * ref v0.10 ML.NET * fix build * hook up to v0.11.0 ML.NET * fix build errors * fix build * include Microsoft.Data.DataView.dll in build * typo * remove protobuf dll * Regenerate code due to manifest changes * fix missing ep * Update to ML.NET 1.0.0-preview * fix .net build * update nuget for ML.NET * remove Data namespace dll * rollback nuget changes * move to final RC ML.NET * Regenerate classes as per updated manifest * fix maximum_number_of_iterations param name * fix parameter names * fix names * reference official v1.0 of ML.NET * fix tests * fix label column * Fix tests * fix lightgbm tests * fix OLS * fix tests * fix more tests * fix more tests * fix weight column name * more tests * fix normalized metrics * more errors * Fix CV * rename feature_column to feature_column_name * fix cv ranker * Fix lightgbm tests * fix changes due to upgrade of NGramFeaturizer * fix ngram featurizer * fix FactorizationMachine assert error * disable test which is not working now due to change in LightGbm version * fix model name * typo * handle nan in arrays * fix tests * fix tests * fix more tests * fix data type * fix AUC exception * kick the build * fix tests due to data change * fix ngram test * fix mutual info tests * copy libiomp lib * fix mac build * disable SymSgdNative for now * disable SymSgdBinary classifier tests for Linux * fix linux tests * fix linux tests * try linux * fix linux * skip SymSgdBinaryClassifier checks * fix entrypoint compiler * fix entry point generation * fix example tests run * fix typo * fix documentation regression * fix parameter name * fix examples * fix examples * fix tests * fix tests * fix linux * kick build * Fix code_fixer * fix skip take filters * fix estimator checks * Fix latest Windows build issues. (#105) * Fix build issue on Windows when VS2019 is installed. Note: The -version option could not be added directly to the FOR command due to a command script parsing issue. * Add missing arguments to fix build issue with latest version of autoflake. * Fixes #50 - summary() fails if called a second time. (#107) * Fixes #50 - summary() fails if called a second time. * Fixes #99. Do not use hardcoded file separator. (#108) Fixes #99. Do not use hard coded file separator. * Delete the cached summaries when refitting a pipeline or a predictor. (#109) * Fix build issue on Windows when VS2019 is installed. Note: The -version option could not be added directly to the FOR command due to a command script parsing issue. * Add missing arguments to fix build issue with latest version of autoflake. * Delete the cached summaries when refitting a pipeline or a predictor. Fixes #106 * Simplify the code that deletes cached summaries when calling fit. * Fix signature import error when using latest version of scikit-learn. (#116) * Fix signature import error when using latest version of scikit-learn. Fixes #111 * Move the conditional import of the signature method in to the utils package. * Package System.Drawing.Common.dll as its missing in dotnetcore2 (#120) * package System.Drawings.Common.dll as its missing in dotnetcore2 * typo * Add png for Image examples * try linux fix * rollback scikit learn version * test * debug * rollback test * rollback * fix fontconfig err * fix tests * print platform * get os names * test * test * fix linux * Upgrade the pytest-remotedata package to fix missing attribute error. (#121) * Upgrade the pytest-remotedata package to fix missing attribute error. Fixes #117 * Remove the RlsMacPy3.6 configuration from .vsts-ci.yml. * Upgrade version (#122) * package System.Drawings.Common.dll as its missing in dotnetcore2 * typo * Add png for Image examples * try linux fix * rollback scikit learn version * test * debug * rollback test * rollback * fix fontconfig err * fix tests * print platform * get os names * test * test * fix linux * Upgrade version * Support quoted strings by default (#124) * upgrade to ML.NET 1.1 (#126) * upgrade to ML.NET 1.1 * by default quote is + * assert changes due to quote * fix tensor flow example * Put long running tests in to their own folder to shorten build times. (#136) * Temporarily remove the dataframe examples from the test run to see how much that effects the test length. * Remove all examples from the tests to see how it impacts the CI run. * Put long running tests in to their own folder to shorten build times. * Update nimbusml.pyproj to reflect the newly moved test files. Forgot to save the nimbusml.pyproj in visual studio. * Expose ML.NET SSA & IID spike & changepoint detectors. (#135) * Initial creation of the IidSpikeDetector files to see what works and what doesn't. * Import the Microsoft.ML.TimeSeries assembly in to the project. * Use 'PassAs' in manifest.json to fix the source parameter name. * Use float32 for data dtype in IidSpikeDetector example. * Convert IidSpikeDetector to a standard transform. Add examples and tests. * Add pre-transform to IidSpikeDetector to fix incompatible data types. * Fix issues with the test_estimator_checks IidSpikeDetector tests. * Remove unnecessary TypeConverter import in IidSpikeDetector example. * Initial implementation of IidChangePointDetector. * Initial implementation of SsaSpikeDetector. * Initial implementation of SsaChangePointDetector. * Fix incorrect SsaSpikeDetector instance in test_estimator_checks. * Fix a few minor issues with time series unit tests and examples. (#139) * Skip Image.py and Image_df.py tests for Ubuntu 14 (#149) * * Fixed the script for generating the documentation (#144) * Moved _static to ci_script to solve an error while using sphinx * Removed amek_md.bat and merge the commands of it to make_yaml.bat * Moved metrics.rst to concepts * Rename time_series package to timeseries. (#150) * Fixed the issue of Ubuntu14 not skipping Image.py and Image_df.py (#161) * Updated CharTokenizer.py example (#153) * Skip CharTokenizer.py for extended tests (#163) * Add support for returning custom values when overriding Pipeline.predict. (#155) * Initial creation of the release-next.md file. (#165) * Initial creation of the release-next.md file. * Point the time series example links to the head of the master branch. * Initial implementation of the SsaForecaster entry point. (#164) * Final updates for release 1.2.0 (#167) * Update the LightGbm entry point with the latest version from the manifest. * Add SsaForecasting examples to the release notes. * Add documentation modification to the release notes. * Create the official 1.2.0 release notes. They have been put in the docs/release-notes folder to closely match the ml.net directory structure. * Add correct version to the release notes title. * Re-enable the SsaForecaster tests. * Update to the latest version of ml.net. Update the NimbusML version. * Fix issues with the summary unit tests. * Comment out the SymSgdBinaryClassifier summary test. It does not appear to be working on linux. * Revert change b5eb937 to see if it (#168) fixes the signed build issue. * Bring back build.cmd commit. It did not fix the signed build issue. (#169) * Revert change b5eb937 to see if it fixes the signed build issue. * Bring back commit b5eb937. It did not fixed the signed build issue. * Bring back the build.cmd change from b5eb937. (#170) It did not fix the signed build issue. * Use restored dotnet CLI for signing (#171) * Update README.md * Enable LinearSvmBinaryClassifier (#180) * Enable LinearSvmBinaryClassifier, add examples, add test, and update docs * Add test for predict_proba() and decision_function() * Setup destructors for data passed to python (#184) * pass destructor to python * indent * Add azureml-dataprep support for dataflow objects (#181) * draft code * draft * delete * add dprep dependency * rollback * rollback * rollback * test & example on using DprepDataStream * add dprep path * add dprep path * fix mlnetpath * optional dependency on dprep * run dprep tests optionally * fix typo * Up sdk version * fix linux dprep tests * up version (#188) * Save the model file when pickling a NimbusML Pipeline. (#189) * Save the model file when pickling a NimbusML Pipeline. * Add version to the pickled Pipeline. * Add the steps attribute to a pickled Pipeline instance. * Add extra unit test for pickled nimbusml pipelines. * Add export_version to pickled base_pipeline_items. Remove unnecessary export_version attribute from an unpickled Pipeline. * Remove stored references to X and y in BasePredictor. (#195) * Remove stored references to X and y in BasePredictor. * Remove unnecessary scikit-learn import. * Add observation level feature contributions to Pipeline and BasePredictor (#196) * Add get_feature_contributions to Pipeline and BasePredictor, add example * Add tests * Update release-next.md * Add classes_ to Pipeline and/or predictor when calling predict_proba. (#200) * Add classes_ to Pipeline and/or predictor when calling predict_proba. * Update test_estimator_checks.py to skip the check_dict_unchanged test for any estimator which supports predict_proba or decision_function. * Update Handler, Filter, and Indicator to automatically convert the input columns to float before performing the transform. (#204) Fixes #203. * Combine models from transforms, predictors and pipelines in to one model. (#208) * Initial test implementation of combining 2 or more models in to one. * Added support to Pipeline.combine_models for combining other types of items and transform only inputs. * Combine Pipeline._evaluation_infer and _evaluation in to one method. This fixes an issue where a classifier graph would not contain the correct nodes after calling Pipeline._predict(). * Missing part of previous check-in. * Fix the Pipeline.combine_models signature to work with Python 2.7. * Fix build (#209) * T * Fix cert * Update release-next.md. (#211) * Update release-next.md * Update release-next.md * Update release-next.md * Add classifier and FileDataStream unit tests to test_pipeline_combining. (#212) Add classifier and FileDataStream unit tests to test_pipeline_combining. * Update release-next.md * up version (#210) * up version * Up the version * renamed factorization lib * remove matrix factorization lib ref * dbg libs * fix libtensorflow framework * package more libs * add mkl proxy * Enable EnsembleClassifier and EnsembleRegressor (#207) * Enable EnsembleClassifier * nit * Enable EnsembleRegressor * Add output combiners * Add sub model selectors * Update examples * Add documentation for components * Add diversity measure * Improve examples * Add tests * Fix test_estimator_checks * Create release notes for version 1.3.0. (#214) * Update release-1.3.0.md * Add --installPythonPackages flag to build scripts (#215) * Add --installPythonPackages flag to build scripts * close if statement in build.sh * fix --runTestsOnly * Fix a bug with the classes_ attribute when no y input is specified during fitting. (#218) Fixes #216 * Add NumSharp.Core.dll (#220) * Add timeseries documentation to the master branch. (#221) * Docs update (#224) * Fix documentation * Few more * More doc fixes (#228) * More doc fixes * A few nits * Pass python path to Dprep (#232) * remove Dprep* dll from wheel (#235) * remove Dprep* dll from wheel * Move Dprep calls into separate class * test * remove DprepLoader * clean unused code (#236) * clean unused code * fix tests changes due to seed changes * remove max_slots from graph * delete Dprep dlls from python2.7 * fix linux extended tests for TensorFlow * fix tests * fix tests * rollback * fix tests * disable estimator check * fix tests * fix tests again * fix tabbing removing -r from rm command * remove experimental * Enable scoring of ML.NET models saved with new TransformerChain format (#230) * Handle new ML.NET model format for predictions * fix * use with{} statement with ZipFile * Add initial implementation of DatasetTransformer. (#240) * Update release-next for the 1.4 release. (#252) * Update release-next.md * Upgrade to ML.NET 1.4 (#251) * Upgrade to ML.NET 1.4 * preview bits * update refs * Fix casing for the installPythonPackages build.sh argument. (#256) * Rename lambda_ to l2_regularization in LinearSvmBinaryClasifier (#259) * Initial implementation of csr_matrix output support. (#250) * Initial implementation of csr_matrix output support. * Whitespace change to kick off another build. The CentOs test run crashed. * Rename as per comment * Initial implementation of LpNormalizer. (#253) * Initial implementation of LpNormalizer. * Rename to LpScaler * fix build * fix casing * up version (#262) * Remove scikit-learn testing module from normal flow (#265) * remove scikit learn testing module from normal flow * fix build * fix build * Fix issue when using predict_proba or decision_function with combined models. (#272) * Output predictor model file optionally (#270) * Output predictor model file optionally * fix comment * fix unit tests * Draft of ColumnConcat transform that takes in a prefix instead * fix test * fix test * PrefixColumnConcat transform * fix entrypoint namespace * fix exception * Handle no match scenario * add exampl & test * add test * fix comments * fix comments * fix example * Providing error message to python in exception (#273) * spit out error message to python upgrade patch version * fix the test * another test * rollback * Add I8 support to CSR matrix output. (#276) * Get column names for transform model (#278) * draft for schema * resolve conflict * debug pieces * Few perf tricks * rollback prints * few perf tricks * perf tricks * fix csr * set 0 byte * Update schema example. * Convert return value to list. * Update schema example to use new list return value. * Fix naming in Pipeline.get_schema. * Add initial unit tests for Pipeline.get_schema(). * Check length in Pipeline.get_schema unit tests. * few perf tricks * fix linux tests * rollback * Temporarily use 'inclusive' test instead of positional test for columns since order is not valid in Python 2.6 and 3.5. * fix comments * Add variable length vector support (#267) * Update Schema.py to remove the non-ASCII character (#291) * Fix Pipeline._extract_classes_from_headers was not checking for valid steps. (#292) * Save predictor_model when pickling a pipeline. (#295) * Initial implementation of the WordTokenizer transform. (#296) * Remove summary validation in Pipeline and enable the summary tests for the tree based predictors. (#298) * Turn on dprep unit tests for all platforms and python versions except 2.7 (#303) * Fix bug in Pipeline.transform() (#294) * Remove unnecessary code from Pipeline.transform that was causing a bug * Update release-next.md * Remove y argument from transform() method * Update release-next.md * Fix test * Fixed building of NimbusML with Python 3.5 on Windows (and other versions of Python) (#297) * Update Schema.py to remove the non-ASCII character * Update build.cmd * Update build.cmd * Update build.cmd * Revert "Update build.cmd" This reverts commit cb79b9d. * Upgreate pip for all Python versions * Update release notes. (#306) * Added libtensorflow_framework.so.1 (#310) * Add Permutation Feature Importance (PFI) (#279) * Add PFI entrypoint * Add PFI to Pipeline and BasePipelineItem, and examples * Improved docs and sample * Load model as PredictorModel, and remove label column and group ID column from EP inputs * schema example reference * Add test * nit * Update release-next.md * Add tests to check PFI from loaded model * Make SgdBinaryClassifier deterministic in test_estimator_checks.py * Update ML.NET nugets to 1.4.0-preview2 and 0.16.0-preview2 * Fix test baseline values * Fix Ranking PFI column names to work with with Py2.7 and Py3.5 * Initial implementation of DateTime input and output column support. (#290) * Add support for DateTime output. * Add support for DateTime input columns. * Add unit test for DateTime column input and output. * Fix DateTime.Kind == Unspecified output from dprep. * Update the csproj files to point to the latest nuget packages. * Update the Tensorflow.NET library version. * Fix azureml dprep not available for Python 2.7 * Fix missing sys import. * Fix broken assertEqual on Python 3.5. * Fix BinaryDataStream not valid as input for transformer. (#307) * Add test for fitting a BinaryDataStream. * Use BinaryDataStream schema for retrieving feature columns in _init_graph_nodes. * Add idv schema to BinaryDataStream. * Fix DprepDataStream was passing in incorrect value to base class constructor. * Remove column position check from unit test since it is unreliable on Python 3.5 and 2.7. * Issue 300 (#311) * Temporarily change running Mac pipeline to Python 3.6 * Temporary addition to view state of "result" in MacOS with Python 3.6 * Added additional temporary Python builds on Mac * Added libtensorflow_framework.so.1 (#310) * Revert "Temporary addition to view state of "result" in MacOS with Python 3.6" This reverts commit d116dc8. * Updated test_data_with_missing.test_input_conversion_to_float() * Update test_data_with_missing.py * Revert "Added additional temporary Python builds on Mac" This reverts commit 1aa1526. * Revert "Temporarily change running Mac pipeline to Python 3.6" This reverts commit 4ec36fb. * allow csr_matrix as input to predict_proba() (#305) * draft * draft * rollback * new entrypoint * add assert * rollback * no print in test * up version * only Single type is allowed for Feature vector * fix comments, rename entrypoint * convert to single * fix type * add feature contribution test * rename pipeline.get_schema() to pipeline.gat_output_columns() * fix build * Update release notes. (#312) * Turn off shuffling for FactorizationMachineBinaryClassifier. (#316) * Fix imports * Fix few more conflicts and build * Fix one more import * Fix nimbusml.pyroj
Fixes #271
Remove unnecessary code from
Pipeline.transform()
that caused a bug described in the issue.