Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync with Branch 0.18 #21

Merged
merged 110 commits into from
Dec 28, 2020
Merged

Sync with Branch 0.18 #21

merged 110 commits into from
Dec 28, 2020

Conversation

daxiongshu
Copy link
Owner

No description provided.

venkywonka and others added 30 commits November 2, 2020 11:11
* splitting `cpp/src/metrics.cu` into seperately compiled files

* updated CHANGELOG.md

* file-naming cleanup from camelCase to under_score

* addings related changes from PR rapidsai#3072 that affected this PR
* Speeding up MNMG KNN Cl&Re testing

* Update changelog

* Testing with extreme values
Fixes rapidsai#3057

Co-authored-by: Corey J. Nolet <cjnolet@users.noreply.github.com>
* Use single random seed in kmeans tests

* Prune redundant kmeans parameterization tests

* Update changelog

* Add extra k-means|| test

Co-authored-by: Dante Gama Dessavre <dante.gamadessavre@gmail.com>
* Speed up test_lightgbm

* Speed up test_fil_regression

* Update changelog

* Test FIL predict() with binary classifier

* Add a TODO comment

* Explicitly indicate skipped tests in test_fil_skl_classification

* Test n_classes=25 with n_estimators=1

* Address reviewer's feedback

* Fix style
…e to underscore format (rapidsai#3065)

* splitting `cpp/src/metrics.cu` into seperately compiled files

* updated CHANGELOG.md

* file-naming cleanup from camelCase to under_score

* refactoring randIndex instances to rand_index

* refactored `silhouetteScore` instances to `silhouette_score`

* refactoring all `adjustedRandIndex` and `adjustedrandindex` to
`adjusted_rand_index`

* adjusted_rand_index more fixes

* refactored `klDivergence` instances to `kl_divergence`

* refactoring `mutualInfoScore` instances to `mutual_info_score`

* refactoring `homogeneityScore` instances to `homogeneity_score`

* refactoring `completenessScore` instances to `completeness_score`

* refactoring `vMeasure` instances to `v_measure`

* refactoring `pairwiseDistance` and related instances to `pairwise_distance`

* preserving camelcase in relevant places

* rand_index refactoring further nooks and corners

* updating CHANGELOG.md

* FIX clang-format fixes

* flake8 fix

* adding related changes from PR rapidsai#3072 that affected this PR

* resolving function name conflicts in the cython layer
  * adding a `cython_` prefix to cython headers wherever conflicted
  * updating appropriately in `__init__.py` files
* ENH speed test_array

* DOC Added entry to changelog
* Speedup umap MNMG tests by lowering data sizes and removing parameters to test

* Reomving accidental change

* Updating changelog

Co-authored-by: Dante Gama Dessavre <dante.gamadessavre@gmail.com>
…not fit with probability=True [skip-ci] (rapidsai#3114)

* Fixed typo in AttributeError (line 464)

with at the end of the second line, and probability at the beginning of the third line did not have a space between them.

* Update CHANGELOG.md
* FIX Fix memset args for benchmark

* DOC Update changelog
* Adding ability to build with --linetrace=1 to support cython codecov

* Adding PR to CHANGELOG

* Style cleanup

* Converting BUILD_PYTHON_ARGS to be a argument in build.sh
* Update README

* UPDATE changelog

* Apply suggestions from code review

Co-authored-by: Dante Gama Dessavre <dante.gamadessavre@gmail.com>

Co-authored-by: Nanthini Balasubramanian <nathanb@nvidia.com>
Co-authored-by: Dante Gama Dessavre <dante.gamadessavre@gmail.com>
* Return Python string from dump_as_json() of RF

* Add changelog
…3117)

* Patch and test for RF crash rapidsai#3107

* Cleanups of RF regression fixes

* Add failing tests to RF regression

* Expand experimental backend testing and align pointers

* Expand python RF regression test

* Updates based on review feedback

* Update changelog

* Add classification tests

* Review comments and style fixes for RF
* draft 1 of better test parameter specification

* refactor using variadic macros; move fil enums to own namespace

* changelog; fixed fil.pyx enum import

* simpler FIL_TEST_PARAMS macro, remove the ::enums:: changes

* leaner change

* renamed struct responsible for non default FIL test parameters

* style
…apidsai#2956)

* Change get_params and set_params to a property params

* Update deprecated docstring

* Update changelog

* Fix style

* Change ARIMA parameters into cuML arrays, write variant of llf to avoid unnecessary memory copies, rename setter/getter, override get_params and set_params with NotImplementedError

* Mark get_param_names as not implemented instead of get_params and set_params

* Cleanup PR, remove redundancy, more efficient pack/unpack

* Fix Python style

Co-authored-by: John Zedlewski <904524+JohnZed@users.noreply.github.com>
rapidsai#3134)

* Improving the deprecation message formatting in pydocs

* Adding PR to CHANGELOG
…ators [skip-ci] (rapidsai#3040)

* Adding additional checking for incorrect use cases. Added CumlArrayDescriptor

* Cleaning up more use cases

* Initial commit of CumlArrayDescriptor in PCA

* Incrementally updating CumlArray uses

* Adding some improvements to decorators to auto detect certain scenarios where a function returns CumlArray

* Adding internals.func_utils to test wrapping all functions and checking output types

* Commit before merging upstream

* Updating native_bayes

* Partial working state

* Updating KMeans

* Partial pass over all Base subclasses

* Mostly complete pass of removing to_output

* Completed cleanup of Base method removal

* Cleaning up more to_output uses. Fixing test errors

* Adding tartet_arg property and fixing tests that can use it

* More cleanup and test fixing

* Updating types derived from Base to properly use get_param_names and allow setting Base values in constructor

* Fixing import order. Adding support for sparse arrays

* Attempting to fix nearest neighbors

* Removing commented code

* Fixing failing tests

* Fixing more tests

* Adding PR to CHANGELOG and style fixes

* Fixing missing import

* Removing protocol interface for python 3.7

* Fixing ARIMA. Required including changes from PR#2956

* Fixing labelbinarizer and KNN failing tests

* Removing "invalid syntax" so flake8 can run

* Adding more wrappers to ARIMA so tests pass.

* Committing CI change to allow tests to run.

* Moving memory check to plugin

* Adding ability to load SPD environment variables to the logger

* Changing pytest import-mode to better support development

* Changing relative imports to absolute

* Adding first iteration of dev guide to see how it looks

* Improving the quick_run plugin

* Removing skip_* from cuml decorators

* Fixing cuml_decorators test.

* Removing the logger environment addition

* Updating non-Base methods to use decorators

* Large cleanup of remaining to_output, with_cupy_rmm and input_to_dev_ptr

* Style cleanup

* Apply John's suggestions from code review on Dev Guide

Co-authored-by: John Zedlewski <904524+JohnZed@users.noreply.github.com>

* Large update to Estimator Guide incorporating feedback from JohnZ

* Removing array tracking and putting in plugin

* Removing PR Description file

* Removing ArrayOutputable

* Removing test plugins

* Cleaning up code to remove unnecessary diffs

* Style cleanup

* Defaulting to cp array instead of np, per feedback

* Adding additional tests

* Separating func_tools into separate files

* Removing extra changes to conftest.py which should not have been committed.

* Renaming base.py back to base.pyx

* Apply suggestions from code review

Co-authored-by: Dante Gama Dessavre <dante.gamadessavre@gmail.com>

* Incorporating feedback from Dante's code review

* Removing straggling TODO

* Applying Dante's Revisions to ESTIMATOR_GUIDE

Co-authored-by: Dante Gama Dessavre <dante.gamadessavre@gmail.com>

* Updateing ESTIMATOR_GUIDE from feedback from Dante

* Cleaning up straggling to_output

* Another iteration on code review feedback

* Style cleanup

* More small items from code review

* One final change to ESTIMATOR_GUIDE

* Updaing all *_mg.pyx files to use the new naming conventions and CumlArrayDescriptor

Co-authored-by: John Zedlewski <904524+JohnZed@users.noreply.github.com>
Co-authored-by: Dante Gama Dessavre <dante.gamadessavre@gmail.com>
* Update all DistanceType references

* Style fix

* Update changelog
…rapidsai#3069)

* Maintain dataframe output for single-series frames

* Add unit test for single-series input type check

* Update changelog

* Add test for Series to DataFrame preprocessing

* Handle output from preprocessors increasing dims

* Allow norms to be returned as Series
* Fix Stochastic Gradient Descent Example

The example that is currently in the docs does not run. dtype, penalty, lrate, loss are not defined. This new version sets the default values for the parameters of cumlSGD, and copies Mini Batch SGD Regression's dtype for pred_data['col1'], pred_data['col2']. When running this example, I also got slightly different values for the output, so these were also updated.

* Added PR rapidsai#3136 to 0.17 Bug Fixes
`#include <cuml/manifold/umap.hpp>` works now.

Co-authored-by: Corey J. Nolet <cjnolet@users.noreply.github.com>
…3137)

* Moving conftest.py files around and adding quick_run plugin

* Adding PR to CHANGELOG

* Incorporating feedback from code review
* Initial cython test commit

* Update changelog

* Style fixes

Co-authored-by: Nanthini Balasubramanian <nathanb@nvidia.com>
Co-authored-by: Dante Gama Dessavre <dante.gamadessavre@gmail.com>
…precation warnings (rapidsai#3155)

* Get rid of warnings in random projections test

* Update changelog

* Fix style

* Update other deprecated make_blob imports
* FIX Force local install by specifying exact build string

* DOC Update changelog

* Update ci/gpu/build.sh

Co-authored-by: AJ Schmidt <ajschmidt8@users.noreply.github.com>

Co-authored-by: AJ Schmidt <ajschmidt8@users.noreply.github.com>
* Update flake8 config to join python/cython configuration and improve setup to check __init__.py files

* Fixing linting issues in previously ignored __init__.py files

* Update flake8 config to join python/cython configuration and improve setup to check __init__.py files

* Fixing linting issues in previously ignored __init__.py files

* Adding PR to CHANGELOG

* Incorporating feedback from code review

* Fixing style issues after merge with branch-0.17

Co-authored-by: Corey J. Nolet <cjnolet@users.noreply.github.com>
Co-authored-by: Dante Gama Dessavre <dante.gamadessavre@gmail.com>
…kip-ci] (rapidsai#3144)

* Adding ability to set arbitrary cmake flags in ./build.sh via the $CUML_ADDL_CMAKE_ARGS variable

* Adding PR to CHANGELOG

* Adding more help info requested from code review.

Co-authored-by: John Zedlewski <904524+JohnZed@users.noreply.github.com>
GPUtester and others added 28 commits December 3, 2020 17:11
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
…'s RF(rapidsai#3245)

Rename rows_sample -> max_samples to be consistent with sklearn's RF.

From https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html:

> **max_samples**: int or float, default=None
> If bootstrap is True, the number of samples to draw from X to train each base estimator.
> If None (default), then draw X.shape[0] samples.
> If int, then draw max_samples samples.
> If float, then draw max_samples * X.shape[0] samples. Thus, max_samples should be in the interval (0, 1).
> New in version 0.22.

Authors:
  - Hyunsu Cho <chohyu01@cs.washington.edu>

Approvers:
  - John Zedlewski

URL: rapidsai#3245
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
…tical(rapidsai#3243)

Closes rapidsai#3231 
Closes rapidsai#3128
Partially addresses rapidsai#3188 

The degenerate case (labels all identical in a node) is now robustly handled, by computing the MSE metric separately for each of the three nodes (the parent node, the left child node, and the right child node). Doing so ensures that the gain is 0 for the degenerate case.

The degenerate case may occur in some real-world regression problems, e.g. house price data where the price label is rounded up to nearest 100k.

As a result, the MSE gain is computed very similarly as the MAE gain.

Disadvantage: now we always make two passes over data to compute the gain.

cc @teju85 @vinaydes @JohnZed

Authors:
  - Hyunsu Cho <chohyu01@cs.washington.edu>
  - Philip Hyunsu Cho <chohyu01@cs.washington.edu>

Approvers:
  - Thejaswi Rao
  - John Zedlewski

URL: rapidsai#3243
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Authors:
  - Corey J. Nolet <cjnolet@gmail.com>
  - Corey J. Nolet <cjnolet@users.noreply.github.com>

Approvers:
  - John Zedlewski

URL: rapidsai#3250
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
…pidsai#3258)

* Hide silhouette_score Python binding

Remove this feature due to memory issues in C++ implementation for
anything but modest numbers of samples

* Remove silhouette_score tests

* Update changelog

* Remove unused import

* Remove silhouette_score from new features list

* Add note on reason for hiding silhouette_score

* Update docstrings with silhouette_score warning

Also remove sillhouette_score from api.rst docs

* Update CHANGELOG to restore reference to reverted PR
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
Answers rapidsai#3232.
Explicitly specify `batch_size` as parameter to MNMG KNN models in order to make it visible in the documentation.

Authors:
  - viclafargue <viclafargue@nvidia.com>
  - Corey J. Nolet <cjnolet@gmail.com>

Approvers:
  - Corey J. Nolet
  - John Zedlewski

URL: rapidsai#3246
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
…rapidsai#3282)

* FIX Add secondary test to kernel explainer pytests for stability in Volta

* DOC Added entry to changelog

* FIX PR review feedback
[gpuCI] Auto-merge branch-0.17 to branch-0.18 [skip ci]
…sai#3279)

* Correct pure virtual declaration in manifold_inputs_t

* Update changelog
Remove keyword "stops" from call to cudf.core.column.string.slice, which no longer accepts arbitrary keywords.

cuDF change introduced in rapidsai/cudf#6750.

Authors:
  - William Hicks <whicks@nvidia.com>

Approvers:
  - John Zedlewski
  - Micka

URL: rapidsai#3289
Linear SVR has the coef_ attribute in the python layer. In the C++ unit test the same vector is denoted by _w_, and it is defined as a linear combination of the support vectors

![image](https://user-images.githubusercontent.com/3671106/101908077-ce3d9e80-3bbb-11eb-98ff-e7be90828dde.png)

The number of elements in _w_ is n_cols. One of the SVR tests only defined 1 expected value for _w_, instead of the expected n_cols=2 values, which lead to accessing an uninitialized value. This would fail the test unless the value is accidentally zero initialized. Surprisingly this happened extremely rarely.

This PR fixes the expected value _w_exp_.

Authors:
  - Tamas Bela Feher <tfeher@nvidia.com>

Approvers:
  - Dante Gama Dessavre

URL: rapidsai#3294
Closes rapidsai#1780

Adding kNN graph input functionality to t-SNE, a request broken off of the issue rapidsai#1733. t-SNE gathers kNN indices and distances in the first stage of it's computation, by allowing the user to input their own kNN graph, they can skip this step. This should follow rapidsai#1815 as closely as possible.

**Benefits of this**:
- allow user custom run of kNN algorithm
- can use different distance function instead of t-SNE euclidean default
- allows for speedup if performing grid search by storing and reusing kNN graph

**Includes:**
- [x] Abstracted `extract_knn_graph` so it can be used for both UMAP and t-SNE
- [x] Implemented kNN graph input to Python/Cython layer and C++/CUDA layer
- [x] C++/CUDA Barnes Hut and Exact t-SNE tests
- [x] Python t-SNE tests
- [x] General code cleanup wherever needed

Authors:
  - Aleksander Ficek <alex.ficek99@gmail.com>
  - Corey J. Nolet <cjnolet@gmail.com>
  - Ray Douglass <3107146+raydouglass@users.noreply.github.com>
  - Corey J. Nolet <cjnolet@users.noreply.github.com>

Approvers:
  - Corey J. Nolet

URL: rapidsai#2592
* FEA Consolidate linear model gemm based predicts on one function on C++

* FEA Consolidate linear model gemm based predicts on one function on Python

* DOC Added entry to changelog

* FIX PEP8 fixes

* FIX Forgot clang-format

* FIX Remove C++ sync calls and unnecessary delete on Python based on PR feedback

* DOC Remove changelog entry
…apidsai#3292)

* Refactoring: move internal FIL interface to a separate file.

- move the functions not related to treelite import, prediction
  or freeing the model to a separate file

* Fixed style errors.
This PR will enable the usage of multiple KNN strategies as alternatives to the current default bruteforce method. See rapidsai#574

Authors:
  - wxbn <wxbn@live.fr>
  - viclafargue <viclafargue@nvidia.com>
  - Corey J. Nolet <cjnolet@gmail.com>

Approvers:
  - Corey J. Nolet

URL: rapidsai#2780
…#3291)

This PR fixes CI fails that happen on `test_naive_bayes` when the machine can't download the 20 newsgroup dataset.

It closes rapidsai#3260

Authors:
  - Mickael Ide <ide.mickael@gmail.com>

Approvers:
  - John Zedlewski

URL: rapidsai#3291
* Adding NotFittedError to PCA

* Fixed typo in PCA import

* Fixed check_is_fitted call

* Fixed missing parenthesis

* Added test on svd_flip

* fix style ipca

* Fixed whitespace style

* Removed useless test
- only the node types without the `_t` suffix are now used
- removed the functions necessary to handle node types with the `_t` suffix
Ensure that the 100th quantile value returned by cupy.percentile is the maximum of the input array rather than (possibly) NaN due to cupy/cupy#4451. This eliminates an intermittent failure observed in tests of KBinsDiscretizer, which makes use of cupy.percentile. Note that this includes an alteration of the included sklearn code and should be reverted once the upstream cupy issue is resolved.

Resolve failure due to ValueError described in rapidsai#2933.

Authors:
  - William Hicks <whicks@nvidia.com>

Approvers:
  - Dante Gama Dessavre
  - Victor Lafargue

URL: rapidsai#3315
…#3275)

This PR aims at converting the confusion matrix to int when possible, to avoid the scientific notation when possible.

See this example:

![image](https://user-images.githubusercontent.com/9810050/101400035-9808d200-38d0-11eb-9f81-4d217a5ff202.png)

Authors:
  - Mickael Ide <ide.mickael@gmail.com>
  - Mickael Ide <mide@nvidia.com>

Approvers:
  - John Zedlewski

URL: rapidsai#3275
…rapidsai#3281)

Replace "constexpr static" member variables in DecisionTree unit test fixture with "const" member variables for compliance with C++14, which otherwise requires that const static data members be separately defined in a namespace scope if it is ODR-used (See sections 3.2 and 9.4.2 of the C++11 standard, which remain relevant until C++17).

Authors:
  - William Hicks <whicks@nvidia.com>

Approvers:
  - Dante Gama Dessavre

URL: rapidsai#3281
@daxiongshu daxiongshu merged commit 39a38b3 into fea_stratified_kfold Dec 28, 2020
@daxiongshu daxiongshu deleted the branch-0.18 branch July 11, 2021 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.