Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Random Forest Cython API and related refactoring #3089

Open
vinaydes opened this issue Oct 30, 2020 · 4 comments
Open

[BUG] Random Forest Cython API and related refactoring #3089

vinaydes opened this issue Oct 30, 2020 · 4 comments
Labels
bug Something isn't working inactive-30d inactive-90d

Comments

@vinaydes
Copy link
Contributor

Describe the bug
We should consider following refactorings in the RF Cython and related C++ code.

  1. Break RF parameters in to two categories:
    • Strictly RF parameters: Parameters that are consistent with scikit-learn RF parameters. Ex max_features, max_depth etc.
    • Implementation/tuning parameters: Parameters that tweak implementation, ex. split_algo, max_batch_size.
      A user should be required to only specify RF parameters. The implementation should try to choose best possible values for the tuning parameters for given problem. However if advanced user want to try different things by tweaking implementation parameters, there would still be a way.
      Typical code with this change look like
...
clf =  cuml.ensemble.RandomForestClassifier(... RF parameters ...)
# Optional
clf.set_impl_params(... Implmenetation parameters ...)
clf.fit(X, y)
...
  1. Too many APIs for setting RF and decision tree parameters:

    • set_rf_params
    • set_all_rf_params
    • set_rf_class_obj
    • set_tree_params
      Ideally only two should be enough one for RF and other for decision tree.
  2. The DecisionTreeParams struct in randomforest_shared.pxd link is incomplete and unused. May be it can be removed?

  3. The _params_names list is incomplete link. Is it possible to add a automatic change that would capture?

  4. Code duplication: Almost everything is duplicated between randomforest_classifier.pyx and randomforest_regressor.pyx

Anything more that might come up after we take a relook at Cython code for RF.

@vinaydes vinaydes added ? - Needs Triage Need team to review and classify bug Something isn't working labels Oct 30, 2020
@vinaydes
Copy link
Contributor Author

@venkywonka Would be taking look at the issues reported here.

@github-actions
Copy link

This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

rapids-bot bot pushed a commit that referenced this issue Mar 17, 2021
* This PR partially solves the issue raised [here](#3089 (comment)).
* Removes unused `DecisionTreeParams` struct in `randomforest_shared.pxd`.
* Unifies the different APIs (namely `set_rf_params`, `set_all_rf_params`, `set_rf_class_obj`) into a single point of parameter initialization (as `set_rf_params`) in the C++ layer; and propagating the changes.

Authors:
  - Venkat (@venkywonka)
  - John Zedlewski (@JohnZed)

Approvers:
  - Philip Hyunsu Cho (@hcho3)
  - John Zedlewski (@JohnZed)
  - Thejaswi. N. S (@teju85)

URL: #3358
@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@venkywonka
Copy link
Contributor

venkywonka commented Jun 15, 2021

Work items yet to be done:

  • complete the __params_names list
  • fuse the code duplication in randomforest_classifier.pyx and randomforest_regressor.pyx
  • separate API for cuml-only implementation params

rapids-bot bot pushed a commit that referenced this issue Jul 1, 2021
* Prunes RF/DT C++ layers by purging legacy code and wrapper classes
* Unifies Regression and Classification under a singular class for DecisionTree and RandomForest code-base
* some bug fixes
* effort to tackle issue #3999 and issue #3089 

---
EDIT: 
Tasks list:
- [x] Unify and eliminate code duplication in `DecisionTreeClassifier`, `DecisionTreeRegressor` and `DecisionTreeBase` 
- [x] Unify and eliminate code duplication in `rf` , `rfClassifier`, `rfRegressor`
- [x] file naming rearrangements (get rid of `*_impl.cuh` files )
- [x] Remove exposed Decision Tree C++ `fit`, `predict` API as it's currently not being used
- [x] Tune/clean up metric/timing calculation in RF and remove unused variables
- [x] cython layer refactorings for checks and warnings pertaining to keyword-arguments

Authors:
  - Venkat (https://github.com/venkywonka)
  - Rory Mitchell (https://github.com/RAMitchell)

Approvers:
  - Rory Mitchell (https://github.com/RAMitchell)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4005
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this issue Oct 9, 2023
* Prunes RF/DT C++ layers by purging legacy code and wrapper classes
* Unifies Regression and Classification under a singular class for DecisionTree and RandomForest code-base
* some bug fixes
* effort to tackle issue rapidsai#3999 and issue rapidsai#3089 

---
EDIT: 
Tasks list:
- [x] Unify and eliminate code duplication in `DecisionTreeClassifier`, `DecisionTreeRegressor` and `DecisionTreeBase` 
- [x] Unify and eliminate code duplication in `rf` , `rfClassifier`, `rfRegressor`
- [x] file naming rearrangements (get rid of `*_impl.cuh` files )
- [x] Remove exposed Decision Tree C++ `fit`, `predict` API as it's currently not being used
- [x] Tune/clean up metric/timing calculation in RF and remove unused variables
- [x] cython layer refactorings for checks and warnings pertaining to keyword-arguments

Authors:
  - Venkat (https://github.com/venkywonka)
  - Rory Mitchell (https://github.com/RAMitchell)

Approvers:
  - Rory Mitchell (https://github.com/RAMitchell)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4005
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working inactive-30d inactive-90d
Projects
None yet
Development

No branches or pull requests

3 participants