-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Random Forest Cython API and related refactoring #3089
Comments
@venkywonka Would be taking look at the issues reported here. |
This issue has been marked rotten due to no recent activity in the past 90d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. |
* This PR partially solves the issue raised [here](#3089 (comment)). * Removes unused `DecisionTreeParams` struct in `randomforest_shared.pxd`. * Unifies the different APIs (namely `set_rf_params`, `set_all_rf_params`, `set_rf_class_obj`) into a single point of parameter initialization (as `set_rf_params`) in the C++ layer; and propagating the changes. Authors: - Venkat (@venkywonka) - John Zedlewski (@JohnZed) Approvers: - Philip Hyunsu Cho (@hcho3) - John Zedlewski (@JohnZed) - Thejaswi. N. S (@teju85) URL: #3358
This issue has been labeled |
Work items yet to be done:
|
* Prunes RF/DT C++ layers by purging legacy code and wrapper classes * Unifies Regression and Classification under a singular class for DecisionTree and RandomForest code-base * some bug fixes * effort to tackle issue #3999 and issue #3089 --- EDIT: Tasks list: - [x] Unify and eliminate code duplication in `DecisionTreeClassifier`, `DecisionTreeRegressor` and `DecisionTreeBase` - [x] Unify and eliminate code duplication in `rf` , `rfClassifier`, `rfRegressor` - [x] file naming rearrangements (get rid of `*_impl.cuh` files ) - [x] Remove exposed Decision Tree C++ `fit`, `predict` API as it's currently not being used - [x] Tune/clean up metric/timing calculation in RF and remove unused variables - [x] cython layer refactorings for checks and warnings pertaining to keyword-arguments Authors: - Venkat (https://github.com/venkywonka) - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Rory Mitchell (https://github.com/RAMitchell) - Dante Gama Dessavre (https://github.com/dantegd) URL: #4005
* Prunes RF/DT C++ layers by purging legacy code and wrapper classes * Unifies Regression and Classification under a singular class for DecisionTree and RandomForest code-base * some bug fixes * effort to tackle issue rapidsai#3999 and issue rapidsai#3089 --- EDIT: Tasks list: - [x] Unify and eliminate code duplication in `DecisionTreeClassifier`, `DecisionTreeRegressor` and `DecisionTreeBase` - [x] Unify and eliminate code duplication in `rf` , `rfClassifier`, `rfRegressor` - [x] file naming rearrangements (get rid of `*_impl.cuh` files ) - [x] Remove exposed Decision Tree C++ `fit`, `predict` API as it's currently not being used - [x] Tune/clean up metric/timing calculation in RF and remove unused variables - [x] cython layer refactorings for checks and warnings pertaining to keyword-arguments Authors: - Venkat (https://github.com/venkywonka) - Rory Mitchell (https://github.com/RAMitchell) Approvers: - Rory Mitchell (https://github.com/RAMitchell) - Dante Gama Dessavre (https://github.com/dantegd) URL: rapidsai#4005
Describe the bug
We should consider following refactorings in the RF Cython and related C++ code.
max_features
,max_depth
etc.split_algo
,max_batch_size
.A user should be required to only specify RF parameters. The implementation should try to choose best possible values for the tuning parameters for given problem. However if advanced user want to try different things by tweaking implementation parameters, there would still be a way.
Typical code with this change look like
Too many APIs for setting RF and decision tree parameters:
set_rf_params
set_all_rf_params
set_rf_class_obj
set_tree_params
Ideally only two should be enough one for RF and other for decision tree.
The
DecisionTreeParams
struct inrandomforest_shared.pxd
link is incomplete and unused. May be it can be removed?The
_params_names
list is incomplete link. Is it possible to add a automatic change that would capture?Code duplication: Almost everything is duplicated between
randomforest_classifier.pyx
andrandomforest_regressor.pyx
Anything more that might come up after we take a relook at Cython code for RF.
The text was updated successfully, but these errors were encountered: