-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-15767][ML][SparkR] Decision Tree wrapper in SparkR #17981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #76926 has finished for PR 17981 at commit
|
|
Test build #76927 has finished for PR 17981 at commit
|
|
Jenkins, please retest this please |
|
Test build #76930 has finished for PR 17981 at commit
|
|
Test build #76938 has finished for PR 17981 at commit
|
|
@felixcheung I send this PR following your implementation of |
R/pkg/R/mllib_tree.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.3.0 please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and all other instances...
R/pkg/R/mllib_utils.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep this sorted
R/pkg/R/mllib_utils.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep this sorted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove double empty line
|
@felixcheung Updated. Thanks for your reviewing! |
|
Test build #76966 has finished for PR 17981 at commit
|
| #' @param cacheNodeIds If FALSE, the algorithm will pass trees to executors to match instances with | ||
| #' nodes. If TRUE, the algorithm will cache node IDs for each instance. Caching | ||
| #' can speed up training of deeper trees. Users can set how often should the | ||
| #' cache be checkpointed or disable it by setting checkpointInterval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is kind of confusing
Users can set how often should the cache be checkpointed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wording can be improved a bit I guess but this matches the Scaladoc...
felixcheung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| function(data, formula, type = c("regression", "classification"), | ||
| maxDepth = 5, maxBins = 32, impurity = NULL, seed = NULL, | ||
| minInstancesPerNode = 1, minInfoGain = 0.0, checkpointInterval = 10, | ||
| maxMemoryInMB = 256, cacheNodeIds = FALSE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider adding thresholds parameter - possibly as a follow up PR.
|
any more comment? |
|
merged to master. thanks! |
## What changes were proposed in this pull request? support decision tree in R ## How was this patch tested? added tests Author: Zheng RuiFeng <ruifengz@foxmail.com> Closes apache#17981 from zhengruifeng/dt_r.
What changes were proposed in this pull request?
support decision tree in R
How was this patch tested?
added tests