-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update R docs based on deprecated parameters/behaviour #9437
Conversation
Thank you for the nice work, I will review the changes next week. Need some time to go through the document. |
|
||
```{r, results='hide'} | ||
data(Arthritis) | ||
df <- data.table(Arthritis, keep.rownames = FALSE) | ||
``` | ||
|
||
> `data.table` is 100% compliant with **R** `data.frame` but its syntax is more consistent and its performance for large dataset is [best in class](https://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly) (`dplyr` from **R** and `Pandas` from **Python** [included](https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A-Grouping)). Some parts of **XGBoost** **R** package use `data.table`. | ||
> `data.table` is 100% compliant with **R** `data.frame` but its syntax is more consistent and its performance for large dataset is [best in class](http://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-one-do-something-well-the-other-cant-or-does-poorly) (`dplyr` from **R** and `Pandas` from **Python** [included](https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A-Grouping)). Some parts of **XGBoost's** **R** package use `data.table`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove the TLS address?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think that was a typo. I reverted that change back to https
|
||
`Cover` measures the relative quantity of observations concerned by a feature. | ||
`Cover` measures the relative percent of observations concerned by a feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, it's the hessian, which serves as a proxy, not sure how to phrase it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the text, let me know if that looks good now!
|
||
`Frequency` is a simpler way to measure the `Gain`. It just counts the number of times a feature is used in all generated trees. You should not use it (unless you know why you want to use it). | ||
|
||
#### Improvement in the interpretability of feature importance data.table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we can revise this part instead of removing it entirely? We still have the properties representing global feature importance, including weight
, gain
, cover
, total_gain
, and total_cover
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I removed this section is because it seems to discuss importance information that is now deprecated. The data
and label
args are now deprecated here, so the code that was here before can no longer compute RealCover
or RealCover %
. As far as I can tell, it seems like total_gain
and total_cover
are already discussed above under the name Gain
and Cover
(based on this) so I'm not sure what else would be useful to talk about.
Alternatively, I could file an issue to support those metrics again, or I could write some R code in this vignette to compute those values if that'd be useful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, let's remove it.
doc/R-package/discoverYourData_files/figure-markdown_strict/unnamed-chunk-12-1.png
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for updating the document!
This PR is meant to resolve one of the issues on the Roadmap features issue; in particular it does this task "Update the R tutorial "Discover your data" to reflect the deletion of RealCover".
Close #5293 .