-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Cannot plot trees with categorical splits #9925
Comments
@mayer79 Perhaps you would like to work on this issue? |
I hope we can remove the regex if possible. XGB can output graphviz dump. I can help add other formats if necessary. |
@trivialfis would be very helpful to add a format "table" which would output the same as python function |
Yes, I did a proof of concept before, but didn't submit a PR because at the time I was wondering how to export the data to arrow. I can make another attempt. |
I don't think arrow is necessary here - these tables are going to be rather small in most cases, so perhaps a plain JSON with one entry per column in the table would do. |
It's more future-proof, we already had feature requests for representing the model as a table, which means XGBoost needs to be able to save and load models as tables. Currently, the Another thing about Arrow is that the performance is just a bonus, I believe the goal is to have a protocol-like class that can be used for other projects. For example, the spark framework uses Arrow as the underlying representation of a table and uses it to transfer dataframe from Java processes to Python processes, presumably to R as well. As a result, if we are dealing with dataframe, exporting directly to Arrow might be the most efficient and useful way to do it. |
This is fixed in the latest by #10989 . Will look into dataframe export separately. |
@trivialfis Looks like the option library(xgboost)
set.seed(123)
y <- rnorm(100)
x <- sample(3, size=100*3, replace=TRUE) |> matrix(nrow=100)
x <- x - 1
dm <- xgb.DMatrix(data=x, label=y)
setinfo(dm, "feature_type", c("c", "c", "c"))
model <- xgb.train(
data=dm,
params=list(
tree_method="hist",
max_depth=3
),
nrounds=2
)
xgb.plot.tree(model=model, with_stats=TRUE) Also |
@david-cortes I just ran your script and saved the result to html, the categories and the leaf node hessian (cover) looks correct: gr <- xgb.plot.tree(model=model, with_stats=TRUE)
htmlwidgets::saveWidget(gr, 'plot.html') Stat for categorical internal split is not available for tree dump yet, I need to add it for all types of tree dump (json, text, graphviz). |
Thank you for sharing. Yes, we can throw an error. Also need to check |
ref #9810
Currently, attempting to plot trees that have categorical splits in R will result in an error:
This is due to the regexes used to parse the dumps not having been updated for the format used in categorical splits:
xgboost/R-package/R/xgb.model.dt.tree.R
Line 123 in a197899
The text was updated successfully, but these errors were encountered: