Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance comparison report with regard to processing categorical features? #8495

Closed
yananchen1989 opened this issue Nov 29, 2022 · 1 comment

Comments

@yananchen1989
Copy link

Hi, as we know, when processing categorical features, the traditional method is one-hot encoding, https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

as from 1.6 and 1.7 on, xgboost now supports directly split in categorical space.

I would like to know is there any performance comparison with regard to these different lines of processing ? for example on some public kaggle dataset ?
Do they perform the same level ?

Thanks.

@trivialfis
Copy link
Member

We have some initial benchmarks, but are still working on parameter tuning for finalizing the default set of parameters. Will publish them once ready. The issue is tracked at #7899 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants