Performance comparison report with regard to processing categorical features? #8495

yananchen1989 · 2022-11-29T22:36:50Z

Hi, as we know, when processing categorical features, the traditional method is one-hot encoding, https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

as from 1.6 and 1.7 on, xgboost now supports directly split in categorical space.

I would like to know is there any performance comparison with regard to these different lines of processing ? for example on some public kaggle dataset ?
Do they perform the same level ?

Thanks.

trivialfis · 2022-11-30T05:57:20Z

We have some initial benchmarks, but are still working on parameter tuning for finalizing the default set of parameters. Will publish them once ready. The issue is tracked at #7899 .

trivialfis closed this as completed Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance comparison report with regard to processing categorical features? #8495

Performance comparison report with regard to processing categorical features? #8495

yananchen1989 commented Nov 29, 2022

trivialfis commented Nov 30, 2022

Performance comparison report with regard to processing categorical features? #8495

Performance comparison report with regard to processing categorical features? #8495

Comments

yananchen1989 commented Nov 29, 2022

trivialfis commented Nov 30, 2022