set min_child_weight as a float instead of int #5976

QuantHao · 2020-08-04T06:32:06Z

Hi, all

Based on xgboost documentation, min_child_weight is treated as number of observation for rmse objective and is usually chosen as int such as https://xgboost.readthedocs.io/en/latest/python/python_api.html#xgboost.XGBRegressor.

However, I recently read a article saying that "With XGBoost, you can specify this parameter as a float as well. If you do, it will use a percentage of samples to determine leaf nods." (https://kevinvecmanis.io/machine%20learning/hyperparameter%20tuning/dataviz/python/2019/05/11/XGBoost-Tuning-Visual-Guide.html#child_weight). I cannot find a second source using min_child_weight as a float.

So I am confused and worried that is it true that we could set min_child_weight as a float?? Any comfirmation would be appreciated.

QuantHao · 2020-08-04T06:52:50Z

sklearn's min_samples_leaf supports float as a percentage of total sample size which makes sense to me. It would be great if xgboost supports similar functionality.

trivialfis · 2020-08-05T02:34:04Z

Sorry for the mistake in our doc. Also, the parameter you referred min_samples_leaf is indeed on our roadmap: #5444

QuantHao · 2020-08-05T15:33:46Z

Hi @trivialfis , thanks so much for your timely reply. May I double check with you that if min_child_weight is set as, for example, 0.01, it will be treated as n_sample * 0.01, meaning 1% of total sample?

Reference: https://kevinvecmanis.io/machine%20learning/hyperparameter%20tuning/dataviz/python/2019/05/11/XGBoost-Tuning-Visual-Guide.html#child_weight

trivialfis · 2020-08-05T15:46:57Z

No. It's the minimum value of accumulated Hessian for each leaf. Hessian is often used as a proxy for data in gradient boosting.

QuantHao · 2020-08-06T01:14:39Z

No. It's the minimum value of accumulated Hessian for each leaf. Hessian is often used as a proxy for data in gradient boosting.

I see. Thanks for answering.

trivialfis mentioned this issue Aug 5, 2020

Fix sklearn doc. #5980

Merged

trivialfis closed this as completed in #5980 Aug 5, 2020

QuantHao mentioned this issue Aug 6, 2020

Maping min_child_weight of xgboost with min_sum_hessian_in_leaf of LightGBM #5987

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

set min_child_weight as a float instead of int #5976

set min_child_weight as a float instead of int #5976

QuantHao commented Aug 4, 2020

QuantHao commented Aug 4, 2020

trivialfis commented Aug 5, 2020

QuantHao commented Aug 5, 2020

trivialfis commented Aug 5, 2020

QuantHao commented Aug 6, 2020

set min_child_weight as a float instead of int #5976

set min_child_weight as a float instead of int #5976

Comments

QuantHao commented Aug 4, 2020

QuantHao commented Aug 4, 2020

trivialfis commented Aug 5, 2020

QuantHao commented Aug 5, 2020

trivialfis commented Aug 5, 2020

QuantHao commented Aug 6, 2020