-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Customizable early stopping tolerance #4982
Comments
Can you adjust |
early_stopping_rounds has no effect on the numerical tolerance, or am I wrong? |
No, I'm asking whether your aim is better met with |
Sorry, I don't understand how you are suggesting to use an adjustment of early stopping rounds to achieve a smaller numerical tolerance on convergence. Can you make an example? |
@kryptonite0 Actually, let us ask this question first: can you point me where the numerical tolerance is defined? I don't see it in the code: xgboost/python-package/xgboost/callback.py Lines 231 to 249 in 96cd7ec
|
@kryptonite0 Potential cause: #4665 (comment) |
I was hoping you could find that out for me, I couldn't see it either in the code. But, since the tolerance is implemented and configurable in sklearn.ensemble.GradientBoostingRegressor or in the H2OXGBoostEstimator of the h2o library, I assumed there was something similar in XGB as well. Apologies if I'm wrong, I'll have a better look at the source and let you know. From what I've seen in the verbose output, it looked like the tolerance was 0.001. |
Can you post a reproducible example? |
A reproducible example is a bit tricky here, as I cannot share the data I am using and I would need to find a dummy dataset with the same behavior. Maybe you can have a look at the output log of the regressor? Here I'm using {'objective' : 'reg:squarederror', 'n_estimators' : 10000, 'early_stopping_rounds' : 200} and a fixed random seed. The algo finds the best iteration at 6096 with validation_1-rmse = 128.807. But iteration 6128 has the same metric value (128.807). Then, if I use 1000 early_stopping_rounds, I can see the same number 128.807 appears at iteration 6128, but it reappears also at iteration 6310, which is less than 200 rounds later, and eventually finds a minimum of 128.800 at iteration 6609. Hence my suspect about the numerical tolerance. Below the log of the case with 200 stopping rounds: [6084] validation_0-mae:72.8091 validation_0-rmse:133.387 validation_1-mae:70.4796 validation_1-rmse:128.809 and the log with 1000 stopping rounds: [5728] validation_0-mae:73.1478 validation_0-rmse:133.798 validation_1-mae:70.5739 validation_1-rmse:128.862 |
Hi @trivialfis, |
@thifil Could you pls take a look into #6942 and see if that's what you want? |
Is this a valid callback option for XGBoost in R? |
Unfortunately not supported for R at the moment. |
I have a situation where the default numerical tolerance (0.001) for early stopping is too large. My target has a gamma distribution and the XGB Regressor reaches convergence too early, when the numerous low target values are well approximated by the model, but the few large values are still underestimated. When I deactivate early stopping, I can see the loss metric still improving at the 4th or more decimal digit, past the best iteration reached during early stopping. It would be great to be able to set manually the tolerance.
The text was updated successfully, but these errors were encountered: