[XGBoost4J-Spark] Early stopping and best iteration #6893

candalfigomoro · 2021-04-22T11:06:57Z

This has been asked before (e.g #3140 (comment)) but no answer was ever given.

In XGBoost4J-Spark we can use early stopping by using setNumEarlyStoppingRounds.

When I call transform(), does it use by default the best iteration (the best number of trees) or the best iteration + num_early_stopping_rounds?
If it uses the best iteration + num_early_stopping_rounds, how can I extract the value of the best iteration so I can set treeLimit to the best iteration?

Thanks

The text was updated successfully, but these errors were encountered:

trivialfis · 2021-04-22T19:25:47Z

@wbo4958 probably has some insight.

candalfigomoro · 2021-04-26T08:45:31Z

@CodingCat

candalfigomoro · 2021-04-30T09:25:22Z

@hcho3

wbo4958 · 2021-04-30T11:39:13Z

@candalfigomoro According to the code https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j/src/main/java/ml/dmlc/xgboost4j/java/XGBoost.java#L253. Looks like "it uses the best iteration + num_early_stopping_rounds". And I have no idea how to get the values of the best iteration, Seems we need to support this.

candalfigomoro · 2021-04-30T12:40:41Z

@candalfigomoro According to the code https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j/src/main/java/ml/dmlc/xgboost4j/java/XGBoost.java#L253. Looks like "it uses the best iteration + num_early_stopping_rounds". And I have no idea how to get the values of the best iteration, Seems we need to support this.

@wbo4958
Thank you for your reply.
I think we need to expose bestScore and bestIteration attributes to be consistent with the python package (see https://xgboost.readthedocs.io/en/latest/python/python_intro.html#early-stopping) and also because I think it's a pretty important feature.

trivialfis · 2021-04-30T13:56:49Z

@wbo4958 If you are interested in the feature, you can use the SetAttr function in XGBoost to store these attributes inside the model. Also, we support model slicing to slice up trees for retuning the best model. Feel free to ping me if you have any questions.

wbo4958 · 2021-04-30T21:46:38Z

Ok, will add this feature.

naveenkb · 2021-06-10T04:34:40Z

Hi @wbo4958 - I too wanted to use this feature in spark. Just wanted to know if you were able to work on it ? If not I can give it a try.

wbo4958 · 2021-06-11T02:46:27Z

Hi @wbo4958 - I too wanted to use this feature in spark. Just wanted to know if you were able to work on it ? If not I can give it a try.

Sry, @naveenkb, I am busy with other things recently, pls help to add it . Thx very much

naveenkb · 2021-06-11T06:20:06Z

XGBoostClassificationModel object has a method called getVersion(). Not much info in the documentation. Based on the experimentation I did, booster.getVersion() / 2 always returns the latest iteration even with early stopping. So ( (booster.getVersion() / 2) - earlyStoppingRound ) gives the bestIteration. Can anyone confirm this or if there are any cases when this won't work ?

@trivialfis or @wbo4958 or @CodingCat ?

candalfigomoro · 2021-06-11T16:38:10Z

@naveenkb
Suppose that you set max iterations=100, num_early_stopping_rounds=10 and the best iteration is iteration 95. If you take the number of iterations - num_early_stopping_rounds you get iteration 90 instead of iteration 95. So it doesn't work when num_early_stopping_rounds > max iterations - best iteration. The clean solution would be to expose bestScore and bestIteration.

naveenkb · 2021-06-17T09:51:12Z

@wbo4958 If you are interested in the feature, you can use the SetAttr function in XGBoost to store these attributes inside the model. Also, we support model slicing to slice up trees for retuning the best model. Feel free to ping me if you have any questions.

I have added bestIteration using SetAttr function. Regarding model slicing, I wanted to confirm that it is not implemented in Java yet right ? Please let me know if I am missing something

candalfigomoro · 2021-06-17T12:58:27Z

@wbo4958 If you are interested in the feature, you can use the SetAttr function in XGBoost to store these attributes inside the model. Also, we support model slicing to slice up trees for retuning the best model. Feel free to ping me if you have any questions.

I have added bestIteration using SetAttr function. Regarding model slicing, I wanted to confirm that it is not implemented in Java yet right ? Please let me know if I am missing something

There's a treeLimit parameter (see https://xgboost.readthedocs.io/en/latest/jvm/scaladocs/xgboost4j-spark/ml/dmlc/xgboost4j/scala/spark/XGBoostClassificationModel.html#setTreeLimit(value:Int):XGBoostClassificationModel.this.type), but I've never tried it.

trivialfis · 2021-06-17T17:25:44Z

We are in the process of replacing that parameter with more robust iteration_range. Python and R have already made the transition, and JVM is the next.

candalfigomoro · 2021-07-08T10:56:38Z

@naveenkb
Are you going to submit a Pull Request to expose bestIteration and bestScore?

naveenkb · 2021-07-08T11:30:09Z

@candalfigomoro Sure. Sorry for the delay. I will raise a PR in few days.

jon-targaryen1995 · 2021-07-13T09:44:18Z

Hello,

I was going through the parameters of the XGBoost 4J spark mentioned in

https://xgboost.readthedocs.io/en/latest/jvm/scaladocs/xgboost4j-spark/ml/dmlc/xgboost4j/scala/spark/XGBoostClassificationModel.html#setTreeLimit(value:Int):XGBoostClassificationModel.this.type

The definition of numEarlyStoppingRounds: is as follows:

If non-zero, the training will be stopped after a specified number of consecutive increases in any evaluation metric.

But shouldn't it be "the training will be stopped after a specified number of consecutive non-increase (same or decrease) in any evaluation metric"

Is there any parameter through which I can set a threshold for early stopping rounds? If the evaluation metric doesn't improve by at-least the threshold within early stopping rounds, the training stops.

Thanks,
Akshay

candalfigomoro · 2021-07-13T15:16:53Z

But shouldn't it be "the training will be stopped after a specified number of consecutive non-increase (same or decrease) in any evaluation metric"

This is tricky because some metrics need to be minimized (e.g. MSE) while other metrics need to be maximized (e.g. accuracy). See also the setMaximizeEvaluationMetrics() method.

jon-targaryen1995 · 2021-07-27T16:18:52Z

@candalfigomoro @naveenkb

How do you expose the bestIteration and bestScore attained during training?
Is it implemented in the package?

naveenkb · 2021-08-01T07:12:13Z

How do you expose the bestIteration and bestScore attained during training?
Is it implemented in the package?

val xgbClassificationModel = xgbClassifier.fit(train)

val bestScore = xgbClassificationModel.nativeBooster.getAttr("bestScore")
val bestIteration = xgbClassificationModel.nativeBooster.getAttr("bestIteration")

trivialfis · 2021-08-02T20:13:43Z

TODO: Follow up with documents.

Shadyelgewily · 2021-08-17T08:25:39Z

This feature would very much be appreciated for XGBoost4J (non-spark) library as well. We have a situation where the evaluation function does not necessarily decrease as the loss decreases. In fact, in our situation the evaluation function can increase when the loss decreases too far. This is deliberate: we use a quantile loss function and a custom evaluation metric to ensure that the loss function does not decrease to zero (if the loss is zero, the predictions are no longer quantiles).

The current implementation means that the model that is returned after early stopping rounds is far from optimal for many of our models, while a good performance was reached at earlier iterations.

trivialfis added the feature-request label May 13, 2021

naveenkb mentioned this issue Jul 8, 2021

[XGBoost4J-Spark] bestIteration and bestScore for early stopping #7095

Merged

trivialfis mentioned this issue Aug 2, 2021

[Roadmap] 1.5.0 Roadmap #6846

Closed

5 tasks

trivialfis added the Blocking label Sep 22, 2021

trivialfis mentioned this issue Sep 23, 2021

[jvm-packages] Create demo and test for xgboost4j early stopping. #7252

Merged

trivialfis closed this as completed in #7252 Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XGBoost4J-Spark] Early stopping and best iteration #6893

[XGBoost4J-Spark] Early stopping and best iteration #6893

candalfigomoro commented Apr 22, 2021

trivialfis commented Apr 22, 2021

candalfigomoro commented Apr 26, 2021

candalfigomoro commented Apr 30, 2021

wbo4958 commented Apr 30, 2021

candalfigomoro commented Apr 30, 2021 •

edited

Loading

trivialfis commented Apr 30, 2021

wbo4958 commented Apr 30, 2021

naveenkb commented Jun 10, 2021

wbo4958 commented Jun 11, 2021

naveenkb commented Jun 11, 2021

candalfigomoro commented Jun 11, 2021

naveenkb commented Jun 17, 2021

candalfigomoro commented Jun 17, 2021

trivialfis commented Jun 17, 2021 •

edited

Loading

candalfigomoro commented Jul 8, 2021

naveenkb commented Jul 8, 2021

jon-targaryen1995 commented Jul 13, 2021 •

edited

Loading

candalfigomoro commented Jul 13, 2021

jon-targaryen1995 commented Jul 27, 2021

naveenkb commented Aug 1, 2021 •

edited

Loading

trivialfis commented Aug 2, 2021

Shadyelgewily commented Aug 17, 2021 •

edited

Loading

[XGBoost4J-Spark] Early stopping and best iteration #6893

[XGBoost4J-Spark] Early stopping and best iteration #6893

Comments

candalfigomoro commented Apr 22, 2021

trivialfis commented Apr 22, 2021

candalfigomoro commented Apr 26, 2021

candalfigomoro commented Apr 30, 2021

wbo4958 commented Apr 30, 2021

candalfigomoro commented Apr 30, 2021 • edited Loading

trivialfis commented Apr 30, 2021

wbo4958 commented Apr 30, 2021

naveenkb commented Jun 10, 2021

wbo4958 commented Jun 11, 2021

naveenkb commented Jun 11, 2021

candalfigomoro commented Jun 11, 2021

naveenkb commented Jun 17, 2021

candalfigomoro commented Jun 17, 2021

trivialfis commented Jun 17, 2021 • edited Loading

candalfigomoro commented Jul 8, 2021

naveenkb commented Jul 8, 2021

jon-targaryen1995 commented Jul 13, 2021 • edited Loading

candalfigomoro commented Jul 13, 2021

jon-targaryen1995 commented Jul 27, 2021

naveenkb commented Aug 1, 2021 • edited Loading

trivialfis commented Aug 2, 2021

Shadyelgewily commented Aug 17, 2021 • edited Loading

candalfigomoro commented Apr 30, 2021 •

edited

Loading

trivialfis commented Jun 17, 2021 •

edited

Loading

jon-targaryen1995 commented Jul 13, 2021 •

edited

Loading

naveenkb commented Aug 1, 2021 •

edited

Loading

Shadyelgewily commented Aug 17, 2021 •

edited

Loading