Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XGBoost4J-Spark] bestIteration and bestScore for early stopping #7095

Merged
merged 3 commits into from
Jul 19, 2021

Conversation

naveenkb
Copy link
Contributor

@naveenkb naveenkb commented Jul 8, 2021

As per discussions in #6893, adding bestIteration and bestScore attributes.

sample code snippet to access the attributes

.
.
.
val xgbClassificationModel = xgbClassifier.fit(train)

val bestScore = xgbClassificationModel.nativeBooster.getAttr("bestScore")
val bestIteration = xgbClassificationModel.nativeBooster.getAttr("bestIteration")

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I'm not familiar with the jvm package, just providing suggestions. Also is there a way we can have an unittest?

cc @wbo4958

@@ -242,11 +242,15 @@ public static Booster trainAndSaveCheckpoint(
if (score > bestScore) {
bestScore = score;
bestIteration = iter;
booster.setAttr("bestIteration", String.valueOf(bestIteration));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to use best_iteration to align with other language bindings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the best iteration and best score are decided by all other language bindings? @trivialfis

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we'd better to keep the same naming for all language bindings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trivialfis - Using camel case is the norm in Java so I went ahead with bestIteration and bestScore. This is in line with other naming convention used in Booster.java & XGBoost.java code files. Let me know if you still would like to change it to best_iteration.

Copy link
Member

@trivialfis trivialfis Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the best iteration and best score are decided by all other language bindings? @trivialfis

Yes.

@trivialfis - Using camel case is the norm in Java so I went ahead with bestIteration and bestScore

For java attributes that make sense. But for the string stored inside XGBoost, please use snake case.

booster.setAttr("best_iteration", String.valueOf(bestIteration));

Notice the difference between string and java symbol. This way the model is more portable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Updated with snake case

Copy link
Contributor

@wbo4958 wbo4958 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi naveenkb, Could we add a unit test for this PR and Better update the doc for this, @trivialfis is there any doc to describe the best iteration/score things?

@@ -643,7 +643,7 @@ public void testSetAndGetAttrs() throws XGBoostError {
}});

Map<String, String> attr = booster.getAttrs();
TestCase.assertEquals(attr.size(), 4);
TestCase.assertEquals(attr.size(), 6);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some comments why the attr.size() is equal to 6?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wbo4958 maybe because 2 attributes got added?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know the extra 2 attrs, but others may be so confused. Better to add some comments here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wbo4958 Yes as attr.size() is 6 because 2 additional attributes are added. The test case is checking setter and getter method and also total number of attributes in the booster object.
I will update the doc to explain best iteration/score.

Any suggestion on adding unit test for these attribute ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking if the booster is containing best_iteration and best_score?

@@ -242,11 +242,15 @@ public static Booster trainAndSaveCheckpoint(
if (score > bestScore) {
bestScore = score;
bestIteration = iter;
booster.setAttr("bestIteration", String.valueOf(bestIteration));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the best iteration and best score are decided by all other language bindings? @trivialfis

@trivialfis
Copy link
Member

trivialfis commented Jul 14, 2021

Please ignore the failure for now. Something on our CI updated we are still trying to figure out what's broken.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, please rebase onto the master branch to get the CI passed. Thanks for the feature!

@codecov-commenter
Copy link

codecov-commenter commented Jul 17, 2021

Codecov Report

Merging #7095 (e07af5d) into master (d7c1449) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #7095   +/-   ##
=======================================
  Coverage   81.60%   81.60%           
=======================================
  Files          13       13           
  Lines        3903     3903           
=======================================
  Hits         3185     3185           
  Misses        718      718           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7c1449...e07af5d. Read the comment docs.

@trivialfis trivialfis merged commit 9f7f8b9 into dmlc:master Jul 19, 2021
@naveenkb naveenkb deleted the bestIteration branch July 19, 2021 12:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants