ENH more stable gradient of CrossEntropy #6327

lorentzenchr · 2024-02-18T14:53:48Z

Similar to scikit-learn/scikit-learn#28048.

There is a small runtime cost to pay, but gradient computation is not the main bottleneck of histogram based gradient boosting.

jameslamb · 2024-02-19T17:07:04Z

Thanks for this! I'll defer to @shiyu1994 and @guolinke to review.

Until then, can you please update this to the latest master?

shiyu1994

Thanks for the contribution. I just left a few comments about the correction of hessian computation.

shiyu1994 · 2024-02-20T06:24:15Z

src/objective/xentropy_objective.hpp

+        if (score[i] > -37.0) {
+          const double exp_tmp = std::exp(-score[i]);
+          gradients[i] = static_cast<score_t>(((1.0f - label_[i]) - label_[i] * exp_tmp) / (1.0f + exp_tmp));
+          hessians[i] = static_cast<score_t>(exp_tmp / (1 + exp_tmp) * (1 + exp_tmp));


The / followed by a * simply returns exp_tmp, which is not the expected hessian.

Suggested change

hessians[i] = static_cast<score_t>(exp_tmp / (1 + exp_tmp) * (1 + exp_tmp));

hessians[i] = static_cast<score_t>(exp_tmp / ((1 + exp_tmp) * (1 + exp_tmp)));

Is it possible that hessians[i] = static_cast<score_t>(exp_tmp / (1 + exp_tmp) / (1 + exp_tmp)); could be more numerically stable?

First for all, yes I forgot the parenthesis. Thanks for spotting it. It is surprising that still all the tests pass (with this bug).

Then (exp_tmp / (1 + exp_tmp) / (1 + exp_tmp)) is more numerical stable in the sense that it could prevent overflow. But exp_tmp > exp(37) = 1e16 and squaring that is within even single precision (3e38), and note that exp_tmp is even double precision.

Thanks. Could you also fix the hessian calculation in the else branch?

shiyu1994 · 2024-02-20T06:26:48Z

src/objective/xentropy_objective.hpp

+        } else {
+          const double exp_tmp = std::exp(score[i]);
+          gradients[i] = static_cast<score_t>(exp_tmp - label_[i]);
+          hessians[i] = static_cast<score_t>(exp_tmp);


Suggested change

hessians[i] = static_cast<score_t>(exp_tmp);

hessians[i] = static_cast<score_t>(exp_tmp / ((1 + exp_tmp) * (1 + exp_tmp)));

This is not needed as exp_tmp < 1e-16 is tiny and (1 + exp_tmp) is just 1. Otherwise stated, the implemented formula is the 1st order Taylor series in exp_tmp.

I see. That makes sense.

But maybe it would still be better to write the original calculation formula explicitly to avoid ambiguity?

What do you mean with "ambiguity"?
It would not avoid the branch and is a tiny bit more efficient.

src/objective/xentropy_objective.hpp

…r/LightGBM into stable_cross_entropy

ENH more stable gradient of CrossEntropy

f1b4da8

lorentzenchr requested review from guolinke, jameslamb, shiyu1994, jmoralez and borchero as code owners February 18, 2024 14:53

lorentzenchr added 2 commits February 18, 2024 16:08

FIX missing }

d92ed54

FIX index score

b8b27ec

jameslamb added awaiting review breaking labels Feb 18, 2024

Merge branch 'master' into stable_cross_entropy

d9615f9

shiyu1994 reviewed Feb 20, 2024

View reviewed changes

shiyu1994 and others added 4 commits February 20, 2024 14:27

Merge branch 'master' into stable_cross_entropy

f2d955a

FIX missing parenthesis in hessian

85eca6c

Merge branch 'stable_cross_entropy' of https://github.com/lorentzench…

79489fa

…r/LightGBM into stable_cross_entropy

Merge branch 'master' into stable_cross_entropy

96a06be

shiyu1994 approved these changes Feb 20, 2024

View reviewed changes

Merge branch 'master' into stable_cross_entropy

64a7474

shiyu1994 merged commit 894066d into microsoft:master Feb 22, 2024
43 checks passed

lorentzenchr deleted the stable_cross_entropy branch February 24, 2024 12:15

jameslamb removed the awaiting review label Apr 11, 2024

jameslamb mentioned this pull request May 1, 2024

release v4.4.0 #6439

Merged

33 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH more stable gradient of CrossEntropy #6327

ENH more stable gradient of CrossEntropy #6327

lorentzenchr commented Feb 18, 2024 •

edited

Loading

jameslamb commented Feb 19, 2024

shiyu1994 left a comment

shiyu1994 Feb 20, 2024

shiyu1994 Feb 20, 2024

lorentzenchr Feb 20, 2024

shiyu1994 Feb 20, 2024

shiyu1994 Feb 20, 2024

lorentzenchr Feb 20, 2024

shiyu1994 Feb 20, 2024

shiyu1994 Feb 20, 2024

lorentzenchr Feb 20, 2024 •

edited

Loading

	hessians[i] = static_cast<score_t>(exp_tmp / (1 + exp_tmp) * (1 + exp_tmp));
	hessians[i] = static_cast<score_t>(exp_tmp / ((1 + exp_tmp) * (1 + exp_tmp)));

ENH more stable gradient of CrossEntropy #6327

ENH more stable gradient of CrossEntropy #6327

Conversation

lorentzenchr commented Feb 18, 2024 • edited Loading

jameslamb commented Feb 19, 2024

shiyu1994 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorentzenchr Feb 20, 2024 • edited Loading

Choose a reason for hiding this comment

lorentzenchr commented Feb 18, 2024 •

edited

Loading

lorentzenchr Feb 20, 2024 •

edited

Loading