Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change centipawn fallback to account for sharper WDL with high WDLCalibrationElo #2075

Merged
merged 2 commits into from
Oct 18, 2024

Conversation

Naphthalin
Copy link
Contributor

@Naphthalin Naphthalin commented Oct 11, 2024

The previously used centipawn formula #1193 is also used as fallback for WDL_mu introduced in #1791 mostly to reproduce the extreme centipawn values in clearly decisive positions users are accustomed to. The formula however was calibrated with the raw (unsharpened) WDL NN output, which means that together with WDL sharpening due to high WDLCalibrationElo it climbs too quickly, which also means it takes over faster than intended. WDL_mu without the fallback usually produces meaningful evals up to around +4.5, which is roughly where the fallback formula should take over. However, in the described scenario with WDLCalibrationElo: 3600 this already happens below +2, causing a discrepancy in behavior between Lc0 and SF eval roughly like this between Lc0 and SF in the range between +2 and +5 for SF where Leela regularly shows about double the eval of SF.
grafik

The fallback formula is therefore updated in two ways: 50% reduction in the scaling, while slightly increasing the constant to still meet +128 at wl=1.0.

With WDL sharpening at 3600 Elo (most commonly used value e.g. in TCEC both for playing and for kibitzing), the old centipawn calibration is off by about a factor 2 compared to Stockfish and generally takes over too quickly around +2.00 while it should only take over around +4.00 since up to there, `WDL_mu` behaves well enough.

With lower calibration Elo (e.g. for analysis of human games / openings), the takeover point is significantly later due to lower Q from broader WDL, so this change doesn't affect anything.

Doesn't yet fix the jumpy eval behavior in draws with very low W or L but substantial L resp. W remaining.
initial oversight: in a +1 position we want to display +128, that shouldn't change
@borg323 borg323 merged commit cab6395 into LeelaChessZero:master Oct 18, 2024
3 checks passed
borg323 pushed a commit that referenced this pull request Oct 20, 2024
…ibrationElo (#2075)

* half eval fallback formula 

With WDL sharpening at 3600 Elo (most commonly used value e.g. in TCEC both for playing and for kibitzing), the old centipawn calibration is off by about a factor 2 compared to Stockfish and generally takes over too quickly around +2.00 while it should only take over around +4.00 since up to there, `WDL_mu` behaves well enough.

With lower calibration Elo (e.g. for analysis of human games / openings), the takeover point is significantly later due to lower Q from broader WDL, so this change doesn't affect anything.

Doesn't yet fix the jumpy eval behavior in draws with very low W or L but substantial L resp. W remaining.

* changed factor to +128 convention

initial oversight: in a +1 position we want to display +128, that shouldn't change
uwuplant pushed a commit to uwuplant/lc0 that referenced this pull request Nov 29, 2024
…ibrationElo (LeelaChessZero#2075)

* half eval fallback formula 

With WDL sharpening at 3600 Elo (most commonly used value e.g. in TCEC both for playing and for kibitzing), the old centipawn calibration is off by about a factor 2 compared to Stockfish and generally takes over too quickly around +2.00 while it should only take over around +4.00 since up to there, `WDL_mu` behaves well enough.

With lower calibration Elo (e.g. for analysis of human games / openings), the takeover point is significantly later due to lower Q from broader WDL, so this change doesn't affect anything.

Doesn't yet fix the jumpy eval behavior in draws with very low W or L but substantial L resp. W remaining.

* changed factor to +128 convention

initial oversight: in a +1 position we want to display +128, that shouldn't change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants