-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the sharpness limit in WDLRescale configurable, and fix the Elo --> Contempt calculation #1941
Conversation
While testing this PR, I found that there was an actual bug in the way the |
src/mcts/search.cc
Outdated
@@ -313,7 +313,7 @@ void Search::SendUciInfo() REQUIRES(nodes_mutex_) REQUIRES(counters_mutex_) { | |||
contempt_mode_ == ContemptMode::NONE | |||
? 0 | |||
: params_.GetWDLRescaleDiff() * params_.GetWDLEvalObjectivity(), | |||
sign, true); | |||
sign, true, params_.GetWDLMaxReasonableS()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any convention on the order of arguments, i.e. does it make more sense to put parameters first and flags last?
The last two commits added a conversion formula, translating regular Elo (as defined by the expected outcome following a logistic curve) which is also used when the alternative Contempt settings where It still is supposed to represent (relatively fast) rapid Elo, so to get classic Elo, add something between 40 and 70 Elo per time doubling. The conversion formula is an approximation to the model prediction for regular Elo from +1.00 openings, which itself is based on Stockfish level selfplay data to estimate the approximate draw rate resp. WDL sharpness, using official-stockfish/Stockfish#4341. |
…--> Contempt calculation (LeelaChessZero#1941) (cherry picked from commit 5d83073)
…--> Contempt calculation (LeelaChessZero#1941)
During implementing #1791 we used a preliminary version in TCEC Swiss 4 together with a net with a mixed training set, and found that the Contempt effect was sometimes exaggerated. To address this, a number of measures was taken:
Contempt
taken fromUCI_RatingAdv
via a hidden settingContemptMaxValue
s
forscale
in theWDLRescale
function, following the logistic distribution nomenclature) to a hardcoded value of1.4
(approx. twice the value of startpos)While this together fixed the problem for good, any 2 of the 4 measures combined would likely already have helped, and it turns out that the hardcoded limit of 1.4 is a bit too conservative, which is especially noticeable when using it for material odds like in https://lczero.org/blog/2023/11/play-with-knight-odds-against-lc0-on-lichess/. This PR allows increasing the limit, thus addressing the original comment on the hardcoded constant.