-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CCC unexpected underperformance #4175
Comments
To begin, Stockfish isn't necessarily underperforming. If you run 10K or so games with SF vs Lc0 for example, SF while winning majority of game pairs, may end up losing some. In CCC, the most likely thing that happened is that an unlucky game pair like the one mentioned occured. Or game pairs where SF couldn't get a win occured, whereas other engines were more lucky in the dice roll. It's why both CCC & TCEC are considered small sample size (SSS) tournaments (much like many other tournaments). One cannot conclude from their results. They do a really small number of games. |
I completely agree w you regarding small sample size and we will know better after more games. The problem is that we want to be sure as soon as possible because of DivP submission looming. Until we are sure it may be prudent to submit the version prior to the last two patches. |
I mean, I could run 5K LTC games between SF and Ethereal. I can say confidently that SF will be higher ELO. |
The latest SF is currently performing quite a bit worse than expected(behind both LC0 and Komodo) at https://www.chess.com/computer-chess-championship# including losing a game pair to Ethereal. Could one of the latest patches be a regression or perhaps scale very poorly?
@vondele What do you thing about scheduling a progress test? We need to select a version to submit for TCEC Premier very soon.
The text was updated successfully, but these errors were encountered: