Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCC unexpected underperformance #4175

Closed
mstembera opened this issue Sep 20, 2022 · 3 comments
Closed

CCC unexpected underperformance #4175

mstembera opened this issue Sep 20, 2022 · 3 comments

Comments

@mstembera
Copy link
Contributor

mstembera commented Sep 20, 2022

The latest SF is currently performing quite a bit worse than expected(behind both LC0 and Komodo) at https://www.chess.com/computer-chess-championship# including losing a game pair to Ethereal. Could one of the latest patches be a regression or perhaps scale very poorly?
@vondele What do you thing about scheduling a progress test? We need to select a version to submit for TCEC Premier very soon.

@TheBlackPlague
Copy link

To begin, Stockfish isn't necessarily underperforming. If you run 10K or so games with SF vs Lc0 for example, SF while winning majority of game pairs, may end up losing some.

In CCC, the most likely thing that happened is that an unlucky game pair like the one mentioned occured. Or game pairs where SF couldn't get a win occured, whereas other engines were more lucky in the dice roll.

It's why both CCC & TCEC are considered small sample size (SSS) tournaments (much like many other tournaments). One cannot conclude from their results. They do a really small number of games.

@mstembera
Copy link
Contributor Author

mstembera commented Sep 20, 2022

I completely agree w you regarding small sample size and we will know better after more games. The problem is that we want to be sure as soon as possible because of DivP submission looming. Until we are sure it may be prudent to submit the version prior to the last two patches.

@TheBlackPlague
Copy link

I completely agree w you regarding small sample size and we will know better after more games. The problem is that we want to be sure as soon as possible because of DivP submission looming.

I mean, I could run 5K LTC games between SF and Ethereal. I can say confidently that SF will be higher ELO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants