-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Elo estimates for terms in search. #2401
Conversation
This updates estimates from 1.5yr ago, and adds missing terms. All tests run at 10+0.1 (STC), 20000 games, error bars +- 3 Elo. http://tests.stockfishchess.org/tests/view/5dc58b620ebc5902562bbd47 http://tests.stockfishchess.org/tests/view/5dc58b240ebc5902562bbd3f http://tests.stockfishchess.org/tests/view/5dc58b810ebc5902562bbd4b http://tests.stockfishchess.org/tests/view/5dc58b170ebc5902562bbd3d http://tests.stockfishchess.org/tests/view/5dc58c0c0ebc5902562bbd59 http://tests.stockfishchess.org/tests/view/5dc58e800ebc5902562bbd83 http://tests.stockfishchess.org/tests/view/5dc58c560ebc5902562bbd63 http://tests.stockfishchess.org/tests/view/5dc58b320ebc5902562bbd41 http://tests.stockfishchess.org/tests/view/5dc58c490ebc5902562bbd61 http://tests.stockfishchess.org/tests/view/5dc58b700ebc5902562bbd49 http://tests.stockfishchess.org/tests/view/5dc58e450ebc5902562bbd7e http://tests.stockfishchess.org/tests/view/5dc58af40ebc5902562bbd38 http://tests.stockfishchess.org/tests/view/5dc58b030ebc5902562bbd3b http://tests.stockfishchess.org/tests/view/5dc58beb0ebc5902562bbd55 http://tests.stockfishchess.org/tests/view/5dc58b8e0ebc5902562bbd4d http://tests.stockfishchess.org/tests/view/5dc58c190ebc5902562bbd5b http://tests.stockfishchess.org/tests/view/5dc58ac30ebc5902562bbd34 http://tests.stockfishchess.org/tests/view/5dc58c290ebc5902562bbd5d http://tests.stockfishchess.org/tests/view/5dc58c380ebc5902562bbd5f http://tests.stockfishchess.org/tests/view/5dc58bfa0ebc5902562bbd57 http://tests.stockfishchess.org/tests/view/5dc58b4f0ebc5902562bbd45 http://tests.stockfishchess.org/tests/view/5dc58ae30ebc5902562bbd36 Noteworthy changes are step 7 (futility pruning) going from ~30 to ~49 Elo and step 14 (pruning at shallow depth) going from ~170 to ~204 Elo. No functional change.
Would it make sense to run some of those measurements at LTC ? This would highlight the depth sensitive areas of search. |
Interesting, but I would suspect that for the terms with small Elo impact we would need much more accurate estimates to do such a test. Is there any of the large Elo terms that you would expect to be TC sensitive? Generally, I'm a bit critical of TC sensitivity. For me relevant numbers are: Counter question... would you (or any eval expert) be interested in doing something similar in Eval? |
So, I've added 2 LTC measurement to the queue futility and singular extension. Both have large contributions, and futility is only at low depth (<7) while se is high depth (>=6) ... let's see. |
@vondele I have created something similar for eval terms, you can find it here: But maybe it's now a bit outdated, it might need some updates and refreshment |
Such tests might help discover a simplification or two. To start with. we could run at least a rough estimate of threats(), passed(), space(). initiative() Interesting would also to see the impact of using A third set of tests would be to disable the respective piece eval in piece() or pawn eval contribution, or psqt or mobility. A fourth set of tests would disable each individual bonus. Another area of research would be to test each individual bonus with 50% value and with 150% value. It is quite possible that despite all the tuning, some bonus are stuck at some local maxima, Looking back at Fauzi results, it seems that One bonus which we still have is MinorBehindPawn. Removing it "as is" will not work, but it might be removed if we adjust some mg psqt values and a few other bonus cleverly. |
Here is a more direct link to Fauzi's work, which is about 1 year old, if someone knows how to replace the dead link Stockfish Feature's Estimated Elo worth (1).xlsx A few bonus have been introduced or were modified since then. |
I've updated the link on the wiki (just click 'Edit' on the top of the page). If you find the time, please submit the Eval tests, I think that would be useful. |
Have someone tried to remove these 2? |
I'd like an ELO estimate for the has_game_cycle() check in search.cpp |
Thanks for running the tests! My suggestion would be to keep the same pattern for Elo estimates in the code as in current master, using a scale of ~2, ~5, ~10, ~15, ~20, ~30, ~40, ~50, etc. instead of writing last digit accuracy which we don't have. This to avoid people running Elo experiments every two weeks to see if the last digits have changed... |
@snicolet, let's keep the result as obtained from the tests. Rounding numbers needlessly increases the error. Not rerunning these tests often should be just a policy (and hasn't been a problem so far). |
@Rocky640 made the suggestion to look at TC dependence of these terms. I picked two large terms, so with small relative error. It turns out it is actually quite interesting. Contrary to my expectation, early futility pruning is pretty TC sensitive, while singular extension is not. Going back to the old measurement of futility pruning (30Elo vs today 49 Elo), the code is actually identical. It seems like a nice example of how connected terms in search really are, i.e. the value of early futility pruning increased significantly due to changes elsewhere in search. |
Could you do a measurement for the multicut part of singular extension search ? |
Code in futility pruning is identical, but futility margin itself is vastly different. |
This updates estimates from 1.5 year ago, and adds missing terms. All estimates from tests run on fishtest at 10+0.1 (STC), 20000 games, error bars +- 3 Elo, see the original message in the pull request for the full list of tests. Noteworthy changes are step 7 (futility pruning) going from ~30 to ~50 Elo and step 13 (pruning at shallow depth) going from ~170 to ~200 Elo. Full list of tests: #2401 @Rocky640 made the suggestion to look at time control dependence of these terms. I picked two large terms (early futility pruning and singular extension), so with small relative error. It turns out it is actually quite interesting (see figure 1). Contrary to my expectation, the Elo gain for early futility pruning is pretty time control sensitive, while singular extension gain is not. Figure 1: TC dependence of two search terms ![elo_search_tc]( http://cassio.free.fr/divers/elo_search_tc.png ) Going back to the old measurement of futility pruning (30 Elo vs today 50 Elo), the code is actually identical but the margins have changed. It seems like a nice example of how connected terms in search really are, i.e. the value of early futility pruning increased significantly due to changes elsewhere in search. No functional change.
Merged via 114ddb7, thanks :-) |
This updates estimates from 1.5 year ago, and adds missing terms. All estimates from tests run on fishtest at 10+0.1 (STC), 20000 games, error bars +- 3 Elo, see the original message in the pull request for the full list of tests. Noteworthy changes are step 7 (futility pruning) going from ~30 to ~50 Elo and step 13 (pruning at shallow depth) going from ~170 to ~200 Elo. Full list of tests: official-stockfish#2401 @Rocky640 made the suggestion to look at time control dependence of these terms. I picked two large terms (early futility pruning and singular extension), so with small relative error. It turns out it is actually quite interesting (see figure 1). Contrary to my expectation, the Elo gain for early futility pruning is pretty time control sensitive, while singular extension gain is not. Figure 1: TC dependence of two search terms ![elo_search_tc]( http://cassio.free.fr/divers/elo_search_tc.png ) Going back to the old measurement of futility pruning (30 Elo vs today 50 Elo), the code is actually identical but the margins have changed. It seems like a nice example of how connected terms in search really are, i.e. the value of early futility pruning increased significantly due to changes elsewhere in search. No functional change. Rewrite initialization of PseudoMoves This is a non-functional code style change. I believe master is a bit convoluted here and propose this version for clarity. No functional change
This updates estimates from 2yr ago official-stockfish#2401, and adds missing terms. All tests run at 10+0.1 (STC), 20000 games, error bars +- 1.8 Elo, book 8moves_v3.png. A table of Elo values with the links to the corresponding tests can be found at the PR closes official-stockfish#3868 Non-functional Change
This updates estimates from 1.5yr ago, and adds missing terms.
All tests run at 10+0.1 (STC), 20000 games, error bars +- 3 Elo.
http://tests.stockfishchess.org/tests/view/5dc58b620ebc5902562bbd47
http://tests.stockfishchess.org/tests/view/5dc58b240ebc5902562bbd3f
http://tests.stockfishchess.org/tests/view/5dc58b810ebc5902562bbd4b
http://tests.stockfishchess.org/tests/view/5dc58b170ebc5902562bbd3d
http://tests.stockfishchess.org/tests/view/5dc58c0c0ebc5902562bbd59
http://tests.stockfishchess.org/tests/view/5dc58e800ebc5902562bbd83
http://tests.stockfishchess.org/tests/view/5dc58c560ebc5902562bbd63
http://tests.stockfishchess.org/tests/view/5dc58b320ebc5902562bbd41
http://tests.stockfishchess.org/tests/view/5dc58c490ebc5902562bbd61
http://tests.stockfishchess.org/tests/view/5dc58b700ebc5902562bbd49
http://tests.stockfishchess.org/tests/view/5dc58e450ebc5902562bbd7e
http://tests.stockfishchess.org/tests/view/5dc58af40ebc5902562bbd38
http://tests.stockfishchess.org/tests/view/5dc58b030ebc5902562bbd3b
http://tests.stockfishchess.org/tests/view/5dc58beb0ebc5902562bbd55
http://tests.stockfishchess.org/tests/view/5dc58b8e0ebc5902562bbd4d
http://tests.stockfishchess.org/tests/view/5dc58c190ebc5902562bbd5b
http://tests.stockfishchess.org/tests/view/5dc58ac30ebc5902562bbd34
http://tests.stockfishchess.org/tests/view/5dc58c290ebc5902562bbd5d
http://tests.stockfishchess.org/tests/view/5dc58c380ebc5902562bbd5f
http://tests.stockfishchess.org/tests/view/5dc58bfa0ebc5902562bbd57
http://tests.stockfishchess.org/tests/view/5dc58b4f0ebc5902562bbd45
http://tests.stockfishchess.org/tests/view/5dc58ae30ebc5902562bbd36
Noteworthy changes are step 7 (futility pruning) going from ~30 to ~49 Elo and step 14 (pruning at shallow depth) going from ~170 to ~204 Elo.
@Rocky640 made the suggestion to look at time control dependence of these terms.
I picked two large terms (early futility pruning and singular extension), so with
small relative error. It turns out it is actually quite interesting (see figure 1).
Contrary to my expectation, the Elo gain for early futility pruning is pretty time
control sensitive, while singular extension gain is not.
Figure 1:
Going back to the old measurement of futility pruning (30 Elo vs today 50 Elo),
the code is actually identical but the margins have changed. It seems like a nice
example of how connected terms in search really are, i.e. the value of early futility
pruning increased significantly due to changes elsewhere in search.
No functional change.