-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Winning evaluation with tablebases in cursed win #5175
Comments
(Apparently present since sf6, according to discord) |
I translated the code to pencil & paper math, and AFAICS it all checks out. It is simply getting the wrong result from the DTZ probe. Why that is.... IDK. |
I can test later but there are some alternative dtz tablebases called "nr" which seemed to fixed the problem for crafty and pere. It makes more sense that the issue was not in Stockfish in the first place. |
Using dtz_nr tablebases
Using dtz tablebases
|
Mh.. is there some information on how those were generated/fixed/edited? Maybe niklasf knows something? |
As I understand it, dtz tables don't assume a 50 move rule, but dtr does. So neither is wrong, just different assumptions. https://chess.stackexchange.com/questions/28520/what-do-dtm-dtz-dtc-dtr-dtz50-and-dtzr-mean |
@niklasf said on Discord: |
So my understanding of the situation is that sf does not report correct tb results in the affected positions when the user provides the widely available standard 3-6men syzygy files. Only if the tb files were generated with this patch does sf work correctly. Is that correct? If that is so, should we warn the user on TB load if they use outdated tb files (if that is possible, by e.g. probing the position from this issue) ? And what is the situation for the 7men files? |
To clarify: Stockfish works correctly in game play. That is, after the capture that crosses into tablebase territory, moves will be ranked correctly and Stockfish will achieve the best possible outcome. That's what the tablebases are designed for - they are not broken and they don't need to be fixed. The issue occurs only in analysis, when setting up arbitrary positions that do not arise from optimal play following a capture. In that case there are really 7 possible probe results:
Avoiding this, by patching the table generator, produces larger tables for no playing strength gain. Handling the ambiguous results will require more code for no playing strength gain. For the analysis board on Lichess, I did both: 6 piece tables with no rounding, and (because 7 piece tables are too much effort to regenerate) a user interface that correctly displays ambiguous results for 7 piece tables. For Stockfish, I am not sure what to do, if anything. |
It's very hard for me to consider them not broken, moves will be ranked correctly at this point but we stopped caring about optimal play or guiding SF to the correct Win long ago, as SF more or less is expected to find these wins mostly, for me I only care about the eval. |
I believe we should re-open this issue. (At least to highlight to end-users that it exists, and to remind ourselves.) At present stockfish returns incorrect analysis results for some FENs with nonzero half move counters when it is supplied with the default (and probably most widely used) 6men syzygy EGTBs, or with the only available format for 7men syzygy EGTBs. |
I think we can reopen, though niklasf probably answered this quite clearly. |
This is clearly false, as niklasf repeatedly stated this is specifically and exactly aimed at improving compression and reducing overall TB size. Syzygy was specifically designed to be the smallest TB, and this compression feature is a noticeable step towards this goal (to the tune of several percent, or so I've heard). I do concur with re-opening in the short run. Altho bestmove selection is unaffected, it is indeed deeply confusing for users to see a winning evaluation for a drawn position. The best idea I've seen is that we should adjust the evaluation of such ambiguous probes to be less than a proven win. Maybe 100 instead of 200 or something? |
It's impossible to imagine that design wise you need to mess this requirement up while you do everything right. we can produce two/three/four bugs such that the so called Syzygy efficiency is optimized more.
|
What is actually returned after probing ? I.e. a |
If we continue to play the game from the original FEN optimally, we reach |
For even more clarity @niklasf ... You are saying that Stockfish avoids this. This is because Stockfish will rank the root moves using this code, and refuse to play any root move with a rank worse than optimal?
The important part here is the Restated: If a repetition has been made, then Stockfish will only play moves with equal-DTZ in winning positions. If a repetition has not been made, then Stockfish will play any move which wins -before- the 100th, avoiding moves which zero on the 100th ply? I don't explicitly see how this guarantees protection from the stated issue in ALL cases. Can a case exist where all moves appear to have the same DTZ=99 or DTZ=100, and then despite best intentions from above, you end up in the same ambiguous situation? IE all moves fall into the "WIN or CURSED WIN" ambiguous bucket? Ref: 108f0da |
The intended effect is to give Stockfish some freedom, but reliable switch to nearly (i.e. 1 ply may be squandered due to rounding) DTZ-optimal play before it's too late. This works when we approach the threshold
Edit: With regard to Edit 2: I misread the implementation and it does not have the problem that I think I saw. |
Thank you for your response, Niklas. I'll take your word that the below code is intended for this purpose. Seems sensible -- and explains why your NR tables hijacked the line above to get the desired result. if (total_stats_w[DRAW_RULE] || total_stats_b[STAT_MATE - DRAW_RULE])
ply_accurate_w = 1; Not sure what this all will do to benefit Stockfish, since it seems Stockfish does it all correctly -- but this will fix some things in Ethereal and Torch, and inevitably many more engines in the openbench space as a result of the Ethereal changes. |
That works, but I think it would have to be min-max rather than a linear walk. Among moves with equal DTZ, there may be some that were precise and some that have been rounded. |
OK, I see, added complication. |
Currently (after official-stockfish#5407), SF has the property that any PV line with a decisive TB score contains the corresponding TB position, with a score that correctly identifies the depth at which TB are entered. The PV line that follows might not preserve the game outcome, but can easily be verified and extended based on TB information. This patch provides this functionality, simply extending the PV lines on output, this doesn't affect search. Indeed, if DTZ tables are available, search based PV lines that correspond to decisive TB scores are verified to preserve game outcome, truncating the line as needed. Subsequently, such PV lines are extended with a game outcome preserving line until mate, as a possible continuation. These lines are not optimal mating lines, but are similar to what a user could produce on a website like https://syzygy-tables.info/ clicking always the top ranked move, i.e. minimizing or maximizing DTZ (with a simple tie-breaker for moves that have identical DTZ), and are thus an just an illustration of how to game can be won. A similar approach is already in established in https://github.com/joergoster/Stockfish/tree/matefish2 This also contributes to addressing official-stockfish#5175 where SF can give an incorrect TB win/loss for positions in TB with a movecounter that doesn't reflect optimal play. While the full solution requires either TB generated differently, or a search when ranking rootmoves, current SF will eventually find a draw in these cases, in practice quite quickly, e.g. `1kq5/q2r4/5K2/8/8/8/8/7Q w - - 96 1` `8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106` As repeated DTZ probing could be slow a procedure (100ms+ on HDD, a few ms on SSD), the extension is only done as long as the time taken is less than half the `Move Overhead` parameter. For tournaments where these lines might be of interest to the user, a suitable `Move Overhead` might be needed (e.g. TCEC has 1000ms already). No functional change
Currently (after official-stockfish#5407), SF has the property that any PV line with a decisive TB score contains the corresponding TB position, with a score that correctly identifies the depth at which TB are entered. The PV line that follows might not preserve the game outcome, but can easily be verified and extended based on TB information. This patch provides this functionality, simply extending the PV lines on output, this doesn't affect search. Indeed, if DTZ tables are available, search based PV lines that correspond to decisive TB scores are verified to preserve game outcome, truncating the line as needed. Subsequently, such PV lines are extended with a game outcome preserving line until mate, as a possible continuation. These lines are not optimal mating lines, but are similar to what a user could produce on a website like https://syzygy-tables.info/ clicking always the top ranked move, i.e. minimizing or maximizing DTZ (with a simple tie-breaker for moves that have identical DTZ), and are thus an just an illustration of how to game can be won. A similar approach is already in established in https://github.com/joergoster/Stockfish/tree/matefish2 This also contributes to addressing official-stockfish#5175 where SF can give an incorrect TB win/loss for positions in TB with a movecounter that doesn't reflect optimal play. While the full solution requires either TB generated differently, or a search when ranking rootmoves, current SF will eventually find a draw in these cases, in practice quite quickly, e.g. `1kq5/q2r4/5K2/8/8/8/8/7Q w - - 96 1` `8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106` As repeated DTZ probing could be slow a procedure (100ms+ on HDD, a few ms on SSD), the extension is only done as long as the time taken is less than half the `Move Overhead` parameter. For tournaments where these lines might be of interest to the user, a suitable `Move Overhead` might be needed (e.g. TCEC has 1000ms already). No functional change
Currently (after official-stockfish#5407), SF has the property that any PV line with a decisive TB score contains the corresponding TB position, with a score that correctly identifies the depth at which TB are entered. The PV line that follows might not preserve the game outcome, but can easily be verified and extended based on TB information. This patch provides this functionality, simply extending the PV lines on output, this doesn't affect search. Indeed, if DTZ tables are available, search based PV lines that correspond to decisive TB scores are verified to preserve game outcome, truncating the line as needed. Subsequently, such PV lines are extended with a game outcome preserving line until mate, as a possible continuation. These lines are not optimal mating lines, but are similar to what a user could produce on a website like https://syzygy-tables.info/ clicking always the top ranked move, i.e. minimizing or maximizing DTZ (with a simple tie-breaker for moves that have identical DTZ), and are thus an just an illustration of how to game can be won. A similar approach is already in established in https://github.com/joergoster/Stockfish/tree/matefish2 This also contributes to addressing official-stockfish#5175 where SF can give an incorrect TB win/loss for positions in TB with a movecounter that doesn't reflect optimal play. While the full solution requires either TB generated differently, or a search when ranking rootmoves, current SF will eventually find a draw in these cases, in practice quite quickly, e.g. `1kq5/q2r4/5K2/8/8/8/8/7Q w - - 96 1` `8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106` As repeated DTZ probing could be slow a procedure (100ms+ on HDD, a few ms on SSD), the extension is only done as long as the time taken is less than half the `Move Overhead` parameter. For tournaments where these lines might be of interest to the user, a suitable `Move Overhead` might be needed (e.g. TCEC has 1000ms already). No functional change
Currently (after official-stockfish#5407), SF has the property that any PV line with a decisive TB score contains the corresponding TB position, with a score that correctly identifies the depth at which TB are entered. The PV line that follows might not preserve the game outcome, but can easily be verified and extended based on TB information. This patch provides this functionality, simply extending the PV lines on output, this doesn't affect search. Indeed, if DTZ tables are available, search based PV lines that correspond to decisive TB scores are verified to preserve game outcome, truncating the line as needed. Subsequently, such PV lines are extended with a game outcome preserving line until mate, as a possible continuation. These lines are not optimal mating lines, but are similar to what a user could produce on a website like https://syzygy-tables.info/ clicking always the top ranked move, i.e. minimizing or maximizing DTZ (with a simple tie-breaker for moves that have identical DTZ), and are thus an just an illustration of how to game can be won. A similar approach is already in established in https://github.com/joergoster/Stockfish/tree/matefish2 This also contributes to addressing official-stockfish#5175 where SF can give an incorrect TB win/loss for positions in TB with a movecounter that doesn't reflect optimal play. While the full solution requires either TB generated differently, or a search when ranking rootmoves, current SF will eventually find a draw in these cases, in practice quite quickly, e.g. `1kq5/q2r4/5K2/8/8/8/8/7Q w - - 96 1` `8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106` Gives the same results as master on an extended set of test positions from mcostalba/Stockfish@9173d29 with the exception of the above mentioned fen where this commit improves. With https://github.com/vondele/matetrack using 6men TB, all generated PVs verify: ``` Using ../Stockfish/src/stockfish.syzygyExtend on matetrack.epd with --nodes 1000000 --syzygyPath /chess/syzygy/3-4-5-6/WDL:/chess/syzygy/3-4-5-6/DTZ Engine ID: Stockfish dev-20240704-ff227954 Total FENs: 6555 Found mates: 3299 Best mates: 2582 Found TB wins: 568 ``` As repeated DTZ probing could be slow a procedure (100ms+ on HDD, a few ms on SSD), the extension is only done as long as the time taken is less than half the `Move Overhead` parameter. For tournaments where these lines might be of interest to the user, a suitable `Move Overhead` might be needed (e.g. TCEC has 1000ms already). closes official-stockfish#5414 No functional change
Currently (after official-stockfish#5407), SF has the property that any PV line with a decisive TB score contains the corresponding TB position, with a score that correctly identifies the depth at which TB are entered. The PV line that follows might not preserve the game outcome, but can easily be verified and extended based on TB information. This patch provides this functionality, simply extending the PV lines on output, this doesn't affect search. Indeed, if DTZ tables are available, search based PV lines that correspond to decisive TB scores are verified to preserve game outcome, truncating the line as needed. Subsequently, such PV lines are extended with a game outcome preserving line until mate, as a possible continuation. These lines are not optimal mating lines, but are similar to what a user could produce on a website like https://syzygy-tables.info/ clicking always the top ranked move, i.e. minimizing or maximizing DTZ (with a simple tie-breaker for moves that have identical DTZ), and are thus an just an illustration of how to game can be won. A similar approach is already in established in https://github.com/joergoster/Stockfish/tree/matefish2 This also contributes to addressing official-stockfish#5175 where SF can give an incorrect TB win/loss for positions in TB with a movecounter that doesn't reflect optimal play. While the full solution requires either TB generated differently, or a search when ranking rootmoves, current SF will eventually find a draw in these cases, in practice quite quickly, e.g. `1kq5/q2r4/5K2/8/8/8/8/7Q w - - 96 1` `8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106` Gives the same results as master on an extended set of test positions from mcostalba/Stockfish@9173d29 with the exception of the above mentioned fen where this commit improves. With https://github.com/vondele/matetrack using 6men TB, all generated PVs verify: ``` Using ../Stockfish/src/stockfish.syzygyExtend on matetrack.epd with --nodes 1000000 --syzygyPath /chess/syzygy/3-4-5-6/WDL:/chess/syzygy/3-4-5-6/DTZ Engine ID: Stockfish dev-20240704-ff227954 Total FENs: 6555 Found mates: 3299 Best mates: 2582 Found TB wins: 568 ``` As repeated DTZ probing could be slow a procedure (100ms+ on HDD, a few ms on SSD), the extension is only done as long as the time taken is less than half the `Move Overhead` parameter. For tournaments where these lines might be of interest to the user, a suitable `Move Overhead` might be needed (e.g. TCEC has 1000ms already). closes official-stockfish#5414 No functional change
Currently (after official-stockfish#5407), SF has the property that any PV line with a decisive TB score contains the corresponding TB position, with a score that correctly identifies the depth at which TB are entered. The PV line that follows might not preserve the game outcome, but can easily be verified and extended based on TB information. This patch provides this functionality, simply extending the PV lines on output, this doesn't affect search. Indeed, if DTZ tables are available, search based PV lines that correspond to decisive TB scores are verified to preserve game outcome, truncating the line as needed. Subsequently, such PV lines are extended with a game outcome preserving line until mate, as a possible continuation. These lines are not optimal mating lines, but are similar to what a user could produce on a website like https://syzygy-tables.info/ clicking always the top ranked move, i.e. minimizing or maximizing DTZ (with a simple tie-breaker for moves that have identical DTZ), and are thus an just an illustration of how to game can be won. A similar approach is already in established in https://github.com/joergoster/Stockfish/tree/matefish2 This also contributes to addressing official-stockfish#5175 where SF can give an incorrect TB win/loss for positions in TB with a movecounter that doesn't reflect optimal play. While the full solution requires either TB generated differently, or a search when ranking rootmoves, current SF will eventually find a draw in these cases, in practice quite quickly, e.g. `1kq5/q2r4/5K2/8/8/8/8/7Q w - - 96 1` `8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106` Gives the same results as master on an extended set of test positions from mcostalba/Stockfish@9173d29 with the exception of the above mentioned fen where this commit improves. With https://github.com/vondele/matetrack using 6men TB, all generated PVs verify: ``` Using ../Stockfish/src/stockfish.syzygyExtend on matetrack.epd with --nodes 1000000 --syzygyPath /chess/syzygy/3-4-5-6/WDL:/chess/syzygy/3-4-5-6/DTZ Engine ID: Stockfish dev-20240704-ff227954 Total FENs: 6555 Found mates: 3299 Best mates: 2582 Found TB wins: 568 ``` As repeated DTZ probing could be slow a procedure (100ms+ on HDD, a few ms on SSD), the extension is only done as long as the time taken is less than half the `Move Overhead` parameter. For tournaments where these lines might be of interest to the user, a suitable `Move Overhead` might be needed (e.g. TCEC has 1000ms already). closes official-stockfish#5414 No functional change
Currently (after official-stockfish#5407), SF has the property that any PV line with a decisive TB score contains the corresponding TB position, with a score that correctly identifies the depth at which TB are entered. The PV line that follows might not preserve the game outcome, but can easily be verified and extended based on TB information. This patch provides this functionality, simply extending the PV lines on output, this doesn't affect search. Indeed, if DTZ tables are available, search based PV lines that correspond to decisive TB scores are verified to preserve game outcome, truncating the line as needed. Subsequently, such PV lines are extended with a game outcome preserving line until mate, as a possible continuation. These lines are not optimal mating lines, but are similar to what a user could produce on a website like https://syzygy-tables.info/ clicking always the top ranked move, i.e. minimizing or maximizing DTZ (with a simple tie-breaker for moves that have identical DTZ), and are thus an just an illustration of how to game can be won. A similar approach is already in established in https://github.com/joergoster/Stockfish/tree/matefish2 This also contributes to addressing official-stockfish#5175 where SF can give an incorrect TB win/loss for positions in TB with a movecounter that doesn't reflect optimal play. While the full solution requires either TB generated differently, or a search when ranking rootmoves, current SF will eventually find a draw in these cases, in practice quite quickly, e.g. `1kq5/q2r4/5K2/8/8/8/8/7Q w - - 96 1` `8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106` Gives the same results as master on an extended set of test positions from mcostalba/Stockfish@9173d29 with the exception of the above mentioned fen where this commit improves. With https://github.com/vondele/matetrack using 6men TB, all generated PVs verify: ``` Using ../Stockfish/src/stockfish.syzygyExtend on matetrack.epd with --nodes 1000000 --syzygyPath /chess/syzygy/3-4-5-6/WDL:/chess/syzygy/3-4-5-6/DTZ Engine ID: Stockfish dev-20240704-ff227954 Total FENs: 6555 Found mates: 3299 Best mates: 2582 Found TB wins: 568 ``` As repeated DTZ probing could be slow a procedure (100ms+ on HDD, a few ms on SSD), the extension is only done as long as the time taken is less than half the `Move Overhead` parameter. For tournaments where these lines might be of interest to the user, a suitable `Move Overhead` might be needed (e.g. TCEC has 1000ms already). closes official-stockfish#5414 No functional change
Describe the issue
Winning evaluation in a cursed win even when using tablebases
https://lichess.org/analysis/standard/8/8/6k1/3B4/3K4/4N3/8/8_w_-_-_54_106
Expected behavior
An evaluation (ideally at depth 1?) of 0.00
Steps to reproduce
Operating system
All
Stockfish version
master
The text was updated successfully, but these errors were encountered: