Winning evaluation with tablebases in cursed win #5175

dav1312 · 2024-04-14T20:06:33Z

Describe the issue

Winning evaluation in a cursed win even when using tablebases

https://lichess.org/analysis/standard/8/8/6k1/3B4/3K4/4N3/8/8_w_-_-_54_106

Expected behavior

An evaluation (ideally at depth 1?) of 0.00

Steps to reproduce

Stockfish dev-20240413-c55ae376 by the Stockfish developers (see AUTHORS file)
setoption name SyzygyPath value tb345
info string Found 145 tablebases
position fen 8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106
go infinite
info string NNUE evaluation using nn-ae6a388e4a1a.nnue
info string NNUE evaluation using nn-baff1ede1f90.nnue
info depth 1 seldepth 2 multipv 1 score cp 20000 nodes 1 nps 333 hashfull 0 tbhits 26 time 3 pv d4e5
info depth 2 seldepth 2 multipv 1 score cp 20000 nodes 2 nps 666 hashfull 0 tbhits 26 time 3 pv d4e5
info depth 3 seldepth 2 multipv 1 score cp 20000 nodes 3 nps 1000 hashfull 0 tbhits 26 time 3 pv d4e5
info depth 4 seldepth 2 multipv 1 score cp 20000 nodes 4 nps 1333 hashfull 0 tbhits 26 time 3 pv d4e5
info depth 5 seldepth 3 multipv 1 score cp 20000 nodes 11 nps 3666 hashfull 0 tbhits 26 time 3 pv d4e5
info depth 6 seldepth 4 multipv 1 score cp 20000 nodes 41 nps 10250 hashfull 0 tbhits 26 time 4 pv d4e5 g6g7 e3g2
info depth 7 seldepth 7 multipv 1 score cp 20000 nodes 194 nps 48500 hashfull 0 tbhits 26 time 4 pv d4e5 g6g7 e3f5 g7f8
info depth 8 seldepth 7 multipv 1 score cp 20000 nodes 442 nps 110500 hashfull 0 tbhits 26 time 4 pv d4e5 g6g7 e3g2 g7f8 d5c4
info depth 9 seldepth 8 multipv 1 score cp 20000 nodes 1119 nps 223800 hashfull 0 tbhits 26 time 5 pv d4e5 g6h6 e5e4 h6g7 e4d3 g7f8
info depth 10 seldepth 9 multipv 1 score cp 20000 nodes 1518 nps 303600 hashfull 0 tbhits 26 time 5 pv d4e5 g6g7 e5f5 g7h8

Operating system

All

Stockfish version

master

The text was updated successfully, but these errors were encountered:

Disservin · 2024-04-14T22:01:00Z

(Apparently present since sf6, according to discord)

jhellis3 · 2024-04-15T00:37:28Z

I translated the code to pencil & paper math, and AFAICS it all checks out. It is simply getting the wrong result from the DTZ probe. Why that is.... IDK.

dav1312 · 2024-04-15T06:45:40Z

I can test later but there are some alternative dtz tablebases called "nr" which seemed to fixed the problem for crafty and pere. It makes more sense that the issue was not in Stockfish in the first place.
https://tablebase.lichess.ovh/tables/standard/

dav1312 · 2024-04-15T10:05:24Z

Using dtz_nr tablebases

Stockfish dev-20240413-c55ae376 by the Stockfish developers (see AUTHORS file)
setoption name SyzygyPath value tb345;tb345_dtz_nr
info string Found 145 tablebases
position fen 8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106
go depth 10
info string NNUE evaluation using nn-ae6a388e4a1a.nnue
info string NNUE evaluation using nn-baff1ede1f90.nnue
info depth 1 seldepth 2 multipv 1 score cp 25 nodes 1 nps 333 hashfull 0 tbhits 26 time 3 pv d4e5
info depth 2 seldepth 2 multipv 1 score cp 25 nodes 2 nps 666 hashfull 0 tbhits 26 time 3 pv d4e5
info depth 3 seldepth 2 multipv 1 score cp 25 nodes 3 nps 750 hashfull 0 tbhits 26 time 4 pv d4e5
info depth 4 seldepth 2 multipv 1 score cp 25 nodes 4 nps 1000 hashfull 0 tbhits 26 time 4 pv d4e5
info depth 5 seldepth 3 multipv 1 score cp 25 nodes 11 nps 2750 hashfull 0 tbhits 26 time 4 pv d4e5
info depth 6 seldepth 4 multipv 1 score cp 25 nodes 41 nps 10250 hashfull 0 tbhits 26 time 4 pv d4e5 g6g7 e3g2
info depth 7 seldepth 7 multipv 1 score cp 25 nodes 194 nps 38800 hashfull 0 tbhits 26 time 5 pv d4e5 g6g7 e3f5 g7f8
info depth 8 seldepth 7 multipv 1 score cp 25 nodes 442 nps 88400 hashfull 0 tbhits 26 time 5 pv d4e5 g6g7 e3g2 g7f8 d5c4
info depth 9 seldepth 8 multipv 1 score cp 25 nodes 1119 nps 186500 hashfull 0 tbhits 26 time 6 pv d4e5 g6h6 e5e4 h6g7 e4d3 g7f8
info depth 10 seldepth 9 multipv 1 score cp 25 nodes 1518 nps 253000 hashfull 0 tbhits 26 time 6 pv d4e5 g6g7 e5f5 g7h8
bestmove d4e5 ponder g6g7

Using dtz tablebases

setoption name SyzygyPath value tb345;tb345_dtz
info string Found 145 tablebases
position fen 8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106
go depth 10
info string NNUE evaluation using nn-ae6a388e4a1a.nnue
info string NNUE evaluation using nn-baff1ede1f90.nnue
info depth 1 seldepth 2 multipv 1 score cp 20000 nodes 1 nps 142 hashfull 0 tbhits 26 time 7 pv d4e5
info depth 2 seldepth 2 multipv 1 score cp 20000 nodes 2 nps 250 hashfull 0 tbhits 26 time 8 pv d4e5
info depth 3 seldepth 2 multipv 1 score cp 20000 nodes 3 nps 375 hashfull 0 tbhits 26 time 8 pv d4e5
info depth 4 seldepth 2 multipv 1 score cp 20000 nodes 4 nps 500 hashfull 0 tbhits 26 time 8 pv d4e5
info depth 5 seldepth 3 multipv 1 score cp 20000 nodes 11 nps 1375 hashfull 0 tbhits 26 time 8 pv d4e5
info depth 6 seldepth 4 multipv 1 score cp 20000 nodes 41 nps 4555 hashfull 0 tbhits 26 time 9 pv d4e5 g6g7 e3g2
info depth 7 seldepth 7 multipv 1 score cp 20000 nodes 194 nps 19400 hashfull 0 tbhits 26 time 10 pv d4e5 g6g7 e3f5 g7f8
info depth 8 seldepth 7 multipv 1 score cp 20000 nodes 442 nps 40181 hashfull 0 tbhits 26 time 11 pv d4e5 g6g7 e3g2 g7f8 d5c4
info depth 9 seldepth 8 multipv 1 score cp 20000 nodes 1119 nps 93250 hashfull 0 tbhits 26 time 12 pv d4e5 g6h6 e5e4 h6g7 e4d3 g7f8
info depth 10 seldepth 9 multipv 1 score cp 20000 nodes 1518 nps 126500 hashfull 0 tbhits 26 time 12 pv d4e5 g6g7 e5f5 g7h8
bestmove d4e5 ponder g6g7

Disservin · 2024-04-15T10:58:37Z

I can test later but there are some alternative dtz tablebases called "nr" which seemed to fixed the problem for crafty and pere. It makes more sense that the issue was not in Stockfish in the first place. https://tablebase.lichess.ovh/tables/standard/

Mh.. is there some information on how those were generated/fixed/edited? Maybe niklasf knows something?

whelanh · 2024-04-15T11:04:24Z

As I understand it, dtz tables don't assume a 50 move rule, but dtr does. So neither is wrong, just different assumptions. https://chess.stackexchange.com/questions/28520/what-do-dtm-dtz-dtc-dtr-dtz50-and-dtzr-mean

dav1312 · 2024-04-15T12:02:51Z

I can test later but there are some alternative dtz tablebases called "nr" which seemed to fixed the problem for crafty and pere. It makes more sense that the issue was not in Stockfish in the first place. tablebase.lichess.ovh/tables/standard

Mh.. is there some information on how those were generated/fixed/edited? Maybe niklasf knows something?

@niklasf said on Discord:
some normal dtz tables store rounded values (described on https://syzygy-tables.info/metrics). it's a bit confusing, so i generated no rounding tables, at least up to 6 pieces

robertnurnberg · 2024-04-16T06:08:17Z

So my understanding of the situation is that sf does not report correct tb results in the affected positions when the user provides the widely available standard 3-6men syzygy files.

Only if the tb files were generated with this patch does sf work correctly.

Is that correct?

If that is so, should we warn the user on TB load if they use outdated tb files (if that is possible, by e.g. probing the position from this issue) ?

And what is the situation for the 7men files?

niklasf · 2024-04-16T11:44:02Z

To clarify: Stockfish works correctly in game play. That is, after the capture that crosses into tablebase territory, moves will be ranked correctly and Stockfish will achieve the best possible outcome. That's what the tablebases are designed for - they are not broken and they don't need to be fixed.

The issue occurs only in analysis, when setting up arbitrary positions that do not arise from optimal play following a capture. In that case there are really 7 possible probe results:

Loss
Loss or blessed loss
Blessed loss
Draw
Cursed win
Cursed win or win
Win

Avoiding this, by patching the table generator, produces larger tables for no playing strength gain. Handling the ambiguous results will require more code for no playing strength gain.

For the analysis board on Lichess, I did both: 6 piece tables with no rounding, and (because 7 piece tables are too much effort to regenerate) a user interface that correctly displays ambiguous results for 7 piece tables. For Stockfish, I am not sure what to do, if anything.

peregrineshahin · 2024-04-16T12:03:13Z

It's very hard for me to consider them not broken, moves will be ranked correctly at this point but we stopped caring about optimal play or guiding SF to the correct Win long ago, as SF more or less is expected to find these wins mostly, for me I only care about the eval.
Also, it's pretty easy to notice that this must be a bug/laziness/some oversight turned into a feature..

robertnurnberg · 2024-04-17T06:10:39Z

I believe we should re-open this issue. (At least to highlight to end-users that it exists, and to remind ourselves.)

At present stockfish returns incorrect analysis results for some FENs with nonzero half move counters when it is supplied with the default (and probably most widely used) 6men syzygy EGTBs, or with the only available format for 7men syzygy EGTBs.

vondele · 2024-04-17T06:12:58Z

I think we can reopen, though niklasf probably answered this quite clearly.

dubslow · 2024-04-17T06:29:11Z

Also, it's pretty easy to notice that this must be a bug/laziness/some oversight turned into a feature..

This is clearly false, as niklasf repeatedly stated this is specifically and exactly aimed at improving compression and reducing overall TB size. Syzygy was specifically designed to be the smallest TB, and this compression feature is a noticeable step towards this goal (to the tune of several percent, or so I've heard).

I do concur with re-opening in the short run. Altho bestmove selection is unaffected, it is indeed deeply confusing for users to see a winning evaluation for a drawn position.

The best idea I've seen is that we should adjust the evaluation of such ambiguous probes to be less than a proven win. Maybe 100 instead of 200 or something?

peregrineshahin · 2024-04-17T06:37:36Z

It's impossible to imagine that design wise you need to mess this requirement up while you do everything right. we can produce two/three/four bugs such that the so called Syzygy efficiency is optimized more.

This is clearly false, as niklasf repeatedly stated this is specifically and exactly aimed at improving compression and reducing overall TB size. Syzygy was specifically designed to be the smallest TB, and this compression feature is a noticeable step towards this goal (to the tune of several percent, or so I've heard).

Disservin · 2024-04-17T06:52:55Z

What is actually returned after probing ? I.e. a WDLCursedWin or a WDLWin ?

Torom · 2024-04-17T08:22:29Z

If we continue to play the game from the original FEN optimally, we reach 8/8/8/8/6B1/4N3/5K1k/8 w - - 98 128.
Giving Stockfish this position we get: info depth 245 seldepth 3 multipv 1 score cp 20000 nodes 490 nps 163333 hashfull 0 tbhits 20 time 3 pv e3f1 h2h1.
So we output a two move PV that ends in a 50-move draw, but still output 200.00.

AndyGrant · 2024-05-16T12:54:59Z

To clarify: Stockfish works correctly in game play. That is, after the capture that crosses into tablebase territory, moves will be ranked correctly and Stockfish will achieve the best possible outcome. That's what the tablebases are designed for - they are not broken and they don't need to be fixed.

For even more clarity @niklasf ... You are saying that Stockfish avoids this. This is because Stockfish will rank the root moves using this code, and refuse to play any root move with a rank worse than optimal?

        // Better moves are ranked higher. Certain wins are ranked equally.
        // Losing moves are ranked equally unless a 50-move draw is in sight.
        int r    = dtz > 0 ? (dtz + cnt50 <= 99 && !rep ? MAX_DTZ : MAX_DTZ - (dtz + cnt50))
                 : dtz < 0 ? (-dtz * 2 + cnt50 < 100 ? -MAX_DTZ : -MAX_DTZ + (-dtz + cnt50))
                           : 0;
        m.tbRank = r;

The important part here is the <= 99 condition, which is intentionally not <= 100, in order to avoid the off-by-one rounding issue (at least when delivering the mate?). Also, the !rep condition is present to serve a similar purpose for accidentally letting a WIN become a CURSED WIN?

Restated: If a repetition has been made, then Stockfish will only play moves with equal-DTZ in winning positions. If a repetition has not been made, then Stockfish will play any move which wins -before- the 100th, avoiding moves which zero on the 100th ply?

I don't explicitly see how this guarantees protection from the stated issue in ALL cases. Can a case exist where all moves appear to have the same DTZ=99 or DTZ=100, and then despite best intentions from above, you end up in the same ambiguous situation? IE all moves fall into the "WIN or CURSED WIN" ambiguous bucket?

Ref: 108f0da

niklasf · 2024-05-25T15:00:41Z

~~@AndyGrant I think you found a bug in Stockfish's ranking.~~

The intended effect is to give Stockfish some freedom, but reliable switch to nearly (i.e. 1 ply may be squandered due to rounding) DTZ-optimal play before it's too late.

This works when we approach the threshold <= 99 and eventually switch. Note that zeroing or mating on half-move clock 100 is still a win.

~~But~~ And there are indeed endgames that are so tight that immediate DTZ-optimal play is required and even losing 1 ply to rounding would change the outcome. Rounding is turned off for these endgames, so we've got precise DTZ values like 99 and 100 on our hands. ~~But Stockfish does not distinguish those by rank.~~

Edit: With regard to rep: After switching to optimal play, we may have to repeat a position from Stockfish's previous failed conversion attempts one more time. That's safe if there's only ever been one repetition, so we better switch to optimal play before allowing a second repetition. ~~But then ranking all moves equally, regardless of DTZ seems wrong.~~

Edit 2: I misread the implementation and it does not have the problem that I think I saw.

AndyGrant · 2024-05-26T01:24:08Z

Thank you for your response, Niklas. I'll take your word that the below code is intended for this purpose. Seems sensible -- and explains why your NR tables hijacked the line above to get the desired result.

  if (total_stats_w[DRAW_RULE] || total_stats_b[STAT_MATE - DRAW_RULE])
    ply_accurate_w = 1;

Not sure what this all will do to benefit Stockfish, since it seems Stockfish does it all correctly -- but this will fix some things in Ethereal and Torch, and inevitably many more engines in the openbench space as a result of the Ethereal changes.

vondele · 2024-07-04T07:39:38Z

@niklasf while working on #5414 I think I realized how this could be fixed without TB regeneration. If the TB win is on the edge from the DTZ / r50c, walk the TB moves until mate or draw, and score accordingly.

niklasf · 2024-07-04T07:56:25Z

That works, but I think it would have to be min-max rather than a linear walk. Among moves with equal DTZ, there may be some that were precise and some that have been rounded.

vondele · 2024-07-04T07:57:03Z

OK, I see, added complication.

Currently (after official-stockfish#5407), SF has the property that any PV line with a decisive TB score contains the corresponding TB position, with a score that correctly identifies the depth at which TB are entered. The PV line that follows might not preserve the game outcome, but can easily be verified and extended based on TB information. This patch provides this functionality, simply extending the PV lines on output, this doesn't affect search. Indeed, if DTZ tables are available, search based PV lines that correspond to decisive TB scores are verified to preserve game outcome, truncating the line as needed. Subsequently, such PV lines are extended with a game outcome preserving line until mate, as a possible continuation. These lines are not optimal mating lines, but are similar to what a user could produce on a website like https://syzygy-tables.info/ clicking always the top ranked move, i.e. minimizing or maximizing DTZ (with a simple tie-breaker for moves that have identical DTZ), and are thus an just an illustration of how to game can be won. A similar approach is already in established in https://github.com/joergoster/Stockfish/tree/matefish2 This also contributes to addressing official-stockfish#5175 where SF can give an incorrect TB win/loss for positions in TB with a movecounter that doesn't reflect optimal play. While the full solution requires either TB generated differently, or a search when ranking rootmoves, current SF will eventually find a draw in these cases, in practice quite quickly, e.g. `1kq5/q2r4/5K2/8/8/8/8/7Q w - - 96 1` `8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106` As repeated DTZ probing could be slow a procedure (100ms+ on HDD, a few ms on SSD), the extension is only done as long as the time taken is less than half the `Move Overhead` parameter. For tournaments where these lines might be of interest to the user, a suitable `Move Overhead` might be needed (e.g. TCEC has 1000ms already). No functional change

Currently (after official-stockfish#5407), SF has the property that any PV line with a decisive TB score contains the corresponding TB position, with a score that correctly identifies the depth at which TB are entered. The PV line that follows might not preserve the game outcome, but can easily be verified and extended based on TB information. This patch provides this functionality, simply extending the PV lines on output, this doesn't affect search. Indeed, if DTZ tables are available, search based PV lines that correspond to decisive TB scores are verified to preserve game outcome, truncating the line as needed. Subsequently, such PV lines are extended with a game outcome preserving line until mate, as a possible continuation. These lines are not optimal mating lines, but are similar to what a user could produce on a website like https://syzygy-tables.info/ clicking always the top ranked move, i.e. minimizing or maximizing DTZ (with a simple tie-breaker for moves that have identical DTZ), and are thus an just an illustration of how to game can be won. A similar approach is already in established in https://github.com/joergoster/Stockfish/tree/matefish2 This also contributes to addressing official-stockfish#5175 where SF can give an incorrect TB win/loss for positions in TB with a movecounter that doesn't reflect optimal play. While the full solution requires either TB generated differently, or a search when ranking rootmoves, current SF will eventually find a draw in these cases, in practice quite quickly, e.g. `1kq5/q2r4/5K2/8/8/8/8/7Q w - - 96 1` `8/8/6k1/3B4/3K4/4N3/8/8 w - - 54 106` Gives the same results as master on an extended set of test positions from mcostalba/Stockfish@9173d29 with the exception of the above mentioned fen where this commit improves. With https://github.com/vondele/matetrack using 6men TB, all generated PVs verify: ``` Using ../Stockfish/src/stockfish.syzygyExtend on matetrack.epd with --nodes 1000000 --syzygyPath /chess/syzygy/3-4-5-6/WDL:/chess/syzygy/3-4-5-6/DTZ Engine ID: Stockfish dev-20240704-ff227954 Total FENs: 6555 Found mates: 3299 Best mates: 2582 Found TB wins: 568 ``` As repeated DTZ probing could be slow a procedure (100ms+ on HDD, a few ms on SSD), the extension is only done as long as the time taken is less than half the `Move Overhead` parameter. For tournaments where these lines might be of interest to the user, a suitable `Move Overhead` might be needed (e.g. TCEC has 1000ms already). closes official-stockfish#5414 No functional change

dav1312 closed this as completed Apr 15, 2024

vondele reopened this Apr 17, 2024

vondele mentioned this issue Jul 4, 2024

Correct and extend PV lines with decisive TB score #5414

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Winning evaluation with tablebases in cursed win #5175

Winning evaluation with tablebases in cursed win #5175

dav1312 commented Apr 14, 2024 •

edited

Loading

Disservin commented Apr 14, 2024

jhellis3 commented Apr 15, 2024

dav1312 commented Apr 15, 2024

dav1312 commented Apr 15, 2024 •

edited

Loading

Disservin commented Apr 15, 2024

whelanh commented Apr 15, 2024

dav1312 commented Apr 15, 2024

robertnurnberg commented Apr 16, 2024

niklasf commented Apr 16, 2024 •

edited

Loading

peregrineshahin commented Apr 16, 2024

robertnurnberg commented Apr 17, 2024

vondele commented Apr 17, 2024

dubslow commented Apr 17, 2024

peregrineshahin commented Apr 17, 2024

Disservin commented Apr 17, 2024

Torom commented Apr 17, 2024

AndyGrant commented May 16, 2024 •

edited

Loading

niklasf commented May 25, 2024 •

edited

Loading

AndyGrant commented May 26, 2024

vondele commented Jul 4, 2024

niklasf commented Jul 4, 2024

vondele commented Jul 4, 2024

Winning evaluation with tablebases in cursed win #5175

Winning evaluation with tablebases in cursed win #5175

Comments

dav1312 commented Apr 14, 2024 • edited Loading

Describe the issue

Expected behavior

Steps to reproduce

Operating system

Stockfish version

Disservin commented Apr 14, 2024

jhellis3 commented Apr 15, 2024

dav1312 commented Apr 15, 2024

dav1312 commented Apr 15, 2024 • edited Loading

Disservin commented Apr 15, 2024

whelanh commented Apr 15, 2024

dav1312 commented Apr 15, 2024

robertnurnberg commented Apr 16, 2024

niklasf commented Apr 16, 2024 • edited Loading

peregrineshahin commented Apr 16, 2024

robertnurnberg commented Apr 17, 2024

vondele commented Apr 17, 2024

dubslow commented Apr 17, 2024

peregrineshahin commented Apr 17, 2024

Disservin commented Apr 17, 2024

Torom commented Apr 17, 2024

AndyGrant commented May 16, 2024 • edited Loading

niklasf commented May 25, 2024 • edited Loading

AndyGrant commented May 26, 2024

vondele commented Jul 4, 2024

niklasf commented Jul 4, 2024

vondele commented Jul 4, 2024

dav1312 commented Apr 14, 2024 •

edited

Loading

dav1312 commented Apr 15, 2024 •

edited

Loading

niklasf commented Apr 16, 2024 •

edited

Loading

AndyGrant commented May 16, 2024 •

edited

Loading

niklasf commented May 25, 2024 •

edited

Loading