wrong evaluation of draw position #4103

mendi80 · 2022-07-06T21:40:23Z

Position: q7/8/2p5/B2p2pp/5pp1/2N3k1/6P1/7K w - - 0 1
The latest version (5/07/2022) miss the draw and evaluate -4.8
Compared to an older version like 14/05/2022 that see the draw and evaluate 0.00.

ErdoganSeref · 2022-07-18T13:09:10Z

The issue is similar to #3894. @vondele closed it by saying the following:

There will always be positions that are wrongly evaluated, especially constructed ones. Not much we can do to fix particular positions, but fortunately, the number of them gets smaller over time.

vondele · 2022-09-04T05:50:14Z

I think this is an interesting hard puzzle for engines in that it basically needs to resolve to the 50 move rule to see the draw evaluation. For this one needs to extend search very deep, which causes something like #3911 .. on the other hand, the right move is found almost instantly.

Craftyawesome · 2022-09-09T18:45:57Z

There is also the reverse of this problem Position: rnb1q2k/2ppnr1p/p4p1B/3PpP1N/1P1bP1R1/2NB4/1PP3PP/5R1K w - - 5 22 SF15 don't see the lose and evaluate 0.00 = draw Compared to an older version SF14 that see the lose and evaluate -2.00

Do you have the full solution? SF14 also eventually drops to near 0.

dav1312 · 2022-09-09T19:02:23Z

This is the line that I got doing some fast analysis

[FEN "rnb1q2k/2ppnr1p/p4p1B/3PpP1N/1P1bP1R1/2NB4/1PP3PP/5R1K w - - 5 22"]

22.Nxf6 Qd8 23.d6 cxd6 24.Bc4 d5 25.Ncxd5 Nxd5 26.Nxd5 Ra7 27.c3 d6
28.Rf3 Nc6 29.Rg5 Rab7 30.h4 Ba7 31.Kh2 Ne7 32.Nxe7 Qxe7 33.f6 Qe8 34.Bxf7 Qxf7
35.Rg7 Be6 36.g4 *

But then the latest dev refuted the whole thing with 24.Rf3 which is winning for white

Craftyawesome · 2022-09-09T20:06:07Z

SF14 at depth 40 evaluate -1.50

Keep in mind that lichess SF14 is using a smaller net that's a few elo worse on average. It looks like it missed 24.Rf3 as mentioned above.

dav1312 · 2022-09-09T21:03:54Z

white is winning ! therefore, the position is not draw

That is the wrong conclusion. It's a draw because black can play 22... Rxf6

peregrineshahin · 2022-09-11T06:01:13Z

@MaiaChess I will assume you're legit asking
First the concept of mistakes and inaccuracies is not something that is embedded in Stockfish, it is just Lichess way of interpreting the amount of eval shift, and yeah you can lose even if you don't play any mistakes and inaccuracies in chesscom and lichess by being outplayed a bit by bit without much gap in the shift of the eval that's one thing

Second Lichess uses another net and maybe a modified binary for SF to run smoothly
Third it is not SF 15 also its Stockfish 14+ on my end
Fourth you cannot compare an engine running on a browser to find the critical eval shift between two engines, do it on a good hardware
Fifth this is unrelated to the issue
Sixth it's not a wrong eval of the whole game
I can go on and on

peregrineshahin · 2022-09-11T06:29:00Z

Simply black is lost because it got outplayed a bit by a bit, there are millions/billions of possibilities
And some variations has a little difference in the eval that choosing from them is the battle ground between engines
Download SF 15 yourself and let it run with good parameters and analyse it well you will certainly find moments that has greater eval shift
When two engines play against each other that's the expected behavior, outplaying another engine can be done in those lines that for example differs 0.04 or even less also and it makes sense because it can accumulate
And anyways The goal of SF 15 wasn't a chess evaluater, it's an engine that needs to make good moves and win
If there is a better eval in recent versions its a byproduct of the recent elo gains (product being able to get better results than the previous one) but it's not the main goal of stockfish development
And if it is really it's unobtainable anyways because nobody knows the real truth about the position anyways
I suggest you think of SF 15 as an engine that wouldn't make the mistakes that SF 8 did
And not mainly a better evaluator (which indeed SF 15 already is)

peregrineshahin · 2022-09-11T07:37:08Z

For god sake what do you mean by a Lichess server analysis?
Request analysis takes 1 sec to finish for the whole game
How is this any hella trustable
Don't get me wrong I love Lichess, but if you even run the analysis in your broweser it's even better than this request server analysis button
When you want to analyze games between two engine you can't just by one click and 1 sec get the full report maybe you can in 2090 but not in 2022
You can go to the Lichess source code or ask somebody in lichess and they will totally agree that this feature can't and shouldn't give reports about engine vs engine games
You are repeating yourself again and again, just go see how many nodes does this feature calculate per move
It is not a good indicator
Whereas it is for human games but also not for top players out there

bftjoe · 2022-09-11T07:42:11Z

Lichess uses fixed nodes analysis at low hash size, it means nothing in engine vs engine play.

dav1312 · 2022-09-11T08:22:02Z

@MaiaChess Please don't use Lichess's server analysis for engine vs engine games.
Their analysis is limited to 1.5M nodes, intended for games between humans.

Craftyawesome · 2022-09-12T21:56:49Z

which one is correct ? sf15 or sf11 ?

The position is completely lost. It is subjective what the best move to try to save the game is. Neither is really wrong or right.

And if you want to argue subjective, Leela also prefers e4. -5 to -11.

jxu · 2022-09-16T06:40:40Z

@MaiaChess are you impersonating the creators of https://github.com/CSSLab/maia-chess?

shorome2 · 2022-09-17T19:12:49Z

Just one inaccuracy cant cause lose a game , inaccuracy defined as the delta function for evaluate mistake
You will see the analysis of this game in the future by SF20 that was able to find blunder & mistake from game
please pay attention the SF8 did not have NNUE for play game against AlphaZero

I should have grabbed popcorn for this...

Your view of Computer Engine Chess is a bit simplistic. SF's NN is tuned on ~5k CPUs. The tests you gave weren't even performed on TCEC equipment much less all of Fishtest. The accuracy of a specific move in the middle game isn't objective... it's subjective to the ideas being calculated vs what the opponent is actually doing. If your opponent already played this game and won and it remembered the game perfectly, do you expect an engine with very limited computing power to be able think far enough ahead to draw these games?

I expect SF using max threads and hash and a 7man TB to be able to correctly evaluate the positions you gave. The only problem is you're expecting moves that were calculated by a supercomputer to be refuted by your PC at low depth, low hash, and low thread count --without an EGTB.

Should there be better scalability between what SF can do on a supercomputer vs your PC? yes
But, as was mentioned earlier, you should probably use a super computer to analyze a game from another supercomputer... 40 depth is nothing... I have a third gen ryzen 7 with only 16 GB ram and it can reach 70 depth 100 selective depth in a matter of 3-5mins on these game positions

RogerThiede · 2022-09-17T22:10:00Z

@MaiaChess, what is it that you're trying to convey? Do you believe you have found an actual issue in the source code? Do you believe you have found a systemic issue in the latest neural network?
We can discover many positions which are draw by Syzygy Tablebase lookup but which the latest neural network gives vastly different evaluations.

jxu · 2022-09-17T22:11:29Z

Should there be better scalability between what SF can do on a supercomputer vs your PC ?

Yes that's right , But remember Kasparov's loss to an amateur opponent , Kasparov thought that his opponent had an idea for each move but the opponent was just playing his simple game

SF15 is stronger than SF11 but this power will cause its weaknesses because it is not perfect

#4103 (comment)

what does this even mean? it's too vague to tell and there's no point in comparing engine calculations to human calculations. maybe Kasparov lost a simul or blitz game to an amateur once, but it sounds more like a made up urban myth

jxu · 2022-09-17T23:33:50Z

I mean that sometimes simple moves are more powerful than each moves

?????????????

nathan-lc0 · 2022-11-21T02:42:10Z

The original position that started this issue seems to no longer be a problem. Latest stockfish finds ne4 and evaluates it as 0.00 almost instantly on a single thread. Perhaps can be closed.

Craftyawesome · 2022-11-21T03:06:51Z

I am getting -2.08 on a single thread. Multithread seems to flatline at some random eval, sometimes 0. But either way, the issue tracker might not be the best place for positions SF gets wrong.

vondele · 2022-11-21T06:37:21Z

I'm closing this in light of the last two comments

ghost mentioned this issue Oct 12, 2022

wrong NNUE evaluation of a position #4190

Closed

vondele closed this as completed Nov 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrong evaluation of draw position #4103

wrong evaluation of draw position #4103

mendi80 commented Jul 6, 2022

ErdoganSeref commented Jul 18, 2022

vondele commented Sep 4, 2022

Craftyawesome commented Sep 9, 2022

dav1312 commented Sep 9, 2022

Craftyawesome commented Sep 9, 2022

dav1312 commented Sep 9, 2022

peregrineshahin commented Sep 11, 2022 •

edited

Loading

peregrineshahin commented Sep 11, 2022

peregrineshahin commented Sep 11, 2022

bftjoe commented Sep 11, 2022

dav1312 commented Sep 11, 2022

Craftyawesome commented Sep 12, 2022

jxu commented Sep 16, 2022

shorome2 commented Sep 17, 2022

RogerThiede commented Sep 17, 2022

jxu commented Sep 17, 2022

jxu commented Sep 17, 2022

nathan-lc0 commented Nov 21, 2022

Craftyawesome commented Nov 21, 2022

vondele commented Nov 21, 2022

wrong evaluation of draw position #4103

wrong evaluation of draw position #4103

Comments

mendi80 commented Jul 6, 2022

ErdoganSeref commented Jul 18, 2022

vondele commented Sep 4, 2022

Craftyawesome commented Sep 9, 2022

dav1312 commented Sep 9, 2022

Craftyawesome commented Sep 9, 2022

dav1312 commented Sep 9, 2022

peregrineshahin commented Sep 11, 2022 • edited Loading

peregrineshahin commented Sep 11, 2022

peregrineshahin commented Sep 11, 2022

bftjoe commented Sep 11, 2022

dav1312 commented Sep 11, 2022

Craftyawesome commented Sep 12, 2022

jxu commented Sep 16, 2022

shorome2 commented Sep 17, 2022

RogerThiede commented Sep 17, 2022

jxu commented Sep 17, 2022

jxu commented Sep 17, 2022

nathan-lc0 commented Nov 21, 2022

Craftyawesome commented Nov 21, 2022

vondele commented Nov 21, 2022

peregrineshahin commented Sep 11, 2022 •

edited

Loading