Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another book with unbalanced human openings #39

Merged
merged 1 commit into from
Oct 21, 2023

Conversation

vondele
Copy link
Member

@vondele vondele commented Oct 21, 2023

A new book derived from Lichess games, with a model draw rate between 48% and 52%

It attempts to address the following points, relative to the currently used book:

  • about 10x larger (2.6M pos), i.e. more variety while testing on fishtest, no repeated openings for any single test played.
  • both white and black advantage around +- 1.0
  • positions at all game plies between 1 and 16

The construction process involved

  1. Parsing all 15B lichess games in the database https://database.lichess.org/ for the period Jan - Sept 2023.
    Extract from these the popular positions, i.e. seen at least twice, within the first 16 plies played, exploring newly added games to at most 8 previously unseen plies.
$ ./fastpopular --dir /mnt/md0/chess/lichessgames/2023/ --minCount 2 --stopEarly --countStopEarly 8 --maxPlies 16 --concurrency 9 -o popular_Lichess_JanSept_maxPlies16_stopEarly8.epd
Looking for pgn files in /mnt/md0/chess/lichessgames/2023/
Found 9 .pgn(.gz) files, creating 9 chunks for processing.
Processed 9 files
Retained 296993424 positions from 1127228493 unique visited in 15251265926 games.
Total time for processing: 7374.5 s

fastpopular as available at https://github.com/vondele/fastpopular

  1. Score all these 296M games with a modified stockfish, based on master, that analyses positions up to a depth 24, for as long as the draw rate is predicted (UCI_ShowWDL) near 50%.
    Positions will be analysed to low depth if the draw rate is very different from 50% at low depth.
    From these scored positions, extract those with a draw rate in the range 48 - 52%
    That modified branch is available at https://github.com/vondele/Stockfish/tree/createUHO
   ./stockfish.createUHO bench 128 1 24 popular_Lichess_JanSept_maxPlies16_stopEarly8.epd > popular_Lichess_JanSept_maxPlies16_stopEarly8_scored.epd
   awk '{if ($15>480 && $15<520) print $0}' popular_Lichess_JanSept_maxPlies16_stopEarly8_scored.epd | cut -d';' -f1 | sed "s/ $//g" > UHO_Lichess_4852_v1.epd

Short initial testing at STC shows the draw rate is, as expected, close to 50% for self-play games:

Score of master1 vs master2: 1048 - 1031 - 1921 [] 4000
Elo difference: 1.48 +/- 7.75, LOS: 64.54 %, DrawRatio: 48.02 %
Ptnml:        WW     WD  DD/WL     LD     LL
Distr:        21    473   1026    462     18

A new book derived from Lichess games, with a model draw rate between 48% and 52%

It attempts to address the following points, relative to the currently used book:

* about 10x larger (2.6M pos), i.e. more variety while testing on fishtest, no repeated openings for any single test played.
* both white and black advantage around +- 1.0
* positions at all game plies between 1 and 16

The construction process involved

1) Parsing all 15B lichess games in the database https://database.lichess.org/ for the period Jan - Sept 2023.
   Extract from these the popular positions, i.e. seen at least twice, within the first 16 plies played, exploring newly added games to at most 8 previously unseen plies.
```
$ ./fastpopular --dir /mnt/md0/chess/lichessgames/2023/ --minCount 2 --stopEarly --countStopEarly 8 --maxPlies 16 --concurrency 9 -o popular_Lichess_JanSept_maxPlies16_stopEarly8.epd
Looking for pgn files in /mnt/md0/chess/lichessgames/2023/
Found 9 .pgn(.gz) files, creating 9 chunks for processing.
Processed 9 files
Retained 296993424 positions from 1127228493 unique visited in 15251265926 games.
Total time for processing: 7374.5 s
```
   fastpopular as available at https://github.com/vondele/fastpopular

2) Score all these 296M games with a modified stockfish, based on master, that analyses positions up to a depth 24, for as long as the draw rate is predicted (UCI_ShowWDL) near 50%.
   Positions will be analysed to low depth if the draw rate is very different from 50% at low depth.
   From these scored positions, extract those with a draw rate in the range 48 - 52%
   That modified branch is available at https://github.com/vondele/Stockfish/tree/createUHO
```
   ./stockfish.createUHO bench 128 1 24 popular_Lichess_JanSept_maxPlies16_stopEarly8.epd > popular_Lichess_JanSept_maxPlies16_stopEarly8_scored.epd
   awk '{if ($15>480 && $15<520) print $0}' popular_Lichess_JanSept_maxPlies16_stopEarly8_scored.epd | cut -d';' -f1 | sed "s/ $//g" > UHO_Lichess_4852_v1.epd
```

Short initial testing at STC shows the draw rate is, as expected, close to 50% for self-play games:
```
Score of master1 vs master2: 1048 - 1031 - 1921 [] 4000
Elo difference: 1.48 +/- 7.75, LOS: 64.54 %, DrawRatio: 48.02 %
Ptnml:        WW     WD  DD/WL     LD     LL
Distr:        21    473   1026    462     18
```
@vondele vondele merged commit 426eca4 into official-stockfish:master Oct 21, 2023
@robertnurnberg
Copy link
Contributor

robertnurnberg commented Mar 1, 2024

  • positions at all game plies between 1 and 16

Just a tiny correction. The earliest game ply I could find is 2, e.g. for the position rnbqkbnr/p1pppppp/8/1p6/3P4/8/PPP1PPPP/RNBQKBNR w KQkq - 0 2.

Edit: Here the complete list of frequencies.

game ply  2: 5 times
game ply  3: 47 times
game ply  4: 642 times
game ply  5: 3454 times
game ply  6: 12996 times
game ply  7: 29984 times
game ply  8: 60510 times
game ply  9: 99575 times
game ply 10: 156793 times
game ply 11: 217136 times
game ply 12: 288550 times
game ply 13: 353868 times
game ply 14: 420058 times
game ply 15: 470702 times
game ply 16: 517716 times

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants