About using different players for training game generation #56

remdu · 2018-03-01T17:04:40Z

So I have a question which is related to another similar project for go https://github.com/gcp/leela-zero
In this project, self play game are generated from the same player playing against itself. So black and white have the same random seed, and have a shared search tree through tree reuse.
If I'm reading the code right, in reversi-alpha-zero, 2 independent players are used to generate self-play games, with their own separate search tree and different random seed.
I am very curious about the effects of the 2 different ways of doing this. What have been your results ?

mokemokechicken · 2018-03-04T06:01:32Z

Hi @eddh

I am very curious about the effects of the 2 different ways of doing this. What have been your results ?

Though I also enabled sharing tree search information by share_mtcs_info_in_self_play, I don't see the effects of sharing and separate tree search information.
I feel that perfectly separating(between black and white in a game) is a little waste of computation cost, and sharing it among games brings a kind of overfitting or mode collapse.

If I have rich computational resources, it might be better to separate perfectly because it brings a little randomness.

remdu · 2018-03-26T16:49:10Z

Thank you for the answer. I have been curious about this but maybe it has less of an effect than I expected. Did you do tests regarding reusing tree information and the effect it has on the effictiveness of Dirichlet noise ? In other related projects, the consensus seems to be that it does make the Dirichlet noise less effictive but that as long as it doesn't prevent new moves discovery completely the speed boost is worth the cost.

mokemokechicken · 2018-03-27T09:37:49Z

Did you do tests regarding reusing tree information and the effect it has on the effictiveness of Dirichlet noise ?

I tested reusing tree information and checked the moves.
In early phase of training, even if reusing it among several games, there were no (or very few) completely same move games.
However in late phase, there were many same move games even if no reusing tree information.

Although it will be a little different topic...

there is a draw in reversi.
I think if both black and white think "the best result of this position is draw", the game will tend to draw.
Because they can only find "lose" and "draw" moves, they select known "draw" moves. They have little motivation to find new "win" moves.
If there is no draw(like Go), they can find only "lose" or "win" moves, they select "win" moves they believe, so they(losing side) find new moves.

I was annoyed with this "many draw games(80~90%) problem".
It was difficult to break this situation.

Reusing tree information tends to bring the problem.
So I think it might be better to separate perfectly because it brings a little randomness.

gooooloo · 2018-03-27T10:42:50Z

Maybe that is just the nature of Reversi game, see https://en.wikipedia.org/wiki/Computer_Othello, the "Othello 8 x 8" section. That being said, even if enough randomness is promised when traning, it will leads to a draw at last.

As said in that link:

Regarding the three main openings of diagonal, perpendicular and parallel, it appears that both diagonal and perpendicular openings lead to drawing lines, while the parallel opening is a win for black.

Is your model playing diagonal opening or perpendicular opening?

mokemokechicken · 2018-03-27T22:52:07Z

Is your model playing diagonal opening or perpendicular opening?

Several opening including diagonal and perpendicular were played.
If the model did best moves, there is no problem, however the model lost against NTest 9~.

gooooloo · 2018-03-28T02:19:21Z

I see. Looking forward to a solution being found~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About using different players for training game generation #56

About using different players for training game generation #56

remdu commented Mar 1, 2018 •

edited

Loading

mokemokechicken commented Mar 4, 2018

remdu commented Mar 26, 2018

mokemokechicken commented Mar 27, 2018 •

edited

Loading

gooooloo commented Mar 27, 2018

mokemokechicken commented Mar 27, 2018

gooooloo commented Mar 28, 2018

About using different players for training game generation #56

About using different players for training game generation #56

Comments

remdu commented Mar 1, 2018 • edited Loading

mokemokechicken commented Mar 4, 2018

remdu commented Mar 26, 2018

mokemokechicken commented Mar 27, 2018 • edited Loading

gooooloo commented Mar 27, 2018

mokemokechicken commented Mar 27, 2018

gooooloo commented Mar 28, 2018

remdu commented Mar 1, 2018 •

edited

Loading

mokemokechicken commented Mar 27, 2018 •

edited

Loading