Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About using different players for training game generation #56

Open
remdu opened this issue Mar 1, 2018 · 6 comments
Open

About using different players for training game generation #56

remdu opened this issue Mar 1, 2018 · 6 comments

Comments

@remdu
Copy link

remdu commented Mar 1, 2018

So I have a question which is related to another similar project for go https://github.com/gcp/leela-zero
In this project, self play game are generated from the same player playing against itself. So black and white have the same random seed, and have a shared search tree through tree reuse.
If I'm reading the code right, in reversi-alpha-zero, 2 independent players are used to generate self-play games, with their own separate search tree and different random seed.
I am very curious about the effects of the 2 different ways of doing this. What have been your results ?

@mokemokechicken
Copy link
Owner

Hi @eddh

I am very curious about the effects of the 2 different ways of doing this. What have been your results ?

Though I also enabled sharing tree search information by share_mtcs_info_in_self_play, I don't see the effects of sharing and separate tree search information.
I feel that perfectly separating(between black and white in a game) is a little waste of computation cost, and sharing it among games brings a kind of overfitting or mode collapse.

If I have rich computational resources, it might be better to separate perfectly because it brings a little randomness.

@remdu
Copy link
Author

remdu commented Mar 26, 2018

Thank you for the answer. I have been curious about this but maybe it has less of an effect than I expected. Did you do tests regarding reusing tree information and the effect it has on the effictiveness of Dirichlet noise ? In other related projects, the consensus seems to be that it does make the Dirichlet noise less effictive but that as long as it doesn't prevent new moves discovery completely the speed boost is worth the cost.

@mokemokechicken
Copy link
Owner

mokemokechicken commented Mar 27, 2018

Did you do tests regarding reusing tree information and the effect it has on the effictiveness of Dirichlet noise ?

I tested reusing tree information and checked the moves.
In early phase of training, even if reusing it among several games, there were no (or very few) completely same move games.
However in late phase, there were many same move games even if no reusing tree information.

Although it will be a little different topic...

there is a draw in reversi.
I think if both black and white think "the best result of this position is draw", the game will tend to draw.
Because they can only find "lose" and "draw" moves, they select known "draw" moves. They have little motivation to find new "win" moves.
If there is no draw(like Go), they can find only "lose" or "win" moves, they select "win" moves they believe, so they(losing side) find new moves.

I was annoyed with this "many draw games(80~90%) problem".
It was difficult to break this situation.

Reusing tree information tends to bring the problem.
So I think it might be better to separate perfectly because it brings a little randomness.

@gooooloo
Copy link
Contributor

Maybe that is just the nature of Reversi game, see https://en.wikipedia.org/wiki/Computer_Othello, the "Othello 8 x 8" section. That being said, even if enough randomness is promised when traning, it will leads to a draw at last.

As said in that link:

Regarding the three main openings of diagonal, perpendicular and parallel, it appears that both diagonal and perpendicular openings lead to drawing lines, while the parallel opening is a win for black.

Is your model playing diagonal opening or perpendicular opening?

@mokemokechicken
Copy link
Owner

Is your model playing diagonal opening or perpendicular opening?

Several opening including diagonal and perpendicular were played.
If the model did best moves, there is no problem, however the model lost against NTest 9~.

@gooooloo
Copy link
Contributor

I see. Looking forward to a solution being found~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants