-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm #13
Comments
Oh..., I did't generate 8 symmetries for each position... |
In reversi, it is better that α is 0.3 ~ 0.5? |
re-normalising in legal moves may be important because of balance between value and policy. |
Wow!! |
Agreed. Let's say 180 legal actions in average in Go19x19, and in Reversi it may be around 10? So as to the new paper, 10 times 0.03 seems more reasonable. |
What is main different between alphago zero and alphazero? |
Hi @apollo-time I think the main differences are as follows. P3~4AlphaZero:
So, MCTS is also used without transforming the board position. |
FYI: https://arxiv.org/abs/1712.01815
The text was updated successfully, but these errors were encountered: