Training alpha_zero_torch on Backgammon and other games #1089

Gedol · 2023-06-26T06:45:51Z

Gedol
Jun 26, 2023

Hi, we tried training alpha_zero_torch on Backgammon (and some other games) and we ran into a libtorch error

-- build open spiel from source

./build/examples/alpha_zero_torch_example --game=backgammon
"
terminate called after throwing an instance of 'c10::Error'
what(): from is out of bounds for float
...

Full repo steps here: https://colab.research.google.com/drive/1D1L5PDSg0if6H2txuwbGYWxKQW5eQnIT#scrollTo=UDFmr_YVLifX
(see last line for error message)

The line that's breaking is
https://github.com/deepmind/open_spiel/blob/master/open_spiel/algorithms/alpha_zero_torch/model.cc#L308

full stack trace: https://etherpad.wikimedia.org/p/azas_os_trace

When we run
./build/examples/alpha_zero_torch_example --game=chess
we do not get this error and it appears to be working

Long story short, it looks like what's happening is that the model set up by alpha_zero_torch uses the observation_tensor_shape of the game to set up the model config, see these lines:
https://github.com/deepmind/open_spiel/blob/master/open_spiel/algorithms/alpha_zero_torch/model.cc#L275-L277
and observation_tensor_shape needs to have 3 non-zero values for games to work with alpha_zero_torch (like Chess), but for others (like Backgammon, gin_rummy, pig, etc..., and all non-deterministic games we tried) the oberservation_tensor_shape doesn't have 3 nonzero values and therefore gives above error. I put more details below. Do you have suggestions how to proceed to fix this issue? Should the alpha_zero_torch code be modified or are all games supposed implement an ObservationTensorShape() that returns a 3-valued vector? Or something else?

Thanks!

UPDATE 6/28: We realized that in our setup, the command "./build/examples/alpha_zero_torch_example --game=backgammon --nn_model=mlp " doesn't give this error. So the error is specific to using the default resnet model (https://github.com/deepmind/open_spiel/blob/master/open_spiel/algorithms/alpha_zero_torch/model.cc#L274).

--- details---

If we print out the output_config getting sent to libtorch (line 308 of model.cc: https://github.com/deepmind/open_spiel/blob/master/open_spiel/algorithms/alpha_zero_torch/model.cc#L308)
for Chess it is
$1 = {input_channels = 128, value_filters = 1, policy_filters = 2, kernel_size = 1, padding = 0, value_linear_in_features = 64, value_linear_out_features = 128, policy_linear_in_features = 128,
policy_linear_out_features = 4672, value_observation_size = 64, policy_observation_size = 128}

while for backgammon it is:
$1 = {input_channels = 128, value_filters = 1, policy_filters = 2, kernel_size = 1, padding = 0, value_linear_in_features = 0, value_linear_out_features = 128, policy_linear_in_features = 0,
policy_linear_out_features = 241, value_observation_size = 0, policy_observation_size = 0}

Notice that value_linear_in_features, policy_linear_in_features, value_observation_size, policy_observation_size are all zero and we think that's causing the "out of bounds for float" error in libtorch. They are 0 because the height and width variables are 0 here:
https://github.com/deepmind/open_spiel/blob/master/open_spiel/algorithms/alpha_zero_torch/model.cc#L277

We believe these are set here
https://github.com/deepmind/open_spiel/blob/1300e75223b2969816542668bc21f4a76eb3b318/open_spiel/algorithms/alpha_zero_torch/vpnet.cc#L100

and here ObservationTensorShape() for backgammon is as follows and results in width getting set to 0:
https://github.com/deepmind/open_spiel/blob/master/open_spiel/games/backgammon.h#L307

for comparison here is the same method for chess:
https://github.com/deepmind/open_spiel/blob/1300e75223b2969816542668bc21f4a76eb3b318/open_spiel/games/chess.h#L49

@alexjshim

Answered by lanctot

Jul 16, 2023

Ah yes, this makes sense, I think, because the backgammon observation tensor is just a 1D vector, but I suspect the resnet model is expecting a 2D observation since it uses conv nets (explaining requiring 3 nonzero values in the observation shape).

If you want to use a resnet, you will likely need to modify the observation tensor to be 2D.

View full answer

lanctot · 2023-07-16T11:48:43Z

lanctot
Jul 16, 2023
Maintainer

Oh sorry, I don't check this tab often and completely missed this. I am away now and will respond in a few days

0 replies

lanctot · 2023-07-16T12:01:19Z

lanctot
Jul 16, 2023
Maintainer

Ah yes, this makes sense, I think, because the backgammon observation tensor is just a 1D vector, but I suspect the resnet model is expecting a 2D observation since it uses conv nets (explaining requiring 3 nonzero values in the observation shape).

If you want to use a resnet, you will likely need to modify the observation tensor to be 2D.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training alpha_zero_torch on Backgammon and other games #1089

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Training alpha_zero_torch on Backgammon and other games #1089

Gedol Jun 26, 2023

Replies: 2 comments

lanctot Jul 16, 2023 Maintainer

lanctot Jul 16, 2023 Maintainer

Gedol
Jun 26, 2023

lanctot
Jul 16, 2023
Maintainer

lanctot
Jul 16, 2023
Maintainer