The backgammon game is a 2-player game that involves both the movement of the checkers and also the roll of the dice. The goal of each player is to move all of his checkers off the board.
This repository contains a Backgammon game implementation in OpenAI Gym.
Given the current state of the board, a roll of the dice, and the current player, it computes all the legal actions/moves (iteratively) that the current player can execute. The legal actions are generated in a such a way that they uses the highest number of dice (if possible) for that state and player.
git clone https://github.com/dellalibera/gym-backgammon.git
cd gym-backgammon/
pip3 install -e .
The encoding used to represent the state is inspired by the one used by Gerald Tesauro[1].
Type: Box(198)
Num | Observation | Min | Max |
---|---|---|---|
0 | WHITE - 1st point, 1st component | 0.0 | 1.0 |
1 | WHITE - 1st point, 2nd component | 0.0 | 1.0 |
2 | WHITE - 1st point, 3rd component | 0.0 | 1.0 |
3 | WHITE - 1st point, 4th component | 0.0 | 6.0 |
4 | WHITE - 2nd point, 1st component | 0.0 | 1.0 |
5 | WHITE - 2nd point, 2nd component | 0.0 | 1.0 |
6 | WHITE - 2nd point, 3rd component | 0.0 | 1.0 |
7 | WHITE - 2nd point, 4th component | 0.0 | 6.0 |
... | |||
92 | WHITE - 24th point, 1st component | 0.0 | 1.0 |
93 | WHITE - 24th point, 2nd component | 0.0 | 1.0 |
94 | WHITE - 24th point, 3rd component | 0.0 | 1.0 |
95 | WHITE - 24th point, 4th component | 0.0 | 6.0 |
96 | WHITE - BAR checkers | 0.0 | 7.5 |
97 | WHITE - OFF bar checkers | 0.0 | 1.0 |
98 | BLACK - 1st point, 1st component | 0.0 | 1.0 |
99 | BLACK - 1st point, 2nd component | 0.0 | 1.0 |
100 | BLACK - 1st point, 3rd component | 0.0 | 1.0 |
101 | BLACK - 1st point, 4th component | 0.0 | 6.0 |
... | |||
190 | BLACK - 24th point, 1st component | 0.0 | 1.0 |
191 | BLACK - 24th point, 2nd component | 0.0 | 1.0 |
192 | BLACK - 24th point, 3rd component | 0.0 | 1.0 |
193 | BLACK - 24th point, 4th component | 0.0 | 6.0 |
194 | BLACK - BAR checkers | 0.0 | 7.5 |
195 | BLACK - OFF bar checkers | 0.0 | 1.0 |
196 - 197 | Current player | 0.0 | 1.0 |
Encoding of a single point (it indicates the number of checkers in that point):
Checkers | Encoding |
---|---|
0 | [0.0, 0.0, 0.0, 0.0] |
1 | [1.0, 0.0, 0.0, 0.0] |
2 | [1.0, 1.0, 0.0, 0.0] |
>= 3 | [1.0, 1.0, 1.0, (checkers - 3.0) / 2.0] |
Encoding of BAR checkers:
Checkers | Encoding |
---|---|
0 - 14 | [bar_checkers / 2.0] |
Encoding of OFF bar checkers:
Checkers | Encoding |
---|---|
0 - 14 | [off_checkers / 15.0] |
Encoding of the current player:
Player | Encoding |
---|---|
WHITE | [1.0, 0.0] |
BLACK | [0.0, 1.0] |
The valid actions that an agent can execute depend on the current state and the roll of the dice. So, there is no fixed shape for the action space.
+1 if player WHITE wins, and 0 if player BLACK wins
All the episodes/games start in the same starting position:
| 12 | 13 | 14 | 15 | 16 | 17 | BAR | 18 | 19 | 20 | 21 | 22 | 23 | OFF |
|--------Outer Board----------| |-------P=O Home Board--------| |
| X | | | | O | | | O | | | | | X | |
| X | | | | O | | | O | | | | | X | |
| X | | | | O | | | O | | | | | | |
| X | | | | | | | O | | | | | | |
| X | | | | | | | O | | | | | | |
|-----------------------------| |-----------------------------| |
| O | | | | | | | X | | | | | | |
| O | | | | | | | X | | | | | | |
| O | | | | X | | | X | | | | | | |
| O | | | | X | | | X | | | | | O | |
| O | | | | X | | | X | | | | | O | |
|--------Outer Board----------| |-------P=X Home Board--------| |
| 11 | 10 | 9 | 8 | 7 | 6 | BAR | 5 | 4 | 3 | 2 | 1 | 0 | OFF |
- One of the 2 players win the game
- Episode length is greater than 10000
The method reset()
returns:
- the player that will move first (
0
for theWHITE
player,1
for theBLACK
player) - the first roll of the dice, a tuple with the dice rolled, i.e
(1,3)
for theBLACK
player or(-1, -3)
for theWHITE
player - observation features from the starting position
If render(mode = 'rgb_array')
or render(mode = 'state_pixels')
are selected, this is the output obtained (on multiple steps):
To run a simple example (both agents - WHITE
and BLACK
select an action randomly):
cd examples/
python3 play_random_agent.py
An internal variable, current player
is used to keep track of the player in turn (it represents the color of the player).
To get all the valid actions:
actions = env.get_valid_actions(roll)
The legal actions are represented as a set of tuples.
Each action is a tuple of tuples, in the form ((source, target), (source, target))
Each tuple represents a move in the form (source, target)
The actions of asking a double and accept/reject a double are not available.
Given the following configuration (starting position, BLACK
player in turn, roll = (1, 3)
):
| 12 | 13 | 14 | 15 | 16 | 17 | BAR | 18 | 19 | 20 | 21 | 22 | 23 | OFF |
|--------Outer Board----------| |-------P=O Home Board--------| |
| X | | | | O | | | O | | | | | X | |
| X | | | | O | | | O | | | | | X | |
| X | | | | O | | | O | | | | | | |
| X | | | | | | | O | | | | | | |
| X | | | | | | | O | | | | | | |
|-----------------------------| |-----------------------------| |
| O | | | | | | | X | | | | | | |
| O | | | | | | | X | | | | | | |
| O | | | | X | | | X | | | | | | |
| O | | | | X | | | X | | | | | O | |
| O | | | | X | | | X | | | | | O | |
|--------Outer Board----------| |-------P=X Home Board--------| |
| 11 | 10 | 9 | 8 | 7 | 6 | BAR | 5 | 4 | 3 | 2 | 1 | 0 | OFF |
Current player=1 (O - Black) | Roll=(1, 3)
The legal actions are:
Legal Actions:
((11, 14), (14, 15))
((0, 1), (11, 14))
((18, 19), (18, 21))
((11, 14), (18, 19))
((0, 1), (0, 3))
((0, 1), (16, 19))
((16, 17), (16, 19))
((18, 19), (19, 22))
((0, 1), (18, 21))
((16, 17), (18, 21))
((0, 3), (18, 19))
((16, 19), (18, 19))
((16, 19), (19, 20))
((0, 1), (1, 4))
((16, 17), (17, 20))
((0, 3), (16, 17))
((18, 21), (21, 22))
((0, 3), (3, 4))
((11, 14), (16, 17))
The above description refers to backgammon-v0
.
The state is represented by (96, 96, 3)
feature vector.
It is the only difference w.r.t backgammon-v0
.
An example of the board representation:
- [1]Implementation Details TD-Gammon
- [2]Practical Issues in Temporal Difference Learning
- Rules of Backgammon:
- Other Implementation of TD-Gammon and the game of Backgammon:
- Other Implementation of the Backgammon OpenAI Gym Environment: