Gym Retro is a wrapper for video game emulator cores using the Libretro API to turn them into Gym environments. It includes support for multiple classic game consoles and a dataset of different games. It runs on Linux, macOS and Windows with Python 3.5 and 3.6 support.
Each game has files listing memory locations for in-game variables, reward functions based on those variables, episode end conditions, savestates at the beginning of levels and a file containing hashes of ROMs that work with these files. Please note that ROMs are not included and you must obtain them yourself.
Currently supported systems:
- Atari 2600 (via Stella)
- Sega Genesis/Mega Drive (via Genesis Plus GX)
See LICENSES.md for information on the licenses of the individual cores.
Gym Retro requires Python 3.5 or 3.6. Please make sure to install the appropriate distribution for your OS beforehand. Please note that due to compatibility issues with some of the cores 32-bit operating systems are not supported.
Building Gym Retro requires at least either gcc 5 or clang 3.4.
If you are on macOS, you need 10.11 or newer. Also, since LuaJIT does not work properly on macOS you must first install Lua 5.1 from homebrew:
brew install pkg-config lua@5.1
pip3 install gym-retro
To build Gym Retro you must first install CMake.
You can do this either through your package manager, download from the official site or pip3 install cmake
.
If you're using the official installer on Windows, make sure to tell CMake to add itself to the system PATH.
If you are not on Windows, please skip to the next section. Otherwise, you will also need to download and install Git and MSYS2 x86_64. When you install git, choose to use Git from the Windows Command Prompt.
After you have installed msys2 open an MSYS2 MinGW 64-bit prompt (under Start > MSYS2 64bit) and run this command:
pacman -Sy make mingw-w64-x86_64-gcc
Once that's done, close the prompt and open a Git CMD prompt (under Start > Git) and run these commands. If you installed MSYS2 into an alternate directory please use that instead of C:\msys64 in the comamnd.
path %PATH%;C:\msys64\mingw64\bin;C:\msys64\usr\bin
set MSYSTEM=MINGW64
Then in the same prompt, without closing it first, continue with the steps in the next section. If you close the prompt you will need to rerun the last commands before you can rebuild.
git clone --recursive https://github.com/openai/retro.git gym-retro
cd gym-retro
pip3 install -e .
When doing a git pull
sometimes submodules will be updated. Usually this should be handled automatically, but in case of errors this can be quickly fixed by running the following steps before rebuilding:
git submodule deinit -f --all
rm -rf .git/modules
git submodule update --init
import retro
env = retro.make(game='SonicTheHedgehog-Genesis', state='GreenHillZone.Act1')
import retro
env = retro.make(game='SonicTheHedgehog-Genesis', state='GreenHillZone.Act1', record='.')
env.reset()
while True:
_obs, _rew, done, _info = env.step(env.action_space.sample())
if done:
break
import retro
movie = retro.Movie('SonicTheHedgehog-Genesis-GreenHillZone.Act1-0000.bk2')
movie.step()
env = retro.make(game=movie.get_game(), state=retro.STATE_NONE, use_restricted_actions=retro.ACTIONS_ALL)
env.initial_state = movie.get_state()
env.reset()
while movie.step():
keys = []
for i in range(env.NUM_BUTTONS):
keys.append(movie.get_key(i))
_obs, _rew, _done, _info = env.step(keys)
python scripts/playback_movie.py SonicTheHedgehog-Genesis-GreenHillZone.Act1-0000.bk2
What environments are there?
import retro
retro.list_games()
What initial states are there?
import retro
for game in retro.list_games():
print(game, retro.list_states(game))
In the examples
directory there are example scripts.
random_agent.py
, loads up a given game and state file and picks random actions every step. It will print the current reward and will exit when the scenario is done. Note that it will throw an exception if no reward or scenario data is defined for that game. This script is useful to see if a scenario is properly set up and that the reward function isn't too generous.
There are a handful of distinct file formats used.
ROM files contain the game itself. Each system has a unique file extension to denote which system a given ROM runs on:
.md
: Sega Genesis (also known as Mega Drive).a26
: Atari 2600
Sometime ROMs from these systems use different extensions, e.g. .gen
for Genesis, .bin
for Atari, etc. Please rename the ROMs to use the aforementioned extensions in these cases.
You can import your ROMs using retro.import
.
python -m retro.import /path/to/your/ROMs/directory/
The following non-commerical ROMs are included with Gym Retro for testing purposes:
- Dekadrive by Dekadence
- Automaton by Derek Ledbetter
- Airstriker by Electrokinesis
Emulation allows the entire state of a video game system to be stored to disk and restored. These files are specific to the emulator, but always end with .state
. These are identical to the versions used in the standalone versions of the emulators but gzipped.
Information about the inner workings of games are stored alongside the ROM in a file named data.json
. This JSON file documents "ground truth" information about a game, including the locations and formats of variables in memory. These manifests are separated into sections, although only one section currently is defined:
The info
section of the manifest lists game variables' memory addresses. Each entry in the info
section consists of a key naming the memory address and the following values:
address
: The address into a RAM array of the first byte of the variable.type
: A type descriptor for this variable. See the above addendum for the format of this value.
The following manifest shows an example of a game that has one variable, score
, located at byte 128 that is 4 bytes wide in unsigned big endian format:
{
"info": {
"score": {
"address": 128,
"type": ">u4"
}
}
}
The types consist of three parts, in order:
- Endianness
- Format
- Bytes
Endianness refers to the order of the bytes in memory.
For example, take the hex string 0x01020304
, which can be stored many ways:
- Big endian:
0x01 0x02 0x03 0x04
- Little endian:
0x04 0x03 0x02 0x01
- Middle endian (big outside/little inside):
0x02 0x01 0x04 0x03
- Middle endian (little outside/big inside):
0x03 0x04 0x01 0x02
The following sigils correspond to the endiannesses:
<
: Little>
: Big><
: Middle (big/little)<>
: Middle (little/big)=
: Native (little on most computers)>=
: Middle (big/native)<=
: Middle (little/native)|
: Don't care (only useful for single-byte values)
NB: Middle endian is very rare, but some systems store 16-bit values in
native endian and 32-bit values as two 16-bit values in big endian order.
One such example is the emulator Genesis Plus GX. Thus, on a big endian
system the format appears to be =u4
(aka >u4
) when it appears as >=u4
on little endian systems. As such some data may require manual grooming.
Format refers to how in memory a value is stored.
For example, take the hex byte 0x81
. It could mean three things in decimal:
- Unsigned: 129
- Signed: -127
- Binary-coded decimal: 81
- Low-nybble Binary-coded decimal: 1
NB: The nybbles
0xA
-0xF
cannot occur in binary-coded decimal.
The following characters correspond to formats:
i
: Signedu
: Unsignedd
: Binary-coded Decimaln
: Low-nybble Binary-coded Decimal
Finally, the last piece refers to how many bytes a value occupies in memory. Ideally, this should be a power of two, e.g. 1, 2, 4, 8, etc., however non-power of two values are used by some games (e.g. the score in Super Mario Bros. is 6 bytes long), so non-power of two variables are supported.
NB: Native endian and middle endian don't work with non-power of two sizes or sizes less than 4 bytes. Currently only 4-byte middle endian is properly supported.
Some examples follow:
<u2
: Little endian two-byte unsigned value (i.e.0x0102
->0x02 0x01
)<>u4
: Middle endian (little/big) four-byte unsigned value (i.e.0x01020304
->0x03 0x04 0x01 0x02
)>d2
: Big endian two-byte binary-coded decimal value (i.e.1234
->0x12 0x34
)|u1
: Single unsigned byte<u3
: Non-power of two bytes (i.e.0x010203
->0x03 0x02 0x1
)- =n2: Native endian two-byte low-nybble binary-coded decimal value
(i.e.
12
->0x01 0x02
on Intel and most ARM CPUs,0x02 0x01
on PowerPC CPUs)
Some non-examples:
|i2
: Valid but not recommended: Two signed bytes, order undefined<u1
: Valid but not recommended: One byte has no order?u4
: Invalid: undefined endianness>q2
: Invalid: undefined format=i0
: Invalid: zero bytes><u3
: Invalid: Non-power of two middle endian bytes<=u2
: Invalid: Middle endian does not make sense for two byte values
Information pertaining to reward functions and done conditions can either be specified by manually overriding functions in retro.RetroEnv
or can be done by writing a scenario file. Scenario files contain information that is used to compute the reward function and done condition from variables defined in the information manifest. Each variable specified in the scenario file is multiplied by a reward
value if positive and a penalty
value if negative and then summed up to create the reward for that step. Similarly, states of these variables can be checked to see if the game is over. By default the scenario file will be loaded from scenario.json
, but alternative scenario files can be specified in the retro.RetroEnv
constructor.
Scenario files are again JSON and specified with the following sections:
The reward
section used to calculate the reward function, and it split into the following subsections:
The variables
subsection is used for defining how to calculate the reward function from the current state of memory. For each variable in the variables
section, a value is calculated, multiplied by a coefficient, then added to the reward function for this step. How a value is extracted is specified by the op
/measurement
/reference
values (see the addendum below on operations for the meanings of these). The default measurement
is delta
. There is no default op
, and by default the value is passed through raw.
reward
: A coefficient multiplied by the value when the value is positive.penalty
: A coefficient multiplied by the value when the value is negative.
NB: A negative penalty
would imply addition to the reward function instead of subtraction as the value to be multiplied by the coefficient is negative.
The time
subsection is used for creating rewards based off of how many steps are taken. Two values can be specified:
reward
: A value to be added to the reward function every step.penalty
: A value to be subtracted from the reward function every step.
The done
section is used to calculate if the end of a game has been reached. At the top level the following property is available:
condition
: Specifies how thedone
conditions should be combinedany
: Any of the conditions in thedone
section is fulfilled. This is the default.all
: All of the conditions in thedone
section are fulfilled.
Currently it has one subsection:
The variables
subsection specifies how to calculate the done condition from the current state of memory. Each variable in the variables
subsection is extracted per the op
/measurement
/reference
values (see the addendum below on operations for the meanings of these). The default measurement
is absolute
. There is no default op
, and by default the value is ignored.
Games can store information in memory in many various ways, and as such the specific information needed can vary in form too. The basic premise is that once a raw value is extracted from memory an operation may be defined to transform it to a useful form. Furthermore, we may want raw values in a given step or the deltas between two steps. Thus three properties are defined:
measurement
: The method used for extracting the raw value. May beabsolute
for the current value anddelta
for the difference between the current and previous value. The default varies based on context.op
: The specific operation to apply to this value. Valid operations are defined below.reference
: The reference value for an operation, if needed.
The following operations are defined:
nonzero
: Returns 0 if the value is 0, 1 otherwise.zero
: Returns 1 if the value is 0, 0 otherwise.positive
: Returns 1 if the value is positive, 0 otherwise.negative
: Returns 1 if the value is negative, 0 otherwise.sign
: Returns 1 if the value is positive, -1 if the value is negative, 0 otherwise.equal
: Returns 1 if the value is equal to thereference
value, 0 otherwise.not-equal
: Returns 1 if the value is not equal to thereference
value, 0 otherwise.less-than
: Returns 1 if the value is less than thereference
value, 0 otherwise.greater-than
: Returns 1 if the value is greater than thereference
value, 0 otherwise.less-or-equal
: Returns 1 if the value is less than or equal to thereference
value, 0 otherwise.greater-or-equal
: Returns 1 if the value is greater than or equal to thereference
value, 0 otherwise.