Skip to content

Deep reinforcement learning implementations with TensorFlow and TensorFlow probability.

License

Notifications You must be signed in to change notification settings

akensert/reinforceable

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforceable

Status

Work in progress.

Applications

About

  • Deep reinforcement learning (RL) implementations using TF's probabilistic library, specifically focusing on agents using recurrent neural networks (RNNs).

  • Compatible with Keras' Functional API.

  • Although possibly subject to change in the future, currently implemented and optimized for a non-distributed setup (i.e., for a single CPU and/or GPU setup).

A quick benchmark of the recurrent PPO algorithm in the Atari environments (using a single processor + GPU, and 32 parallel environments), shows that it processes, roughly, 6-12M frames per hour — approximately 1700-3300 frames per second (FPS).

Highlights

  • Python RL environments (e.g., Gym(nasium) enviroments such as Classic Control and Atari environments) can be run on the TF graph, allowing the complete interaction loop (agent-environment interaction) to run non-eagerly. See Driver.

  • Hybrid action spaces.

  • A PPO algorithm that deals with partial observability is implemented (RecurrentPPOAgent). RecurrentPPOAgent makes use of stateful RNNs to pass hidden states between time steps, allowing the agent to make decisions based on past states as well as the current state (Figure B). This contrasts to a typical PPO implementations wherein the agent makes decisions based on the current state only (Figure A).

PPO

The use of hidden states is a clever way to pass experiences through time. One limitation of this approach however, is that the hidden states correspond to incomplete trajectories (chunks of trajectories) for each training iteration — a limitation especially emphasized for longer episodes and off-policy RL (using experience replay). For further reading, see R2D2 paper.

Implementations

For hybrid action spaces, just combine action layers:

from keras import Model
from reinforceable import layers
# ... 
action_1 = layers.DenseNormal((2,), [-1., 1.])(x)   # continuous action, dim=2
action_2 = layers.DenseCategorical((10,))(x)        # discrete action, n=10
policy_network = Model(inputs, (action_1, action_2))
# ...

Examples

See examples/example.ipynb.

Dependencies

  • Python (3.10)
    • tensorflow (2.13.0)
    • tensorflow-probability (0.20.1)
    • gymnasium[all] (0.26.2)

For atari environments, atari ROMs need to be installed. See here.

Installation

With SSH:

git clone git@github.com:akensert/reinforceable.git
pip install -e .

With HTTPS:

git clone https://github.com/akensert/reinforceable.git
pip install -e .

About

Deep reinforcement learning implementations with TensorFlow and TensorFlow probability.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published