Reinforceable

Status

Work in progress.

Applications

Method optimization in chromatography

About

Deep reinforcement learning (RL) implementations using TF's probabilistic library, specifically focusing on agents using recurrent neural networks (RNNs).
Compatible with Keras' Functional API.
Although possibly subject to change in the future, currently implemented and optimized for a non-distributed setup (i.e., for a single CPU and/or GPU setup).

A quick benchmark of the recurrent PPO algorithm in the Atari environments (using a single processor + GPU, and 32 parallel environments), shows that it processes, roughly, 6-12M frames per hour — approximately 1700-3300 frames per second (FPS).

Highlights

Python RL environments (e.g., Gym(nasium) enviroments such as Classic Control and Atari environments) can be run on the TF graph, allowing the complete interaction loop (agent-environment interaction) to run non-eagerly. See Driver.
Hybrid action spaces.
A PPO algorithm that deals with partial observability is implemented (RecurrentPPOAgent). RecurrentPPOAgent makes use of stateful RNNs to pass hidden states between time steps, allowing the agent to make decisions based on past states as well as the current state (Figure B). This contrasts to a typical PPO implementations wherein the agent makes decisions based on the current state only (Figure A).

The use of hidden states is a clever way to pass experiences through time. One limitation of this approach however, is that the hidden states correspond to incomplete trajectories (chunks of trajectories) for each training iteration — a limitation especially emphasized for longer episodes and off-policy RL (using experience replay). For further reading, see R2D2 paper.

Implementations

Agents
- RecurrentPPOAgent
Layers
- DenseNormal - for continuous actions.
- DenseCategorical - for categorical actions.
- DenseBernoulli - for binary actions.
- StatefulRNN - for passing information between states
Distributions
- BoundedNormal - a bounded normal distribution, inheriting from TransformedDistribution.
Environments
- Environment - Abstract environment that wraps gym environment's reset and step in tf.numpy_function and converts its output to a Timestep.
- AsyncEnvironment - allowing multiple independent environments to run in parallel. Inherits from Environment.

For hybrid action spaces, just combine action layers:

from keras import Model
from reinforceable import layers
# ... 
action_1 = layers.DenseNormal((2,), [-1., 1.])(x)   # continuous action, dim=2
action_2 = layers.DenseCategorical((10,))(x)        # discrete action, n=10
policy_network = Model(inputs, (action_1, action_2))
# ...

Examples

See examples/example.ipynb.

Dependencies

Python (3.10)
- tensorflow (2.13.0)
- tensorflow-probability (0.20.1)
- gymnasium[all] (0.26.2)

For atari environments, atari ROMs need to be installed. See here.

Installation

With SSH:

git clone git@github.com:akensert/reinforceable.git
pip install -e .

With HTTPS:

git clone https://github.com/akensert/reinforceable.git
pip install -e .

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
applications/chromatography		applications/chromatography
examples		examples
media		media
reinforceable		reinforceable
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforceable

Status

Applications

About

Highlights

Implementations

Examples

Dependencies

Installation

About

Releases

Packages

Languages

License

akensert/reinforceable

Folders and files

Latest commit

History

Repository files navigation

Reinforceable

Status

Applications

About

Highlights

Implementations

Examples

Dependencies

Installation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages