GBRL_SB3 provides implementations for actor-critic algorithms based on Gradient Boosting Trees (GBT) within the stable_baselines3 RL package.
- Integration with GBRL: Leverages the GBRL library, optimized for reinforcement learning, to provide efficient GBT-based implementations capable of handling complex, high-dimensional RL tasks with millions of interactions.
- Implemented Algorithms: GBRL_SB3 includes GBT-based implementations of Proximal Policy Optimization (PPO), Advantage Actor-Critic (A2C), and Advantage Weighted Regression (AWR).
- High Performance in Structured Environments: GBRL based implementations are competitive with NNs across a range of environments. In addition, similarly to supervised learning, GBRL outperforms NNs on categorical tasks
GBRL SB3 supports the following environments:
- Gymansium environments [2]
- Atari-ram [3]
- Football [4]
- MiniGrid [5] including a custom categorical feature wrapper
This repository integrates GBRL with actor-critic algorithms.
GBRL's tree ensemble parameterizes the actor's policy and critic's value function. At each training iteration, the algorithm collects a rollout and computes the gradient of the objective function. This gradient is then used to fit the next tree added to the ensemble. This process repeats with each iteration fitting a new tree, refining the parameterization, and expanding the ensemble. The full process is illustrated in following diagram:
The following results demonstrate the performance of PPO with GBRL compared to neural-networks across various scenarios and environments:
- Docker 19 or newer.
- Access to NVIDIA Docker Catalog. Visit the NGC website and follow the instructions. This will grant you access to the base docker image (from the Dockerfile) and ability to run on NVIDIA GPU using the nvidia runtime flag.
building docker
docker build -f Dockerfile -t <your-image-name:tag> .
Running docker
docker run --runtime=nvidia -it <your-image-name:tag> /bin/bash
GBRL_SB3 is based on the GBRL library, stable_baselines3, and other popular python libraries. To run GBRL_SB3 locally please install the necessary dependencies by running:
pip install -r requirements.txt
The Google Research Football installation is not part of the requirements due to additionaly non-python dependencies and should be installed separetly (see gfootball repository).
For GPU support GBRL looks for CUDA_PATH
or CUDA_HOME
environment variables. Unless found, GBRL will automatically compile only for CPU.
Verify that GPU is visible by running
import gbrl
gbrl.cuda_available()
OPTIONAL
For GBRL tree visualization make sure graphviz is installed before installing GBRL.
A general training script is located at scripts/train.py
.
Configuration (not tuned hyperparameters) yaml is provided at config/defaults.yaml
.
valid CLI arguments are found at config/args.py
.
Example - running from project root directory
python3 scripts/train.py --algo_type=ppo_gbrl --batch_size=512 --clip_range=0.2 --device=cuda --ent_coef=0 --env_name=MiniGrid-Unlock-v0 --env_type=minigrid --gae_lambda=0.95 --gamma=0.99 --grow_policy=oblivious --n_epochs=20 --n_steps=256 --num_envs=16 --policy_lr=0.17 --total_n_steps=1000000 --value_lr=0.01
For tracking with weights and biases use CLI args:
- project=<project_name>
- wandb=true
- group_name=<group_name>
- run_name=<run_name>
Exact training reproduction with GBRL is not possible as GPU training is non-deterministic. This is due to the non-deterministic nature of floating point summation. However, running the training scripts with the reported hyperparameters are expected to produce similar results.
Experiment scripts are located at experiments/
.
Running is done via a bash script per algorithm per environment with the following two arguments: scenario_name
and seed
. For example, the run command for CartPole-v1
, with seed=0
GBRL PPO is:
experiments/gym/ppo_gbrl.sh CartPole-v1 0
- SAC
- DQN
[1] Raffin et al. Stable-baselines3: Reliable reinforcement learning implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
[2] Towers et al. Gymnasium,March 2023. URL https://zenodo.org/record/8127025.
[3] Bellemare et al. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, June 2013. ISSN 1076-9757. doi: 10.1613/jair.3912. URL http://dx.doi.org/10.1613/jair. 3912.
[4] Kurach et al. Google research football: A novel reinforcement learning environment, 2020
[5] Chevalier-Boisvert et al. Minigrid & miniworld: Modular & customizable reinforcement learning environments for goal-oriented tasks. CoRR, abs/2306.13831, 2023
@article{gbrl,
title={Gradient Boosting Reinforcement Learning},
author={Benjamin Fuhrer, Chen Tessler, Gal Dalal},
year={2024},
eprint={2407.08250},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2407.08250},
}
Copyright © 2024, NVIDIA Corporation. All rights reserved.
This work is made available under the NVIDIA Source Code License-NC. Click here. to view a copy of this license.
For license information regarding the stable_baselines3 repository, please refer to [its repository(https://github.com/DLR-RM/stable-baselines3)].