The bipedal skills benchmark is a suite of reinforcement learning environments implemented for the MuJoCo physics simulator. It aims to provide a set of tasks that demand a variety of motor skills beyond locomotion, and is intended for evaluating skill discovery and hierarchical learning methods. The majority of tasks exhibit a sparse reward structure.
This benchmark was introduced in Hierarchial Skills for Efficient Exploration.
In order to run the environments, a working MuJoCo setup (version 2.0 or higher) is required. You can follow the respective installation steps of dm_control for that.
Afterwards, install the Python package with pip:
pip install bipedal-skills
To install the package from a working copy, do:
pip install .
All tasks are exposed and registered as Gym environments once the bisk
module
is imported:
import gym
import bisk
env = gym.make('BiskHurdles-v1', robot='Walker')
# Alternatively
env = gym.make('BiskHurdlesWalker-v1')
A detailed description of the tasks can be found in the corresponding publication.
For evaluating agents, we recommend estimating returns on 50 environment instances with distinct seeds. This can be acheived in sequence or by using one of Gym's vector wrappers:
# Sequential evaluation
env = gym.make('BiskHurdlesWalker-v1')
retrns = []
for i in range(50):
obs, _ = env.reset(seed=i)
retrn = 0
while True:
# Retrieve `action` from agent
obs, reward, terminated, truncated, info = env.step(action)
retrn += reward
if terminated or truncated:
# End of episode
retrns.append(reward)
break
print(f'Average return: {sum(retrns)/len(retrns)}')
# Batched evaluation
from gym.vector import SyncVectorEnv
import numpy as np
n = 50
env = SyncVectorEnv([lambda: gym.make('BiskHurdlesWalker-v1')] * n)
retrns = np.array([0.0] * n)
dones = np.array([False] * n)
obs, _ = env.reset(seed=0)
while not dones.all():
# Retrieve `action` from agent
obs, reward, terminated, truncated, info = env.step(action)
retrns += reward * np.logical_not(dones)
dones |= (terminated | truncated)
print(f'Average return: {retrns.mean()}')
The bipedal skills benchmark is MIT licensed, as found in the LICENSE file.
Model definitions have been adapted from:
- Gym (HalfCheetah)
- dm_control (Walker, Humanoid)