The Bipedal Skills Benchmark

The bipedal skills benchmark is a suite of reinforcement learning environments implemented for the MuJoCo physics simulator. It aims to provide a set of tasks that demand a variety of motor skills beyond locomotion, and is intended for evaluating skill discovery and hierarchical learning methods. The majority of tasks exhibit a sparse reward structure.

This benchmark was introduced in Hierarchial Skills for Efficient Exploration.

Usage

In order to run the environments, a working MuJoCo setup (version 2.0 or higher) is required. You can follow the respective installation steps of dm_control for that.

Afterwards, install the Python package with pip:

pip install bipedal-skills

To install the package from a working copy, do:

pip install .

All tasks are exposed and registered as Gym environments once the bisk module is imported:

import gym
import bisk

env = gym.make('BiskHurdles-v1', robot='Walker')
# Alternatively
env = gym.make('BiskHurdlesWalker-v1')

A detailed description of the tasks can be found in the corresponding publication.

Evaluation Protocol

For evaluating agents, we recommend estimating returns on 50 environment instances with distinct seeds. This can be acheived in sequence or by using one of Gym's vector wrappers:

# Sequential evaluation
env = gym.make('BiskHurdlesWalker-v1')
retrns = []
for i in range(50):
  obs, _ = env.reset(seed=i)
  retrn = 0
  while True:
    # Retrieve `action` from agent
    obs, reward, terminated, truncated, info = env.step(action)
    retrn += reward
    if terminated or truncated:
      # End of episode
      retrns.append(reward)
      break
print(f'Average return: {sum(retrns)/len(retrns)}')

# Batched evaluation
from gym.vector import SyncVectorEnv
import numpy as np
n = 50
env = SyncVectorEnv([lambda: gym.make('BiskHurdlesWalker-v1')] * n)
retrns = np.array([0.0] * n)
dones = np.array([False] * n)
obs, _ = env.reset(seed=0)
while not dones.all():
    # Retrieve `action` from agent
    obs, reward, terminated, truncated, info = env.step(action)
    retrns += reward * np.logical_not(dones)
    dones |= (terminated | truncated)
print(f'Average return: {retrns.mean()}')

License

The bipedal skills benchmark is MIT licensed, as found in the LICENSE file.

Model definitions have been adapted from:

Gym (HalfCheetah)
dm_control (Walker, Humanoid)

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
bisk		bisk
exp		exp
img		img
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
format		format
pytest.ini		pytest.ini
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Bipedal Skills Benchmark

Usage

Evaluation Protocol

License

About

Releases

Packages

Languages

License

facebookresearch/bipedal-skills

Folders and files

Latest commit

History

Repository files navigation

The Bipedal Skills Benchmark

Usage

Evaluation Protocol

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages