Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support other gymnasium spaces in Direct workflow #1117

Merged
merged 26 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
338e17d
Add new space attributes and deprecation notes for the old ones
Toni-SM Sep 30, 2024
48b59ed
Show deprecation message and overwrite configuration
Toni-SM Sep 30, 2024
b7da9f8
Add an utility to generate Gymnasium spaces given a space specification
Toni-SM Sep 30, 2024
4f674d9
Set up spaces using the utility function
Toni-SM Sep 30, 2024
aaf0c36
Add sentile data types to avoid circular import error
Toni-SM Sep 30, 2024
9b9a7b9
Comment actions initialization
Toni-SM Sep 30, 2024
ca305d0
Update Cartpole-Camera-Direct tasks to use new space definitions
Toni-SM Sep 30, 2024
ccd5269
Improve spaces' docstrings
Toni-SM Sep 30, 2024
deffdc0
Remove num_actions, num_observations and num_states properties
Toni-SM Sep 30, 2024
1f73596
Add new space attributes and deprecation notes for the old ones in mu…
Toni-SM Sep 30, 2024
9e7b33d
Show deprecation message and overwrite configuration in multi-agent env
Toni-SM Sep 30, 2024
151d0c5
Defer utility import to avoid circular import error
Toni-SM Sep 30, 2024
42ebe5c
Remove num_actions, num_observations and num_states properties in mul…
Toni-SM Oct 1, 2024
eb098fd
Fix multi-agent state space computation implementation
Toni-SM Oct 1, 2024
9ae8dd6
Merge branch 'main' into feature/support_other_gym_spaces
Toni-SM Oct 1, 2024
9c45b5c
Sample a tensorized space
Toni-SM Oct 1, 2024
1a0d1aa
Merge branch 'main' into feature/support_other_gym_spaces
Toni-SM Oct 2, 2024
9198027
Replace deprecated properties in DirectRLEnvCfg and DirectMARLEnvCfg …
Toni-SM Oct 2, 2024
08392b0
Update test files to support different spaces
Toni-SM Oct 2, 2024
e34e860
Update extensions version and changelog
Toni-SM Oct 2, 2024
001cb71
Update deprecated properties in docs
Toni-SM Oct 2, 2024
70ca75b
Merge branch 'main' into feature/support_other_gym_spaces
Toni-SM Oct 7, 2024
6270f9e
Merge branch 'main' into feature/support_other_gym_spaces
Toni-SM Oct 16, 2024
2a3dff6
Add test for spaces utils
Toni-SM Oct 16, 2024
0649c89
Replace carb.log_warn by omni.log.warn
Toni-SM Oct 16, 2024
d405418
Move spaces and MARL utils to an internal folder
Toni-SM Oct 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/features/hydra.rst
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ For example, for the configuration of the Cartpole camera depth environment:
:emphasize-lines: 16

If the user were to modify the width of the camera, i.e. ``env.tiled_camera.width=128``, then the parameter
``env.num_observations=10240`` (1*80*128) must be updated and given as input as well.
``env.observation_space=[80,128,1]`` must be updated and given as input as well.

Similarly, the ``__post_init__`` method is not updated with the command line inputs. In the ``LocomotionVelocityRoughEnvCfg``, for example,
the post init update is as follows:
Expand Down
18 changes: 9 additions & 9 deletions docs/source/migration/migrating_from_isaacgymenvs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@ Below is an example skeleton of a task config class:
# env
decimation = 2
episode_length_s = 5.0
num_actions = 1
num_observations = 4
num_states = 0
action_space = 1
observation_space = 4
state_space = 0
# task-specific parameters
...

Expand Down Expand Up @@ -135,9 +135,9 @@ The following parameters must be set for each environment config:

decimation = 2
episode_length_s = 5.0
num_actions = 1
num_observations = 4
num_states = 0
action_space = 1
observation_space = 4
state_space = 0

Note that the maximum episode length parameter (now ``episode_length_s``) is in seconds instead of steps as it was
in IsaacGymEnvs. To convert between step count to seconds, use the equation:
Expand Down Expand Up @@ -569,9 +569,9 @@ Task Config
| | decimation = 2 |
| asset: | episode_length_s = 5.0 |
| assetRoot: "../../assets" | action_scale = 100.0 # [N] |
| assetFileName: "urdf/cartpole.urdf" | num_actions = 1 |
| | num_observations = 4 |
| enableCameraSensors: False | num_states = 0 |
| assetFileName: "urdf/cartpole.urdf" | action_space = 1 |
| | observation_space = 4 |
| enableCameraSensors: False | state_space = 0 |
| | # reset |
| sim: | max_cart_pos = 3.0 |
| dt: 0.0166 # 1/60 s | initial_pole_angle_range = [-0.25, 0.25] |
Expand Down
18 changes: 9 additions & 9 deletions docs/source/migration/migrating_from_omniisaacgymenvs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,9 @@ Below is an example skeleton of a task config class:
# env
decimation = 2
episode_length_s = 5.0
num_actions = 1
num_observations = 4
num_states = 0
action_space = 1
observation_space = 4
state_space = 0
# task-specific parameters
...

Expand Down Expand Up @@ -158,9 +158,9 @@ The following parameters must be set for each environment config:

decimation = 2
episode_length_s = 5.0
num_actions = 1
num_observations = 4
num_states = 0
action_space = 1
observation_space = 4
state_space = 0


RL Config Setup
Expand Down Expand Up @@ -501,9 +501,9 @@ Task config in Isaac Lab can be split into the main task configuration class and
| clipObservations: 5.0 | decimation = 2 |
| clipActions: 1.0 | episode_length_s = 5.0 |
| controlFrequencyInv: 2 # 60 Hz | action_scale = 100.0 # [N] |
| | num_actions = 1 |
| sim: | num_observations = 4 |
| | num_states = 0 |
| | action_space = 1 |
| sim: | observation_space = 4 |
| | state_space = 0 |
| dt: 0.0083 # 1/120 s | # reset |
| use_gpu_pipeline: ${eq:${...pipeline},"gpu"} | max_cart_pos = 3.0 |
| gravity: [0.0, 0.0, -9.81] | initial_pole_angle_range = [-0.25, 0.25] |
Expand Down
4 changes: 2 additions & 2 deletions docs/source/refs/snippets/tutorial_modify_direct_rl_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@
# [end-h1_env-import]

# [start-h1_env-spaces]
num_actions = 19
num_observations = 69
action_space = 19
observation_space = 69
# [end-h1_env-spaces]

# [start-h1_env-robot]
Expand Down
6 changes: 3 additions & 3 deletions docs/source/tutorials/03_envs/create_direct_rl_env.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,9 +48,9 @@ config should define the number of actions and observations for the environment.
@configclass
class CartpoleEnvCfg(DirectRLEnvCfg):
...
num_actions = 1
num_observations = 4
num_states = 0
action_space = 1
observation_space = 4
state_space = 0

The config class can also be used to define task-specific attributes, such as scaling for reward terms
and thresholds for reset conditions.
Expand Down
2 changes: 1 addition & 1 deletion source/extensions/omni.isaac.lab/config/extension.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[package]

# Note: Semantic Versioning is used: https://semver.org/
version = "0.24.19"
version = "0.24.20"

# Description
title = "Isaac Lab framework for Robot Learning"
Expand Down
19 changes: 19 additions & 0 deletions source/extensions/omni.isaac.lab/docs/CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,25 @@
Changelog
---------

0.24.20 (2024-10-07)
~~~~~~~~~~~~~~~~~~~~

Added
^^^^^

* Added support for different Gymnasium spaces (``Box``, ``Discrete``, ``MultiDiscrete``, ``Tuple`` and ``Dict``)
to define observation, action and state spaces in the direct workflow.
* Added :meth:`sample_space` to environment utils to sample supported spaces where data containers are torch tensors.

Changed
^^^^^^^

* Mark the :attr:`num_observations`, :attr:`num_actions` and :attr:`num_states` in :class:`DirectRLEnvCfg` as deprecated
in favor of :attr:`observation_space`, :attr:`action_space` and :attr:`state_space` respectively.
* Mark the :attr:`num_observations`, :attr:`num_actions` and :attr:`num_states` in :class:`DirectMARLEnvCfg` as deprecated
in favor of :attr:`observation_spaces`, :attr:`action_spaces` and :attr:`state_space` respectively.


0.24.19 (2024-10-05)
~~~~~~~~~~~~~~~~~~~~

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

from __future__ import annotations

import gymnasium as gym
import torch
from typing import Dict, Literal, TypeVar

Expand Down Expand Up @@ -62,6 +63,9 @@ class ViewerCfg:
# Types.
##

SpaceType = TypeVar("SpaceType", gym.spaces.Space, int, set, tuple, list, dict)
"""A sentinel object to indicate a valid space type to specify states, observations and actions."""

VecEnvObs = Dict[str, torch.Tensor | Dict[str, torch.Tensor]]
"""Observation returned by the environment.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import weakref
from abc import abstractmethod
from collections.abc import Sequence
from dataclasses import MISSING
from typing import Any, ClassVar

import carb
Expand Down Expand Up @@ -164,10 +165,6 @@ def __init__(self, cfg: DirectMARLEnvCfg, render_mode: str | None = None, **kwar
# -- init buffers
self.episode_length_buf = torch.zeros(self.num_envs, device=self.device, dtype=torch.long)
self.reset_buf = torch.zeros(self.num_envs, dtype=torch.bool, device=self.sim.device)
self.actions = {
agent: torch.zeros(self.num_envs, self.cfg.num_actions[agent], device=self.sim.device)
for agent in self.cfg.possible_agents
}

# setup the observation, state and action spaces
self._configure_env_spaces()
Expand Down Expand Up @@ -406,16 +403,19 @@ def state(self) -> StateType | None:
"""Returns the state for the environment.

The state-space is used for centralized training or asymmetric actor-critic architectures. It is configured
using the :attr:`DirectMARLEnvCfg.num_states` parameter.
using the :attr:`DirectMARLEnvCfg.state_space` parameter.

Returns:
The states for the environment, or None if :attr:`DirectMARLEnvCfg.num_states` parameter is zero.
The states for the environment, or None if :attr:`DirectMARLEnvCfg.state_space` parameter is zero.
"""
if not self.cfg.num_states:
if not self.cfg.state_space:
return None
# concatenate and return the observations as state
if self.cfg.num_states < 0:
self.state_buf = torch.cat([self.obs_dict[agent] for agent in self.cfg.possible_agents], dim=-1)
# FIXME: This implementation assumes the spaces are fundamental ones. Fix it to support composite spaces
if isinstance(self.cfg.state_space, int) and self.cfg.state_space < 0:
self.state_buf = torch.cat(
[self.obs_dict[agent].reshape(self.num_envs, -1) for agent in self.cfg.possible_agents], dim=-1
)
# compute and return custom environment state
else:
self.state_buf = self._get_states()
Expand Down Expand Up @@ -564,29 +564,52 @@ def set_debug_vis(self, debug_vis: bool) -> bool:
"""

def _configure_env_spaces(self):
# defer import to avoid circular import error
from omni.isaac.lab.envs.utils import sample_space, spec_to_gym_space
kellyguo11 marked this conversation as resolved.
Show resolved Hide resolved

"""Configure the spaces for the environment."""
self.agents = self.cfg.possible_agents
self.possible_agents = self.cfg.possible_agents

# show deprecation message and overwrite configuration
if self.cfg.num_actions is not None:
carb.log_warn("DirectMARLEnvCfg.num_actions is deprecated. Use DirectMARLEnvCfg.action_spaces instead.")
if isinstance(self.cfg.action_spaces, type(MISSING)):
self.cfg.action_spaces = self.cfg.num_actions
if self.cfg.num_observations is not None:
carb.log_warn(
"DirectMARLEnvCfg.num_observations is deprecated. Use DirectMARLEnvCfg.observation_spaces instead."
)
if isinstance(self.cfg.observation_spaces, type(MISSING)):
self.cfg.observation_spaces = self.cfg.num_observations
if self.cfg.num_states is not None:
carb.log_warn("DirectMARLEnvCfg.num_states is deprecated. Use DirectMARLEnvCfg.state_space instead.")
if isinstance(self.cfg.state_space, type(MISSING)):
self.cfg.state_space = self.cfg.num_states

# set up observation and action spaces
self.observation_spaces = {
agent: gym.spaces.Box(low=-np.inf, high=np.inf, shape=(self.cfg.num_observations[agent],))
for agent in self.cfg.possible_agents
agent: spec_to_gym_space(self.cfg.observation_spaces[agent]) for agent in self.cfg.possible_agents
}
self.action_spaces = {
agent: gym.spaces.Box(low=-np.inf, high=np.inf, shape=(self.cfg.num_actions[agent],))
for agent in self.cfg.possible_agents
agent: spec_to_gym_space(self.cfg.action_spaces[agent]) for agent in self.cfg.possible_agents
}

# set up state space
if not self.cfg.num_states:
if not self.cfg.state_space:
self.state_space = None
if self.cfg.num_states < 0:
self.state_space = gym.spaces.Box(
low=-np.inf, high=np.inf, shape=(sum(self.cfg.num_observations.values()),)
if isinstance(self.cfg.state_space, int) and self.cfg.state_space < 0:
self.state_space = gym.spaces.flatten_space(
gym.spaces.Tuple([self.observation_spaces[agent] for agent in self.cfg.possible_agents])
)
else:
self.state_space = gym.spaces.Box(low=-np.inf, high=np.inf, shape=(self.cfg.num_states,))
self.state_space = spec_to_gym_space(self.cfg.state_space)

# instantiate actions (needed for tasks for which the observations computation is dependent on the actions)
self.actions = {
agent: sample_space(self.action_spaces[agent], self.sim.device, batch_size=self.num_envs, fill_value=0)
for agent in self.cfg.possible_agents
}

def _reset_idx(self, env_ids: Sequence[int]):
"""Reset environments based on specified indices.
Expand Down Expand Up @@ -664,8 +687,8 @@ def _get_observations(self) -> dict[AgentID, ObsType]:
def _get_states(self) -> StateType:
"""Compute and return the states for the environment.

This method is only called (and therefore has to be implemented) when the :attr:`DirectMARLEnvCfg.num_states`
parameter is greater than zero.
This method is only called (and therefore has to be implemented) when the :attr:`DirectMARLEnvCfg.state_space`
parameter is not a number less than or equal to zero.

Returns:
The states for the environment.
Expand Down
Loading
Loading