Skip to content

Commit

Permalink
Migrate/mujoco binds (#18)
Browse files Browse the repository at this point in the history
Migrating from gym to gymnasium and mujoco_py to mujocos own python wrappers.
Added changelog.
  • Loading branch information
Domattee authored Aug 10, 2023
1 parent 483ee7b commit 3eaedf8
Show file tree
Hide file tree
Showing 31 changed files with 959 additions and 1,147 deletions.
28 changes: 14 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,49 +2,49 @@

<img src="/docs/source/imgs/showroom.png" width="400" align="right">

MIMo is a platform for the research of the cognitive development of infants. It consists of an [OpenAI gym](https://github.com/openai/gym) environment using [MuJoCo](https://mujoco.readthedocs.io) for the physical simulation and multiple modules that can produce simulated sensory input for vision, touch, proprioception and the vestibular system.
MIMo is a platform for the research of the cognitive development of infants. It consists of a [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) environment using [MuJoCo](https://mujoco.readthedocs.io) for the physical simulation and multiple modules that can produce simulated sensory input for vision, touch, proprioception and the vestibular system.

[//]: # (See "MIMo: A Multi-Modal Infant Model for Studying Cognitive Development in Humans and AIs".)

[A full API documentation is available on ReadTheDocs.](https://mimo.readthedocs.io)

## Installation

First install mujoco 2.10 and mujoco-py 2.1 following their instructions.
Then clone this repository, install other dependencies with `pip install -r requirements.txt` and finally run `pip install -e .`

> **Note**
> A version using gymnasium 0.28.1 and the official mujoco wrappers instead of mujoco-py is available on the branch migrate/mujoco_binds.
- Clone this repository
- `pip install -r requirements.txt`
- `pip install -e .`

## The MIMo environment

The main class of the codebase is `MIMoEnv`. It is an openAI gym style environment, implementing all the relevant gym interfaces. It is the base class that is subclassed by all the experiment specific environments. It takes a MuJoCo XML and a series of parameter dictionaries for the sensory modalities and builds all the specific attributes, such as the observation space, from these initial inputs.

The MuJoCo XML defines the simulated geometries and their degrees of freedom. We have set ours up in a modular fashion to avoid duplication as much as possible. MIMos kinematic tree is defined in `mimo_model.xml` while the associated actuators and sensors are located in `mimo_meta.xml`. A scene then includes both of these files. This allows multiple scenes to share the same base model with different actuation models and ancillary objects.
The MuJoCo XMLs defines the simulated geometries and their degrees of freedom. We have set ours up in a modular fashion to avoid duplication as much as possible. MIMos kinematic tree is defined in `mimo_model.xml` while the associated actuators and sensors are located in `mimo_meta.xml`. A scene then includes both of these files. This allows multiple scenes to share the same base model with different actuation models and ancillary objects.

The action space of the gym environment is generated automatically from the underlying MuJoCo XML. Each actuator whose name starts with 'act:' is included in the action space. Each actuator has a range from -1 to 1, with full torque in opposite directions at -1 and 1 and a linear response in between.

The observation space is a dictionary type space built automatically based on the configuration of the sensor modules. An entry 'observation' is always included and always returns relative joint positions. Enabling more sensor modules adds extra entries. For example, each camera of the vision system will store its image in a separate entry in the observation space, named after the associated camera.

### Observation spaces and `done`

By default this environment follows the behaviour of the old `Robot` environments in gym. This means that the `done` return value from `step` is always False, and the calling method has to figure out when to stop or reset. In addition the observation space includes two entries with the desired and the currently achieved goal (populated by `_sample_goal` and `_get_achieved_goal`).
By default, this environment follows the behaviour of the old `Robot` environments in gym. This means that the 'done' return value from `step` is always `False`, and the calling method has to figure out when to stop or reset. In addition, the observation space includes two entries with the desired and the currently achieved goal (populated by `sample_goal` and `get_achieved_goal`).

This behaviour can be changed with two parameters during initialization of the environment.
1. `goals_in_observation` : If this parameter is False, the goal entries will not be included in the observation space. By default set to True.
2. `done_active` : If this parameter is True, `done` is True if either `_is_success` or `_is_failure` returns True. If set to False, `done` is always False. By default set to False. Note that this behaviour is defined in the `_is_done` function. If you overwrite this function you can ignore this parameter.
1. `goals_in_observation` : If this parameter is `False`, the goal entries will not be included in the observation space. Set to `True` by default.
2. `done_active` : If this parameter is `True`, 'done' is `True` if either `is_success` or `is_failure` returns `True`. If set to `False`, 'done' is always `False`. By default, set to `False`. Note that this behaviour is defined in the `_is_done` function. If you overwrite this function you can ignore this parameter.

## Sensor modules
## Actuation and sensory modules

All of the sensor modules follow the same pattern. They are initialized with a MuJoCo gym environment and a dictionary of parameters and their observations can be collected by calling their `get_<modality>_touch` function. The return of this function is generally a single array containing the flattened/concatenated output. Modules can be disabled or reduced to a minimum by passing an empty dictionary. Each module also has an attribute `sensor_outputs` that stores the unflattened outputs as a dictionary. The parameter structure and workings of each module are described in more detail in the documentation.
We provide two different actuation models with different trade-offs between run time and accuracy. They can be swapped out without any other adjustments to the environment.
All the sensor modules follow the same pattern. They are initialized with a MuJoCo gym environment and a dictionary of parameters and their observations can be collected by calling their `get_<modality>_obs` function. The return of this function is generally a single array containing the flattened/concatenated output. Modules can be disabled or reduced to a minimum by passing an empty dictionary. Each module also has an attribute `sensor_outputs` that stores the unflattened outputs as a dictionary. The parameter structure and workings of each module are described in more detail in the documentation.

## Sample Environments

We provide several sample environments with some simple tasks for demonstration purposes. These come with both an openAI environment in `mimoEnv/envs` as well as simple training scripts using stable-baselines3, in `mimoEnv`. These environments include:

1. `reach` - A stripped down version where MIMo is tasked with reaching for a ball hovering in front of him. By default only the proprioceptive sensors are used. MIMo can only move his right arm and his head is manually fixed to track the ball. The initial position of both the ball and MIMo is slightly randomized.
1. `reach` - A stripped down version where MIMo is tasked with reaching for a ball hovering in front of him. By default, only the proprioceptive sensors are used. MIMo can only move his right arm and his head is manually fixed to track the ball. The initial position of both the ball and MIMo is slightly randomized.
2. `standup` - MIMo is tasked with standing up. At the start he is in a low crouch with his hands gripping the bars of a crib. Proprioception and the vestibular sensors are included by default.
3. `test` - This is a simple dummy environment set up to demonstrate and visualize most of the sensor modalities. MIMo is set to fall from a short height. During this, the visual and haptic outputs are rendered and saved to hard drive.
3. `selfbody` - MIMo is sitting on the ground and rewarded for touching a specific body part with his right arm. The target body part is randomized each episode.
4. `catch` - A ball is dropped from a short height onto MIMo's outstretched arm. Episodes are completed successfully when MIMo holds onto the ball continuously for a full second.

## License

Expand Down
6 changes: 4 additions & 2 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
numpy<=1.22.0
numpy
matplotlib
glfw
imageio
scipy
trimesh
cachetools
CacheToolsUtils
networkx
rtree
mujoco
gymnasium==0.28.1
opencv-python
sphinx==4.2.0
sphinx_rtd_theme==1.0.0
Expand Down
11 changes: 10 additions & 1 deletion docs/source/api/utility.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,16 @@ Assorted functions
:nosignatures:

mimoEnv.utils.determine_geom_masses


Data fields
-----------

.. autodata:: mimoEnv.utils.MUJOCO_JOINT_SIZES
:no-value:

.. autodata:: mimoEnv.utils.MUJOCO_DOF_SIZES
:no-value:

Detail documentation
----------------------

Expand Down
66 changes: 66 additions & 0 deletions docs/source/changelog.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
Changelog
=========

Version 1.1.0 (The Big Migration)
---------------------------------

This version moves MIMo from gym and the mujoco_py wrappers to gymnasium and
MuJoCo's own python wrappers. With this move come several breaking changes:

- Swapping python wrappers means two key changes:

1. The ``env.sim`` attribute is gone. Instead, there are now two
attributes ``env.model`` and ``env.data`` which correspond quite closely
to the ``sim.model`` and ``sim.data`` objects from mujoco_py
2. some functions are no longer available. In particular, ``model.body_name2id``
and similar functions are not available with the MuJoCo wrappers. Instead they
have named access, i.e. ``model.body(id).name``. See the
:doc:`MuJoCo Documentation <mujoco:python>` for more details.

- Gym has ben replaced with Gymnasium throughout. The key changes due to this are
listed below:

1. The step function now returns 5 values, ``(obs, reward, done, trunc, info)``. The
new value 'trunc', indicates if the episode has ended for any reason other than
reaching a terminal state, such as a time limit. As a result, any code such as
``if done: env.reset()`` needs to be changed to ``ìf done or trunc: env.reset()``.
2. The reset function has also gained an extra return value. This needs to be caught
to avoid potentially obscure unpacking errors. ``obs = env.reset()`` ->
``obs, _ = env.reset()`` will do.
3. Rendering has changed significantly, depending on use case. If you wish to
use the interactive window you can pass 'render_mode="human"' to the constructor
and then call ``env.render()``.
If you want to render images from multiple different cameras, or use both an
interactive window and also render arrays (for example to save as a video), you
should use gymnasiums MuJoCoRenderer with
``img = env.mujoco_renderer.render(render_mode="rgb_array", ...)``,
similar to the old interface.

In addition to these changes there were also some adjustments to the actuation models
to allow multiple to be attached to the same environment without conflicts.

Version 1.0.0 (Initial release)
-------------------------------

First full release.

In addition to many, many small updates, this version brings two major changes that
"complete" the initial release:

1. A new, five-fingered version of MIMo. This allows for experiments which
require dexterous manipulation.
2. A system to handle actuation models, with three models/implementations to start
with. The first is the Spring-Damper model from the conference version. The
second is a new approach in which each actuator is modeled as two opposing,
independently controllable "muscles". Finally we have a positional model which
allows locking MIMo into or moving him through defined poses.

The actuation systems come in a new package 'mimoActuation', which defines the
interfaces and functions that actuation models must provide, similar to the
sensory modules.
Performance was also somewhat improved.

Version 0.1.0 (Paper release)
-----------------------------

Conference paper version.
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
'sphinx.ext.napoleon',
'sphinx.ext.autodoc.preserve_defaults', # Do not resolve the names of default arguments
]
autodoc_mock_imports = ["mujoco_py", "gym"]
autodoc_mock_imports = [] # "mujoco"
autodoc_member_order = 'bysource'
# autodoc_class_signature = 'separated' # Moves the class signatures into separate init function docu.
numpydoc_class_members_toctree = False
Expand All @@ -46,7 +46,7 @@
'sphinx': ('https://www.sphinx-doc.org/en/master/', None),
'mimo': ('https://mimo.readthedocs.io/en/latest/', None),
'mujoco': ('https://mujoco.readthedocs.io/en/latest', None),
'gym': ('https://www.gymlibrary.dev', None),
'gym': ('https://gymnasium.farama.org', None),
}
intersphinx_disabled_domains = ['std']

Expand Down
2 changes: 0 additions & 2 deletions docs/source/guides/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,6 @@ Installation

Installation is very straight forward::

Install MuJoCo 2.10
Install mujoco-py 2.1
Clone this repository
pip install -r requirements.txt
pip install -e .
Expand Down
49 changes: 22 additions & 27 deletions docs/source/guides/new_experiment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -201,48 +201,49 @@ parent class.
class MIMoStandupEnv(MIMoEnv):
def __init__(self,
model_path=STANDUP_XML,
initial_qpos={},
n_substeps=2,
proprio_params=DEFAULT_PROPRIOCEPTION_PARAMS,
touch_params=None,
vision_params=None,
vestibular_params=DEFAULT_VESTIBULAR_PARAMS,
done_active=False,
**kwargs,
):

super().__init__(model_path=model_path,
initial_qpos=initial_qpos,
n_substeps=n_substeps,
proprio_params=proprio_params,
touch_params=touch_params,
vision_params=vision_params,
vestibular_params=vestibular_params,
goals_in_observation=False,
done_active=False)
done_active=done_active,
**kwargs,)

Next we need to override all the abstract functions. We will use the head height as our goal
variable::

def _get_achieved_goal(self):
return self.sim.data.get_body_xpos('head')[2]
def get_achieved_goal(self):
return self.data.body('head').xpos[2]

Since we want fixed length episodes and have disabled `done_active` we don't need any of the
other goal related functions and just implement them as dummy functions::

def _is_success(self, achieved_goal, desired_goal):
def is_success(self, achieved_goal, desired_goal):
return False

def _is_failure(self, achieved_goal, desired_goal):
def is_failure(self, achieved_goal, desired_goal):
return False

def _sample_goal(self):
def is_truncated(self):
return False

def sample_goal(self):
return 0.0

The only things still missing are the reward and the reset functions. The reward will consist
of a positive component based on the head height, determined in ``_get_achieved_goal``, and
of a positive component based on the head height, determined in ``get_achieved_goal``, and
a penalty for large actions::

def compute_reward(self, achieved_goal, desired_goal, info):
quad_ctrl_cost = 0.01 * np.square(self.sim.data.ctrl).sum()
quad_ctrl_cost = 0.01 * np.square(self.data.ctrl).sum()
reward = achieved_goal - 0.2 - quad_ctrl_cost
return reward

Expand All @@ -253,30 +254,24 @@ those from the randomization. The crib does not have joints and other joints in
scene belong to MIMo. We then set the state with our new randomized positions and let the
simulation settle for a few timesteps::

def _reset_sim(self):
self.sim.set_state(self.initial_state)
default_state = self.sim.get_state()
qpos = self.sim.data.qpos
def reset_model(self):
self.set_state(self.init_qpos, self.init_qvel)
qpos = self.init_crouch_position

# set initial positions stochastically
qpos[7:] = qpos[7:] + self.np_random.uniform(low=-0.1, high=0.1, size=len(qpos[6:]))
qpos[7:] = qpos[7:] + self.np_random.uniform(low=-0.01, high=0.01, size=len(qpos[7:]))

# set initial velocities to zero
qvel = np.zeros(self.sim.data.qvel.shape)
qvel = np.zeros(self.data.qvel.ravel().shape)

new_state = mujoco_py.MjSimState(
default_state.time, qpos, qvel, default_state.act, default_state.udd_state
)
self.sim.set_state(new_state)
self.sim.forward()
self.set_state(qpos, qvel)

# perform 100 steps with no actions to stabilize initial position
actions = np.zeros(self.action_space.shape)
self._set_action(actions)
for _ in range(100):
self.sim.step()
mujoco.mj_step(self.model, self.data, nstep=100)

return True
return self._get_obs()

Finally we register our new environment with gym by adding these lines to
``mimoEnv/__init__.py``, which also lets us set our fixed episode length::
Expand Down
3 changes: 2 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Welcome to MIMo's documentation!
:alt: MIMo with some toys

MIMo is a library for the research of cognitive development and multi-sensory learning.
We follow the :doc:`API of the OpenAI gym <gym:content/api>` environments, using
We follow the :doc:`API of Gymnasium <gym:index>` environments, using
:doc:`MuJoCo <mujoco:overview>` for the physics simulation.


Expand All @@ -23,6 +23,7 @@ Contents

guides/index.rst
api/index.rst
changelog.rst

MIMo is a simple, childlike robot physically simulated using MuJoCo. He has multiple sensory
modalities including touch, vision, proprioception and a vestibular system.
Expand Down
Loading

0 comments on commit 3eaedf8

Please sign in to comment.