Migrate/mujoco binds (#18)

Migrating from gym to gymnasium and mujoco_py to mujocos own python wrappers. Added changelog.
trieschlab · Aug 10, 2023 · 3eaedf8 · 3eaedf8
1 parent 483ee7b
commit 3eaedf8
Show file tree

Hide file tree

Showing 31 changed files with 959 additions and 1,147 deletions.
diff --git a/README.md b/README.md
@@ -2,49 +2,49 @@
 
 <img src="/docs/source/imgs/showroom.png" width="400" align="right">
 
-MIMo is a platform for the research of the cognitive development of infants. It consists of an [OpenAI gym](https://github.com/openai/gym) environment using [MuJoCo](https://mujoco.readthedocs.io) for the physical simulation and multiple modules that can produce simulated sensory input for vision, touch, proprioception and the vestibular system.
+MIMo is a platform for the research of the cognitive development of infants. It consists of a [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) environment using [MuJoCo](https://mujoco.readthedocs.io) for the physical simulation and multiple modules that can produce simulated sensory input for vision, touch, proprioception and the vestibular system.
 
 [//]: # (See "MIMo: A Multi-Modal Infant Model for Studying Cognitive Development in Humans and AIs".)
 
 [A full API documentation is available on ReadTheDocs.](https://mimo.readthedocs.io)
 
 ## Installation
 
-First install mujoco 2.10 and mujoco-py 2.1 following their instructions.
-Then clone this repository, install other dependencies with `pip install -r requirements.txt` and finally run `pip install -e .`
-
-> **Note**
-> A version using gymnasium 0.28.1 and the official mujoco wrappers instead of mujoco-py is available on the branch migrate/mujoco_binds.
+- Clone this repository 
+- `pip install -r requirements.txt` 
+- `pip install -e .`
 
 ## The MIMo environment
 
 The main class of the codebase is `MIMoEnv`. It is an openAI gym style environment, implementing all the relevant gym interfaces. It is the base class that is subclassed by all the experiment specific environments. It takes a MuJoCo XML and a series of parameter dictionaries for the sensory modalities and builds all the specific attributes, such as the observation space, from these initial inputs.
 
-The MuJoCo XML defines the simulated geometries and their degrees of freedom. We have set ours up in a modular fashion to avoid duplication as much as possible. MIMos kinematic tree is defined in `mimo_model.xml` while the associated actuators and sensors are located in `mimo_meta.xml`. A scene then includes both of these files. This allows multiple scenes to share the same base model with different actuation models and ancillary objects.
+The MuJoCo XMLs defines the simulated geometries and their degrees of freedom. We have set ours up in a modular fashion to avoid duplication as much as possible. MIMos kinematic tree is defined in `mimo_model.xml` while the associated actuators and sensors are located in `mimo_meta.xml`. A scene then includes both of these files. This allows multiple scenes to share the same base model with different actuation models and ancillary objects.
 
 The action space of the gym environment is generated automatically from the underlying MuJoCo XML. Each actuator whose name starts with 'act:' is included in the action space. Each actuator has a range from -1 to 1, with full torque in opposite directions at -1 and 1 and a linear response in between.
 
 The observation space is a dictionary type space built automatically based on the configuration of the sensor modules. An entry 'observation' is always included and always returns relative joint positions. Enabling more sensor modules adds extra entries. For example, each camera of the vision system will store its image in a separate entry in the observation space, named after the associated camera.
 
 ### Observation spaces and `done`
 
-By default this environment follows the behaviour of the old `Robot` environments in gym. This means that the `done` return value from `step` is always False, and the calling method has to figure out when to stop or reset. In addition the observation space includes two entries with the desired and the currently achieved goal (populated by `_sample_goal` and `_get_achieved_goal`).
+By default, this environment follows the behaviour of the old `Robot` environments in gym. This means that the 'done' return value from `step` is always `False`, and the calling method has to figure out when to stop or reset. In addition, the observation space includes two entries with the desired and the currently achieved goal (populated by `sample_goal` and `get_achieved_goal`).
 
 This behaviour can be changed with two parameters during initialization of the environment. 
-  1. `goals_in_observation` : If this parameter is False, the goal entries will not be included in the observation space. By default set to True.
-  2. `done_active` : If this parameter is True, `done` is True if either `_is_success` or `_is_failure` returns True. If set to False, `done` is always False. By default set to False. Note that this behaviour is defined in the `_is_done` function. If you overwrite this function you can ignore this parameter.
+  1. `goals_in_observation` : If this parameter is `False`, the goal entries will not be included in the observation space. Set to `True` by default.
+  2. `done_active` : If this parameter is `True`, 'done' is `True` if either `is_success` or `is_failure` returns `True`. If set to `False`, 'done' is always `False`. By default, set to `False`. Note that this behaviour is defined in the `_is_done` function. If you overwrite this function you can ignore this parameter.
 
-## Sensor modules
+## Actuation and sensory modules
 
-All of the sensor modules follow the same pattern. They are initialized with a MuJoCo gym environment and a dictionary of parameters and their observations can be collected by calling their `get_<modality>_touch` function. The return of this function is generally a single array containing the flattened/concatenated output. Modules can be disabled or reduced to a minimum by passing an empty dictionary. Each module also has an attribute `sensor_outputs` that stores the unflattened outputs as a dictionary. The parameter structure and workings of each module are described in more detail in the documentation.
+We provide two different actuation models with different trade-offs between run time and accuracy. They can be swapped out without any other adjustments to the environment.
+All the sensor modules follow the same pattern. They are initialized with a MuJoCo gym environment and a dictionary of parameters and their observations can be collected by calling their `get_<modality>_obs` function. The return of this function is generally a single array containing the flattened/concatenated output. Modules can be disabled or reduced to a minimum by passing an empty dictionary. Each module also has an attribute `sensor_outputs` that stores the unflattened outputs as a dictionary. The parameter structure and workings of each module are described in more detail in the documentation.
 
 ## Sample Environments
 
 We provide several sample environments with some simple tasks for demonstration purposes. These come with both an openAI environment in `mimoEnv/envs` as well as simple training scripts using stable-baselines3, in `mimoEnv`. These environments include:
 
-  1. `reach` - A stripped down version where MIMo is tasked with reaching for a ball hovering in front of him. By default only the proprioceptive sensors are used. MIMo can only move his right arm and his head is manually fixed to track the ball. The initial position of both the ball and MIMo is slightly randomized.
+  1. `reach` - A stripped down version where MIMo is tasked with reaching for a ball hovering in front of him. By default, only the proprioceptive sensors are used. MIMo can only move his right arm and his head is manually fixed to track the ball. The initial position of both the ball and MIMo is slightly randomized.
   2. `standup` - MIMo is tasked with standing up. At the start he is in a low crouch with his hands gripping the bars of a crib. Proprioception and the vestibular sensors are included by default.
-  3. `test` - This is a simple dummy environment set up to demonstrate and visualize most of the sensor modalities. MIMo is set to fall from a short height. During this, the visual and haptic outputs are rendered and saved to hard drive.
+  3. `selfbody` - MIMo is sitting on the ground and rewarded for touching a specific body part with his right arm. The target body part is randomized each episode.
+  4. `catch` - A ball is dropped from a short height onto MIMo's outstretched arm. Episodes are completed successfully when MIMo holds onto the ball continuously for a full second.
 
 ## License
 

diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -1,12 +1,14 @@
-numpy<=1.22.0
+numpy
 matplotlib
-glfw
+imageio
 scipy
 trimesh
 cachetools
 CacheToolsUtils
 networkx
 rtree
+mujoco
+gymnasium==0.28.1
 opencv-python
 sphinx==4.2.0
 sphinx_rtd_theme==1.0.0

diff --git a/docs/source/api/utility.rst b/docs/source/api/utility.rst
@@ -102,7 +102,16 @@ Assorted functions
    :nosignatures:
 
    mimoEnv.utils.determine_geom_masses
-
+
+Data fields
+-----------
+
+.. autodata:: mimoEnv.utils.MUJOCO_JOINT_SIZES
+   :no-value:
+
+.. autodata:: mimoEnv.utils.MUJOCO_DOF_SIZES
+   :no-value:
+
 Detail documentation
 ----------------------
 

diff --git a/docs/source/changelog.rst b/docs/source/changelog.rst
@@ -0,0 +1,66 @@
+Changelog
+=========
+
+Version 1.1.0 (The Big Migration)
+---------------------------------
+
+This version moves MIMo from gym and the mujoco_py wrappers to gymnasium and
+MuJoCo's own python wrappers. With this move come several breaking changes:
+
+ - Swapping python wrappers means two key changes:
+
+    1. The ``env.sim`` attribute is gone. Instead, there are now two
+       attributes ``env.model`` and ``env.data`` which correspond quite closely
+       to the ``sim.model`` and ``sim.data`` objects from mujoco_py
+    2. some functions are no longer available. In particular, ``model.body_name2id``
+       and similar functions are not available with the MuJoCo wrappers. Instead they
+       have named access, i.e. ``model.body(id).name``. See the
+       :doc:`MuJoCo Documentation <mujoco:python>` for more details.
+
+ - Gym has ben replaced with Gymnasium throughout. The key changes due to this are
+   listed below:
+
+    1. The step function now returns 5 values, ``(obs, reward, done, trunc, info)``. The
+       new value 'trunc', indicates if the episode has ended for any reason other than
+       reaching a terminal state, such as a time limit. As a result, any code such as
+       ``if done: env.reset()`` needs to be changed to ``ìf done or trunc: env.reset()``.
+    2. The reset function has also gained an extra return value. This needs to be caught
+       to avoid potentially obscure unpacking errors. ``obs = env.reset()`` ->
+       ``obs, _ = env.reset()`` will do.
+    3. Rendering has changed significantly, depending on use case. If you wish to
+       use the interactive window you can pass 'render_mode="human"' to the constructor
+       and then call ``env.render()``.
+       If you want to render images from multiple different cameras, or use both an
+       interactive window and also render arrays (for example to save as a video), you
+       should use gymnasiums MuJoCoRenderer with
+       ``img = env.mujoco_renderer.render(render_mode="rgb_array", ...)``,
+       similar to the old interface.
+
+In addition to these changes there were also some adjustments to the actuation models
+to allow multiple to be attached to the same environment without conflicts.
+
+Version 1.0.0 (Initial release)
+-------------------------------
+
+First full release.
+
+In addition to many, many small updates, this version brings two major changes that
+"complete" the initial release:
+
+ 1. A new, five-fingered version of MIMo. This allows for experiments which
+    require dexterous manipulation.
+ 2. A system to handle actuation models, with three models/implementations to start
+    with. The first is the Spring-Damper model from the conference version. The
+    second is a new approach in which each actuator is modeled as two opposing,
+    independently controllable "muscles". Finally we have a positional model which
+    allows locking MIMo into or moving him through defined poses.
+
+The actuation systems come in a new package 'mimoActuation', which defines the
+interfaces and functions that actuation models must provide, similar to the
+sensory modules.
+Performance was also somewhat improved.
+
+Version 0.1.0 (Paper release)
+-----------------------------
+
+Conference paper version.
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -20,7 +20,7 @@
     'sphinx.ext.napoleon',
     'sphinx.ext.autodoc.preserve_defaults',  # Do not resolve the names of default arguments
 ]
-autodoc_mock_imports = ["mujoco_py", "gym"]
+autodoc_mock_imports = []  # "mujoco"
 autodoc_member_order = 'bysource'
 # autodoc_class_signature = 'separated' # Moves the class signatures into separate init function docu.
 numpydoc_class_members_toctree = False
@@ -46,7 +46,7 @@
     'sphinx': ('https://www.sphinx-doc.org/en/master/', None),
     'mimo': ('https://mimo.readthedocs.io/en/latest/', None),
     'mujoco': ('https://mujoco.readthedocs.io/en/latest', None),
-    'gym': ('https://www.gymlibrary.dev', None),
+    'gym': ('https://gymnasium.farama.org', None),
 }
 intersphinx_disabled_domains = ['std']
 

diff --git a/docs/source/guides/installation.rst b/docs/source/guides/installation.rst
@@ -3,8 +3,6 @@ Installation
 
 Installation is very straight forward::
 
-    Install MuJoCo 2.10
-    Install mujoco-py 2.1
     Clone this repository
     pip install -r requirements.txt
     pip install -e .

diff --git a/docs/source/guides/new_experiment.rst b/docs/source/guides/new_experiment.rst
@@ -201,48 +201,49 @@ parent class.
     class MIMoStandupEnv(MIMoEnv):
         def __init__(self,
                  model_path=STANDUP_XML,
-                 initial_qpos={},
-                 n_substeps=2,
                  proprio_params=DEFAULT_PROPRIOCEPTION_PARAMS,
                  touch_params=None,
                  vision_params=None,
                  vestibular_params=DEFAULT_VESTIBULAR_PARAMS,
+                 done_active=False,
+                 **kwargs,
                  ):
 
             super().__init__(model_path=model_path,
-                             initial_qpos=initial_qpos,
-                             n_substeps=n_substeps,
                              proprio_params=proprio_params,
                              touch_params=touch_params,
                              vision_params=vision_params,
                              vestibular_params=vestibular_params,
-                             goals_in_observation=False,
-                             done_active=False)
+                             done_active=done_active,
+                             **kwargs,)
 
 Next we need to override all the abstract functions. We will use the head height as our goal
 variable::
 
-    def _get_achieved_goal(self):
-        return self.sim.data.get_body_xpos('head')[2]
+    def get_achieved_goal(self):
+        return self.data.body('head').xpos[2]
 
 Since we want fixed length episodes and have disabled `done_active` we don't need any of the
 other goal related functions and just implement them as dummy functions::
 
-    def _is_success(self, achieved_goal, desired_goal):
+    def is_success(self, achieved_goal, desired_goal):
         return False
 
-    def _is_failure(self, achieved_goal, desired_goal):
+    def is_failure(self, achieved_goal, desired_goal):
         return False
 
-    def _sample_goal(self):
+    def is_truncated(self):
+        return False
+
+    def sample_goal(self):
         return 0.0
 
 The only things still missing are the reward and the reset functions. The reward will consist
-of a positive component based on the head height, determined in ``_get_achieved_goal``, and
+of a positive component based on the head height, determined in ``get_achieved_goal``, and
 a penalty for large actions::
 
     def compute_reward(self, achieved_goal, desired_goal, info):
-        quad_ctrl_cost = 0.01 * np.square(self.sim.data.ctrl).sum()
+        quad_ctrl_cost = 0.01 * np.square(self.data.ctrl).sum()
         reward = achieved_goal - 0.2 - quad_ctrl_cost
         return reward
 
@@ -253,30 +254,24 @@ those from the randomization. The crib does not have joints and other joints in
 scene belong to MIMo. We then set the state with our new randomized positions and let the
 simulation settle for a few timesteps::
 
-    def _reset_sim(self):
-        self.sim.set_state(self.initial_state)
-        default_state = self.sim.get_state()
-        qpos = self.sim.data.qpos
+    def reset_model(self):
+        self.set_state(self.init_qpos, self.init_qvel)
+        qpos = self.init_crouch_position
 
         # set initial positions stochastically
-        qpos[7:] = qpos[7:] + self.np_random.uniform(low=-0.1, high=0.1, size=len(qpos[6:]))
+        qpos[7:] = qpos[7:] + self.np_random.uniform(low=-0.01, high=0.01, size=len(qpos[7:]))
 
         # set initial velocities to zero
-        qvel = np.zeros(self.sim.data.qvel.shape)
+        qvel = np.zeros(self.data.qvel.ravel().shape)
 
-        new_state = mujoco_py.MjSimState(
-            default_state.time, qpos, qvel, default_state.act, default_state.udd_state
-        )
-        self.sim.set_state(new_state)
-        self.sim.forward()
+        self.set_state(qpos, qvel)
 
         # perform 100 steps with no actions to stabilize initial position
         actions = np.zeros(self.action_space.shape)
         self._set_action(actions)
-        for _ in range(100):
-            self.sim.step()
+        mujoco.mj_step(self.model, self.data, nstep=100)
 
-        return True
+        return self._get_obs()
 
 Finally we register our new environment with gym by adding these lines to
 ``mimoEnv/__init__.py``, which also lets us set our fixed episode length::

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -7,7 +7,7 @@ Welcome to MIMo's documentation!
    :alt: MIMo with some toys
 
 MIMo is a library for the research of cognitive development and multi-sensory learning. 
-We follow the :doc:`API of the OpenAI gym <gym:content/api>` environments, using
+We follow the :doc:`API of Gymnasium <gym:index>` environments, using
 :doc:`MuJoCo <mujoco:overview>` for the physics simulation.
 
 
@@ -23,6 +23,7 @@ Contents
 
    guides/index.rst
    api/index.rst
+   changelog.rst
 
 MIMo is a simple, childlike robot physically simulated using MuJoCo. He has multiple sensory
 modalities including touch, vision, proprioception and a vestibular system.