[Question] Observation in Humanoid/Ant-v2 #1636

Rowing0914 · 2019-08-02T15:48:56Z

Hi,

Recently I've been working on some experiments using MuJoCo/OpenAI Gym.
And when I was checking the returns from env.step() on Humanoid-v2 and Ant-v2.
It returns the vector containing most items are zeros so that I have investigated a bit more on the source code like

Humanoid get_obs: https://github.com/openai/gym/blob/master/gym/envs/mujoco/humanoid.py#L22
Ant get_obs: https://github.com/openai/gym/blob/master/gym/envs/mujoco/ant.py#L35
cfrc_ext impl: https://github.com/openai/mujoco py/blob/master/mujoco_py/pxd/mjdata.pxd#L285

And did read this issue: #585
but it seems like no one is asking about the issue which I have right now, the actual values in observation from humanoid/ant are dominated by 0.

So that I wonder if anyone gets the obs as me??

=== Info of my env ===

gym: v0.14.0
MuJoCo: v2.0.0
Python: 3.6.6
Obs in Ant-v2 shown below

[ 4.86801671e-01 9.81827799e-01 -1.64617166e-01 -1.56627964e-02

9.31130374e-02 -5.24809883e-01 5.23265547e-01 5.24312273e-01

-5.21959648e-01 -5.24233225e-01 -1.22195542e+00 5.24128510e-01

5.23526435e-01 -2.90538579e-07 -6.56325909e-07 -1.38336952e-15

1.22402905e-06 -8.01568735e-07 -2.76759106e-07 2.80583420e-15

-3.52882797e-15 -2.03051639e-15 9.77448673e-16 2.40798716e-16

-8.00517472e-16 2.80357298e-15 1.24554476e-15 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00

0.00000000e+00 0.00000000e+00 0.00000000e+00]

The text was updated successfully, but these errors were encountered:

christopherhesse · 2019-08-09T22:48:48Z

Going to close this in favor of #585 but in this particular case, it looks like the zeros are in cfrc_ext , which according to http://www.mujoco.org/book/reference.html is the external forces on the center of mass of different components of the model. It's likely there are many parts of the model with no external forces. But as this observation I gathered from Ant-v2 seems to show, they're not all always zero:

[ 2.58922679e-01  1.63536825e-01 -6.37013970e-01 -7.49357197e-01
  7.70240215e-02  5.88669942e-01  1.10035780e+00  5.40773286e-01
 -5.23651238e-01  5.25538532e-01 -5.21748144e-01 -5.70826370e-01
  1.21903678e+00  5.45502006e-02 -6.04703699e-02  3.60616908e-02
  1.65534538e-01  2.15688587e-01 -4.97538018e-02  5.60827621e-01
 -3.84989316e+00  4.09290318e-01 -8.74945315e-02 -3.56383019e-02
  2.64558270e-03  1.28401105e+00  2.24424241e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  1.00000000e+00  7.16574164e-02  3.75826113e-02
 -3.02589527e-01  1.00000000e+00  1.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
  0.00000000e+00  0.00000000e+00  0.00000000e+00]```

anyboby · 2020-10-09T21:20:33Z

Hi,
it's a rather old thread, but in case anyone else is wondering about the zero terms:
This issue originates from the combination of mujoco-py >= 2.0 and mujoco 200 (also see this thread: #1541) where contact forces are not necessarily calculated by mujoco.
Solutions would include downgrading either mujoco-py to 1.5x or mujoco to 150.
Overwriting the ant environment to manually force the calculation is also a solution:

    def step(self, a): \
        xposbefore = self.get_body_com("torso")[0]
        self.do_simulation(a, self.frame_skip)
        #######
        mjp.functions.mj_rnePostConstraint(self.sim.model, self.sim.data) #### calc contacts, this is a mujoco py version mismatch issue with mujoco200
        #######
        xposafter = self.get_body_com("torso")[0]
        forward_reward = (xposafter - xposbefore)/self.dt
        ctrl_cost = .5 * np.square(a).sum()
        contact_cost = 0.5 * 1e-3 * np.sum(
            np.square(np.clip(self.sim.data.cfrc_ext, -1, 1)))
        survive_reward = 1.0
        reward = forward_reward - ctrl_cost - contact_cost + survive_reward
        state = self.state_vector()
        notdone = np.isfinite(state).all() \
            and state[2] >= 0.2 and state[2] <= 1.0
        done = not notdone
        ob = self._get_obs()
        return ob, reward, done, dict(
            reward_forward=forward_reward,
            reward_ctrl=-ctrl_cost,
            reward_contact=-contact_cost,
            reward_survive=survive_reward)

PS: I think this is actually a rather important issue, as contact forces are part of the reward function and should remain unchanged for combinations of tested mjpy - mujoco versions.

DanielTakeshi · 2020-11-10T21:23:38Z

Agree with @anyboby not sure if this should be closed.

johnnylin110 · 2021-01-05T11:55:51Z

Thanks @anyboby , your comment really help ,
I also want to ask is your second method(modify the code) identically equal to your another solution downgrad mujoco_py to 1.5X?
because I use your second method on some experiment and want to make sure it is identically the same and compare it with other paper result.
Thanks !

anyboby · 2021-01-05T18:57:56Z

@johnnylin110
Generally, overwriting the environments is not equal to downgrading mujoco or mujoco-py since you're not undoing any of the changes associated with the different versions. I also can't speak for the dynamics involved in the mujoco backend, since I don't know if, for example, the dynamics solvers operate in the exact same way between versions, or if a call to compute contact forces from mujoco-py is slower than an internal call etc. etc..

As for the reward functions, given equal states, this code produced the same rewards as mujoco150 + mjpy 200 for me. But again, there is no guarantee and for absolute certainty you would have to use the same versions as a referenced paper.

johnnylin110 · 2021-01-06T03:07:56Z

@anyboby
Thanks for your reply !
I will take this into consideration
Very appreciated!

christopherhesse closed this as completed Aug 9, 2019

JamesKCS mentioned this issue Feb 4, 2022

Contact forces are still zero in Ant-v2 #2593

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Observation in Humanoid/Ant-v2 #1636

[Question] Observation in Humanoid/Ant-v2 #1636

Rowing0914 commented Aug 2, 2019

christopherhesse commented Aug 9, 2019

anyboby commented Oct 9, 2020 •

edited

Loading

DanielTakeshi commented Nov 10, 2020

johnnylin110 commented Jan 5, 2021

anyboby commented Jan 5, 2021

johnnylin110 commented Jan 6, 2021

[Question] Observation in Humanoid/Ant-v2 #1636

[Question] Observation in Humanoid/Ant-v2 #1636

Comments

Rowing0914 commented Aug 2, 2019

christopherhesse commented Aug 9, 2019

anyboby commented Oct 9, 2020 • edited Loading

DanielTakeshi commented Nov 10, 2020

johnnylin110 commented Jan 5, 2021

anyboby commented Jan 5, 2021

johnnylin110 commented Jan 6, 2021

anyboby commented Oct 9, 2020 •

edited

Loading