[Question] Why does env.step in the HalfCheetah environment return 5 values if one of them if always False? #134

cijerezg · 2022-11-16T03:20:39Z

Question

This is code directly copied from this repo for the halfcheetah environment v4, specifically this

def step(self, action):
        x_position_before = self.data.qpos[0]
        self.do_simulation(action, self.frame_skip)
        x_position_after = self.data.qpos[0]
        x_velocity = (x_position_after - x_position_before) / self.dt

        ctrl_cost = self.control_cost(action)

        forward_reward = self._forward_reward_weight * x_velocity

        observation = self._get_obs()
        reward = forward_reward - ctrl_cost
        terminated = False
        info = {
            "x_position": x_position_after,
            "x_velocity": x_velocity,
            "reward_run": forward_reward,
            "reward_ctrl": -ctrl_cost,
        }

        if self.render_mode == "human":
            self.render()
        return observation, reward, terminated, False, info

That functions returns observation, reward, terminated, and info, which all makes sense, but then there is that False that I don't understand. What's the purpose of that? What am I missing?

The text was updated successfully, but these errors were encountered:

axb2035 · 2022-11-16T09:21:04Z

In the change to v26 the values retured from step() became observation, reward, terminated, truncated, info.

So the false is the truncated. This implies that the episode can never be truncated, only terminated.

Check out the docs for more information: https://gymnasium.farama.org/api/env/#gymnasium.Env.step

pseudo-rnd-thoughts · 2022-11-16T12:10:06Z

what @axb2035 wrote is mostly correct as the environment doesn't internally truncate, i.e. stop an episode due to a time limit.
However, during gym.make, a time limit wrapper is applied to the environment that can enforce a time limit and truncate the environment. For all mujoco environments this time limit is 1000 steps.

cijerezg added the question Further information is requested label Nov 16, 2022

cijerezg closed this as completed Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Why does env.step in the HalfCheetah environment return 5 values if one of them if always False? #134

[Question] Why does env.step in the HalfCheetah environment return 5 values if one of them if always False? #134

cijerezg commented Nov 16, 2022

axb2035 commented Nov 16, 2022 •

edited

Loading

pseudo-rnd-thoughts commented Nov 16, 2022

[Question] Why does env.step in the HalfCheetah environment return 5 values if one of them if always False? #134

[Question] Why does env.step in the HalfCheetah environment return 5 values if one of them if always False? #134

Comments

cijerezg commented Nov 16, 2022

Question

axb2035 commented Nov 16, 2022 • edited Loading

pseudo-rnd-thoughts commented Nov 16, 2022

axb2035 commented Nov 16, 2022 •

edited

Loading