Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Why does env.step in the HalfCheetah environment return 5 values if one of them if always False? #134

Closed
cijerezg opened this issue Nov 16, 2022 · 2 comments
Labels
question Further information is requested

Comments

@cijerezg
Copy link

Question

This is code directly copied from this repo for the halfcheetah environment v4, specifically this

def step(self, action):
        x_position_before = self.data.qpos[0]
        self.do_simulation(action, self.frame_skip)
        x_position_after = self.data.qpos[0]
        x_velocity = (x_position_after - x_position_before) / self.dt

        ctrl_cost = self.control_cost(action)

        forward_reward = self._forward_reward_weight * x_velocity

        observation = self._get_obs()
        reward = forward_reward - ctrl_cost
        terminated = False
        info = {
            "x_position": x_position_after,
            "x_velocity": x_velocity,
            "reward_run": forward_reward,
            "reward_ctrl": -ctrl_cost,
        }

        if self.render_mode == "human":
            self.render()
        return observation, reward, terminated, False, info

That functions returns observation, reward, terminated, and info, which all makes sense, but then there is that False that I don't understand. What's the purpose of that? What am I missing?

@cijerezg cijerezg added the question Further information is requested label Nov 16, 2022
@axb2035
Copy link
Contributor

axb2035 commented Nov 16, 2022

In the change to v26 the values retured from step() became observation, reward, terminated, truncated, info.

So the false is the truncated. This implies that the episode can never be truncated, only terminated.

Check out the docs for more information: https://gymnasium.farama.org/api/env/#gymnasium.Env.step

@pseudo-rnd-thoughts
Copy link
Member

what @axb2035 wrote is mostly correct as the environment doesn't internally truncate, i.e. stop an episode due to a time limit.
However, during gym.make, a time limit wrapper is applied to the environment that can enforce a time limit and truncate the environment. For all mujoco environments this time limit is 1000 steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants