Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why remove the first two joints' positions in Swimmer ? #837

Closed
quanvuong opened this issue Jan 26, 2018 · 5 comments
Closed

Why remove the first two joints' positions in Swimmer ? #837

quanvuong opened this issue Jan 26, 2018 · 5 comments
Labels

Comments

@quanvuong
Copy link
Contributor

In the Swimmer environment, there are 5 joints. However, the step function removes the positions of the first two joints (x, y position of the whole body) from the state before returning the state.

I was wondering why these two scalars are removed from the state ? Thanks!

@suryabhupa
Copy link

Mujoco makes a distinction between the state and observation; there's a full system state that's maintained and updated by the simulator, but the policy is only able to see some of the information. In the case of swimmer, they remove the first two joints. If you check out HalfCheetah (and any other Mujoco environment), you'll notice they prune the full state to an observation.

@quanvuong
Copy link
Contributor Author

quanvuong commented Jan 27, 2018 via email

@quanvuong
Copy link
Contributor Author

In other words, if it is reasonable to expect a real-life robot to have access to the full state, why prune the state unless there is explicit guarantee that the pruning does not impose unreasonable constraint on the policy (i.e. there are no unexpected higher order effects) ?

@JulianoLagana
Copy link

My guess is that having states like x, y position of the body would make it significantly harder to use neural networks for the policy function. That is because these variables would have small ranges at the start of the optimization (the agent does not yet know how to swim, so it doesn't move so much), to huge ranges after a good policy was learnt.

@jkterry1
Copy link
Collaborator

PR #2762 is about to be merged, introducing V4 MuJoCo environments using new bindings and a dramatically newer version of the engine. If this issue still persists with the V4 ones, please create a new issue for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants