-
Notifications
You must be signed in to change notification settings - Fork 8.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why remove the first two joints' positions in Swimmer ? #837
Comments
Mujoco makes a distinction between the state and observation; there's a full system state that's maintained and updated by the simulator, but the policy is only able to see some of the information. In the case of swimmer, they remove the first two joints. If you check out HalfCheetah (and any other Mujoco environment), you'll notice they prune the full state to an observation. |
Yeah that’s exactly my questions, how can we ensure that the policy
performance is not negatively affected because of pruned state? It’s not
unreasonable for a real robot to have access to the positions of the first
two joints in this environment.
My guess is that since the goal is to make the snake swims in positive
direction, its x, y positions are not important information to decide on
actions.
On Sat, Jan 27, 2018 at 6:09 AM Surya Bhupatiraju ***@***.***> wrote:
Mujoco makes a distinction between the state and observation; there's a
full system state that's maintained and updated by the simulator, but the
policy is only able to see some of the information. In the case of swimmer,
they remove the first two joints. If you check out HalfCheetah (and any
other Mujoco environment), you'll notice they prune the full state to an
observation.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#837 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGqjODmdTAO8hLNpRrqX4TY6Lc1gTrAdks5tOoVlgaJpZM4RunEi>
.
--
Quan
UAE: +971 569 747 646
Skype: quan.vuong.nyu
|
In other words, if it is reasonable to expect a real-life robot to have access to the full state, why prune the state unless there is explicit guarantee that the pruning does not impose unreasonable constraint on the policy (i.e. there are no unexpected higher order effects) ? |
My guess is that having states like x, y position of the body would make it significantly harder to use neural networks for the policy function. That is because these variables would have small ranges at the start of the optimization (the agent does not yet know how to swim, so it doesn't move so much), to huge ranges after a good policy was learnt. |
PR #2762 is about to be merged, introducing V4 MuJoCo environments using new bindings and a dramatically newer version of the engine. If this issue still persists with the V4 ones, please create a new issue for it. |
In the Swimmer environment, there are 5 joints. However, the
step
function removes the positions of the first two joints (x, y position of the whole body) from the state before returning the state.I was wondering why these two scalars are removed from the state ? Thanks!
The text was updated successfully, but these errors were encountered: