-
-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug Report] [MuJoCo
] Reacher And Pusher reward is calculated prior to transition
#821
Closed
1 task done
Labels
bug
Something isn't working
Comments
Is it possible to change this in v5 before we make the release? |
Quick analysis: The performance is very similar (though slightly more consistent for the fixed reward case) |
Amazing, could you make a PR to change the implementation to this? |
This was referenced Dec 9, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
In a normal RL environment's step:
which is mean that they generate the reward as a function of the current state and current actions
but in
Pusher
&Reacher
's step:which means that they generate the reward as a function of the previous state and current actions
Learning impact analysis
TODO at some point (will likely do it after 2023)
proposed solution
as I believe, the current
v5
MuJoCo environment to be done as is,and the environments are easily solvable as is anyway,
we should fix this in a future release (v6?)
Code example
Gymnasium/gymnasium/envs/mujoco/reacher_v5.py
Lines 199 to 207 in 14def07
lines 199 and 200 are in the opposite order
Additional context
This has been an issue since
reacher
was introduced in the initial commit.This issue was first reported in 2018, but never addressed.
Checklist
The text was updated successfully, but these errors were encountered: