-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
关于TRPO车杆环境运行结果 #95
Comments
+1 |
我通过修改一部分超参数得到了较为良好的结果,见这个issue里我上一条回复 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
我直接copy了trpo这一章节的代码运行结果,但在车杆环境下和教材里展示的结果差距显著:每轮迭代500采样序列的参数下,几乎无法达到200的回报;即使我尝试增大序列数到1000,效果能够有所改善,但还是和展示结果差距明显。在两台电脑上都跑出了类似的结果,请问这可能是什么原因导致的?


The text was updated successfully, but these errors were encountered: