Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于TRPO车杆环境运行结果 #95

Open
24krab opened this issue Oct 29, 2024 · 4 comments
Open

关于TRPO车杆环境运行结果 #95

24krab opened this issue Oct 29, 2024 · 4 comments

Comments

@24krab
Copy link

24krab commented Oct 29, 2024

我直接copy了trpo这一章节的代码运行结果,但在车杆环境下和教材里展示的结果差距显著:每轮迭代500采样序列的参数下,几乎无法达到200的回报;即使我尝试增大序列数到1000,效果能够有所改善,但还是和展示结果差距明显。在两台电脑上都跑出了类似的结果,请问这可能是什么原因导致的?
车杆2轮平滑
序列1000车杆平滑

@itera-del
Copy link

+1

@YanikZ35
Copy link

可以试着增加共轭梯度法循环部分的轮数
def conjugate_gradient(self, grad, states, old_action_dists): # 共轭梯度法求解方程
x = torch.zeros_like(grad)
r = grad.clone()
p = grad.clone()
rdotr = torch.dot(r, r)
for i in range(100): # 共轭梯度主循环,10步或许有点少,搜索结果不甚准确,给到100发现有较好的性能提升
Hp = self.hessian_matrix_vector_product(states, old_action_dists, p)
alpha = rdotr / torch.dot(p, Hp)
x += alpha * p
r -= alpha * Hp
new_rdotr = torch.dot(r, r)
if new_rdotr < 1e-10:
break
beta = new_rdotr / rdotr
p = r + beta * p
rdotr = new_rdotr
return x
实测中10轮也就到0.0几的水平,根本达不到1e-10阈值的水平,我这里改成100轮之后就可以达到200的奖励了,推测是轮次不够导致的计算结果不准确,下附对比图
Image
Image

@24krab
Copy link
Author

24krab commented Feb 27, 2025

可以试着增加共轭梯度法循环部分的轮数 def conjugate_gradient(self, grad, states, old_action_dists): # 共轭梯度法求解方程 x = torch.zeros_like(grad) r = grad.clone() p = grad.clone() rdotr = torch.dot(r, r) for i in range(100): # 共轭梯度主循环,10步或许有点少,搜索结果不甚准确,给到100发现有较好的性能提升 Hp = self.hessian_matrix_vector_product(states, old_action_dists, p) alpha = rdotr / torch.dot(p, Hp) x += alpha * p r -= alpha * Hp new_rdotr = torch.dot(r, r) if new_rdotr < 1e-10: break beta = new_rdotr / rdotr p = r + beta * p rdotr = new_rdotr return x 实测中10轮也就到0.0几的水平,根本达不到1e-10阈值的水平,我这里改成100轮之后就可以达到200的奖励了,推测是轮次不够导致的计算结果不准确,下附对比图 Image Image

谢谢你的分享!虽然我试着按照你的建议把共轭梯度循环数量修改至100,但效果仍然不理想,此外训练速度明显变慢。不过我之后又尝试把kl散度的约束调整为了0.001(即kl_constraint = 0.001),同时把共轭梯度循环数修改至50,现在得到了较为理想的结果,如图所示。

Image

Image

@24krab
Copy link
Author

24krab commented Feb 27, 2025

+1

我通过修改一部分超参数得到了较为良好的结果,见这个issue里我上一条回复

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants