Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter14 DDPG code about policy gradient #59

Open
GuilinF opened this issue Oct 14, 2019 · 1 comment
Open

Chapter14 DDPG code about policy gradient #59

GuilinF opened this issue Oct 14, 2019 · 1 comment

Comments

@GuilinF
Copy link

GuilinF commented Oct 14, 2019

Sir, thanks for your book and code. it is very nice. In chapter14 DDPG code, i can't understand the code about the actor policy gradient update. This is part of your relevant code
act_opt.zero_grad()
cur_actions_v = act_net(states_v)
actor_loss_v = -crt_net(states_v, cur_actions_v)
actor_loss_v = actor_loss_v.mean()
actor_loss_v.backward()
act_opt.step()
tb_tracker.track("loss_actor", actor_loss_v, frame_idx)

but I can't correspond to the DDPG paper formula. The paper writes update method for multiply of two gradients.

Thank you for your patience

@josiahls
Copy link

josiahls commented Sep 27, 2022

I honestly feel like the code from the book is wrong. Like this does train but I think it is incorrect, and slower than it should.

I say this because the above loss will go up to 10,20,30+, which isnt great. The main reason why this is the case is because the loss from the book is the literal q value, not the gradient / derivative. I dont think you want to be working with literal q values as losses since they can be much larger and smaller than 1.

If you do a gradient, it will likely stay a reasonable number regardless of whether q is 1 or 1000.

Once again, this code does work, buuuut I think it would train faster and more stably usign the gradients instead.

Edit: Thinking about this more, the reason this doesnt match the equation is because you're using pytorch's autograd to do the gradients for you. Its not a particularly satisfying answer, and I find the idea of the actor_loss_v steadily growing larger concerning, and that it might be slowing training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants