You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sir, thanks for your book and code. it is very nice. In chapter14 DDPG code, i can't understand the code about the actor policy gradient update. This is part of your relevant code
act_opt.zero_grad()
cur_actions_v = act_net(states_v)
actor_loss_v = -crt_net(states_v, cur_actions_v)
actor_loss_v = actor_loss_v.mean()
actor_loss_v.backward()
act_opt.step()
tb_tracker.track("loss_actor", actor_loss_v, frame_idx)
but I can't correspond to the DDPG paper formula. The paper writes update method for multiply of two gradients.
Thank you for your patience
The text was updated successfully, but these errors were encountered:
I honestly feel like the code from the book is wrong. Like this does train but I think it is incorrect, and slower than it should.
I say this because the above loss will go up to 10,20,30+, which isnt great. The main reason why this is the case is because the loss from the book is the literal q value, not the gradient / derivative. I dont think you want to be working with literal q values as losses since they can be much larger and smaller than 1.
If you do a gradient, it will likely stay a reasonable number regardless of whether q is 1 or 1000.
Once again, this code does work, buuuut I think it would train faster and more stably usign the gradients instead.
Edit: Thinking about this more, the reason this doesnt match the equation is because you're using pytorch's autograd to do the gradients for you. Its not a particularly satisfying answer, and I find the idea of the actor_loss_v steadily growing larger concerning, and that it might be slowing training.
Sir, thanks for your book and code. it is very nice. In chapter14 DDPG code, i can't understand the code about the actor policy gradient update. This is part of your relevant code
act_opt.zero_grad()
cur_actions_v = act_net(states_v)
actor_loss_v = -crt_net(states_v, cur_actions_v)
actor_loss_v = actor_loss_v.mean()
actor_loss_v.backward()
act_opt.step()
tb_tracker.track("loss_actor", actor_loss_v, frame_idx)
but I can't correspond to the DDPG paper formula. The paper writes update method for multiply of two gradients.
Thank you for your patience
The text was updated successfully, but these errors were encountered: