You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I changed the line adv_v = vals_ref_v - value_v.detach() to adv_v = vals_ref_v - value_v.squeeze(-1).detach(). It seems the convergence is much faster. According to the A2C algorithm, it is just logic to apply Q(a, s) - V(s) where Q(a, s) and V(s) with the same shape.
The call to detach() is important here as we don't want to propagate the PG into our value approximation head.
https://ai.stackexchange.com/questions/21172/advantage-computed-the-wrong-way
The text was updated successfully, but these errors were encountered: