You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 13, 2024. It is now read-only.
Hello, in the SAC paper "Soft Actor-Critic Algorithms and Applications" the calculation of the loss of alpha is:
J(alpha) = E[-alpha * (log(pi) + H)]
However, in your implementation, the calculation of the loss of alpha is instead (line 109 of "trainer.py"):
J(alpha) = E[-log(alpha) * (log(pi) + H)]
I am curious why the loss is calculated in this way. I have searched in Github for a couple of PyTorch based SAC implementations and they call calculate the loss in this way. But the TensorFlow based SAC implementations calculate the J(alpha) in the same way as the SAC paper (https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py). TensorFlow implementations still calculate the gradient with respect to log(alpha), but when calculating the loss J(alpha) they use exp(log(alpha)) (which is alpha) instead of log(alpha).
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hello, in the SAC paper "Soft Actor-Critic Algorithms and Applications" the calculation of the loss of
alpha
is:However, in your implementation, the calculation of the loss of
alpha
is instead (line 109 of "trainer.py"):I am curious why the loss is calculated in this way. I have searched in Github for a couple of PyTorch based SAC implementations and they call calculate the loss in this way. But the TensorFlow based SAC implementations calculate the
J(alpha)
in the same way as the SAC paper (https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py). TensorFlow implementations still calculate the gradient with respect tolog(alpha)
, but when calculating the lossJ(alpha)
they useexp(log(alpha))
(which isalpha
) instead oflog(alpha)
.The text was updated successfully, but these errors were encountered: