Skip to content
This repository has been archived by the owner on Jun 13, 2024. It is now read-only.

Calculation of alpha loss in SAC is different from the original paper #28

Open
CloudyDory opened this issue Jun 15, 2022 · 0 comments
Open

Comments

@CloudyDory
Copy link

CloudyDory commented Jun 15, 2022

Hello, in the SAC paper "Soft Actor-Critic Algorithms and Applications" the calculation of the loss of alpha is:

J(alpha) = E[-alpha * (log(pi) + H)]

However, in your implementation, the calculation of the loss of alpha is instead (line 109 of "trainer.py"):

J(alpha) = E[-log(alpha) * (log(pi) + H)]

I am curious why the loss is calculated in this way. I have searched in Github for a couple of PyTorch based SAC implementations and they call calculate the loss in this way. But the TensorFlow based SAC implementations calculate the J(alpha) in the same way as the SAC paper (https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py). TensorFlow implementations still calculate the gradient with respect to log(alpha), but when calculating the loss J(alpha) they use exp(log(alpha)) (which is alpha) instead of log(alpha).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant