Calculation of alpha loss in SAC is different from the original paper #28

CloudyDory · 2022-06-15T01:02:53Z

Hello, in the SAC paper "Soft Actor-Critic Algorithms and Applications" the calculation of the loss of alpha is:

J(alpha) = E[-alpha * (log(pi) + H)]

However, in your implementation, the calculation of the loss of alpha is instead (line 109 of "trainer.py"):

J(alpha) = E[-log(alpha) * (log(pi) + H)]

I am curious why the loss is calculated in this way. I have searched in Github for a couple of PyTorch based SAC implementations and they call calculate the loss in this way. But the TensorFlow based SAC implementations calculate the J(alpha) in the same way as the SAC paper (https://github.com/rail-berkeley/softlearning/blob/master/softlearning/algorithms/sac.py). TensorFlow implementations still calculate the gradient with respect to log(alpha), but when calculating the loss J(alpha) they use exp(log(alpha)) (which is alpha) instead of log(alpha).

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculation of alpha loss in SAC is different from the original paper #28

Calculation of alpha loss in SAC is different from the original paper #28

CloudyDory commented Jun 15, 2022 •

edited

Loading

Calculation of alpha loss in SAC is different from the original paper #28

Calculation of alpha loss in SAC is different from the original paper #28

Comments

CloudyDory commented Jun 15, 2022 • edited Loading

CloudyDory commented Jun 15, 2022 •

edited

Loading