SAC's loss explode on Hopper-v3 #332

tongzhoumu · 2021-03-31T04:35:48Z

I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
I have visited the source website
I have searched through the issue tracker for duplicates

I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, torch, sys
print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

Problems

I am trying to reproduce the results mentioned in the mujoco benchmark. And the command I used is ./run_experiments.sh Hopper-v3, which runs SAC algorithm. The losses exploded after several epochs. I have repeated the experiments for 6 times and tried fixed alpha and auto alpha tuning, but all of them exploded finally.

Other info

Version: I tried 6426a39 and e605bde, both of them have this issue
Operating system: ubuntu 18.04
Environment: 0.4.0 1.8.1+cu102 3.7.10 (default, Feb 26 2021, 18:47:35) [GCC 7.3.0] Linux

The text was updated successfully, but these errors were encountered:

Trinkle23897 · 2021-03-31T04:47:23Z

@ChenDRAG could you please have a check?

ChenDRAG · 2021-03-31T05:16:39Z

Looking into that @Trinkle23897 @tongzhoumu

ChenDRAG · 2021-03-31T09:21:21Z

@tongzhoumu I have run 4 experiments on both e605bde (older version, 0.4.0 ), and 6426a39 (latest), and find that older version of tianshou works quite well. There does seem to be a problem with SAC in 6426a39.

(older version results)

You mentioned that you have failed to reproduce benchmark even in e605bde . Is it possible that you forget to switch tianshou library you have installed to older version? In other words, your code under tianshou/examples/mujoco is older version, but the library you use is the latest one?

Suggestion: try

cd tianshou
git checkout e605bde
pip install -e .

and then reproduce benchmark.

Meanwhile, I will try to fix the bug with sac in latest version as soon as possible.

tongzhoumu · 2021-03-31T21:18:11Z

@ChenDRAG Yes, you are right. I forgot to install tianshou after switching to the old commit id. Now I can reproduce your results with e605bde codes.
I am looking forward to your solution for fixing the latest version. Thank you so much!

Trinkle23897 · 2021-04-01T00:40:45Z

#313 introduces this bug

ChenDRAG · 2021-04-01T05:45:03Z

solved with #333 ， turns out that tanh mapping is not supported in policies where action is used as input of critic , because raw action in range (-inf, inf) will cause instability in training. If you really need to ues tanh for mapping, you have to use it by definition in forward()

tongzhoumu · 2021-04-01T17:30:04Z

solved with #333 ， turns out that tanh mapping is not supported in policies where action is used as input of critic , because raw action in range (-inf, inf) will cause instability in training. If you really need to ues tanh for mapping, you have to use it by definition in forward()

Hi, I have not entirely understood your fix. After tanh, the action should be in [-1, 1], why it is incorrect?

BTW, I take a look at your solution, it seems you just removed if self.action_bound_method == "tanh" and self.action_space is not None:, what is the reason for this?

I am looking forward to your reply. Thanks!

ChenDRAG · 2021-04-02T04:34:19Z

BTW, I take a look at your solution, it seems you just removed if self.action_bound_method == "tanh" and self.action_space is not None:, what is the reason for this?

This key point is not to remove if self.action_bound_method == "tanh" and self.action_space is not None:, the key point is to make batch.act = squashed_act

See, in tianshou, there is a function called map_action used to map action in range (-inf, inf) to range (action_sapce.low, action_sapce.high). You can consider map_action as a kind of gym environment wrapper. It is a part of environment like a black box because remapped action will not be stored in buffer.

For instance, In mujoco Humanoid, the action space is (-0.4, 0.4), but the output of your neural network is (-inf,inf). Using map_action, you don't have to care about this.

This is particularly useful in on policy algorithms because action is usually sampled from Guassion distribution (range from -inf to inf in theory). However in SAC, when action is used as input of a neural network, (-inf, inf) action is no longer allowed because it will cause gradient explosion if action is too large. So in this case, tanh cannot be a part of black box, and can only be manually defined in forward.

tongzhoumu · 2021-04-02T05:17:31Z

BTW, I take a look at your solution, it seems you just removed if self.action_bound_method == "tanh" and self.action_space is not None:, what is the reason for this?

This key point is not to remove if self.action_bound_method == "tanh" and self.action_space is not None:, the key point is to make batch.act = squashed_act

See, in tianshou, there is a function called map_action used to map action in range (-inf, inf) to range (action_sapce.low, action_sapce.high). You can consider map_action as a kind of gym environment wrapper. It is a part of environment like a black box because remapped action will not be stored in buffer.

For instance, In mujoco Humanoid, the action space is (-0.4, 0.4), but the output of your neural network is (-inf,inf). Using map_action, you don't have to care about this.

This is particularly useful in on policy algorithms because action is usually sampled from Guassion distribution (range from -inf to inf in theory). However in SAC, when action is used as input of a neural network, (-inf, inf) action is no longer allowed because it will cause gradient explosion if action is too large. So in this case, tanh cannot be a part of black box, and can only be manually defined in forward.

After carefully inspecting the codes, I noticed that the key difference is act=squashed_action. Thanks so much for your detailed explanation!

Trinkle23897 added the bug Something isn't working label Mar 31, 2021

ChenDRAG self-assigned this Mar 31, 2021

ChenDRAG mentioned this issue Apr 1, 2021

Fix SAC loss explode #333

Merged

ChenDRAG assigned Trinkle23897 Apr 2, 2021

Trinkle23897 closed this as completed in #333 Apr 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAC's loss explode on Hopper-v3 #332

SAC's loss explode on Hopper-v3 #332

tongzhoumu commented Mar 31, 2021 •

edited

Loading

Trinkle23897 commented Mar 31, 2021

ChenDRAG commented Mar 31, 2021

ChenDRAG commented Mar 31, 2021 •

edited

Loading

tongzhoumu commented Mar 31, 2021 •

edited

Loading

Trinkle23897 commented Apr 1, 2021

ChenDRAG commented Apr 1, 2021

tongzhoumu commented Apr 1, 2021 •

edited

Loading

ChenDRAG commented Apr 2, 2021 •

edited

Loading

tongzhoumu commented Apr 2, 2021 •

edited

Loading

SAC's loss explode on Hopper-v3 #332

SAC's loss explode on Hopper-v3 #332

Comments

tongzhoumu commented Mar 31, 2021 • edited Loading

Problems

Other info

Trinkle23897 commented Mar 31, 2021

ChenDRAG commented Mar 31, 2021

ChenDRAG commented Mar 31, 2021 • edited Loading

tongzhoumu commented Mar 31, 2021 • edited Loading

Trinkle23897 commented Apr 1, 2021

ChenDRAG commented Apr 1, 2021

tongzhoumu commented Apr 1, 2021 • edited Loading

ChenDRAG commented Apr 2, 2021 • edited Loading

tongzhoumu commented Apr 2, 2021 • edited Loading

tongzhoumu commented Mar 31, 2021 •

edited

Loading

ChenDRAG commented Mar 31, 2021 •

edited

Loading

tongzhoumu commented Mar 31, 2021 •

edited

Loading

tongzhoumu commented Apr 1, 2021 •

edited

Loading

ChenDRAG commented Apr 2, 2021 •

edited

Loading

tongzhoumu commented Apr 2, 2021 •

edited

Loading