introduce recurrent sac-discrete #1

twni2016 · 2022-03-01T03:24:56Z

This PR introduces recurrent SAC-discrete algorithm for POMDPs with discrete action space.
The code is heavily based on the SAC-discrete open-sourced code https://github.com/ku2482/sac-discrete.pytorch/blob/master/sacd/agent/sacd.py and the SAC-discrete paper https://arxiv.org/abs/1910.07207

We provide two sanity checks on classic gym discrete control environments: CartPole-v0 and LunarLander-v2. The commands for running Markovian and recurrent SAC-discrete algorithms are:

# CartPole
python3 policies/main.py --cfg configs/pomdp/cartpole/f/mlp.yml --target_entropy 0.7 --cuda -1
# CartPole-V
python3 policies/main.py --cfg configs/pomdp/cartpole/v/rnn.yml --target_entropy 0.7 --cuda 0
# Lunalander
python3 policies/main.py --cfg configs/pomdp/lunalander/f/mlp.yml --target_entropy 0.7 --cuda -1
# Lunalander-V
python3 policies/main.py --cfg configs/pomdp/lunalander/v/rnn.yml --target_entropy 0.5 --cuda 0

where target_entropy sets the ratio of target entropy: ratio * log(|A|).

CartPole: Markovian SAC-discrete is quite sensitive to target_entropy but can solve the task with max return 200:

CartPole-V: recurrent SAC-discrete is robust to target_entropy and can solve the task with max return 200 within 10 episodes

Lunalander: Markovian SAC-discrete is sensitive to target_entropy but can solve the task with return over 200:

Lunalander-V: recurrent SAC-discrete is very sensitive to target_entropy but can nearly solve the task in one target_entropy value:

hai-h-nguyen · 2022-04-03T01:08:55Z

Hi, I ran this command:
python3 policies/main.py --cfg configs/pomdp/cartpole/f/mlp.yml --target_entropy 0.7 --cuda -1
Even it seems to solve the domain, rl_loss/alpha, rl_loss/policy_loss, rl_loss/qf1_loss, l_loss/qf2_loss increase/decrease very quickly and do not seem to stop. Probably something weird going on here?

twni2016 · 2022-04-04T04:02:34Z

Hi,

Yes, I did observe Markovian SAC-Discrete is unstable for Cartpole across seeds. You may try disable auto-tuning the alpha, and grid search over fixed alpha, using

--noautomatic_entropy_tuning --entropy_alpha 0.1

I did not have much insight on it..

hai-h-nguyen · 2022-11-02T14:29:26Z

Hi, when running recurrent SACD on my domains, there is often a time when the agent doesn't seem to change much (the learning curve is just straight - 0-15k timesteps like the figure below). Do you have any insight?

twni2016 · 2022-11-02T17:37:50Z

It seems that your task has sparse reward. I guess the entropy is high at the early learning stage and through training, the entropy decreases to a threshold where the agent can exploit its "optimal" behavior to receive some positive rewards.

hai-h-nguyen · 2022-11-04T14:05:45Z

Yeah, that's right. However, the agent does experience rewards during the period 0-10k, but the policy gradient doesn't seem to be large. And the policy during evaluation didn't change much, often not getting any success at all, even though it's not hard that sparse to get a positive reward.

RobertMcCarthy97 · 2023-02-11T16:27:58Z

@hai-h-nguyen Did you ever find a solution to these issues? I am experiencing somewhat similar behaviour.

hai-h-nguyen · 2023-02-11T16:49:48Z

Hi @RobertMcCarthy97 , it might help to start alpha at a smaller value, like 0.01, than the starting value in the code (1.0). That is to make the agent explores less initially.

twni2016 added 6 commits February 28, 2022 22:23

introduce recurrent sac-discrete

72d6462

add readme

2a07e8c

black format

1f7940d

fix potential bug in sac-discrete

385be91

discount 0.99 is important to sacd in cartpole; introduce lunarlander

43bb3bb

minor

27ae0fc

twni2016 merged commit b3b928e into master Mar 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

introduce recurrent sac-discrete #1

introduce recurrent sac-discrete #1

twni2016 commented Mar 1, 2022 •

edited

Loading

hai-h-nguyen commented Apr 3, 2022

twni2016 commented Apr 4, 2022

hai-h-nguyen commented Nov 2, 2022 •

edited

Loading

twni2016 commented Nov 2, 2022

hai-h-nguyen commented Nov 4, 2022

RobertMcCarthy97 commented Feb 11, 2023

hai-h-nguyen commented Feb 11, 2023

introduce recurrent sac-discrete #1

introduce recurrent sac-discrete #1

Conversation

twni2016 commented Mar 1, 2022 • edited Loading

hai-h-nguyen commented Apr 3, 2022

twni2016 commented Apr 4, 2022

hai-h-nguyen commented Nov 2, 2022 • edited Loading

twni2016 commented Nov 2, 2022

hai-h-nguyen commented Nov 4, 2022

RobertMcCarthy97 commented Feb 11, 2023

hai-h-nguyen commented Feb 11, 2023

twni2016 commented Mar 1, 2022 •

edited

Loading

hai-h-nguyen commented Nov 2, 2022 •

edited

Loading