Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pixel observation with recurrent SAC-Discrete #2

Closed
wants to merge 32 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
72d6462
introduce recurrent sac-discrete
twni2016 Mar 1, 2022
2a07e8c
add readme
twni2016 Mar 1, 2022
1f7940d
black format
twni2016 Mar 1, 2022
385be91
fix potential bug in sac-discrete
twni2016 Mar 1, 2022
43bb3bb
discount 0.99 is important to sacd in cartpole; introduce lunarlander
twni2016 Mar 1, 2022
27ae0fc
minor
twni2016 Mar 1, 2022
06ded95
introduce pixel obs POMDP env as sanity check
twni2016 Mar 2, 2022
a35b22e
black
twni2016 Mar 2, 2022
9eb2d79
update config for catch-40
twni2016 Mar 3, 2022
1197e3c
fix error in env reward, make `o` and 0.25 default
twni2016 Mar 4, 2022
47ea711
MINOR
twni2016 Mar 8, 2022
d2b0f39
support tuning `entropy_alpha`
twni2016 Mar 8, 2022
89e195c
merge
twni2016 Mar 8, 2022
5d9b3ff
Merge branch 'master' into sac-discrete
twni2016 Mar 8, 2022
07e0976
Merge remote-tracking branch 'origin/sac-discrete' into pixel-obs
twni2016 Mar 8, 2022
1369994
fix error
twni2016 Mar 9, 2022
f568031
Merge remote-tracking branch 'origin/main' into pixel-obs
twni2016 Mar 12, 2022
470f4eb
Merge remote-tracking branch 'origin/main' into pixel-obs
twni2016 Mar 12, 2022
e20c1fb
Merge remote-tracking branch 'origin/main' into pixel-obs
twni2016 Mar 12, 2022
7dab076
fix minor bug
twni2016 Mar 13, 2022
e9b05d6
add key2door env
twni2016 Mar 19, 2022
e265f4a
Merge branch 'main' of https://github.com/twni2016/pomdp-baselines in…
twni2016 Mar 19, 2022
fbd306c
refactor the gym wrapper
twni2016 Mar 20, 2022
8bb2ffb
runnable
twni2016 Mar 20, 2022
11dc70a
fix metric bug and reformat
twni2016 Mar 20, 2022
e0a7231
update env.yml
twni2016 Mar 20, 2022
5a427e4
update plot script
twni2016 Apr 8, 2022
4b966a0
minor
twni2016 Apr 9, 2022
946fcae
add key2door low/high variance
twni2016 Apr 21, 2022
d19252b
fix discrepancy in `max_frames` in keytodoor
twni2016 Apr 21, 2022
d529d15
add eval scripts
twni2016 Apr 30, 2022
b8196e7
move to a separate dir
twni2016 Apr 30, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,5 @@ scripts/tmp_configs/

# singularity
*.sif

third_party/
55 changes: 55 additions & 0 deletions configs/credit_assign/catch/rnn.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
seed: 73
cuda: 0 # use_gpu
# RAM: ~10G
env:
env_type: pomdp
env_name: Catch-40-v0

num_eval_tasks: 20 # num of eval episodes

train:
# 20000*(7*n) = 5M steps
num_iters: 20000 # number meta-training iterates
num_init_rollouts_pool: 5 # before training
num_rollouts_per_iter: 1

num_updates_per_iter: 0.25 # 1.0

# buffer params
buffer_type: seq_efficient
buffer_size: 1e6
batch_size: 32 # to tune based on sampled_seq_len
sampled_seq_len: -1 # -1 is all, or positive integer
sample_weight_baseline: 0.0

eval:
eval_stochastic: false # also eval stochastic policy
log_interval: 50 # num of iters
save_interval: -1
log_tensorboard: true

policy:
separate: True
arch: lstm # [lstm, gru]
algo: sacd # only support sac-discrete

action_embedding_size: 0 # no need for catch
state_embedding_size: 0 # use image encoder instead
image_encoder:
from_flattened: True

reward_embedding_size: 0
rnn_hidden_size: 128

dqn_layers: [128, 128]
policy_layers: [128, 128]
lr: 0.0003
gamma: 0.99
tau: 0.005

# sacd alpha
entropy_alpha: 0.1
automatic_entropy_tuning: False
target_entropy: None # the ratio: target_entropy = ratio * log(|A|)
alpha_lr: 0.0003

56 changes: 56 additions & 0 deletions configs/credit_assign/keytodoor/HighVar/rnn.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
seed: 73
cuda: 0 # use_gpu
# RAM: ~10G
env:
env_type: pomdp
env_name: KeytoDoor-HighVar-v0 # KeytoDoor-HighVar5-v0

num_eval_tasks: 20 # num of eval episodes

train:
# 200000*60 = 12M steps
num_iters: 200000 # number meta-training iterates
num_init_rollouts_pool: 5 # before training
num_rollouts_per_iter: 1

num_updates_per_iter: 0.25 # 1.0

# buffer params
buffer_type: seq_efficient
buffer_size: 1e6
batch_size: 32 # to tune based on sampled_seq_len
sampled_seq_len: -1 # -1 is all, or positive integer
sample_weight_baseline: 0.0

eval:
eval_stochastic: false # also eval stochastic policy
log_interval: 50 # num of iters
save_interval: -1
log_tensorboard: true

policy:
separate: True
arch: lstm # [lstm, gru]
algo: sacd # only support sac-discrete

action_embedding_size: 0 # no need for catch
state_embedding_size: 0 # use image encoder instead
image_encoder:
from_flattened: True
normalize_pixel: True

reward_embedding_size: 0
rnn_hidden_size: 128

dqn_layers: [128, 128]
policy_layers: [128, 128]
lr: 0.0003
gamma: 0.99
tau: 0.005

# sacd alpha
entropy_alpha: 0.1
automatic_entropy_tuning: False
target_entropy: None # the ratio: target_entropy = ratio * log(|A|)
alpha_lr: 0.0003

56 changes: 56 additions & 0 deletions configs/credit_assign/keytodoor/LowVar/rnn.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
seed: 73
cuda: 0 # use_gpu
# RAM: ~10G
env:
env_type: pomdp
env_name: KeytoDoor-LowVar-v0 # KeytoDoor-LowVar5-v0

num_eval_tasks: 20 # num of eval episodes

train:
# 150000*60 = 9M steps
num_iters: 150000 # number meta-training iterates
num_init_rollouts_pool: 5 # before training
num_rollouts_per_iter: 1

num_updates_per_iter: 0.25 # 1.0

# buffer params
buffer_type: seq_efficient
buffer_size: 1e6
batch_size: 32 # to tune based on sampled_seq_len
sampled_seq_len: -1 # -1 is all, or positive integer
sample_weight_baseline: 0.0

eval:
eval_stochastic: false # also eval stochastic policy
log_interval: 50 # num of iters
save_interval: -1
log_tensorboard: true

policy:
separate: True
arch: lstm # [lstm, gru]
algo: sacd # only support sac-discrete

action_embedding_size: 0 # no need for catch
state_embedding_size: 0 # use image encoder instead
image_encoder:
from_flattened: True
normalize_pixel: True

reward_embedding_size: 0
rnn_hidden_size: 128

dqn_layers: [128, 128]
policy_layers: [128, 128]
lr: 0.0003
gamma: 0.99
tau: 0.005

# sacd alpha
entropy_alpha: 0.1
automatic_entropy_tuning: False
target_entropy: None # the ratio: target_entropy = ratio * log(|A|)
alpha_lr: 0.0003

56 changes: 56 additions & 0 deletions configs/credit_assign/keytodoor/SR/rnn.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
seed: 73
cuda: 0 # use_gpu
# RAM: ~10G
env:
env_type: pomdp
env_name: KeytoDoor-SR-v0

num_eval_tasks: 20 # num of eval episodes

train:
# 100000*90 = 9M steps
num_iters: 100000 # number meta-training iterates
num_init_rollouts_pool: 5 # before training
num_rollouts_per_iter: 1

num_updates_per_iter: 0.25 # 1.0

# buffer params
buffer_type: seq_efficient
buffer_size: 1e6
batch_size: 32 # to tune based on sampled_seq_len
sampled_seq_len: -1 # -1 is all, or positive integer
sample_weight_baseline: 0.0

eval:
eval_stochastic: false # also eval stochastic policy
log_interval: 50 # num of iters
save_interval: -1
log_tensorboard: true

policy:
separate: True
arch: lstm # [lstm, gru]
algo: sacd # only support sac-discrete

action_embedding_size: 0 # no need for catch
state_embedding_size: 0 # use image encoder instead
image_encoder:
from_flattened: True
normalize_pixel: True

reward_embedding_size: 0
rnn_hidden_size: 128

dqn_layers: [128, 128]
policy_layers: [128, 128]
lr: 0.0003
gamma: 0.99
tau: 0.005

# sacd alpha
entropy_alpha: 0.1
automatic_entropy_tuning: False
target_entropy: None # the ratio: target_entropy = ratio * log(|A|)
alpha_lr: 0.0003

11 changes: 9 additions & 2 deletions environments.yml
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,6 @@ dependencies:
- pcre=8.44=he6710b0_0
- pexpect=4.8.0=pyhd3eb1b0_3
- pickleshare=0.7.5=pyhd3eb1b0_1003
- pip=20.3.3=py38h06a4308_0
- prometheus_client=0.9.0=pyhd3eb1b0_0
- prompt-toolkit=3.0.8=py_0
- ptyprocess=0.7.0=pyhd3eb1b0_2
Expand All @@ -149,7 +148,6 @@ dependencies:
- scipy=1.6.0=py38h91f5cce_0
- seaborn=0.11.1=pyhd3eb1b0_0
- send2trash=1.5.0=pyhd3eb1b0_1
- setuptools=52.0.0=py38h06a4308_0
- sip=4.19.13=py38he6710b0_0
- six=1.15.0=py38h06a4308_0
- sqlite=3.33.0=h62c20be_0
Expand Down Expand Up @@ -222,9 +220,11 @@ dependencies:
- importlib-resources==5.4.0
- ipdb==0.13.4
- jsonpickle==0.9.6
- keras==2.8.0
- keras-nightly==2.5.0.dev2021032900
- keras-preprocessing==1.1.2
- labmaze==1.0.3
- libclang==13.0.0
- lockfile==0.12.2
- lxml==4.6.2
- markdown==3.3.3
Expand All @@ -241,13 +241,16 @@ dependencies:
- pathspec==0.9.0
- patsy==0.5.2
- pillow==7.2.0
- pip==22.0.4
- platformdirs==2.4.0
- pot==0.8.1.0
- protobuf==3.19.4
- psutil==5.8.0
- py-cpuinfo==8.0.0
- pyasn1==0.4.8
- pyasn1-modules==0.2.8
- pybullet==3.1.0
- pycolab==1.2
- pyglet==1.5.0
- pyopengl==3.1.5
- pywavelets==1.1.1
Expand All @@ -259,14 +262,18 @@ dependencies:
- sacred==0.7.4
- sacremoses==0.0.45
- scikit-image==0.18.1
- setuptools==60.10.0
- statsmodels==0.13.2
- tables==3.6.1
- tabulate==0.8.9
- tensorboard==2.8.0
- tensorboard-data-server==0.6.1
- tensorboard-plugin-wit==1.8.0
- tensorboardx==1.8
- tensorflow==2.8.0
- tensorflow-io-gcs-filesystem==0.24.0
- termcolor==1.1.0
- tf-estimator-nightly==2.8.0.dev2021122109
- tifffile==2021.2.26
- tokenizers==0.10.2
- tomli==1.2.1
Expand Down
Loading