variable-td3

Traditionally, we learn a policy, and action is determined for every time-step. However, in many cases, it is also viable to simply repeat an action for multiple time-steps rather than determining a new action every time. This repeat factor is usually manually tuned and is kept constant. We hypothesize that keeping it constant may not be an ideal-policy as there could be scenarios in an environment where we need fine-step control as well as there could be scenarios where a larger-step control is feasible. For example, if we think of Lunar-Lander, we may need fine-step control as we are closer to the ground and attempting to land as compared to moments when we are high up in the space and large-repeat action may be feasible.

In this work, we learn a policy that learns an action as well as the time-step for which this action should be repeated. This gives the policy the ability to have large as well as fine-step control. We also hypothesize that learning to repeat an action may also lead to better sample efficiency. Our work utilizes "td3" as a core learning algorithm and updates it to have a q-value for each action-repeat, thereby, we call it "variable-td3".


Slide

Status:

Work has been halted.
Code provided as it is and no major updates expected.
This work is similar to the paper "Learning to repeat: fine grained action repetition for deep reinforcement learning"

Installation

Install conda

For classic tasks, do following:

conda env create -f  env.yml # creates env with name "vtd3"

Optional: For gym mujoco env. do following:
- Requires mjpro 150 and mujoco license
- conda activate vtd3 # activates conda-env. in (2)
- pip install 'gym[mujoco]'
Optional: For dm_control, create a separate conda env. having mujoco 2.0 :
- Requires mujoco200 and mujoco license
- conda env create -f env_mj2.yml # creates env with name "vtd3_mj2"

Having trouble during installation ?, please refer here

Usage

$ conda activate <env_name>
Train: $ python main.py --case classic_control --env Pendulum-v0 --opr train

Test: $ python main.py --case classic_control --env Pendulum-v0 --opr test

Required Arguments	Description
`--case {classic_control,box2d,mujoco,dm_control}`	It's used for switching between different domains(and configs)
`--env`	Name of the environment Environments corresponding to ease case: `classic_control` : {Pendulum-v0, MountainCarContinuous-v0} `box2d` : {LunarLanderContinuous-v2, BipedalWalker-v3, BipedalWalkerHardcore-v3} `mujoco`: (refer here) `dm_control`: (refer here)
`--opr {train,test}`	select the operation to be performed

Visualize Results: tensorboard --logdir=./results

Summarize plots in Plotly:

$ cd scripts
$ python summary_graphs.py --logdir=../results/classic_control --opr extract_summary 
$ python summary_graphs.py --logdir=../results/classic_control --opr plot

Installation Troubleshooting

Windows :

- Error: error: command 'swig.exe' failed ...
- Fix: install swigy from here and add it in your path using this reference.
- Error: error: Microsoft Visual C++ 14.0 is required. ...
- Fix: by installing build tools from here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

variable-td3

Installation

Usage

Installation Troubleshooting

Windows :

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 170 Commits
config		config
core		core
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.yml		env.yml
env_mj2.yml		env_mj2.yml
main.py		main.py

License

koulanurag/variable-td3

Folders and files

Latest commit

History

Repository files navigation

variable-td3

Installation

Usage

Installation Troubleshooting

Windows :

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages