Feature/save agents #185

cpnota · 2020-11-20T16:50:10Z

It's here! This PR dds the ability to save/load agents, addressing #161 .

There are a few keys to the design. First of all, rather than having agent.act(state) and agent.eval(state), agents were split into a training-mode Agent and a TestAgent. Both types of agents can be instantiated by a Preset:

agent = preset.agent()
test_agent = preset.test_agent()

The Preset is a serializable object containing the hyperparameters and all necessary torch models. The TestAgent inherits a copy of the model trained by the Agent, allowing TestAgents from different points in training to be stored.

The second major key to the design is that the Preset, rather than the Agent itself is saved:

preset.save(filename)
preset = torch.load(filename)

This is important because the underlying Agent objects are often difficult to serialize, and even if they can be serialized they can take up an excessive amount of storage (for example, a standard 1 million frame Atari replay buffer is ~7 GB).

One thing to note is that while this design supports creating a training mode Agent with a previously trained network, it does not support a full "resume" of training, e.g., scheduler states will be reset and replay buffers will be cleared. Full resume functionality introduces many difficulties which interfere with the design of the library, however, we may implement a partial solution in the future.

Example usage can be found below:

# construct the preset
preset = builder().hyperparameters(lr=1e-3).env(some_env).build()

# run agent in train mode
agent = preset.agent()
for i in range(episodes)
    run_episode(agent, some_env)

# run agent in test mode
test_agent = preset.test_agent()
for i in range(test_episodes)
    run_episode(test_agent, some_env)

# save the model for later
preset.save('dqn.pt')

# load model from disk and watch
preset = torch.load('dqn.pt')
test_agent = preset.test_agent()
for i in range(test_episodes)
    run_episode(test_agent, some_env, render=True)

cpnota added 30 commits November 11, 2020 12:41

initial implementation for dqn

26b81ca

update watch code

f24f638

simply usage of dqn builder

3d2877a

add a2c preset

7d0df7e

make approximation optimizer optional

a8755b0

update a2c

c489b98

add preset unit test

ec6d974

add c51 atari preset

23b9069

add train_steps parameter

7d4eb64

add train_step to a2c

b6ff8e0

add train_steps to c51

3515428

change render command line usage

12d18e4

add hyperparameter parser

f213915

add DDQNAtariPreset

84ff465

update ppo

ff149f5

rainbow preset

08f140d

add vac atari preset

3529b63

vpg preset

129c200

add vqn and vsarsa presets

6540520

update integration tests

69687e2

make parallel env experiment test with single env agent

31a89d0

tweak function signature for Preset

2d980ab

try to get docstrings working

37af692

get documentation working properly

4f00bbf

re-add model constructor to a2c preset

687c0a5

separate keyword args

be24222

update all docstrings and re-add model constructors

730725d

start converting cc presets

f6ac16b

update c51 cc preset

0c9e96d

update ddqn classic control preset

c6819de

cpnota added 21 commits November 19, 2020 11:40

update dqn cc preset

de2978f

add classic control preset test

616f633

ppo cc preset

2319314

add rainbow cc preset

37f6b07

add VAC cc preset

3068e5c

add vpg cc preset

dcfe6ad

add vqn cc preset

d540488

add vsarsa cc preset

b3c1ad4

export presets

60931fc

add ddpg preset

44119a2

ppo

9fbe43b

add sac

792c544

fix continuous preset integration tests

c145080

update classic control integration tests

d2f6af2

fix single env experiment test

e99f80b

fix policy tests

88263b4

run autopep8

3cd9df2

deep copy everything

e00ea40

run autopep on integration tests

bea07cb

fix linting

9c83754

update watch scripts

0f5eb7d

cpnota merged commit 8f65a70 into develop Nov 23, 2020

cpnota deleted the feature/save-agents branch November 23, 2020 18:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/save agents #185

Feature/save agents #185

cpnota commented Nov 20, 2020

Feature/save agents #185

Feature/save agents #185

Conversation

cpnota commented Nov 20, 2020