Questions about the imitation learning #248

Tortes · 2020-11-18T07:46:28Z

I have marked all applicable categories:
- exception-raising bug
- RL algorithm bug
- documentation request (i.e. "X is missing from the documentation.")
- new feature request
I have visited the source website
I have searched through the issue tracker for duplicates

I have mentioned version numbers, operating system and environment, where applicable:

import tianshou, torch, sys
print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

Now I am training a new network with no pre-trained weights, while I meet with the action asturation problem(the agent only take one action). Therefore, I prepare to train the network with supervised data(generate from another optimization algorithm). So how could I send the supervised data to tianshou, or I have to write another script for training. Thanks for any help!
By the way, I currently use the PPO with onpolicy trainer, and is there any examples to #188 if it could solve my problem ?

Trinkle23897 · 2020-11-18T08:21:37Z

You can refer to the imitation learning script, as provided in test/discrete/test_a2c_with_il.py or test/continuous/test_sac_with_il.py. Another way is to format your data as a ReplayBuffer (ReplayBuffer can use pickle.save and pickle.load to store/load data) as the initialized data, therefore you can collect data offline and train your agent with any data you want.

zhujl1991 · 2020-12-14T17:21:30Z

You can refer to the imitation learning script, as provided in test/discrete/test_a2c_with_il.py or test/continuous/test_sac_with_il.py. Another way is to format your data as a ReplayBuffer (ReplayBuffer can use pickle.save and pickle.load to store/load data) as the initialized data, therefore you can collect data offline and train your agent with any data you want.

If I understand correctly, test/discrete/test_a2c_with_il.py or test/continuous/test_sac_with_il.py actually use the data collected by a2c/sac to train imitation learning, right? How can I disable the online collecting logic so that I can only use the offline collected data to train imitation model?

I think it looks confusing to train any offline algorithms with onpolicy/offpolicy trainer. IMO, there should be a separate offline trainer.

Trinkle23897 · 2020-12-15T00:49:46Z

You can use the buffer with collected data, together with pickle.load and pickle.save.

zhujl1991 · 2020-12-15T00:56:40Z

pickle.load

But when training, it still has to interact with env via collector, which is unnecessary for offline algorithms, right?
Do you think it makes sense to have a separate offline trainer for all offline algorithms, e.g., imitation, BCQ?

Trinkle23897 · 2020-12-15T00:59:22Z

That's a good point. At first I didn't implement this kind of trainer because we always know Dagger is better than BC.

Trinkle23897 added the question Further information is requested label Nov 18, 2020

Trinkle23897 closed this as completed Nov 29, 2020

Trinkle23897 reopened this Dec 15, 2020

zhujl1991 mentioned this issue Dec 15, 2020

Add offline trainer and discrete BCQ algorithm #263

Merged

Trinkle23897 linked a pull request Dec 16, 2020 that will close this issue

Add offline trainer and discrete BCQ algorithm #263

Merged

Trinkle23897 closed this as completed in #263 Jan 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the imitation learning #248

Questions about the imitation learning #248

Tortes commented Nov 18, 2020

Trinkle23897 commented Nov 18, 2020

zhujl1991 commented Dec 14, 2020 •

edited

Loading

Trinkle23897 commented Dec 15, 2020

zhujl1991 commented Dec 15, 2020

Trinkle23897 commented Dec 15, 2020 •

edited

Loading

Questions about the imitation learning #248

Questions about the imitation learning #248

Comments

Tortes commented Nov 18, 2020

Trinkle23897 commented Nov 18, 2020

zhujl1991 commented Dec 14, 2020 • edited Loading

Trinkle23897 commented Dec 15, 2020

zhujl1991 commented Dec 15, 2020

Trinkle23897 commented Dec 15, 2020 • edited Loading

zhujl1991 commented Dec 14, 2020 •

edited

Loading

Trinkle23897 commented Dec 15, 2020 •

edited

Loading