Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the imitation learning #248

Closed
2 of 8 tasks
Tortes opened this issue Nov 18, 2020 · 5 comments · Fixed by #263
Closed
2 of 8 tasks

Questions about the imitation learning #248

Tortes opened this issue Nov 18, 2020 · 5 comments · Fixed by #263
Labels
question Further information is requested

Comments

@Tortes
Copy link

Tortes commented Nov 18, 2020

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, sys
    print(tianshou.__version__, torch.__version__, sys.version, sys.platform)

Now I am training a new network with no pre-trained weights, while I meet with the action asturation problem(the agent only take one action). Therefore, I prepare to train the network with supervised data(generate from another optimization algorithm). So how could I send the supervised data to tianshou, or I have to write another script for training. Thanks for any help!
By the way, I currently use the PPO with onpolicy trainer, and is there any examples to #188 if it could solve my problem ?

@Trinkle23897
Copy link
Collaborator

You can refer to the imitation learning script, as provided in test/discrete/test_a2c_with_il.py or test/continuous/test_sac_with_il.py. Another way is to format your data as a ReplayBuffer (ReplayBuffer can use pickle.save and pickle.load to store/load data) as the initialized data, therefore you can collect data offline and train your agent with any data you want.

@Trinkle23897 Trinkle23897 added the question Further information is requested label Nov 18, 2020
@zhujl1991
Copy link
Contributor

zhujl1991 commented Dec 14, 2020

You can refer to the imitation learning script, as provided in test/discrete/test_a2c_with_il.py or test/continuous/test_sac_with_il.py. Another way is to format your data as a ReplayBuffer (ReplayBuffer can use pickle.save and pickle.load to store/load data) as the initialized data, therefore you can collect data offline and train your agent with any data you want.

If I understand correctly, test/discrete/test_a2c_with_il.py or test/continuous/test_sac_with_il.py actually use the data collected by a2c/sac to train imitation learning, right? How can I disable the online collecting logic so that I can only use the offline collected data to train imitation model?

I think it looks confusing to train any offline algorithms with onpolicy/offpolicy trainer. IMO, there should be a separate offline trainer.

@Trinkle23897
Copy link
Collaborator

You can use the buffer with collected data, together with pickle.load and pickle.save.

@zhujl1991
Copy link
Contributor

pickle.load

But when training, it still has to interact with env via collector, which is unnecessary for offline algorithms, right?
Do you think it makes sense to have a separate offline trainer for all offline algorithms, e.g., imitation, BCQ?

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Dec 15, 2020

That's a good point. At first I didn't implement this kind of trainer because we always know Dagger is better than BC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants