-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions about the imitation learning #248
Comments
You can refer to the imitation learning script, as provided in |
If I understand correctly, I think it looks confusing to train any offline algorithms with onpolicy/offpolicy trainer. IMO, there should be a separate offline trainer. |
You can use the buffer with collected data, together with |
But when training, it still has to interact with env via collector, which is unnecessary for offline algorithms, right? |
That's a good point. At first I didn't implement this kind of trainer because we always know Dagger is better than BC. |
Now I am training a new network with no pre-trained weights, while I meet with the action asturation problem(the agent only take one action). Therefore, I prepare to train the network with supervised data(generate from another optimization algorithm). So how could I send the supervised data to tianshou, or I have to write another script for training. Thanks for any help!
By the way, I currently use the PPO with onpolicy trainer, and is there any examples to #188 if it could solve my problem ?
The text was updated successfully, but these errors were encountered: