-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWAC doesn't profit from offline data #166
Comments
Met the same problem... In my case, i checked my result in "pretrain_q.csv", and found it seem like the offline_training procedure didn't actually happen... I'm looking closely into the source code, and i think maybe the default hyperparameters should be alterd. |
Just wondering, is the general issue that after pretraining the average returns go to zero during the training phase? Or that the model learns nothing during pretraining (i.e. returns are always near 0 during the pretraining phase)? |
Hi @Winston-Gu , it seems that my question is not related to the problem discussed in here but and I am sorry for that. But I'm trying to reproduce the AWAC results and stucking with creating the figures like showed in the AWAC paper. I see that you maybe could create the similar figures like in AWAC paper. Could you please help me with that? |
Hi,
@anair13 , it's nice that we can get the code, seems you answer AWAC questions frequently, so I just directly make "@" to you.
In AWAC paper the main benifit is that switching from offline-training to online training there is no "dip" of the performance. But when I run it on mujoco-gym environment, it doesn't get benifit from the pre-training on offline dataset.
I run the code in repo
examples/awac/mujoco/awac1.py
with all default settings, seems pretraining on offline data doesn't help these experiments. I find this link in issues(https://drive.google.com/file/d/1Qy5SYIGNwdeTHAGNjbRfuP5pSiRw8JzJ/view), looks in this file the leraning processs also doesn't profit much from the offline-learning.Do I have to change any hyperparameter? If would be really super nice if I can reproduce the paper result.
Looking forward to your reply.
Best.
The text was updated successfully, but these errors were encountered: