AWAC doesn't profit from offline data #166

im-Kitsch · 2022-07-14T03:50:34Z

Hi,

@anair13 , it's nice that we can get the code, seems you answer AWAC questions frequently, so I just directly make "@" to you.

In AWAC paper the main benifit is that switching from offline-training to online training there is no "dip" of the performance. But when I run it on mujoco-gym environment, it doesn't get benifit from the pre-training on offline dataset.

HalfCheetah, it learns nothing , the episode returns are almost always below zero.
Ant, it performs nearly expert performance after switching from offline to online, but it have a huge dip to nearly zero.
Walker2d, it also has a dip.

I run the code in repo examples/awac/mujoco/awac1.py with all default settings, seems pretraining on offline data doesn't help these experiments. I find this link in issues(https://drive.google.com/file/d/1Qy5SYIGNwdeTHAGNjbRfuP5pSiRw8JzJ/view), looks in this file the leraning processs also doesn't profit much from the offline-learning.

Do I have to change any hyperparameter? If would be really super nice if I can reproduce the paper result.

Looking forward to your reply.

Best.

The text was updated successfully, but these errors were encountered:

Winston-Gu · 2022-07-14T14:38:52Z

Met the same problem... In my case, i checked my result in "pretrain_q.csv", and found it seem like the offline_training procedure didn't actually happen... I'm looking closely into the source code, and i think maybe the default hyperparameters should be alterd.

Winston-Gu · 2022-07-14T16:18:25Z

This is my result for HalfCheetah, as you noted, "it learned nothing".

While the result shown in the paper looks like this:

I noticed that when creating the HalfCheetah-v2 environment, gym raised a warning indicating that HalfCheetah-v2 is outdated, is there any possibility that some changes in the environment caused this problem?

Roberto09 · 2022-07-19T17:36:38Z

Just wondering, is the general issue that after pretraining the average returns go to zero during the training phase? Or that the model learns nothing during pretraining (i.e. returns are always near 0 during the pretraining phase)?

linhlpv · 2023-04-24T06:43:53Z

Hi @Winston-Gu , it seems that my question is not related to the problem discussed in here but and I am sorry for that. But I'm trying to reproduce the AWAC results and stucking with creating the figures like showed in the AWAC paper. I see that you maybe could create the similar figures like in AWAC paper. Could you please help me with that?
Thank you so much and wish you have a nice day.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWAC doesn't profit from offline data #166

AWAC doesn't profit from offline data #166

im-Kitsch commented Jul 14, 2022

Winston-Gu commented Jul 14, 2022

Winston-Gu commented Jul 14, 2022

Roberto09 commented Jul 19, 2022

linhlpv commented Apr 24, 2023

AWAC doesn't profit from offline data #166

AWAC doesn't profit from offline data #166

Comments

im-Kitsch commented Jul 14, 2022

Winston-Gu commented Jul 14, 2022

Winston-Gu commented Jul 14, 2022

Roberto09 commented Jul 19, 2022

linhlpv commented Apr 24, 2023