Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about the effect of timesteps of diffusion model over different d4rl datasets #12

Open
thu-yao-01-luo opened this issue Sep 13, 2023 · 1 comment

Comments

@thu-yao-01-luo
Copy link

image
the above is the description of the effect of timesteps of diffusion model in your diffusion q learning method. However, there are several concerns here.

  • with more careful observation, in halfcheetah and walker, n=2 has the best performance with a little bit margin, and I think a more clear result of performance around the stage of convergence should be shown here.
  • the experiment is applied over the environment of medium-expert environment of d4rl dataset. However, as known to us, the medium expert dataset includes the expert dataset, and it is natural that higher diffusion steps can do BC better. And it has been shown that different timesteps results converge to roughly same performance. But it can not show the advantage of higher diffusion steps. But that higher steps, higher performance, should be the core story of this method, right?
  • I run your experiment over medium dataset with different timesteps, T=1,4,8,16. However, the results seem very poor? I wonder whether you have implement similar experiments and what your results look like. Thank you!
    image
    image
@Zhendong-Wang
Copy link
Owner

Hi there, thanks for your interest in our paper. "higher steps, higher performance" is not the core story of this method. Higher steps will help more in BC but offline RL is not BC. There is a trade-off between BC and policy optimization. Higher steps will also bring larger computational cost in our case, so we use T=5 in all cases. I remembered that I did try more higher steps, such as T=50 and T=100, but that doesn't help much in the performance improvement of medium-expert datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants