You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the above is the description of the effect of timesteps of diffusion model in your diffusion q learning method. However, there are several concerns here.
with more careful observation, in halfcheetah and walker, n=2 has the best performance with a little bit margin, and I think a more clear result of performance around the stage of convergence should be shown here.
the experiment is applied over the environment of medium-expert environment of d4rl dataset. However, as known to us, the medium expert dataset includes the expert dataset, and it is natural that higher diffusion steps can do BC better. And it has been shown that different timesteps results converge to roughly same performance. But it can not show the advantage of higher diffusion steps. But that higher steps, higher performance, should be the core story of this method, right?
I run your experiment over medium dataset with different timesteps, T=1,4,8,16. However, the results seem very poor? I wonder whether you have implement similar experiments and what your results look like. Thank you!
The text was updated successfully, but these errors were encountered:
Hi there, thanks for your interest in our paper. "higher steps, higher performance" is not the core story of this method. Higher steps will help more in BC but offline RL is not BC. There is a trade-off between BC and policy optimization. Higher steps will also bring larger computational cost in our case, so we use T=5 in all cases. I remembered that I did try more higher steps, such as T=50 and T=100, but that doesn't help much in the performance improvement of medium-expert datasets.
the above is the description of the effect of timesteps of diffusion model in your diffusion q learning method. However, there are several concerns here.
The text was updated successfully, but these errors were encountered: