-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
questions about stage 1 training #6
Comments
I also have the same requirement and hope to obtain the code for the first stage of training. |
I have reimplemented the stage 1 training. If you are still interesting in it, please concat me. |
I have reimplemented the stage 1 training. If you are still interesting in it, please concat me. |
Hi, have you successfully trained stage 1? I've reimplemented it but when I trained the model, the noise estimation loss will always stuck around 1.0. Thanks for any insights. |
Try to change the fut_traj size to (b,1,2) to train, and save the model to train again with size(b,T,2) and batchsize 250. The loss will stuck around 0.12 which is closed to 0.06 of pretrained model. |
Hi, do you mean to implement stage 1 (pre-train model) only need to change the shape type? @woyoudian2gou |
Yes, if you follow the steps I described above, you will get a model that is close to pre-train model. Time step T and batchsize are both factors that affect training. |
@woyoudian2gou Thank you for your reply. It will change config(cfg):
Also change these related parameters in ./trainer/train_led_trajectory_augment_input.py, right? Looking forward to your reply! or could you share your related code via Google drive or other cloud? Thanks in advance! |
No, you should change the shape of fut_traj like Loss_NE(past_traj,fut_traj[:,0,:].unsqueeze(1),traj_mask). And once you have trained this model, change the batchsize to 250 and use origin shape of fut_traj to continue training. |
@woyoudian2gou Hello, What does Loss_NE( ) mean? Where can I find this code? thanks. |
Hello, i am confused about how to reimplenment the stage 1 training, would you please leave me a contact information? |
Hi, Thank you very much for sharing your unique training method. I find it very interesting! However, I have a few questions for clarification:
|
|
Thank you very much for your response! It resolved my issue and provided immense help. Your suggestion to adjust the batch size from 10 to 250, processing 250*11 agents at a time, is indeed a sensible configuration for a diffusion model. However, my hardware might not support running such a large volume of data simultaneously. I will attempt to use a slightly smaller batch size. Once again, I appreciate your reply! @kkk00714 |
I hope you can successfully replicate the stage 1 training process. The inspiration for changing the size of future_traj from (b, T, 2) to (b, 1,2) came from the fact that I found that the noise value of the same sample was almost exactly the same at all 30 time steps. I hope this will help you with your subsequent adjustments. |
Thank you for your kind words and positive outlook! It truly is an intriguing finding, and your ability to implement it effectively showcases your talent. This discovery may not be coincidental at all; it's possible that this approach could be universally applied across diffusion models to yield even better performing ones. Once again, I appreciate your response and insight—it's invaluable for further advancements. |
Hello, I would like to know how you implemented the first stage of denoising training. Did you use the LED module in the first stage of training? Thank you very much! |
As the paper described, LED module is not used in the first training stage. You just need to use the loss_ne function that comes with the author's code to train fut_traj after changing its shape as I described earlier. |
Hi,
Thank you for sharing your code. As mentioned in your paper, there are 2 stages of training. First for the denoising diffusion model, and second focuses on the leapfrog initializer. It seems the repo provides the code of stage 2 training, which loaded the pretraining checkpoint of the denoising diffusion model directly. Could you also provide the code for stage 1 training? Do you use the leapfrog initializer in the first stage? If so, what are the initialized values of the estimated mean, variance, and sample prediction you used? Thanks!
The text was updated successfully, but these errors were encountered: