Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is no checkpoint output, #5

Open
zhangbaijin opened this issue Dec 1, 2022 · 9 comments
Open

There is no checkpoint output, #5

zhangbaijin opened this issue Dec 1, 2022 · 9 comments

Comments

@zhangbaijin
Copy link

Thanks for your contribution, and after train, i get nothing, no checkpoint, why?

`----------------------------
| grad_norm | 0.00858 |
| lg_loss_scale | 20.9 |
| loss | 0.00124 |
| loss_q0 | 0.00374 |
| loss_q1 | 0.000934 |
| loss_q2 | 0.00063 |
| loss_q3 | 0.000106 |
| mse | 0.00124 |
| mse_q0 | 0.00374 |
| mse_q1 | 0.000934 |
| mse_q2 | 0.00063 |
| mse_q3 | 0.000106 |
| param_norm | 240 |
| samples | 4e+05 |
| step | 5e+04 |

saving model 0...
saving model 0.9999...`

@dillfrescott
Copy link

Take a look at how I'm training mine, I'm getting a model output with this configuration...

!LOGDIR="OUTPUT/sinddpm-yourimage-day-commitseq" python image_train.py --data_dir /content/upscaled2.png --lr 5e-4 --diffusion_steps 1000 --image_size 256 \
                                   --noise_schedule linear --num_channels 64 --num_head_channels 16 --channel_mult "1,2,4" \
                                   --attention_resolutions "2" --num_res_blocks 1 --resblock_updown False --use_fp16 True \
                                   --use_scale_shift_norm True --use_checkpoint True --batch_size 16

Hope this helps!

@zhangbaijin
Copy link
Author

It seems that bash: !LOGDIR="OUTPUT/sinddpm: event not found. Is that true?

@zhangbaijin
Copy link
Author

It has been sovled, thanks for a lot

@dillfrescott
Copy link

No problem

@cwy08090014
Copy link

@zhangbaijin May I ask how did this problem solve? I meet with the same problem. Thanks.

@KevinWang676
Copy link

Hi @dillfrescott, I used the same code as yours and the size of my input image is 256x256. But I got an error called RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 54 but got size 53 for tensor number 1 in the list. Could you help me resolve this issue? Thank you in advance!

image

@KevinWang676
Copy link

Hi @dillfrescott, I resolved the problem just now by adding \ after each line. But I wonder if you know how many steps the training process takes in total. Thank you.

@dillfrescott
Copy link

@KevinWang676 It's been a while since ive used this project. I'm glad you were able to fix it though but I do not remember how many steps the training process takes in total.

@Shaohua987
Copy link

It has been sovled, thanks for a lot

Hello, may I ask how many steps did you train to complete the training?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants