There is no checkpoint output, #5

zhangbaijin · 2022-12-01T14:11:16Z

Thanks for your contribution, and after train, i get nothing, no checkpoint, why?

`----------------------------
| grad_norm | 0.00858 |
| lg_loss_scale | 20.9 |
| loss | 0.00124 |
| loss_q0 | 0.00374 |
| loss_q1 | 0.000934 |
| loss_q2 | 0.00063 |
| loss_q3 | 0.000106 |
| mse | 0.00124 |
| mse_q0 | 0.00374 |
| mse_q1 | 0.000934 |
| mse_q2 | 0.00063 |
| mse_q3 | 0.000106 |
| param_norm | 240 |
| samples | 4e+05 |
| step | 5e+04 |

saving model 0...
saving model 0.9999...`

dillfrescott · 2022-12-03T03:20:40Z

Take a look at how I'm training mine, I'm getting a model output with this configuration...

!LOGDIR="OUTPUT/sinddpm-yourimage-day-commitseq" python image_train.py --data_dir /content/upscaled2.png --lr 5e-4 --diffusion_steps 1000 --image_size 256 \
                                   --noise_schedule linear --num_channels 64 --num_head_channels 16 --channel_mult "1,2,4" \
                                   --attention_resolutions "2" --num_res_blocks 1 --resblock_updown False --use_fp16 True \
                                   --use_scale_shift_norm True --use_checkpoint True --batch_size 16

Hope this helps!

zhangbaijin · 2022-12-03T06:39:11Z

It seems that bash: !LOGDIR="OUTPUT/sinddpm: event not found. Is that true?

zhangbaijin · 2022-12-03T06:41:35Z

It has been sovled, thanks for a lot

dillfrescott · 2022-12-04T00:21:15Z

No problem

cwy08090014 · 2023-01-14T08:11:05Z

@zhangbaijin May I ask how did this problem solve? I meet with the same problem. Thanks.

KevinWang676 · 2023-06-18T18:44:42Z

Hi @dillfrescott, I used the same code as yours and the size of my input image is 256x256. But I got an error called RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 54 but got size 53 for tensor number 1 in the list. Could you help me resolve this issue? Thank you in advance!

KevinWang676 · 2023-06-18T19:44:26Z

Hi @dillfrescott, I resolved the problem just now by adding \ after each line. But I wonder if you know how many steps the training process takes in total. Thank you.

dillfrescott · 2023-06-20T01:34:58Z

@KevinWang676 It's been a while since ive used this project. I'm glad you were able to fix it though but I do not remember how many steps the training process takes in total.

Shaohua987 · 2023-11-23T08:45:31Z

It has been sovled, thanks for a lot

Hello, may I ask how many steps did you train to complete the training？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

There is no checkpoint output, #5

There is no checkpoint output, #5

zhangbaijin commented Dec 1, 2022

dillfrescott commented Dec 3, 2022

zhangbaijin commented Dec 3, 2022

zhangbaijin commented Dec 3, 2022

dillfrescott commented Dec 4, 2022

cwy08090014 commented Jan 14, 2023

KevinWang676 commented Jun 18, 2023

KevinWang676 commented Jun 18, 2023

dillfrescott commented Jun 20, 2023

Shaohua987 commented Nov 23, 2023

There is no checkpoint output, #5

There is no checkpoint output, #5

Comments

zhangbaijin commented Dec 1, 2022

dillfrescott commented Dec 3, 2022

zhangbaijin commented Dec 3, 2022

zhangbaijin commented Dec 3, 2022

dillfrescott commented Dec 4, 2022

cwy08090014 commented Jan 14, 2023

KevinWang676 commented Jun 18, 2023

KevinWang676 commented Jun 18, 2023

dillfrescott commented Jun 20, 2023

Shaohua987 commented Nov 23, 2023