Training details #11

Imalne · 2022-09-03T13:52:41Z

"The model pretraining stage takes about 3 days on 2 GeForce RTX 3090 GPUs" in paper. However I use the provided settings and it takes about 17 days to train on 2 GeForce RTX 3090s. Besides, the training results of PSNR, SSIM, and Lpips are far different from the provided pretrained-weights. I want to confirm the total training iter, batch size, and learning rate in configuration of https://github.com/chaofengc/FeMaSR/blob/main/options/train_FeMaSR_HQ_pretrain_stage.yml are accurate settings to reproduce the paper's results?

Besides, I got "CUDA Out of Memory" error when I tried to enlarge the batch size on each GPU according to the suggestion in issue #9.

chaofengc · 2022-09-05T06:16:39Z

The training config is same with the provided config file, except that training iteration num is set to 2000k to select suitable training iterations. In practice, 200k is enough, as clarified in #9.

Please provide more details about your results. Otherwise, I am not able to help find the problem.

As for the batch size, I would use larger batch size for training if I got better GPUs.

Mayongrui · 2022-09-22T14:29:43Z

Hi Chaofeng, I met a similar problem reported by ImaIne.

I used the default setting to pre-train the HRP. However, the numeric metrics were even lower than the reported SR results, x4 only 23 dB PSNR and 0.6 on DIV2K Validation set, even lower than the SR results reported by the paper. And the visualization of the images was also color-shifted to be over-yellow.

I rechecked the code and the paper, and found that maybe the implementation of the code was a little different from the paper. As the paper claimed, the training objective during pretraining is described as eq. 4 of the paper. Only L1 loss was adopted to force the reconstructed frame approximate the ground truth. While the released code adopted both L1, perceptual, and GAN loss for supervision.

Is this the point that degraded the pertaining results? Or do you have any other suggestions or guidance?

chaofengc · 2022-09-23T00:55:44Z

Please show your results to help find the problem. The perceptual and GAN loss are essential to train HRP. The paper omited them for simplicity (as described after eq 4. in the paper).

Mayongrui · 2022-09-25T07:56:51Z

Please find the following link for my results:https://drive.google.com/file/d/1c4uOc1vxlVQS5ZCSzOaEraXxUcjxVA10/view?usp=sharing

After training the network for 250K iterations with the default setting, I got the results by setting save_img to be true. The validation set was DIV2K validation set, including 100 images. As we could find, there are severe color shifting issues for most of the images, and the validation metrics are as follows,
PSNR: 19.1810
SSIM: 0.5650
LPIPS: 0.2974

which were far lower even compared with x4 SR results reported in the paper.

Mayongrui · 2022-09-25T10:15:35Z

The training logs, weights, and codes can be found here: https://drive.google.com/drive/folders/1WP8PZTVmAeof7LKbEsbtPlFwVhNPuvgc?usp=sharing

chaofengc · 2022-09-26T03:07:51Z

After checking the codes and retrain the model, I found the problem is the missing initialization in VectorQuantizer class. The training works fine after fixing it. You may find the example training logs in wandb: https://wandb.ai/chaofeng/FeMaSR?workspace=user-chaofeng. Note that I didn't finish the training (up to 70k in the example log) due to limited resources, and longer training is supposed to get better results.

FeMaSR/basicsr/archs/femasr_arch.py

Line 33 in af0b2b8

self.embedding.weight.data.uniform_(-1.0 / self.n_e, 1.0 / self.n_e)

chaofengc closed this as completed Sep 26, 2022

Mayongrui mentioned this issue Oct 4, 2022

Training of LR stage #15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training details #11

Training details #11

Imalne commented Sep 3, 2022

chaofengc commented Sep 5, 2022

Mayongrui commented Sep 22, 2022

chaofengc commented Sep 23, 2022 •

edited

Loading

Mayongrui commented Sep 25, 2022

Mayongrui commented Sep 25, 2022

chaofengc commented Sep 26, 2022

Training details #11

Training details #11

Comments

Imalne commented Sep 3, 2022

chaofengc commented Sep 5, 2022

Mayongrui commented Sep 22, 2022

chaofengc commented Sep 23, 2022 • edited Loading

Mayongrui commented Sep 25, 2022

Mayongrui commented Sep 25, 2022

chaofengc commented Sep 26, 2022

chaofengc commented Sep 23, 2022 •

edited

Loading