Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training details #11

Closed
Imalne opened this issue Sep 3, 2022 · 6 comments
Closed

Training details #11

Imalne opened this issue Sep 3, 2022 · 6 comments

Comments

@Imalne
Copy link

Imalne commented Sep 3, 2022

"The model pretraining stage takes about 3 days on 2 GeForce RTX 3090 GPUs" in paper. However I use the provided settings and it takes about 17 days to train on 2 GeForce RTX 3090s. Besides, the training results of PSNR, SSIM, and Lpips are far different from the provided pretrained-weights. I want to confirm the total training iter, batch size, and learning rate in configuration of https://github.com/chaofengc/FeMaSR/blob/main/options/train_FeMaSR_HQ_pretrain_stage.yml are accurate settings to reproduce the paper's results?

Besides, I got "CUDA Out of Memory" error when I tried to enlarge the batch size on each GPU according to the suggestion in issue #9.

@chaofengc
Copy link
Owner

The training config is same with the provided config file, except that training iteration num is set to 2000k to select suitable training iterations. In practice, 200k is enough, as clarified in #9.

Please provide more details about your results. Otherwise, I am not able to help find the problem.

As for the batch size, I would use larger batch size for training if I got better GPUs.

@Mayongrui
Copy link

Hi Chaofeng, I met a similar problem reported by ImaIne.

I used the default setting to pre-train the HRP. However, the numeric metrics were even lower than the reported SR results, x4 only 23 dB PSNR and 0.6 on DIV2K Validation set, even lower than the SR results reported by the paper. And the visualization of the images was also color-shifted to be over-yellow.

I rechecked the code and the paper, and found that maybe the implementation of the code was a little different from the paper. As the paper claimed, the training objective during pretraining is described as eq. 4 of the paper. Only L1 loss was adopted to force the reconstructed frame approximate the ground truth. While the released code adopted both L1, perceptual, and GAN loss for supervision.

Is this the point that degraded the pertaining results? Or do you have any other suggestions or guidance?

@chaofengc
Copy link
Owner

chaofengc commented Sep 23, 2022

Please show your results to help find the problem. The perceptual and GAN loss are essential to train HRP. The paper omited them for simplicity (as described after eq 4. in the paper).

@Mayongrui
Copy link

Please find the following link for my results:https://drive.google.com/file/d/1c4uOc1vxlVQS5ZCSzOaEraXxUcjxVA10/view?usp=sharing

After training the network for 250K iterations with the default setting, I got the results by setting save_img to be true. The validation set was DIV2K validation set, including 100 images. As we could find, there are severe color shifting issues for most of the images, and the validation metrics are as follows,
PSNR: 19.1810
SSIM: 0.5650
LPIPS: 0.2974

which were far lower even compared with x4 SR results reported in the paper.

@Mayongrui
Copy link

The training logs, weights, and codes can be found here: https://drive.google.com/drive/folders/1WP8PZTVmAeof7LKbEsbtPlFwVhNPuvgc?usp=sharing

@chaofengc
Copy link
Owner

After checking the codes and retrain the model, I found the problem is the missing initialization in VectorQuantizer class. The training works fine after fixing it. You may find the example training logs in wandb: https://wandb.ai/chaofeng/FeMaSR?workspace=user-chaofeng. Note that I didn't finish the training (up to 70k in the example log) due to limited resources, and longer training is supposed to get better results.

self.embedding.weight.data.uniform_(-1.0 / self.n_e, 1.0 / self.n_e)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants