-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training details #11
Comments
The training config is same with the provided config file, except that training iteration num is set to 2000k to select suitable training iterations. In practice, 200k is enough, as clarified in #9. Please provide more details about your results. Otherwise, I am not able to help find the problem. As for the batch size, I would use larger batch size for training if I got better GPUs. |
Hi Chaofeng, I met a similar problem reported by ImaIne. I used the default setting to pre-train the HRP. However, the numeric metrics were even lower than the reported SR results, x4 only 23 dB PSNR and 0.6 on DIV2K Validation set, even lower than the SR results reported by the paper. And the visualization of the images was also color-shifted to be over-yellow. I rechecked the code and the paper, and found that maybe the implementation of the code was a little different from the paper. As the paper claimed, the training objective during pretraining is described as eq. 4 of the paper. Only L1 loss was adopted to force the reconstructed frame approximate the ground truth. While the released code adopted both L1, perceptual, and GAN loss for supervision. Is this the point that degraded the pertaining results? Or do you have any other suggestions or guidance? |
Please show your results to help find the problem. The perceptual and GAN loss are essential to train HRP. The paper omited them for simplicity (as described after eq 4. in the paper). |
Please find the following link for my results:https://drive.google.com/file/d/1c4uOc1vxlVQS5ZCSzOaEraXxUcjxVA10/view?usp=sharing After training the network for 250K iterations with the default setting, I got the results by setting save_img to be true. The validation set was DIV2K validation set, including 100 images. As we could find, there are severe color shifting issues for most of the images, and the validation metrics are as follows, which were far lower even compared with x4 SR results reported in the paper. |
The training logs, weights, and codes can be found here: https://drive.google.com/drive/folders/1WP8PZTVmAeof7LKbEsbtPlFwVhNPuvgc?usp=sharing |
After checking the codes and retrain the model, I found the problem is the missing initialization in FeMaSR/basicsr/archs/femasr_arch.py Line 33 in af0b2b8
|
"The model pretraining stage takes about 3 days on 2 GeForce RTX 3090 GPUs" in paper. However I use the provided settings and it takes about 17 days to train on 2 GeForce RTX 3090s. Besides, the training results of PSNR, SSIM, and Lpips are far different from the provided pretrained-weights. I want to confirm the total training iter, batch size, and learning rate in configuration of https://github.com/chaofengc/FeMaSR/blob/main/options/train_FeMaSR_HQ_pretrain_stage.yml are accurate settings to reproduce the paper's results?
Besides, I got "CUDA Out of Memory" error when I tried to enlarge the batch size on each GPU according to the suggestion in issue #9.
The text was updated successfully, but these errors were encountered: