-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
about training #5
Comments
A full training run took about 48 hours on a 4 V100 GPU machine. I think you could train for about 1-1.5 days to get good results, but not quite best.. |
@vb000 I use 4 3090 GPU machine.But one epoch training run took about 2 hours. A full training run took about 200 hours on a 4 3090 GPU machine.It runs much more slowly than yours. Do you know the reason? In addition, the training set contains 83876 videos. Are there so many videos in your training set? |
@vb000 In your params.json, the batchsize is 8. I use 4 3090 GPU machine, should I change the batchsize to 32? |
The training set had somewhere close to 64k videos. The exact link to the dataset is this. No, we used batch size 8 with 4 GPUs. You could try 32 batch size for faster training at probably a small cost of accuracy.. |
It's single precision.. float32 |
@vb000
|
Vimeo-90k train list has 64612 sequences, please use this link get the dataset we used.
No, because it only has 7 frame sequences. We used validation set from REDS dataset.
100 in the script is the max number of epochs. 80th epoch was the best performing epoch based on validation metrics. |
@vb000 REDS dataset means REDS4? REDS4 is set of 4 1280x720videos each containing 100 frames. |
Hi, Refer to the footnote in page 6, in the following paper for REDS train and val sets: https://openaccess.thecvf.com/content/CVPR2021/papers/Chan_BasicVSR_The_Search_for_Essential_Components_in_Video_Super-Resolution_and_CVPR_2021_paper.pdf |
@vb000 Thank you very much for your reply. I still have questions:
|
Hi, Sorry for the late reply. Responses inline..
Yes.
Both modes work, we trained it using the later approach: we convert color LR frame to Lab color space and provide only L-channel to the model.
We trained it on cluster, where various machines are used based on availability. So, we currently do not have access to this data.
I think that might be normal. I might have misquoted the runtimes, as I might have remembered them wrong. One suggestion I have is, you might want to make sure data loading is not the bottleneck. |
Hi, how long does it take you to train once?Under the condition that the effect will not be greatly reduced, how much can the training epoch be reduced?
The text was updated successfully, but these errors were encountered: