-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of FAN_tiny on ImageNet1K #15
Comments
Hi, Thanks for your interests in the work! Based on the previous experiments experience, the tiny model needs to be train on a large batch size (e.g. 1024) with 300 epochs. In your case, it is probably that the network is not converged yet. You can do a sanity check on this by observing the training loss and the validation loss, seeing that they are still decreasing indicates my point. If that's the case, you can simply increase the number of epochs to compensate the small batch size's impacts. I hope this can help your experiments a little bit. regards, |
Hi, Thanks for the response! I used 4 gpus and batch_size_per_gpu=200, that's total batch size of 800, which is not far from 1024 you used, so I think this shouldn't be the problem. I also double checked if the model has converged, and it seemed that the loss barely changed in the last 30 epochs so I guess it converged. I haven't found the reason yet, but I will try training a larger model to see if the problem still exists. Thanks! |
Hi! I've tried training FAN-S and I can reproduce the results in the paper. However, when I train FAN-L, I found that the validation accuracy reaches a peak of ~83.5 at around epoch 200, and then falls back to ~82.3 after all 300 epochs. Is this supposed to happen? I trained on 150 batch_size_per_gpu * 8 gpus. All other configurations follow the ones in the repo. Thanks! |
Hi, based on my previous experience, this typically indicates an overfitting. you can try to increase the value for the droppath. Regards, |
Thanks for the suggestion! I will try that. |
Hi, congratulations on the cool work!
One question about the code: when I train fan_tiny_12_p16_224 on IN1K, I got the clean accuracy of 77.454, lower than the reported 79.2. I followed all the hyperparameters setting as in the README, except that I train the model on 4 gpus, each with batch-size of 200. Will that severely affect the performance? Or is there any other possible reason? Thanks!
The text was updated successfully, but these errors were encountered: