Performance of FAN_tiny on ImageNet1K #15

bfshi · 2022-10-16T19:26:28Z

Hi, congratulations on the cool work!

One question about the code: when I train fan_tiny_12_p16_224 on IN1K, I got the clean accuracy of 77.454, lower than the reported 79.2. I followed all the hyperparameters setting as in the README, except that I train the model on 4 gpus, each with batch-size of 200. Will that severely affect the performance? Or is there any other possible reason? Thanks!

zhoudaquan · 2022-10-18T03:32:58Z

Hi,

Thanks for your interests in the work! Based on the previous experiments experience, the tiny model needs to be train on a large batch size (e.g. 1024) with 300 epochs.

In your case, it is probably that the network is not converged yet. You can do a sanity check on this by observing the training loss and the validation loss, seeing that they are still decreasing indicates my point.

If that's the case, you can simply increase the number of epochs to compensate the small batch size's impacts.

I hope this can help your experiments a little bit.

regards,
DQ

bfshi · 2022-10-21T05:55:28Z

Hi,

Thanks for the response! I used 4 gpus and batch_size_per_gpu=200, that's total batch size of 800, which is not far from 1024 you used, so I think this shouldn't be the problem. I also double checked if the model has converged, and it seemed that the loss barely changed in the last 30 epochs so I guess it converged. I haven't found the reason yet, but I will try training a larger model to see if the problem still exists.

Thanks!

bfshi · 2022-11-02T21:40:41Z

Hi! I've tried training FAN-S and I can reproduce the results in the paper. However, when I train FAN-L, I found that the validation accuracy reaches a peak of ~83.5 at around epoch 200, and then falls back to ~82.3 after all 300 epochs. Is this supposed to happen? I trained on 150 batch_size_per_gpu * 8 gpus. All other configurations follow the ones in the repo. Thanks!

zhoudaquan · 2022-11-03T01:47:24Z

Hi! I've tried training FAN-S and I can reproduce the results in the paper. However, when I train FAN-L, I found that the validation accuracy reaches a peak of ~83.5 at around epoch 200, and then falls back to ~82.3 after all 300 epochs. Is this supposed to happen? I trained on 150 batch_size_per_gpu * 8 gpus. All other configurations follow the ones in the repo. Thanks!

Hi, based on my previous experience, this typically indicates an overfitting. you can try to increase the value for the droppath.

Regards,
DQ

bfshi · 2022-11-03T02:38:05Z

Thanks for the suggestion! I will try that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of FAN_tiny on ImageNet1K #15

Performance of FAN_tiny on ImageNet1K #15

bfshi commented Oct 16, 2022

zhoudaquan commented Oct 18, 2022

bfshi commented Oct 21, 2022

bfshi commented Nov 2, 2022

zhoudaquan commented Nov 3, 2022

bfshi commented Nov 3, 2022

Performance of FAN_tiny on ImageNet1K #15

Performance of FAN_tiny on ImageNet1K #15

Comments

bfshi commented Oct 16, 2022

zhoudaquan commented Oct 18, 2022

bfshi commented Oct 21, 2022

bfshi commented Nov 2, 2022

zhoudaquan commented Nov 3, 2022

bfshi commented Nov 3, 2022