-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always got nan loss while token_label is False. #6
Comments
Hello! Where did you download the token-labels imagenet_efficientnet_l2_sz475_top5 from? Thanks! |
Are you training on multiple GPUs? If you managed to get the checkpoint, can you please share it with me? Thanks! |
@lqniunjunlper Hello, I also encounter this problem and would like to know how you solved it. |
@EasonXiao-888 When token_label is setting to False, the loss is always nan. Later i downloaded the token label datasets, but the training is still unstable (at the middle step of training process, loss increased suddenly). |
Thank you very much for your reply. I would like to know if the results provided by Simba in the paper use token_lable? |
hello!Did you have the token-labels _imagenet_efficientnet_l2_sz475_top5?Thanks! |
[03/29 11:44:29] train INFO: Test: [520/521] eta: 0:00:00 loss: 6.7130 (6.8320) acc1: 0.0000 (0.5440) acc5: 0.0000 (2.0020) time: 0.4453 data: 0.0002 max mem: 15878
[03/29 11:44:29] train INFO: Test: Total time: 0:04:02 (0.4651 s / it)
[03/29 11:44:35] train INFO: * Acc@1 0.544 Acc@5 2.002 loss 6.832
[03/29 11:44:37] train INFO: *** Best metric: 0.5440000167846679 (epoch 1)
[03/29 11:44:37] train INFO: Accuracy of the network on the 50000 test images: 0.5%
[03/29 11:44:37] train INFO: Max accuracy: 0.54%
[03/29 11:44:37] train INFO: {'train_lr': 9.999999999999953e-07, 'train_loss': 6.907856970549011, 'test_loss': 6.832010622845959, 'test_acc1': 0.5440000167846679, 'test_acc5': 2.0021250657749174, 'epoch': 1, 'n_parameters': 66249512}
[03/29 11:44:44] train INFO: Epoch: [2] [ 0/1251] eta: 2:18:16 lr: 0.000401 loss: 6.9199 (6.9199) time: 6.6321 data: 4.9419 max mem: 15878
[03/29 11:45:01] train INFO: Epoch: [2] [ 10/1251] eta: 0:44:41 lr: 0.000401 loss: 6.9639 (6.9749) time: 2.1604 data: 0.4495 max mem: 15878
[03/29 11:45:18] train INFO: Epoch: [2] [ 20/1251] eta: 0:39:54 lr: 0.000401 loss: 6.9639 (6.9806) time: 1.7107 data: 0.0003 max mem: 15878
[03/29 11:45:35] train INFO: Epoch: [2] [ 30/1251] eta: 0:38:00 lr: 0.000401 loss: 7.0065 (6.9880) time: 1.7071 data: 0.0002 max mem: 15878
[03/29 11:45:49] train INFO: Loss is nan, stopping training
The text was updated successfully, but these errors were encountered: