Can't achieve the accuracy in the paper with cifar10 #3

FiresWorker · 2020-12-02T02:58:57Z

I use the kNN classification as a monitor during training. As shown in Figure D.1 in paper, the accuracy is about 60% in the beginning and finally achieve 90%. I can't achieve this accuracy and just achieve a very low accuracy with the parameter mentioned in the paper.

If anyone can achieve the results in the paper, thank you very much for sharing some experimental details.

PatrickHua · 2020-12-02T17:20:40Z

In appendix section D:
We do not use blur augmentation. The backbone is the CIFAR variant of ResNet-18 [19], followed by a 2-layer projection MLP. The outputs are 2048-d.
I already removed gaussian blur for image size equal or less than 32(cifar10). They seem also removed one layer of the projection mlp. You should try commenting out the second layer in projection_MLP

I'm working on it. Also, could you show me the way you use KNN to monitor the training?

matthiasware · 2020-12-03T17:29:53Z

Hi, I reached 72% acc within 100 epochs, starting from 40% after the first epoch. I was using sklearn.neighbors.KNeighborsClassifier, but this does note really scale with larger datasets ;)

PatrickHua · 2020-12-05T16:04:15Z

Hi, I reached 72% acc within 100 epochs, starting from 40% after the first epoch. I was using sklearn.neighbors.KNeighborsClassifier, but this does note really scale with larger datasets ;)

72% in 100 epochs? That's almost the same with the paper. I only got 80% accuracy when trained for 800 epochs (see configs/cifar10_experiment.sh). What's your training configuration?

FiresWorker · 2020-12-07T11:44:21Z

Thank you for replying to my question.

https://colab.research.google.com/github/facebookresearch/moco/blob/colab-notebook/colab/moco_cifar10_demo.ipynb

I use the knn classification in this code.

matthiasware · 2020-12-07T11:45:19Z

72% in 100 epochs? That's almost the same with the paper. I only got 80% accuracy when trained for 800 epochs (see configs/cifar10_experiment.sh). What's your training configuration?

batch_size=512, lr=0.03, backbone=resent18, optimizer=sgd with cosine_decay like in the paper and 2 layers in the projection head.

What bothers me more is that i cannot get a higher acc of more than ~30% (+-5%) on the linear evaluation. whereas in the paper they achieve 91.8%. I am really unsure how they achieved it! Different training setups and multiple runs do not seem to improve this result.

Does anyone have similiar issues?

Also the std is extremely unstable, unlike the results the paper!

FiresWorker · 2020-12-08T04:14:11Z

batch_size=512, lr=0.03, backbone=resent18, optimizer=sgd with cosine_decay like in the paper and 2 layers in the projection head.

What bothers me more is that i cannot get a higher acc of more than ~30% (+-5%) on the linear evaluation. whereas in the paper they achieve 91.8%. I am really unsure how they achieved it! Different training setups and multiple runs do not seem to improve this result.

Does anyone have similiar issues?

Also the std is extremely unstable, unlike the results the paper!

I use the gaussian blur augmentation and change the projection.

Run 200 epochs, for knn classification, I can get 72% accuracy (from 30% to 72%), and for linear evaluation, I can get 74% accuracy (from 67% to 74%). If I run more epochs, the results may be improved.

But in knn classification, the accuracy first dropped from 30% to 28%, and then increased to 72%.

May be you can try a large base learning rate, like 30.0. And don't use the weight decay.

matthiasware · 2020-12-08T14:29:08Z

batch_size=512, lr=0.03, backbone=resent18, optimizer=sgd with cosine_decay like in the paper and 2 layers in the projection head.
What bothers me more is that i cannot get a higher acc of more than ~30% (+-5%) on the linear evaluation. whereas in the paper they achieve 91.8%. I am really unsure how they achieved it! Different training setups and multiple runs do not seem to improve this result.
Does anyone have similiar issues?
Also the std is extremely unstable, unlike the results the paper!

I use the gaussian blur augmentation and change the projection.

Run 200 epochs, for knn classification, I can get 72% accuracy (from 30% to 72%), and for linear evaluation, I can get 74% accuracy (from 67% to 74%). If I run more epochs, the results may be improved.

But in knn classification, the accuracy first dropped from 30% to 28%, and then increased to 72%.

May be you can try a large base learning rate, like 30.0. And don't use the weight decay.

Thanks, it works!

codergan · 2020-12-09T08:14:09Z

batch_size=512, lr=0.03, backbone=resent18, optimizer=sgd with cosine_decay like in the paper and 2 layers in the projection head.
What bothers me more is that i cannot get a higher acc of more than ~30% (+-5%) on the linear evaluation. whereas in the paper they achieve 91.8%. I am really unsure how they achieved it! Different training setups and multiple runs do not seem to improve this result.
Does anyone have similiar issues?
Also the std is extremely unstable, unlike the results the paper!

I use the gaussian blur augmentation and change the projection.
Run 200 epochs, for knn classification, I can get 72% accuracy (from 30% to 72%), and for linear evaluation, I can get 74% accuracy (from 67% to 74%). If I run more epochs, the results may be improved.
But in knn classification, the accuracy first dropped from 30% to 28%, and then increased to 72%.
May be you can try a large base learning rate, like 30.0. And don't use the weight decay.

Thanks, it works!

hi bro, so how is your result now? did you achieve 91% ?

PatrickHua · 2020-12-09T12:57:22Z

batch_size=512, lr=0.03, backbone=resent18, optimizer=sgd with cosine_decay like in the paper and 2 layers in the projection head.
What bothers me more is that i cannot get a higher acc of more than ~30% (+-5%) on the linear evaluation. whereas in the paper they achieve 91.8%. I am really unsure how they achieved it! Different training setups and multiple runs do not seem to improve this result.
Does anyone have similiar issues?
Also the std is extremely unstable, unlike the results the paper!

I use the gaussian blur augmentation and change the projection.
Run 200 epochs, for knn classification, I can get 72% accuracy (from 30% to 72%), and for linear evaluation, I can get 74% accuracy (from 67% to 74%). If I run more epochs, the results may be improved.
But in knn classification, the accuracy first dropped from 30% to 28%, and then increased to 72%.
May be you can try a large base learning rate, like 30.0. And don't use the weight decay.

Thanks, it works!

hi bro, so how is your result now? did you achieve 91% ?

I fix a small problem in the linear evaluation and it eventually gives 85%.

matthiasware · 2020-12-11T08:53:54Z

My results for the following run on CIFAR10 with the parameters from the paper:

ACC (train set): 85%
ACC (test set): 83%

however the average std over all channels is unstable! In 2 out of 10 runs it completely collapsed! So I am unsure about their claim that they successfully prevent collapsing!

Asamisora · 2020-12-12T13:59:53Z

batch_size=512, lr=0.03, backbone=resent18, optimizer=sgd with cosine_decay like in the paper and 2 layers in the projection head.
What bothers me more is that i cannot get a higher acc of more than ~30% (+-5%) on the linear evaluation. whereas in the paper they achieve 91.8%. I am really unsure how they achieved it! Different training setups and multiple runs do not seem to improve this result.
Does anyone have similiar issues?
Also the std is extremely unstable, unlike the results the paper!

I use the gaussian blur augmentation and change the projection.
Run 200 epochs, for knn classification, I can get 72% accuracy (from 30% to 72%), and for linear evaluation, I can get 74% accuracy (from 67% to 74%). If I run more epochs, the results may be improved.
But in knn classification, the accuracy first dropped from 30% to 28%, and then increased to 72%.
May be you can try a large base learning rate, like 30.0. And don't use the weight decay.

Thanks, it works!

hi bro, so how is your result now? did you achieve 91% ?

I fix a small problem in the linear evaluation and it eventually gives 85%.

Hi, I run cifar_experiment.sh and get training loss ~-.882, but get evalution acc ~40 (the evalution epoch had set to 100), could you share the evalution parameters?

ahmdtaha · 2020-12-24T22:32:16Z

Thanks PatrickHua for your implementation.
I followed this issue because I wasn't able to achieve the report 90+ performance reported on CIFAR10.
I think I figured the core reason for that. The paper does Not use ResNet18 for the CIFAR10 experiment. The paper states that "The backbone is the CIFAR variant of ResNet-18". Accordingly, a resnet with [2, 2, 2, 2] is not enough.

I am currently using this resnet-cifar variant[1]. Note the conv1 has 3x3, not 7x7, kernels. I also commented the maxpool layer. Now my KNN accuracy reaches 89%.
I am training with batch size = 512 on a single GPU. So I use lr=0.06 because the base lr=0.03

[1] https://github.com/huyvnphan/PyTorch_CIFAR10/blob/master/cifar10_models/resnet.py

Xiatian-Zhu · 2021-01-05T16:41:08Z

Thanks PatrickHua for your implementation.
I followed this issue because I wasn't able to achieve the report 90+ performance reported on CIFAR10.
I think I figured the core reason for that. The paper does Not use ResNet18 for the CIFAR10 experiment. The paper states that "The backbone is the CIFAR variant of ResNet-18". Accordingly, a resnet with [2, 2, 2, 2] is not enough.

I am currently using this resnet-cifar variant[1]. Note the conv1 has 3x3, not 7x7, kernels. I also commented the maxpool layer. Now my KNN accuracy reaches 89%.
I am training with batch size = 512 on a single GPU. So I use lr=0.06 because the base lr=0.03

[1] https://github.com/huyvnphan/PyTorch_CIFAR10/blob/master/cifar10_models/resnet.py

Great spot, pal. Could you please clarify which maxpool layer you commented? and why? Thanks a lot.

ahmdtaha · 2021-01-05T18:06:30Z

Thanks PatrickHua for your implementation.
I followed this issue because I wasn't able to achieve the report 90+ performance reported on CIFAR10.
I think I figured the core reason for that. The paper does Not use ResNet18 for the CIFAR10 experiment. The paper states that "The backbone is the CIFAR variant of ResNet-18". Accordingly, a resnet with [2, 2, 2, 2] is not enough.
I am currently using this resnet-cifar variant[1]. Note the conv1 has 3x3, not 7x7, kernels. I also commented the maxpool layer. Now my KNN accuracy reaches 89%.
I am training with batch size = 512 on a single GPU. So I use lr=0.06 because the base lr=0.03
[1] https://github.com/huyvnphan/PyTorch_CIFAR10/blob/master/cifar10_models/resnet.py

Great spot, pal. Could you please clarify which maxpool layer you commented? and why? Thanks a lot.

There is a single maxpool layer :)
https://github.com/huyvnphan/PyTorch_CIFAR10/blob/24ac04fe10874b6d36116a83c8d42778df9ad65a/cifar10_models/resnet.py#L130

I commented the maxpool layer because He et al.,[1] stated that "The subsampling is performed by convolutions with a stride of 2" in section 4.2.

[1] Deep Residual Learning for Image Recognition

Xiatian-Zhu · 2021-01-05T18:24:10Z

Thanks for quick response. Good to know the reason. Is there an official implementation for the cifar variant of resnet. I found another one which also looks strong: https://github.com/kuangliu/pytorch-cifar/blob/master/models/resnet.py

…

On Tue, 5 Jan 2021 at 18:06, Ahmed Taha ***@***.***> wrote: Thanks PatrickHua for your implementation. I followed this issue because I wasn't able to achieve the report 90+ performance reported on CIFAR10. I think I figured the core reason for that. The paper does Not use ResNet18 for the CIFAR10 experiment. The paper states that "The backbone is the *CIFAR variant* of ResNet-18". Accordingly, a resnet with [2, 2, 2, 2] is not enough. I am currently using this resnet-cifar variant[1]. Note the conv1 has 3x3, not 7x7, kernels. I also commented the maxpool layer. Now my KNN accuracy reaches 89%. I am training with batch size = 512 on a single GPU. So I use lr=0.06 because the *base lr*=0.03 [1] https://github.com/huyvnphan/PyTorch_CIFAR10/blob/master/cifar10_models/resnet.py Great spot, pal. Could you please clarify which maxpool layer you commented? and why? Thanks a lot. There is a single maxpool layer :) https://github.com/huyvnphan/PyTorch_CIFAR10/blob/24ac04fe10874b6d36116a83c8d42778df9ad65a/cifar10_models/resnet.py#L130 I commented the maxpool layer because He et al.,[1] stated that "The subsampling is performed by convolutions with a stride of 2" in section 4.2. [1] Deep Residual Learning for Image Recognition — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJYOWTQXXFKPBQ6RDFJTUDLSYNILNANCNFSM4UJ34TRQ> .

ahmdtaha · 2021-01-05T18:40:11Z

I didn't find an official version. I wish FB shares one.
I also tried the ResNet variant you mentioned. But I do not remember why I did not use it eventually -- I made a lot of changes while resolving this issue :)

Xiatian-Zhu · 2021-01-05T18:55:47Z

I see. Thanks, I will try your version of cifar resnet18 and also one with the maxpool layer, and see what differences can be seen. If any issues or findings, I will update in this thread :-)

…

On Tue, 5 Jan 2021 at 18:40, Ahmed Taha ***@***.***> wrote: I didn't find an official version. I wish FB shares one. I also the ResNet variant you mentioned. But I do not remember why I did not use it eventually -- I made a lot of changes trying to resolve this issue :) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJYOWTRGES3KEAS7XB3RBUDSYNMJXANCNFSM4UJ34TRQ> .

Xiatian-Zhu · 2021-01-07T20:02:49Z

Hi Ahmed, I can only get 38.5% vs 89% on cifar10 using the resnet code you mentioned (with maxpool applied, still running the one without maxpool which however I do not think will change a lot). Would you mind share your code in case I have some other issues? Thanks a lot.

…

On Tue, 5 Jan 2021 at 18:55, Xiatian Zhu ***@***.***> wrote: I see. Thanks, I will try your version of cifar resnet18 and also one with the maxpool layer, and see what differences can be seen. If any issues or findings, I will update in this thread :-) On Tue, 5 Jan 2021 at 18:40, Ahmed Taha ***@***.***> wrote: > I didn't find an official version. I wish FB shares one. > I also the ResNet variant you mentioned. But I do not remember why I did > not use it eventually -- I made a lot of changes trying to resolve this > issue :) > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#3 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AJYOWTRGES3KEAS7XB3RBUDSYNMJXANCNFSM4UJ34TRQ> > . >

ahmdtaha · 2021-01-07T20:12:31Z

My implementation is mostly inspired by PatrickHua's; so I felt bad about uploading it to my Github. But I guess PatrickHua's repository is already well recognized, so my implementation would not make much difference. I will clean my version and upload it tomorrow.

Xiatian-Zhu · 2021-01-07T20:30:24Z

Appreciated!

…

On Thu, 7 Jan 2021 at 20:12, Ahmed Taha ***@***.***> wrote: My implementation is mostly inspired by PatrickHua's; so I felt bad about uploading it to my Github. But I guess PatrickHua's repository is already well recognized, so my implementation would not make much difference. I will clean my version and upload it tomorrow. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJYOWTT6QQYVUKUFGGDTTALSYYIT3ANCNFSM4UJ34TRQ> .

PatrickHua · 2021-01-08T01:54:11Z

My implementation is mostly inspired by PatrickHua's; so I felt bad about uploading it to my Github. But I guess PatrickHua's repository is already well recognized, so my implementation would not make much difference. I will clean my version and upload it tomorrow.

Don't feel bad lol. It's open source so you can do anything with my code! I'm also quite curious about your implementation haha.

yaoweilee · 2021-01-08T02:43:52Z

Hey Guys, I achieved 90.6% KNN acc on CIFAR10 Validset. Basically, I tried everything you mentioned above, including the Cosine-Sim-KNN and changing the model to the Resnet18-CIFART variant. According to my experiments, the Cosine Similarity-based KNN usually performs better than L2-based KNN with a 2% boost. As for the network structure, the exact implementation of the Resnet18 CIFAR10 Variant mentioned in the paper is too simple with its 64-d feature output. So I simply did what @ahmdtaha did and it worked well.

Some of my implementation details are as follows:
optimizer: SGD, lr: 0.06, weight decay: 5e-4, momentum: 0.9, batch size: 512, Max epoch: 800, Warmup epoch: 2
for knn parameters: knn_k: 25, knn_t: 0.1
In addition, I used the cosine learning rate schedule implemented in Swav

Hope it can help, cheers

Xiatian-Zhu · 2021-01-08T12:44:10Z

Great efforts, Yaowei. You guys are excellent to reproduce simsiam. I am still struggling with <40% accuracy now :-( All training parameters I am using is the same as yours, except warmup epoch which I set to 10. Clearly, you have found a better parameter setting. And I am now trying to first have a reasonable result in the author's setup :-)

…

On Fri, 8 Jan 2021 at 02:44, yaowei ***@***.***> wrote: Hey Guys, I achieved 90.6% KNN acc on CIFAR10 Validset. Basically, I tried everything you mentioned above, including the Cosine-Sim-KNN and changing the model to the Resnet18-CIFART variant. According to my experiments, the Cosine Similarity-based KNN usually performs better than L2-based KNN with a 2% boost. As for the network structure, the exact implementation of the Resnet18 CIFAR10 Variant mentioned in the paper is too simple with its 64-d feature output. So I simply did what @ahmdtaha <https://github.com/ahmdtaha> did and it worked well. Some of my implementation details are as follows: optimizer: SGD, lr: 0.06, weight decay: 5e-4, momentum: 0.9, batch size: 512, Max epoch: 800, Warmup epoch: 2 for knn parameters: knn_k: 25, knn_t: 0.1 In addition, I used the cosine learning rate schedule implemented in Swav <https://github.com/facebookresearch/swav/blob/master/main_swav.py#L182> Hope it can help, cheers — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJYOWTQIV4JX6M5OOEYMLBTSYZWPLANCNFSM4UJ34TRQ> .

ahmdtaha · 2021-01-08T14:15:32Z

I uploaded my implementation here
It supports DistributedDataParallel. The pretrain_main.py should deliver 89.xx KNN accuracy out of the box.
I will keep an eye on the Github issues in case something is missing.

Thanks again PatrickHua for your implementation.

Xiatian-Zhu · 2021-01-08T15:24:33Z

Thanks Ahmed for sharing this built on the basis of Patrickhua's code.

…

On Friday, 8 January 2021, Ahmed Taha ***@***.***> wrote: I uploaded my implementation here <https://github.com/ahmdtaha/simsiam> It supports DistributedDataParallel. The pretrain_main.py should deliver 89.xx KNN accuracy out of the box. I will keep an eye on the Github issues in case something is missing. Thanks again PatrickHua for your implementation. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJYOWTTVUM5GQRAFKLNFA2DSY4HRJANCNFSM4UJ34TRQ> .

Xiatian-Zhu · 2021-01-08T17:29:53Z

I have managed to run the codes Ahmed shared kindly. Exciting to see the results on CIFAR10 :-) Happy weekend to everyone!

…

On Fri, 8 Jan 2021 at 15:24, Xiatian Zhu ***@***.***> wrote: Thanks Ahmed for sharing this built on the basis of Patrickhua's code. On Friday, 8 January 2021, Ahmed Taha ***@***.***> wrote: > I uploaded my implementation here <https://github.com/ahmdtaha/simsiam> > It supports DistributedDataParallel. The pretrain_main.py should deliver > 89.xx KNN accuracy out of the box. > I will keep an eye on the Github issues in case something is missing. > > Thanks again PatrickHua for your implementation. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#3 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AJYOWTTVUM5GQRAFKLNFA2DSY4HRJANCNFSM4UJ34TRQ> > . >

Xiatian-Zhu · 2021-01-08T18:39:28Z

HI Ahmed, sorry for the many questions. I assume you do not release the *classification_main.py* yet (You said in README it is not cleaned up yet), right. I see most of the codes are there, however.

…

On Fri, 8 Jan 2021 at 17:29, Xiatian Zhu ***@***.***> wrote: I have managed to run the codes Ahmed shared kindly. Exciting to see the results on CIFAR10 :-) Happy weekend to everyone! On Fri, 8 Jan 2021 at 15:24, Xiatian Zhu ***@***.***> wrote: > Thanks Ahmed for sharing this built on the basis of Patrickhua's code. > > On Friday, 8 January 2021, Ahmed Taha ***@***.***> wrote: > >> I uploaded my implementation here <https://github.com/ahmdtaha/simsiam> >> It supports DistributedDataParallel. The pretrain_main.py should >> deliver 89.xx KNN accuracy out of the box. >> I will keep an eye on the Github issues in case something is missing. >> >> Thanks again PatrickHua for your implementation. >> >> — >> You are receiving this because you commented. >> Reply to this email directly, view it on GitHub >> <#3 (comment)>, >> or unsubscribe >> <https://github.com/notifications/unsubscribe-auth/AJYOWTTVUM5GQRAFKLNFA2DSY4HRJANCNFSM4UJ34TRQ> >> . >> >

ahmdtaha · 2021-01-08T20:22:42Z

SimSiam has two phases (refer [1] Figure D.1.)

Pretraining a model in a self-supervised manner (no labels)
Training a classifier in a supervised manner while freezing the model's weights (resnet weights)

In the first phase, we evaluate using a KNN classifier (non-linear classifier). This is already uploaded. This should give 89.xx accuracy (Figure D.1. left)
In the second phase, we evaluate the self-supervision task (the frozen resnet weights) using a linear classifier. This is not uploaded/clean yet. This should give 91.xx accuracy (Figure D.1. right).

[1] Exploring Simple Siamese Representation Learning
P.S. I think it is better to move this discussion to my repository.

Xiatian-Zhu · 2021-01-08T20:50:40Z

Great. All clean. Yes I meant the second phase. Good to know that the first phase is great enough to check the feature quality with knn. Brilliant. Indeed, i should have done this in your GitHub repo. Sorry to misuse the space of Patrick. Stop here.

…

On Friday, 8 January 2021, Ahmed Taha ***@***.***> wrote: SimSiam has two phases (refer [1] Figure D.1.) 1. Pretraining a model in a self-supervised manner (no labels) 2. Training a classifier in a supervised manner while freezing the model's weights (resnet weights) In the first phase, we evaluate using a KNN classifier (non-linear classifier). This is already uploaded. This should give 89.xx accuracy (Figure D.1. left) In the second phase, we evaluate the self-supervision task (the frozen resnet weights) using a linear classifier. This is not uploaded/clean yet. This should give 91.xx accuracy (Figure D.1. right). [1] Exploring Simple Siamese Representation Learning P.S. I think it is better to move this discussion to my repository. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AJYOWTXUZVSHSAGSV34NYJDSY5SSDANCNFSM4UJ34TRQ> .

PatrickHua · 2021-01-11T03:52:15Z

I changed the backbone to a cifar variant ahmdtaha(#3 (comment)) proposed. The code now gives 90.8 linear evaluation accuracy out of the box.

ahmdtaha · 2021-01-11T14:17:11Z

Great News @PatrickHua. Please keep this issue open. We still lag 1%. Maybe someone can figure it out.

taoyang1122 · 2021-01-11T16:51:46Z

@PatrickHua @ahmdtaha Hi guys, I reproduced the results on ImageNet (67.8% Top-1 accuracy). You can take a look if interested. Link is here.

Hzzone · 2021-04-10T10:17:24Z

How many GPUs do you use for cifar10, I have tried 8/4 2080Ti, I found that 4 GPUs performed more stable than 8 GPUs. I cannot fit the training into 1 GPU due to the limited memory. It implies that the SimSiam or BYOL without negative samples are benefitting from the BN with large batch size, even when they said the SimSiam has no need for large batch size, but SimSiame has to fit large BS for single GPU. It is my guess, maybe not right.

ahmdtaha · 2021-04-10T13:48:18Z

@Hzzone what are your batch size and network architecture? I am not sure why you can't fit the training on a single GPU, especially for cifar10 (32x32).

BTW, when FB guys talk about a large batch-size, they mean 4096.

Hzzone · 2021-04-10T13:54:05Z

@ahmdtaha I am sorry for my mistake. I used the same resnet18 as yours. I can train on single 2080Ti GPU though it runs much slower than multi-gpus, 2 iter/s vs 7 iter/s. I will take more tries to find the reason why SimSiam works not so well in my case wrt less training stability than BYOL.

ahmdtaha · 2021-04-10T14:00:01Z

@Hzzone That makes more sense. BTW, keep an eye on the learning rate. The learning rate depends on the batch size only. In my code, I use lr=0.06 with a batch size 512 without regard for the number of GPUs. It has nothing to do with the number of GPUs.

Hzzone · 2021-04-10T15:02:39Z

@Hzzone That makes more sense. BTW, keep an eye on the learning rate. The learning rate depends on the batch size only. In my code, I use lr=0.06 with a batch size 512 without regard for the number of GPUs. It has nothing to do with the number of GPUs.

Thanks for your advice, I set my lr as suggested by the paper, ie, lr = base_lr(0.03)*bs/256

Can't achieve the accuracy in the paper with cifar10 #3

Can't achieve the accuracy in the paper with cifar10 #3

Comments

FiresWorker commented Dec 2, 2020

PatrickHua commented Dec 2, 2020

matthiasware commented Dec 3, 2020

PatrickHua commented Dec 5, 2020

FiresWorker commented Dec 7, 2020

matthiasware commented Dec 7, 2020 • edited Loading

FiresWorker commented Dec 8, 2020

matthiasware commented Dec 8, 2020

codergan commented Dec 9, 2020

PatrickHua commented Dec 9, 2020

matthiasware commented Dec 11, 2020

Asamisora commented Dec 12, 2020

ahmdtaha commented Dec 24, 2020

Xiatian-Zhu commented Jan 5, 2021

ahmdtaha commented Jan 5, 2021

Xiatian-Zhu commented Jan 5, 2021 via email

ahmdtaha commented Jan 5, 2021 • edited Loading

Xiatian-Zhu commented Jan 5, 2021 via email

Xiatian-Zhu commented Jan 7, 2021 via email

ahmdtaha commented Jan 7, 2021

Xiatian-Zhu commented Jan 7, 2021 via email

PatrickHua commented Jan 8, 2021

yaoweilee commented Jan 8, 2021

Xiatian-Zhu commented Jan 8, 2021 via email

ahmdtaha commented Jan 8, 2021

Xiatian-Zhu commented Jan 8, 2021 via email

Xiatian-Zhu commented Jan 8, 2021 via email

Xiatian-Zhu commented Jan 8, 2021 via email

ahmdtaha commented Jan 8, 2021

Xiatian-Zhu commented Jan 8, 2021 via email

PatrickHua commented Jan 11, 2021 • edited Loading

ahmdtaha commented Jan 11, 2021

taoyang1122 commented Jan 11, 2021

Hzzone commented Apr 10, 2021

ahmdtaha commented Apr 10, 2021

Hzzone commented Apr 10, 2021 • edited Loading

ahmdtaha commented Apr 10, 2021

Hzzone commented Apr 10, 2021

matthiasware commented Dec 7, 2020 •

edited

Loading

ahmdtaha commented Jan 5, 2021 •

edited

Loading

PatrickHua commented Jan 11, 2021 •

edited

Loading

Hzzone commented Apr 10, 2021 •

edited

Loading