-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't achieve the accuracy in the paper with cifar10 #3
Comments
In appendix section D: I'm working on it. Also, could you show me the way you use KNN to monitor the training? |
Hi, I reached 72% acc within 100 epochs, starting from 40% after the first epoch. I was using |
72% in 100 epochs? That's almost the same with the paper. I only got 80% accuracy when trained for 800 epochs (see configs/cifar10_experiment.sh). What's your training configuration? |
Thank you for replying to my question. I use the knn classification in this code. |
batch_size=512, lr=0.03, backbone=resent18, optimizer=sgd with cosine_decay like in the paper and 2 layers in the projection head. What bothers me more is that i cannot get a higher acc of more than ~30% (+-5%) on the linear evaluation. whereas in the paper they achieve 91.8%. I am really unsure how they achieved it! Different training setups and multiple runs do not seem to improve this result. Does anyone have similiar issues? Also the std is extremely unstable, unlike the results the paper! |
I use the gaussian blur augmentation and change the projection. Run 200 epochs, for knn classification, I can get 72% accuracy (from 30% to 72%), and for linear evaluation, I can get 74% accuracy (from 67% to 74%). If I run more epochs, the results may be improved. But in knn classification, the accuracy first dropped from 30% to 28%, and then increased to 72%. May be you can try a large base learning rate, like 30.0. And don't use the weight decay. |
Thanks, it works! |
hi bro, so how is your result now? did you achieve 91% ? |
I fix a small problem in the linear evaluation and it eventually gives 85%. |
My results for the following run on CIFAR10 with the parameters from the paper:
however the average std over all channels is unstable! In 2 out of 10 runs it completely collapsed! So I am unsure about their claim that they successfully prevent collapsing! |
Hi, I run |
Thanks PatrickHua for your implementation. I am currently using this resnet-cifar variant[1]. Note the conv1 has 3x3, not 7x7, kernels. I also commented the maxpool layer. Now my KNN accuracy reaches 89%. [1] https://github.com/huyvnphan/PyTorch_CIFAR10/blob/master/cifar10_models/resnet.py |
Great spot, pal. Could you please clarify which maxpool layer you commented? and why? Thanks a lot. |
There is a single maxpool layer :) I commented the maxpool layer because He et al.,[1] stated that "The subsampling is performed by convolutions with a stride of 2" in section 4.2. [1] Deep Residual Learning for Image Recognition |
Thanks for quick response. Good to know the reason. Is there an official
implementation for the cifar variant of resnet. I found another one
which also looks strong:
https://github.com/kuangliu/pytorch-cifar/blob/master/models/resnet.py
…On Tue, 5 Jan 2021 at 18:06, Ahmed Taha ***@***.***> wrote:
Thanks PatrickHua for your implementation.
I followed this issue because I wasn't able to achieve the report 90+
performance reported on CIFAR10.
I think I figured the core reason for that. The paper does Not use
ResNet18 for the CIFAR10 experiment. The paper states that "The backbone is
the *CIFAR variant* of ResNet-18". Accordingly, a resnet with [2, 2, 2,
2] is not enough.
I am currently using this resnet-cifar variant[1]. Note the conv1 has 3x3,
not 7x7, kernels. I also commented the maxpool layer. Now my KNN accuracy
reaches 89%.
I am training with batch size = 512 on a single GPU. So I use lr=0.06
because the *base lr*=0.03
[1]
https://github.com/huyvnphan/PyTorch_CIFAR10/blob/master/cifar10_models/resnet.py
Great spot, pal. Could you please clarify which maxpool layer you
commented? and why? Thanks a lot.
There is a single maxpool layer :)
https://github.com/huyvnphan/PyTorch_CIFAR10/blob/24ac04fe10874b6d36116a83c8d42778df9ad65a/cifar10_models/resnet.py#L130
I commented the maxpool layer because He et al.,[1] stated that "The
subsampling is performed by convolutions with a stride of 2" in section 4.2.
[1] Deep Residual Learning for Image Recognition
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJYOWTQXXFKPBQ6RDFJTUDLSYNILNANCNFSM4UJ34TRQ>
.
|
I didn't find an official version. I wish FB shares one. |
I see. Thanks, I will try your version of cifar resnet18 and also one with
the maxpool layer, and see what differences can be seen. If any issues or
findings, I will update in this thread :-)
…On Tue, 5 Jan 2021 at 18:40, Ahmed Taha ***@***.***> wrote:
I didn't find an official version. I wish FB shares one.
I also the ResNet variant you mentioned. But I do not remember why I did
not use it eventually -- I made a lot of changes trying to resolve this
issue :)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJYOWTRGES3KEAS7XB3RBUDSYNMJXANCNFSM4UJ34TRQ>
.
|
Hi Ahmed, I can only get 38.5% vs 89% on cifar10 using the resnet code you
mentioned (with maxpool applied, still running the one without maxpool
which however I do not think will change a lot). Would you mind share your
code in case I have some other issues? Thanks a lot.
…On Tue, 5 Jan 2021 at 18:55, Xiatian Zhu ***@***.***> wrote:
I see. Thanks, I will try your version of cifar resnet18 and also one with
the maxpool layer, and see what differences can be seen. If any issues or
findings, I will update in this thread :-)
On Tue, 5 Jan 2021 at 18:40, Ahmed Taha ***@***.***> wrote:
> I didn't find an official version. I wish FB shares one.
> I also the ResNet variant you mentioned. But I do not remember why I did
> not use it eventually -- I made a lot of changes trying to resolve this
> issue :)
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#3 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AJYOWTRGES3KEAS7XB3RBUDSYNMJXANCNFSM4UJ34TRQ>
> .
>
|
My implementation is mostly inspired by PatrickHua's; so I felt bad about uploading it to my Github. But I guess PatrickHua's repository is already well recognized, so my implementation would not make much difference. I will clean my version and upload it tomorrow. |
Appreciated!
…On Thu, 7 Jan 2021 at 20:12, Ahmed Taha ***@***.***> wrote:
My implementation is mostly inspired by PatrickHua's; so I felt bad about
uploading it to my Github. But I guess PatrickHua's repository is already
well recognized, so my implementation would not make much difference. I
will clean my version and upload it tomorrow.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJYOWTT6QQYVUKUFGGDTTALSYYIT3ANCNFSM4UJ34TRQ>
.
|
Don't feel bad lol. It's open source so you can do anything with my code! I'm also quite curious about your implementation haha. |
Hey Guys, I achieved 90.6% KNN acc on CIFAR10 Validset. Basically, I tried everything you mentioned above, including the Cosine-Sim-KNN and changing the model to the Resnet18-CIFART variant. According to my experiments, the Cosine Similarity-based KNN usually performs better than L2-based KNN with a 2% boost. As for the network structure, the exact implementation of the Resnet18 CIFAR10 Variant mentioned in the paper is too simple with its 64-d feature output. So I simply did what @ahmdtaha did and it worked well. Some of my implementation details are as follows: Hope it can help, cheers |
Great efforts, Yaowei. You guys are excellent to reproduce simsiam. I am
still struggling with <40% accuracy now :-(
All training parameters I am using is the same as yours, except warmup
epoch which I set to 10.
Clearly, you have found a better parameter setting. And I am now trying to
first have a reasonable result in the author's setup :-)
…On Fri, 8 Jan 2021 at 02:44, yaowei ***@***.***> wrote:
Hey Guys, I achieved 90.6% KNN acc on CIFAR10 Validset. Basically, I tried
everything you mentioned above, including the Cosine-Sim-KNN and changing
the model to the Resnet18-CIFART variant. According to my experiments, the
Cosine Similarity-based KNN usually performs better than L2-based KNN with
a 2% boost. As for the network structure, the exact implementation of the
Resnet18 CIFAR10 Variant mentioned in the paper is too simple with its 64-d
feature output. So I simply did what @ahmdtaha
<https://github.com/ahmdtaha> did and it worked well.
Some of my implementation details are as follows:
optimizer: SGD, lr: 0.06, weight decay: 5e-4, momentum: 0.9, batch size:
512, Max epoch: 800, Warmup epoch: 2
for knn parameters: knn_k: 25, knn_t: 0.1
In addition, I used the cosine learning rate schedule implemented in Swav
<https://github.com/facebookresearch/swav/blob/master/main_swav.py#L182>
Hope it can help, cheers
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJYOWTQIV4JX6M5OOEYMLBTSYZWPLANCNFSM4UJ34TRQ>
.
|
I uploaded my implementation here Thanks again PatrickHua for your implementation. |
Thanks Ahmed for sharing this built on the basis of Patrickhua's code.
…On Friday, 8 January 2021, Ahmed Taha ***@***.***> wrote:
I uploaded my implementation here <https://github.com/ahmdtaha/simsiam>
It supports DistributedDataParallel. The pretrain_main.py should deliver
89.xx KNN accuracy out of the box.
I will keep an eye on the Github issues in case something is missing.
Thanks again PatrickHua for your implementation.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJYOWTTVUM5GQRAFKLNFA2DSY4HRJANCNFSM4UJ34TRQ>
.
|
I have managed to run the codes Ahmed shared kindly. Exciting to see the
results on CIFAR10 :-)
Happy weekend to everyone!
…On Fri, 8 Jan 2021 at 15:24, Xiatian Zhu ***@***.***> wrote:
Thanks Ahmed for sharing this built on the basis of Patrickhua's code.
On Friday, 8 January 2021, Ahmed Taha ***@***.***> wrote:
> I uploaded my implementation here <https://github.com/ahmdtaha/simsiam>
> It supports DistributedDataParallel. The pretrain_main.py should deliver
> 89.xx KNN accuracy out of the box.
> I will keep an eye on the Github issues in case something is missing.
>
> Thanks again PatrickHua for your implementation.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#3 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AJYOWTTVUM5GQRAFKLNFA2DSY4HRJANCNFSM4UJ34TRQ>
> .
>
|
HI Ahmed, sorry for the many questions. I assume you do not release the
*classification_main.py* yet (You said in README it is not cleaned up yet),
right. I see most of the codes are there, however.
…On Fri, 8 Jan 2021 at 17:29, Xiatian Zhu ***@***.***> wrote:
I have managed to run the codes Ahmed shared kindly. Exciting to see the
results on CIFAR10 :-)
Happy weekend to everyone!
On Fri, 8 Jan 2021 at 15:24, Xiatian Zhu ***@***.***> wrote:
> Thanks Ahmed for sharing this built on the basis of Patrickhua's code.
>
> On Friday, 8 January 2021, Ahmed Taha ***@***.***> wrote:
>
>> I uploaded my implementation here <https://github.com/ahmdtaha/simsiam>
>> It supports DistributedDataParallel. The pretrain_main.py should
>> deliver 89.xx KNN accuracy out of the box.
>> I will keep an eye on the Github issues in case something is missing.
>>
>> Thanks again PatrickHua for your implementation.
>>
>> —
>> You are receiving this because you commented.
>> Reply to this email directly, view it on GitHub
>> <#3 (comment)>,
>> or unsubscribe
>> <https://github.com/notifications/unsubscribe-auth/AJYOWTTVUM5GQRAFKLNFA2DSY4HRJANCNFSM4UJ34TRQ>
>> .
>>
>
|
SimSiam has two phases (refer [1] Figure D.1.)
In the first phase, we evaluate using a KNN classifier (non-linear classifier). This is already uploaded. This should give 89.xx accuracy (Figure D.1. left) [1] Exploring Simple Siamese Representation Learning |
Great. All clean. Yes I meant the second phase. Good to know that the first
phase is great enough to check the feature quality with knn. Brilliant.
Indeed, i should have done this in your GitHub repo. Sorry to misuse the
space of Patrick. Stop here.
…On Friday, 8 January 2021, Ahmed Taha ***@***.***> wrote:
SimSiam has two phases (refer [1] Figure D.1.)
1. Pretraining a model in a self-supervised manner (no labels)
2. Training a classifier in a supervised manner while freezing the
model's weights (resnet weights)
In the first phase, we evaluate using a KNN classifier (non-linear
classifier). This is already uploaded. This should give 89.xx accuracy
(Figure D.1. left)
In the second phase, we evaluate the self-supervision task (the frozen
resnet weights) using a linear classifier. This is not uploaded/clean yet.
This should give 91.xx accuracy (Figure D.1. right).
[1] Exploring Simple Siamese Representation Learning
P.S. I think it is better to move this discussion to my repository.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJYOWTXUZVSHSAGSV34NYJDSY5SSDANCNFSM4UJ34TRQ>
.
|
I changed the backbone to a cifar variant ahmdtaha(#3 (comment)) proposed. The code now gives 90.8 linear evaluation accuracy out of the box. |
Great News @PatrickHua. Please keep this issue open. We still lag 1%. Maybe someone can figure it out. |
@PatrickHua @ahmdtaha Hi guys, I reproduced the results on ImageNet (67.8% Top-1 accuracy). You can take a look if interested. Link is here. |
How many GPUs do you use for cifar10, I have tried 8/4 2080Ti, I found that 4 GPUs performed more stable than 8 GPUs. I cannot fit the training into 1 GPU due to the limited memory. It implies that the SimSiam or BYOL without negative samples are benefitting from the BN with large batch size, even when they said the SimSiam has no need for large batch size, but SimSiame has to fit large BS for single GPU. It is my guess, maybe not right. |
@Hzzone what are your batch size and network architecture? I am not sure why you can't fit the training on a single GPU, especially for cifar10 (32x32). BTW, when FB guys talk about a large batch-size, they mean 4096. |
@ahmdtaha I am sorry for my mistake. I used the same resnet18 as yours. I can train on single 2080Ti GPU though it runs much slower than multi-gpus, 2 iter/s vs 7 iter/s. I will take more tries to find the reason why SimSiam works not so well in my case wrt less training stability than BYOL. |
Thanks for your advice, I set my lr as suggested by the paper, ie, lr = base_lr(0.03)*bs/256 |
I use the kNN classification as a monitor during training. As shown in Figure D.1 in paper, the accuracy is about 60% in the beginning and finally achieve 90%. I can't achieve this accuracy and just achieve a very low accuracy with the parameter mentioned in the paper.
If anyone can achieve the results in the paper, thank you very much for sharing some experimental details.
The text was updated successfully, but these errors were encountered: