-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Fine-tune InceptionV3/ResNet50 on a new set of classes" doesn't work, while VGG16 works (suspect BN) #9214
Comments
VGG16 output (works as expected):
|
resnet50 output (does not work as expected):
|
The problem is not happening without
Note that this is repo master i.e with 24246ea already merged, see #8616 (comment) |
Just checked with TF 1.5 and Keras master, and the behavior is unchanged. Also identical on CPU (which I didn't check before). @fchollet, it appears to be a serious problem, because AFAIK I follow (simple) docs to the letter. |
I am having the exact same issue. Inception does not work but VGG does fine. InceptionV3 picks the same class every time no matter what the test set is |
same issue when I try ResNet50 in keras |
I have tried using different way to optimize(Adam, SGD, SGD with momentum and so on) when I trained the ResNet50, finetune and just freeze all the layers except fc layer, My training loss is decreasing, but val accu is increasing and then just as you say, stopping at 50% ~60%...... I have tried to ramdom sampling the data and used some data augmentation tricks, but those didn't work. |
I am having the same problem with Inception-v3, while VGG19 works. I can also confirm that when I remove
validation accuracy starts imroving. Maybe somebody has a suggestion |
I am having the same problem with ResNet50. I am doing transfer learning and the same dataset/code works for InceptionV3 and DenseNet121, but ResNet seems to always predict one class |
For all of you who are affected by this, please have a look at PR #9965. This probllem is caused by the way that the Batch Normalization layer is implemented in Keras. To understand why this happens we need to understand how the BN works. When the network is in training mode, the mini-batch statistics of BN are used for training the network; when the network is in inference mode, we use the moving mean/var learned during the training. That's all good. The problem is how the layer behaves when it is frozen. Its side-effects are more profound when we use fine-tuning and Transfer Learning. You see, when frozen and while in training mode the BN continues to use the mini-batch statistics for scaling the training data. This causes the unfrozen/trainable layers to adapt to the scale of the data. Unfortunately during inference mode (predictions) the network will switch to the moving mean/var. If the moving mean/var is different that the mini-batch statistics the data are scaled differently causing massive discrepancies on the accuracy. If you want more info, have a look at the PR. |
@jksmither Sorry for the late response. I just synced my branch with the latest master and provided a patched fork of 2.1.6. Honestly I would like to see this fixed on master as maintaining a separate fork with the patch is not a viable solution on the long term. I'll probably keep syncing it for as long as we use Keras at work but I can't make any promises. |
Did anyone find a concrete solution to this problem? I am also affected by this problem and I am working on Keras 2.1.6 and TensorFlow 1.7 to train and test my data using InceptionV3 and Resnet50. I am very new to deep learning and any help will be appreciated. |
This is temp fix:
So, this command reinstall keras with fixes, provided by @datumbox. |
@izharikov @datumbox |
Thanks for the input. I would advise using the fork 2.1.6 instead of the Unfortunately there are no plans for a permanent fix at the moment. My PR #9965 was rejected (you can read the rational on the link) because it modifies the semantics of |
Wow. Was spinning my wheels for a while with ResNet50 training trying to fine tune until I found these threads. Same problems. So batchnorm in Keras = no fine tuning? Either paste the FC layer on top of trained weights (imagenet) or train from scratch. I'm working with smaller N's so training from scratch with augmentation is not a defacto solution. A shame too because Transfer Learning is looking more attractive lately - see : "Do Better ImageNet Models Transfer Better?" Apart from @datumbox's patch, or moving over to another framework, any other workarounds?
|
Reset the batch norm moving averages and allow them to update to the new dataset - you'll see it transfer. I'm writing a longer update on this matter and will open issues (easy PRs) so people can help contribute to fixing the documentation and the like. |
Unfreezing the BNs while keeping the subsequent Convolutions frozen can have negative effects on accuracy. I describe this in more detail here. |
Just a simple question: why the value 1024 as units in the last dense layer? |
For the purpose of this Issue, I was following the docs literally for stronger effect to make my point. |
I fine tune keras pretrained model on my own dataset. I freeze some the layers in the early stage. I got decent validation accuracy on VGG, but bad validation accuracy on ResNet50. VGG ResNet50 I think this may be caused by BatchNormalization. I once used keras-retinanet https://github.com/fizyr/keras-retinanet to train on my own dataset, which worked very well. So I want to find out the reason. RetinaNet uses ResNet as backbone, and BatchNormalization layers are frozen, see https://github.com/fizyr/keras-retinanet/blob/master/keras_retinanet/models/resnet.py#L98 The ResNet in that project is borrowed from another repo keras-resnet https://github.com/broadinstitute/keras-resnet. In this ResNet implementation, the authors customize the BatchNormalize layer, see
It seems that this operation is what @fchollet recommends in @datumbox 's PR I think it would be better to use keras-resnet https://github.com/broadinstitute/keras-resnet for fine tuning. I have not tried yet. Yesterday I tried fine tuning InceptionV3 on the same dataset, with half of the layers set untrainable. But it is somewhat strange that the validation accuracy is quite well. InceptionV3 The code I use is something like this https://gist.github.com/XupingZHENG/1e20d54a70c8e04912c0b37fa7e7b931 |
I have the same problem with Resnet50. This seems to work. -Set the learning phase to 1 After that I get the correct accuracy.
|
@cesarorosco This means that you network runs always on Training mode. Even when you make predictions you use the mini-batch statistics. This is not great as your predictions will change depending on what images you pass on the batch. |
If I proceed as @cesarorosco mentionned, things go even worse and I stick around a ~0.5 acc on the learning and validation sets... It is a bit disappointing to have this kind of problem, especially when this kind of stuff is presented as too simple when using Keras... It is such a great tool, anyway. |
All, Keras version: 2.2.4 from keras.applications.resnet50 import ResNet50 case 1: model converges: case 2: model never ever converges: any ideas ? Edit: I forgot the activation layer softmax in case 2... |
I think I might be affected by a similar issue, except that I am using VGG16 instead. The training goes really well, to be honest a bit unexpectedly well, and it goes beyond 90%. I checked the model accuracy on a testing set and it gives good results and the same thing happens when I check with a confusion matrix. The problem is that, when I try to use the model to predict_classes() on my testing set (the same one that gave good results on accuracy and confusion matrix), the predictions are awfully bad. One class seems to be preferred over the others and I get 0 accuracy in 2 or the 5 classes. I was asked to check this post and I am wondering if anyone could help. Thanks a lot !! |
I am having the same issue as well.Inception v3 giving low validation accuracy but high training accuracy.What would be the suggestion to fix this? Epoch 1/30 |
Same here too I use ResNet50 for fine-tuning. I want to predict the results for two classes. In one class the validation and trainning accuracy is 45% and on the other it's 0% but I unfreeze the final set of conv layers. |
i tried to run resnet50 model for classifying colors of clothes, i used image net weights and added a globalavg pool layer, 2 dense and dropouts and final output layer with sigmoid/softmax. i frooze all the layers except the newly added ones then started training. Training goes well for some time and i managed to get some accuracies but when i tested my model, the predicitions are all wrong. can anybody help me solve this? |
I tried transfer learning on resnet50 in keras for two class classification problem. I only fine-tune the top fully-connected layer, while all other layers are frozen. I encounter the same problem: training accuracy is increasing as expected, but validation accuracy is only between 50-60%. Have any one had good solutions to overcome this problem in keras, please? |
@xhm1014 Pre-compute the vectors using resnet50, then train model with only the dense layers on the vectors. After training, join the dense layers to the resnet50 layers if you want to save the whole network. |
could you provide any example how to do that? i know that this w/a was suggested by the author of keras, but I couldn't find any way how to do that. It is really shame that official docs doesn't mention this. Any other model than VGG (without BN) is useless if you want to freeze any of the layers. |
@BraveDistribution Yea it is a bit strange. This is how I do it, (taken from my repo here: https://github.com/microfossil/particle-classification/blob/master/miso/training/model_trainer.py ) Functions to make the head and tail:
Note the two lambda layers to make the head. The pre-trained ResNet50 uses some pre-processing that takes away the channel averages (of ImageNet). In my datasets I use images scaled to the range [0,1] by divding by 255. So these two layers convert from [0,1] range to correct pre-processing used by the pre-trained network. You can remove them if you are using Now create the head and make the vectors:
Make the tail and train with these vectors:
Then if you want to have a network that takes image as input, join them:
and then you can do
Pre-calculating the vectors like this makes training very fast. You can even do it on a CPU. By the way, my repo is designed for a project where we are enabling non-ML people to train networks. If you are interested see the docs read me here: https://github.com/microfossil/particle-classification-examples and especially the google colab tutorial (and use resnet50_tl as the cnn type) |
I also noticed this while training the efficientNet model, which includes BatchNormalization. I observed that it seems like freezing the BN layer, leads to bad accuracy (wrong predictions) in validation phase, while in training phase everything looks fine. When un-freezing the BN, the test accuracy recovers. Here is an example notebook and here is some additional information: |
@digital-thinking I think it would be interesting to see the results if a very small batch_size(1/2) is used; that is the specific case in which for using pre-trained backbones(especially for segmentation), freezing the BN layer is recommended. |
@digital-thinking Thanks, can you point out which are the precise code parts you refer to? Do I understand you correctly:
Is this what you are saying in your post? |
Hi @Nestak2 yes this is what I had to do to get reasonable validation metrics. If you train only the top-layer and don't make Batch Normalization layers trainable, you won't get correct results. If the hole model is trainable, there is no problem. I guess it's because the BatchNormalization layer is completely fixed and therefore it does not normalize the batch anymore. |
@digital-thinking Thanks for clarifying! Unfortunately this strategy didn't work out for me - the validation loss and accuracy were bad in both the "warm-up" and the proper training phases. I post the heart part of my code and the training metrics. Can you spot what I am doing wrong? Tnx
This gives me this training metrics output, which shows a decent learning performance, but bad validation:
|
Did anyone find a workaround on the code side rather than having patch in the backend? I also tried @digital-thinking answer but that somehow doesn't work for me and the loss comes out to be -inf/nan for every epoch while the accuracy is increasing very slowly. |
The following code works as expected with vgg16 (no BN) but not with resnet50 or inception_v3 (BN). My hypothesis is that it's due to BN. The code follows "Fine-tune InceptionV3 on a new set of classes" from https://keras.io/applications/#usage-examples-for-image-classification-models
The text was updated successfully, but these errors were encountered: