"Fine-tune InceptionV3/ResNet50 on a new set of classes" doesn't work, while VGG16 works (suspect BN) #9214

ozabluda · 2018-01-28T04:00:37Z

The following code works as expected with vgg16 (no BN) but not with resnet50 or inception_v3 (BN). My hypothesis is that it's due to BN. The code follows "Fine-tune InceptionV3 on a new set of classes" from https://keras.io/applications/#usage-examples-for-image-classification-models

from keras.preprocessing import image
from keras.applications import resnet50, inception_v3, vgg16
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D, Input
from keras.optimizers import Adam
import numpy as np

batch_size = 50
num_classes = 2

#base_model = resnet50.ResNet50
#base_model = inception_v3.InceptionV3
base_model = vgg16.VGG16

base_model = base_model(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(num_classes, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
    layer.trainable = False

model.compile(loss='sparse_categorical_crossentropy',
              optimizer=Adam(lr=0.0001),
              metrics=['acc'])

x_train = np.random.normal(loc=127, scale=127, size=(50, 224,224,3))
y_train = np.array([0,1]*25)
x_train = resnet50.preprocess_input(x_train)

print(model.evaluate(x_train, y_train, batch_size=batch_size, verbose=0))
model.fit(x_train, y_train,
          epochs=100,
          batch_size=batch_size,
          shuffle=False,
          validation_data=(x_train, y_train))

The text was updated successfully, but these errors were encountered:

ozabluda · 2018-01-28T04:10:42Z

VGG16 output (works as expected):

both validation loss and acc before training (from evaluate()) are exactly equal to those in the first training iteration
validation loss/acc correspond to training loss/acc
training loss->0, acc->1, validation loss->0, acc->1.

[1.1005926132202148, 0.5]
Train on 50 samples, validate on 50 samples
Epoch 1/100
50/50 [==============================] - 1s 11ms/step - loss: 1.1006 - acc: 0.5000 - val_loss: 0.7771 - val_acc: 0.5800
Epoch 2/100
50/50 [==============================] - 0s 8ms/step - loss: 0.7771 - acc: 0.5800 - val_loss: 0.8947 - val_acc: 0.4600
Epoch 3/100
50/50 [==============================] - 0s 9ms/step - loss: 0.8947 - acc: 0.4600 - val_loss: 0.9511 - val_acc: 0.4800
Epoch 4/100
50/50 [==============================] - 0s 9ms/step - loss: 0.9511 - acc: 0.4800 - val_loss: 0.8385 - val_acc: 0.4600
Epoch 5/100
50/50 [==============================] - 0s 9ms/step - loss: 0.8385 - acc: 0.4600 - val_loss: 0.7341 - val_acc: 0.5400
Epoch 6/100
50/50 [==============================] - 0s 9ms/step - loss: 0.7341 - acc: 0.5400 - val_loss: 0.7455 - val_acc: 0.5600
Epoch 7/100
50/50 [==============================] - 0s 8ms/step - loss: 0.7455 - acc: 0.5600 - val_loss: 0.7991 - val_acc: 0.6000
Epoch 8/100
50/50 [==============================] - 0s 9ms/step - loss: 0.7991 - acc: 0.6000 - val_loss: 0.7902 - val_acc: 0.6000
Epoch 9/100
50/50 [==============================] - 0s 9ms/step - loss: 0.7902 - acc: 0.6000 - val_loss: 0.7258 - val_acc: 0.5800
Epoch 10/100
50/50 [==============================] - 0s 9ms/step - loss: 0.7258 - acc: 0.5800 - val_loss: 0.6727 - val_acc: 0.6400
[...]
Epoch 98/100
50/50 [==============================] - 0s 9ms/step - loss: 0.2272 - acc: 1.0000 - val_loss: 0.2246 - val_acc: 1.0000
Epoch 99/100
50/50 [==============================] - 0s 9ms/step - loss: 0.2246 - acc: 1.0000 - val_loss: 0.2221 - val_acc: 1.0000
Epoch 100/100
50/50 [==============================] - 0s 9ms/step - loss: 0.2221 - acc: 1.0000 - val_loss: 0.2196 - val_acc: 1.0000

ozabluda · 2018-01-28T04:17:34Z

resnet50 output (does not work as expected):

validation loss before training (from evaluate()) is nowhere near to that in the first training iteration (BN?)
validation loss/acc does not correspond to training loss/acc at all (BN?)
training loss->0, acc->1 very quickly (acc=1.0 starting from epoch 5), validation loss stays huge forever, acc=0.5 (random) forever.

[2.3405368328094482, 0.5]
Train on 50 samples, validate on 50 samples
Epoch 1/100
50/50 [==============================] - 1s 21ms/step - loss: 0.6806 - acc: 0.5400 - val_loss: 1.6767 - val_acc: 0.5000
Epoch 2/100
50/50 [==============================] - 0s 8ms/step - loss: 0.6061 - acc: 0.6400 - val_loss: 1.8632 - val_acc: 0.5000
Epoch 3/100
50/50 [==============================] - 0s 9ms/step - loss: 0.5088 - acc: 0.9000 - val_loss: 2.0533 - val_acc: 0.5000
Epoch 4/100
50/50 [==============================] - 0s 9ms/step - loss: 0.4437 - acc: 0.9200 - val_loss: 1.9083 - val_acc: 0.5000
Epoch 5/100
50/50 [==============================] - 0s 9ms/step - loss: 0.3799 - acc: 1.0000 - val_loss: 1.5847 - val_acc: 0.5000
Epoch 6/100
50/50 [==============================] - 0s 9ms/step - loss: 0.3222 - acc: 1.0000 - val_loss: 1.3209 - val_acc: 0.5000
Epoch 7/100
50/50 [==============================] - 0s 8ms/step - loss: 0.2816 - acc: 1.0000 - val_loss: 1.2207 - val_acc: 0.5000
Epoch 8/100
50/50 [==============================] - 0s 8ms/step - loss: 0.2439 - acc: 1.0000 - val_loss: 1.2348 - val_acc: 0.5000
Epoch 9/100
50/50 [==============================] - 0s 9ms/step - loss: 0.2089 - acc: 1.0000 - val_loss: 1.2679 - val_acc: 0.5000
Epoch 10/100
50/50 [==============================] - 0s 9ms/step - loss: 0.1824 - acc: 1.0000 - val_loss: 1.2359 - val_acc: 0.5000
[...]
Epoch 98/100
50/50 [==============================] - 0s 8ms/step - loss: 0.0032 - acc: 1.0000 - val_loss: 2.2686 - val_acc: 0.5000
Epoch 99/100
50/50 [==============================] - 0s 8ms/step - loss: 0.0032 - acc: 1.0000 - val_loss: 2.2791 - val_acc: 0.5000
Epoch 100/100
50/50 [==============================] - 0s 8ms/step - loss: 0.0031 - acc: 1.0000 - val_loss: 2.2894 - val_acc: 0.5000

ozabluda · 2018-01-28T21:00:59Z

The problem is not happening without

for layer in base_model.layers:
    layer.trainable = False

Note that this is repo master i.e with 24246ea already merged, see #8616 (comment)

ozabluda · 2018-02-13T01:30:58Z

Just checked with TF 1.5 and Keras master, and the behavior is unchanged. Also identical on CPU (which I didn't check before).

@fchollet, it appears to be a serious problem, because AFAIK I follow (simple) docs to the letter.

GougeC · 2018-03-14T00:26:38Z

I am having the exact same issue. Inception does not work but VGG does fine. InceptionV3 picks the same class every time no matter what the test set is

jax-max · 2018-04-02T08:48:32Z

same issue when I try ResNet50 in keras

Zamirquito · 2018-04-06T11:58:52Z

I have tried using different way to optimize(Adam, SGD, SGD with momentum and so on) when I trained the ResNet50, finetune and just freeze all the layers except fc layer, My training loss is decreasing, but val accu is increasing and then just as you say, stopping at 50% ~60%......

I have tried to ramdom sampling the data and used some data augmentation tricks, but those didn't work.

NProdanova · 2018-04-06T12:33:16Z

I am having the same problem with Inception-v3, while VGG19 works. I can also confirm that when I remove

layer.trainable = False

validation accuracy starts imroving.
I posted a question for a workaround on stackoverflow:
https://stackoverflow.com/questions/49689122/keras-inception-v3-fine-tuning-workaraound

Maybe somebody has a suggestion

ciprianfocsaneanu · 2018-04-18T20:47:04Z

I am having the same problem with ResNet50. I am doing transfer learning and the same dataset/code works for InceptionV3 and DenseNet121, but ResNet seems to always predict one class

datumbox · 2018-04-23T21:02:20Z

For all of you who are affected by this, please have a look at PR #9965. This probllem is caused by the way that the Batch Normalization layer is implemented in Keras.

To understand why this happens we need to understand how the BN works. When the network is in training mode, the mini-batch statistics of BN are used for training the network; when the network is in inference mode, we use the moving mean/var learned during the training. That's all good. The problem is how the layer behaves when it is frozen. Its side-effects are more profound when we use fine-tuning and Transfer Learning.

You see, when frozen and while in training mode the BN continues to use the mini-batch statistics for scaling the training data. This causes the unfrozen/trainable layers to adapt to the scale of the data. Unfortunately during inference mode (predictions) the network will switch to the moving mean/var. If the moving mean/var is different that the mini-batch statistics the data are scaled differently causing massive discrepancies on the accuracy. If you want more info, have a look at the PR.

jksmither · 2018-04-28T10:52:33Z

@ozabluda I'm facing the same problem on Keras 2.1.6 and TensorFlow 1.7. Did you find a solution?

@datumbox I tried installing your branch and it seems to work well. Unfortunately it is not synced with the latest version of Keras. Any plans to merge it with 2.1.6? If not I can do it.

datumbox · 2018-05-01T20:10:36Z

@jksmither Sorry for the late response. I just synced my branch with the latest master and provided a patched fork of 2.1.6. Honestly I would like to see this fixed on master as maintaining a separate fork with the patch is not a viable solution on the long term. I'll probably keep syncing it for as long as we use Keras at work but I can't make any promises.

shazamkash · 2018-06-05T10:09:48Z

Did anyone find a concrete solution to this problem? I am also affected by this problem and I am working on Keras 2.1.6 and TensorFlow 1.7 to train and test my data using InceptionV3 and Resnet50.

I am very new to deep learning and any help will be appreciated.

izharikov · 2018-06-05T13:30:03Z

@shazamkash

This is temp fix:

pip install -U --force-reinstall --no-dependencies git+https://github.com/datumbox/keras@bugfix/trainable_bn

So, this command reinstall keras with fixes, provided by @datumbox.
This works for me (Inception started train normally).

shazamkash · 2018-06-06T07:02:44Z

@izharikov @datumbox
Thank you so much for providing this information and the patch. Although are we expecting a permanent fix anytime soon?
Also has anyone faced this problem using just the TensorFlow? Any experience with that?

datumbox · 2018-06-06T12:00:33Z

Thanks for the input. I would advise using the fork 2.1.6 instead of the trainable_bn branch. This is because the fork is synced with the latest stable version of keras, while the trainable_bn even though it's more fresh it's not based on a finalized release.

Unfortunately there are no plans for a permanent fix at the moment. My PR #9965 was rejected (you can read the rational on the link) because it modifies the semantics of trainable. It's not the first time that the BatchNormalization layer forces us to update the semantics of trainable (see version 2.1.3) but it can take a while until such a change gets enough momentum. So maybe on the future if enough people complain about it, it will reopen. Until then I'll do my best to maintain the fork for those of you who are brave enough to mess with custom implementations.

drsxr · 2018-06-13T11:08:24Z

Wow. Was spinning my wheels for a while with ResNet50 training trying to fine tune until I found these threads. Same problems. So batchnorm in Keras = no fine tuning? Either paste the FC layer on top of trained weights (imagenet) or train from scratch. I'm working with smaller N's so training from scratch with augmentation is not a defacto solution. A shame too because Transfer Learning is looking more attractive lately - see : "Do Better ImageNet Models Transfer Better?"

Apart from @datumbox's patch, or moving over to another framework, any other workarounds?
@fchollet suggested this in the main discussion thread on Vasils' PR (#9965):

But even if you don't, you can still do your style of fine-tuning in this case:

set learning phase to 0
load model
retrieve features you want to train on
set learning phase to 1
add new layers on top
optionally load weights from initial model layers to corresponding new layers
train

joeyearsley · 2018-06-18T00:14:36Z

for layer in base_model.layers:
        if hasattr(layer, 'moving_mean') and hasattr(layer, 'moving_variance'):
            layer.trainable = True
            K.eval(K.update(layer.moving_mean, K.zeros_like(layer.moving_mean)))
            K.eval(K.update(layer.moving_variance, K.zeros_like(layer.moving_variance)))
        else:
            layer.trainable = False

Reset the batch norm moving averages and allow them to update to the new dataset - you'll see it transfer.

I'm writing a longer update on this matter and will open issues (easy PRs) so people can help contribute to fixing the documentation and the like.

datumbox · 2018-06-18T19:26:17Z

Unfreezing the BNs while keeping the subsequent Convolutions frozen can have negative effects on accuracy. I describe this in more detail here.

AndreGuerra123 · 2018-08-18T18:38:00Z

Just a simple question: why the value 1024 as units in the last dense layer?
x = Dense(1024, activation='relu')(x)

ozabluda · 2018-08-19T18:45:09Z

x = Dense(1024, activation='relu')(x) is because

The code follows "Fine-tune InceptionV3 on a new set of classes" from https://keras.io/applications/#usage-examples-for-image-classification-models

For the purpose of this Issue, I was following the docs literally for stronger effect to make my point.

xpngzhng · 2018-08-20T05:46:42Z

I fine tune keras pretrained model on my own dataset. I freeze some the layers in the early stage. I got decent validation accuracy on VGG, but bad validation accuracy on ResNet50.

VGG
epoch 1: train_acc 0.546514682730133, val_acc 0.6607583973804312
epoch16: train_acc 0.9279250402631126, val_acc 0.7440402661072892
There is overfitting.

ResNet50
epoch 1: train_acc 0.7301661501087283, val_acc 0.04389513340162522
Then I terminate the training.

I think this may be caused by BatchNormalization.

I once used keras-retinanet https://github.com/fizyr/keras-retinanet to train on my own dataset, which worked very well. So I want to find out the reason. RetinaNet uses ResNet as backbone, and BatchNormalization layers are frozen, see https://github.com/fizyr/keras-retinanet/blob/master/keras_retinanet/models/resnet.py#L98

The ResNet in that project is borrowed from another repo keras-resnet https://github.com/broadinstitute/keras-resnet. In this ResNet implementation, the authors customize the BatchNormalize layer, see
https://github.com/broadinstitute/keras-resnet/blob/master/keras_resnet/layers/_batch_normalization.py

    def call(self, *args, **kwargs):
        # return super.call, but set training
        return super(BatchNormalization, self).call(training=(not self.freeze), *args, **kwargs)

It seems that this operation is what @fchollet recommends in @datumbox 's PR
#9965 (comment)

I think it would be better to use keras-resnet https://github.com/broadinstitute/keras-resnet for fine tuning. I have not tried yet.

Yesterday I tried fine tuning InceptionV3 on the same dataset, with half of the layers set untrainable. But it is somewhat strange that the validation accuracy is quite well.

InceptionV3
epoch 1: train_acc 0.7295633685318796, val_acc 0.7712759384936045
epoch 10: train_acc 0.9254250409199613, val_acc 0.7986968871106656

The code I use is something like this https://gist.github.com/XupingZHENG/1e20d54a70c8e04912c0b37fa7e7b931

cesarorosco · 2018-09-18T18:00:32Z

I have the same problem with Resnet50.

This seems to work.

-Set the learning phase to 1
-In every batch normalization layer set Training=False

After that I get the correct accuracy.

K.set_learning_phase(1)

base_model = ResNet50(weights='imagenet', include_top=False)

x = base_model.output

x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(2, activation='softmax')(x)

model = Model(inputs=base_model.input, outputs=predictions)

for layer in base_model.layers:
    layer.trainable = False
    
    if layer.name.startswith('bn'):
        layer.call(layer.input, training=False)

datumbox · 2018-09-28T10:52:55Z

@cesarorosco This means that you network runs always on Training mode. Even when you make predictions you use the mini-batch statistics. This is not great as your predictions will change depending on what images you pass on the batch.

lpenet · 2018-12-02T23:43:18Z

If I proceed as @cesarorosco mentionned, things go even worse and I stick around a ~0.5 acc on the learning and validation sets...

It is a bit disappointing to have this kind of problem, especially when this kind of stuff is presented as too simple when using Keras...

It is such a great tool, anyway.

chabir · 2019-01-08T21:45:50Z

All,
I don't know if this is completely related but I have been refering to this issue for several days now. I used the patch mentioned by @datumbox and I observed the following:

Keras version: 2.2.4

from keras.applications.resnet50 import ResNet50
from keras.layers import Dense, GlobalAveragePooling2D

case 1: model converges:
base_model = ResNet50(input_shape=(img_size, img_size, 3), weights=None, classes=5004)

case 2: model never ever converges:
res50 = ResNet50(input_shape=(img_size, img_size, 3), weights=None, include_top=False)
x = GlobalAveragePooling2D()(res50.output)
output = Dense(5004)(x)
base_model = Model(inputs=[res50.input], outputs=[output])

any ideas ?

Edit: I forgot the activation layer softmax in case 2...

Wazaki-Ou · 2019-02-20T05:02:35Z

I think I might be affected by a similar issue, except that I am using VGG16 instead. The training goes really well, to be honest a bit unexpectedly well, and it goes beyond 90%. I checked the model accuracy on a testing set and it gives good results and the same thing happens when I check with a confusion matrix. The problem is that, when I try to use the model to predict_classes() on my testing set (the same one that gave good results on accuracy and confusion matrix), the predictions are awfully bad. One class seems to be preferred over the others and I get 0 accuracy in 2 or the 5 classes. I was asked to check this post and I am wondering if anyone could help. Thanks a lot !!

rohan19250 · 2019-02-21T12:48:03Z

I am having the same issue as well.Inception v3 giving low validation accuracy but high training accuracy.What would be the suggestion to fix this?

Epoch 1/30
185/185 [==============================] - 5478s 30s/step - loss: 0.1161 - acc: 0.9493 - val_loss: 2.4898 - val_acc: 0.5678
Epoch 2/30
185/185 [==============================] - 5453s 29s/step - loss: 0.0362 - acc: 0.9861 - val_loss: 1.1530 - val_acc: 0.7678
Epoch 3/30
185/185 [==============================] - 5457s 29s/step - loss: 0.0280 - acc: 0.9902 - val_loss: 5.4614 - val_acc: 0.4506
Epoch 4/30
185/185 [==============================] - 5458s 30s/step - loss: 0.0184 - acc: 0.9934 - val_loss: 5.2297 - val_acc: 0.5117
Epoch 5/30
185/185 [==============================] - 5474s 30s/step - loss: 0.0146 - acc: 0.9954 - val_loss: 4.2587 - val_acc: 0.5586
Epoch 6/30
185/185 [==============================] - 5463s 30s/step - loss: 0.0113 - acc: 0.9965 - val_loss: 4.5049 - val_acc: 0.6019
Epoch 7/30
185/185 [==============================] - 5467s 30s/step - loss: 0.0099 - acc: 0.9972 - val_loss: 6.9422 - val_acc: 0.3551
Epoch 8/30
185/185 [==============================] - 5467s 30s/step - loss: 0.0099 - acc: 0.9969 - val_loss: 5.8211 - val_acc: 0.4901
Epoch 9/30
185/185 [==============================] - 5466s 30s/step - loss: 0.0112 - acc: 0.9965 - val_loss: 5.2108 - val_acc: 0.5518
Epoch 10/30
185/185 [==============================] - 5471s 30s/step - loss: 0.0113 - acc: 0.9964 - val_loss: 6.1660 - val_acc: 0.5092
Epoch 11/30
104/185 [===============>..............] - ETA: 37:44 - loss: 0.0140 - acc: 0.9958

pchris24 · 2019-03-07T18:41:06Z

same issue when I try ResNet50 in keras

Same here too

I use ResNet50 for fine-tuning. I want to predict the results for two classes. In one class the validation and trainning accuracy is 45% and on the other it's 0% but I unfreeze the final set of conv layers.

AkshayRoy · 2019-04-02T21:19:59Z

i tried to run resnet50 model for classifying colors of clothes, i used image net weights and added a globalavg pool layer, 2 dense and dropouts and final output layer with sigmoid/softmax. i frooze all the layers except the newly added ones then started training. Training goes well for some time and i managed to get some accuracies but when i tested my model, the predicitions are all wrong. can anybody help me solve this?

xhm1014 · 2019-08-12T20:24:35Z

I tried transfer learning on resnet50 in keras for two class classification problem. I only fine-tune the top fully-connected layer, while all other layers are frozen. I encounter the same problem: training accuracy is increasing as expected, but validation accuracy is only between 50-60%.

Have any one had good solutions to overcome this problem in keras, please?

geometrikal · 2019-08-25T05:07:02Z

@xhm1014 Pre-compute the vectors using resnet50, then train model with only the dense layers on the vectors. After training, join the dense layers to the resnet50 layers if you want to save the whole network.

BraveDistribution · 2019-08-25T16:12:44Z

@geometrikal

could you provide any example how to do that? i know that this w/a was suggested by the author of keras, but I couldn't find any way how to do that.

It is really shame that official docs doesn't mention this. Any other model than VGG (without BN) is useless if you want to freeze any of the layers.

geometrikal · 2019-08-25T20:59:50Z

@BraveDistribution Yea it is a bit strange.

This is how I do it, (taken from my repo here: https://github.com/microfossil/particle-classification/blob/master/miso/training/model_trainer.py )

Functions to make the head and tail:

def resnet50_head(input_shape):
    inputs = Input(shape=input_shape)
    x = Lambda(lambda y: tf.reverse(y, axis=[-1]))(inputs)
    x = Lambda(lambda y: y * tf.constant(255.0)
                         - tf.reshape(tf.constant([103.939, 116.779, 128.68]),
                                      [1, 1, 1, 3]))(x)
    x = resnet50.ResNet50(include_top=False,
                          weights='imagenet',
                          pooling='avg')(x)
    model = Model(inputs=inputs, outputs=x)
    model.get_layer('resnet50').trainable = False
    return model


def tl_tail(nb_classes):
    model = Sequential()
    model.add(Dropout(0.05))
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(0.15))
    model.add(Dense(512, activation='relu'))
    model.add(Dense(nb_classes, activation='softmax'))
    return model

Note the two lambda layers to make the head. The pre-trained ResNet50 uses some pre-processing that takes away the channel averages (of ImageNet). In my datasets I use images scaled to the range [0,1] by divding by 255. So these two layers convert from [0,1] range to correct pre-processing used by the pre-trained network. You can remove them if you are using resnet50.preprocessing() to create the images.

Now create the head and make the vectors:

model_head = resnet50_head(input_shape=(224,224,3))
train_vectors = model_head.predict(train_images)
test_vectors = model_head.predict(test_images)

Make the tail and train with these vectors:

model_tail = tl_tail(nb_classes=N)
model_tail.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history = model_tail.fit(train_vector,
                                 train_onehots,
                                 validation_data=(test_vector, test_onehots),
                                 epochs=max_epochs,
                                 batch_size=batch_size,
                                 shuffle=True,
                                 verbose=0)

Then if you want to have a network that takes image as input, join them:

outputs = model_tail(model_head.outputs[0])
model = Model(model_head.inputs[0], outputs)

and then you can do

train_preds = model.predict(train_images) etc

Pre-calculating the vectors like this makes training very fast. You can even do it on a CPU.

By the way, my repo is designed for a project where we are enabling non-ML people to train networks. If you are interested see the docs read me here: https://github.com/microfossil/particle-classification-examples and especially the google colab tutorial (and use resnet50_tl as the cnn type)

digital-thinking · 2019-10-06T11:10:51Z

I also noticed this while training the efficientNet model, which includes BatchNormalization. I observed that it seems like freezing the BN layer, leads to bad accuracy (wrong predictions) in validation phase, while in training phase everything looks fine. When un-freezing the BN, the test accuracy recovers.

Here is an example notebook and here is some additional information:

TimbusCalin · 2019-11-20T08:28:55Z

@digital-thinking I think it would be interesting to see the results if a very small batch_size(1/2) is used; that is the specific case in which for using pre-trained backbones(especially for segmentation), freezing the BN layer is recommended.

Nestak2 · 2020-07-01T16:51:33Z

@digital-thinking Thanks, can you point out which are the precise code parts you refer to? Do I understand you correctly:

first we need to train the model a few epochs with the setting

for layer in base_model.layers:
    if isinstance(layer, BatchNormalization):
        layer.trainable = True
    else:
        layer.trainable = False

then we need to run the very same trained model more epochs with the settings

for layer in model.layers:
    layer.trainable = True

Is this what you are saying in your post?

digital-thinking · 2020-07-01T17:13:42Z

Hi @Nestak2 yes this is what I had to do to get reasonable validation metrics. If you train only the top-layer and don't make Batch Normalization layers trainable, you won't get correct results. If the hole model is trainable, there is no problem. I guess it's because the BatchNormalization layer is completely fixed and therefore it does not normalize the batch anymore.

Nestak2 · 2020-07-01T17:55:05Z

@digital-thinking Thanks for clarifying! Unfortunately this strategy didn't work out for me - the validation loss and accuracy were bad in both the "warm-up" and the proper training phases. I post the heart part of my code and the training metrics. Can you spot what I am doing wrong? Tnx

image_input = Input(shape=data_shape)
basemodel = ResNet50(input_tensor=image_input, include_top=True,weights='imagenet')
x = basemodel.get_layer('avg_pool').output

x = Flatten(name='flatten')(x)
x = Dense(512, activation="relu")(x)
x = Dropout(0.5)(x)

out = Dense(num_classes, activation='softmax', name='output_layer')(x)
custom_resnet_model = Model(inputs=image_input,outputs= out)

custom_resnet_model.compile(Adam(lr=0.001),loss='categorical_crossentropy',metrics=['accuracy'])
model = custom_resnet_model

######################
#### do the warm-up training for a the head of the model for a few epochs.
#### the tail of the model is not being trained 
for layer in model.layers:
    if isinstance(layer, BatchNormalization):
        layer.trainable = True
    else:
        layer.trainable = False

H = model.fit_generator(
    generator = train_generator,
    steps_per_epoch=train_generator.n//train_generator.batch_size,
    validation_data=valid_generator,
    validation_steps=valid_generator.n//valid_generator.batch_size,
    # class_weight=class_weights,
    epochs=15
    )
#######################


#######################
#### do the proper training of all the layers
for layer in model.layers:
    layer.trainable = True

print('check of validation ims 2:', valid_generator.filepaths[:5])

H = model.fit_generator(
    generator = train_generator,
    steps_per_epoch=train_generator.n//train_generator.batch_size,
    validation_data=valid_generator,
    validation_steps=valid_generator.n//valid_generator.batch_size,
    # class_weight=class_weights,
    epochs=num_epochs
    )
########################

This gives me this training metrics output, which shows a decent learning performance, but bad validation:

 9/10 [==========================>...] - ETA: 7s - loss: 0.9657 - acc: 0.7240 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 13s 171ms/step - loss: 26.6340 - acc: 0.1757
10/10 [==============================] - 85s 9s/step - loss: 0.9069 - acc: 0.7391 - val_loss: 29.8742 - val_acc: 0.1757
Epoch 2/15
 9/10 [==========================>...] - ETA: 5s - loss: 0.2085 - acc: 0.9446 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 9s 121ms/step - loss: 40.7067 - acc: 0.1081
10/10 [==============================] - 67s 7s/step - loss: 0.2051 - acc: 0.9487 - val_loss: 51.1770 - val_acc: 0.1081
Epoch 3/15
 9/10 [==========================>...] - ETA: 5s - loss: 0.2235 - acc: 0.9464 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 9s 121ms/step - loss: 222.0811 - acc: 0.1757
10/10 [==============================] - 62s 6s/step - loss: 0.2124 - acc: 0.9490 - val_loss: 284.9632 - val_acc: 0.1757
Epoch 4/15
 9/10 [==========================>...] - ETA: 5s - loss: 0.1671 - acc: 0.9554 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 9s 120ms/step - loss: 27.6185 - acc: 0.1892
10/10 [==============================] - 64s 6s/step - loss: 0.1536 - acc: 0.9599 - val_loss: 50.7521 - val_acc: 0.1892
Epoch 5/15
 9/10 [==========================>...] - ETA: 5s - loss: 0.0500 - acc: 0.9844 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 9s 120ms/step - loss: 30.1733 - acc: 0.1757
10/10 [==============================] - 65s 7s/step - loss: 0.0565 - acc: 0.9828 - val_loss: 40.2771 - val_acc: 0.1757
Epoch 6/15
 9/10 [==========================>...] - ETA: 5s - loss: 0.0466 - acc: 0.9853 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 9s 120ms/step - loss: 13.8834 - acc: 0.1757
10/10 [==============================] - 62s 6s/step - loss: 0.0429 - acc: 0.9868 - val_loss: 18.6932 - val_acc: 0.1757
Epoch 7/15
 9/10 [==========================>...] - ETA: 5s - loss: 0.0214 - acc: 0.9946 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 9s 122ms/step - loss: 20.4458 - acc: 0.1757
10/10 [==============================] - 64s 6s/step - loss: 0.0388 - acc: 0.9904 - val_loss: 25.5468 - val_acc: 0.1757
Epoch 8/15
 9/10 [==========================>...] - ETA: 5s - loss: 0.0701 - acc: 0.9821 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 9s 121ms/step - loss: 14.2723 - acc: 0.1892
10/10 [==============================] - 65s 6s/step - loss: 0.0732 - acc: 0.9824 - val_loss: 24.9597 - val_acc: 0.1892
Epoch 9/15
 9/10 [==========================>...] - ETA: 5s - loss: 0.0761 - acc: 0.9768 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 9s 121ms/step - loss: 2.8525 - acc: 0.1622
10/10 [==============================] - 64s 6s/step - loss: 0.0689 - acc: 0.9792 - val_loss: 3.2069 - val_acc: 0.1622
Epoch 10/15
 9/10 [==========================>...] - ETA: 5s - loss: 0.0368 - acc: 0.9861 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 9s 123ms/step - loss: 6.5042 - acc: 0.1622
10/10 [==============================] - 64s 6s/step - loss: 0.0332 - acc: 0.9872 - val_loss: 6.5715 - val_acc: 0.1622
Epoch 11/15
 9/10 [==========================>...] - ETA: 5s - loss: 0.0753 - acc: 0.9826 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 9s 122ms/step - loss: 2.4013 - acc: 0.1622
10/10 [==============================] - 66s 7s/step - loss: 0.1494 - acc: 0.9750 - val_loss: 2.6977 - val_acc: 0.1622
Epoch 12/15
 9/10 [==========================>...] - ETA: 5s - loss: 0.1294 - acc: 0.9653 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 9s 115ms/step - loss: 17.0063 - acc: 0.1622
10/10 [==============================] - 65s 6s/step - loss: 0.1289 - acc: 0.9656 - val_loss: 18.5074 - val_acc: 0.1622
Epoch 13/15
 9/10 [==========================>...] - ETA: 5s - loss: 0.0855 - acc: 0.9816 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 9s 120ms/step - loss: 180.1986 - acc: 0.1622
10/10 [==============================] - 66s 7s/step - loss: 0.0788 - acc: 0.9836 - val_loss: 174.2110 - val_acc: 0.1622
Epoch 14/15
 9/10 [==========================>...] - ETA: 5s - loss: 0.0823 - acc: 0.9792 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 9s 120ms/step - loss: 211.2853 - acc: 0.1622
10/10 [==============================] - 66s 7s/step - loss: 0.0802 - acc: 0.9797 - val_loss: 199.9975 - val_acc: 0.1622
Epoch 15/15
 9/10 [==========================>...] - ETA: 5s - loss: 0.1303 - acc: 0.9821 Epoch 1/15
74/10 [==============================================================================================================================================================================================================================] - 9s 120ms/step - loss: 17.8702 - acc: 0.1622
10/10 [==============================] - 63s 6s/step - loss: 0.1164 - acc: 0.9840 - val_loss: 18.2201 - val_acc: 0.1622
check of validation ims 2: ['./ims_train2/bottle/20200623_202617.jpg', './ims_train2/bottle/20200623_202619.jpg', './ims_train2/bottle/20200623_202624.jpg', './ims_train2/bottle/20200623_202629.jpg', './ims_train2/bottle/20200623_202636.jpg']
Epoch 1/60
 9/10 [==========================>...] - ETA: 6s - loss: 0.1154 - acc: 0.9653 Epoch 1/60
74/10 [==============================================================================================================================================================================================================================] - 9s 116ms/step - loss: 4.5718 - acc: 0.1622
10/10 [==============================] - 74s 7s/step - loss: 0.1057 - acc: 0.9688 - val_loss: 7.2388 - val_acc: 0.1622
Epoch 2/60
 9/10 [==========================>...] - ETA: 5s - loss: 0.1150 - acc: 0.9724 Epoch 1/60
74/10 [==============================================================================================================================================================================================================================] - 9s 121ms/step - loss: 5.0241 - acc: 0.1892
10/10 [==============================] - 66s 7s/step - loss: 0.1047 - acc: 0.9753 - val_loss: 7.6464 - val_acc: 0.1892
Epoch 3/60
 9/10 [==========================>...] - ETA: 5s - loss: 0.4038 - acc: 0.9714 Epoch 1/60
74/10 [==============================================================================================================================================================================================================================] - 9s 122ms/step - loss: 9.9850 - acc: 0.1757
10/10 [==============================] - 64s 6s/step - loss: 0.3883 - acc: 0.9679 - val_loss: 11.0730 - val_acc: 0.1757
Epoch 4/60
 9/10 [==========================>...] - ETA: 5s - loss: 0.3822 - acc: 0.8982 Epoch 1/60
74/10 [==============================================================================================================================================================================================================================] - 9s 125ms/step - loss: 2.7105 - acc: 0.1892
10/10 [==============================] - 64s 6s/step - loss: 0.4216 - acc: 0.8926 - val_loss: 5.3383 - val_acc: 0.1892
Epoch 5/60
 9/10 [==========================>...] - ETA: 5s - loss: 0.2122 - acc: 0.9427 Epoch 1/60
74/10 [==============================================================================================================================================================================================================================] - 9s 125ms/step - loss: 2599176.3548 - acc: 0.1892
10/10 [==============================] - 66s 7s/step - loss: 0.2137 - acc: 0.9438 - val_loss: 4664818.5203 - val_acc: 0.1892
Epoch 6/60
 9/10 [==========================>...] - ETA: 5s - loss: 0.2319 - acc: 0.9761 Epoch 1/60
74/10 [==============================================================================================================================================================================================================================] - 9s 124ms/step - loss: 821284.6876 - acc: 0.1892
10/10 [==============================] - 63s 6s/step - loss: 0.2161 - acc: 0.9770 - val_loss: 1461603.6436 - val_acc: 0.1892

therealansh · 2021-05-21T09:56:56Z

Did anyone find a workaround on the code side rather than having patch in the backend? I also tried @digital-thinking answer but that somehow doesn't work for me and the loss comes out to be -inf/nan for every epoch while the accuracy is increasing very slowly.

ozabluda mentioned this issue Apr 23, 2018

Change BN layer to use moving mean/var if frozen #9965

Closed

jksmither mentioned this issue Apr 28, 2018

Fine-tuning InceptionV3: High accuracy during training, low during predictions #10045

Closed

This was referenced Jun 10, 2018

tf.keras.model_to_estimator doesn't work well in evaluating tensorflow/tensorflow#17950

Closed

keras.layers.BatchNormalization update_ops not added tensorflow/tensorflow#19643

Closed

chaohuang mentioned this issue Jun 28, 2018

How to do transfer learning with InceptionV3/ResNet50 #10554

Closed

gkarampatsakis mentioned this issue Jul 14, 2018

Extremely low accuracy after finetuning InceptionV3 (not overfitting) #10214

Closed

This was referenced Dec 1, 2018

Pretrained Networks google-deepmind/graph_nets#31

Closed

Best way to use pre-trained nets in Sonnet google-deepmind/sonnet#107

Closed

RomainCendre mentioned this issue Dec 10, 2018

Transfer Learning - Single model VS Feature extractor and single Dense #11803

Closed

adityapatadia mentioned this issue Feb 2, 2019

Keras fine tuned VGG16 with good accuracy gives many wrong predictions #11590

Closed

rightnknow mentioned this issue Feb 14, 2019

[Progress Report] Implementation of final decision layer capstone496/SpeechSentiments#14

Open

gabegrand mentioned this issue Jun 13, 2019

Improve interface for warmstarting non-TRAINABLE variables in warm_starting_util tensorflow/tensorflow#29748

Closed

datumbox mentioned this issue Mar 12, 2020

Align BN trainable behaviour with TF 2.0 #13892

Closed

Gareth001 mentioned this issue Jul 23, 2020

TCAV on models with Transfer Learning Gareth001/tcav-talk#1

Open

fchollet closed this as completed Jun 24, 2021

"Fine-tune InceptionV3/ResNet50 on a new set of classes" doesn't work, while VGG16 works (suspect BN) #9214

"Fine-tune InceptionV3/ResNet50 on a new set of classes" doesn't work, while VGG16 works (suspect BN) #9214

Comments

ozabluda commented Jan 28, 2018

ozabluda commented Jan 28, 2018 • edited Loading

ozabluda commented Jan 28, 2018

ozabluda commented Jan 28, 2018 • edited Loading

ozabluda commented Feb 13, 2018

GougeC commented Mar 14, 2018

jax-max commented Apr 2, 2018

Zamirquito commented Apr 6, 2018 • edited Loading

NProdanova commented Apr 6, 2018 • edited Loading

ciprianfocsaneanu commented Apr 18, 2018 • edited Loading

datumbox commented Apr 23, 2018

jksmither commented Apr 28, 2018

datumbox commented May 1, 2018

shazamkash commented Jun 5, 2018

izharikov commented Jun 5, 2018

shazamkash commented Jun 6, 2018

datumbox commented Jun 6, 2018

drsxr commented Jun 13, 2018 • edited Loading

joeyearsley commented Jun 18, 2018 • edited Loading

datumbox commented Jun 18, 2018

AndreGuerra123 commented Aug 18, 2018

ozabluda commented Aug 19, 2018

xpngzhng commented Aug 20, 2018 • edited Loading

cesarorosco commented Sep 18, 2018

datumbox commented Sep 28, 2018

lpenet commented Dec 2, 2018

chabir commented Jan 8, 2019 • edited Loading

Wazaki-Ou commented Feb 20, 2019

rohan19250 commented Feb 21, 2019

pchris24 commented Mar 7, 2019 • edited Loading

AkshayRoy commented Apr 2, 2019 • edited Loading

xhm1014 commented Aug 12, 2019

geometrikal commented Aug 25, 2019

BraveDistribution commented Aug 25, 2019

geometrikal commented Aug 25, 2019

digital-thinking commented Oct 6, 2019 • edited Loading

TimbusCalin commented Nov 20, 2019

Nestak2 commented Jul 1, 2020

digital-thinking commented Jul 1, 2020 • edited Loading

Nestak2 commented Jul 1, 2020

therealansh commented May 21, 2021

ozabluda commented Jan 28, 2018 •

edited

Loading

ozabluda commented Jan 28, 2018 •

edited

Loading

Zamirquito commented Apr 6, 2018 •

edited

Loading

NProdanova commented Apr 6, 2018 •

edited

Loading

ciprianfocsaneanu commented Apr 18, 2018 •

edited

Loading

drsxr commented Jun 13, 2018 •

edited

Loading

joeyearsley commented Jun 18, 2018 •

edited

Loading

xpngzhng commented Aug 20, 2018 •

edited

Loading

chabir commented Jan 8, 2019 •

edited

Loading

pchris24 commented Mar 7, 2019 •

edited

Loading

AkshayRoy commented Apr 2, 2019 •

edited

Loading

digital-thinking commented Oct 6, 2019 •

edited

Loading

digital-thinking commented Jul 1, 2020 •

edited

Loading