Skip to content

Latest commit

 

History

History
133 lines (101 loc) · 5.48 KB

05-transfer-learning.md

File metadata and controls

133 lines (101 loc) · 5.48 KB

8.5 Transfer learning

Slides

Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task. Usually a pretrained model is trained with large volume of images and that is why the convolutional layers and vector representation of this model can be used for other tasks as well. However, the dense layers need to be retrained because they are specific to the dataset to make predictions. In our problem, we want to keep convoluational layers but we want to train new dense layers.

Following are the steps to create train/validation data for model:

# Build image generator for training (takes preprocessing input function)
train_gen = ImageDataGenerator(preprocessing_function=preprocess_input)

# Load in train dataset into train generator
train_ds = train_gen.flow_from_directory(directory=path/to/train_imgs_dir, # Train images directory
                                         target_size=(150,150), # resize images to train faster
                                         batch_size=32) # 32 images per batch

# Create image generator for validation
val_gen = ImageDataGenerator(preprocessing_function=preprocess_input)

# Load in image for validation
val_ds = val_gen.flow_from_directory(directory=path/to/val_imgs_dir, # Validation image directory
                                     target_size=(150,150),
                                     batch_size=32,
                                     shuffle=False) # False for validation

Following are the steps to build model from a pretrained model:

# Build base model
base_model = Xception(weights='imagenet',
                      include_top=False, # to create custom dense layer
                      input_shape=(150,150,3))

# Freeze the convolutional base by preventing the weights being updated during training
base_model.trainable = False

# Define expected image shape as input
inputs = keras.Input(shape=(150,150,3))

# Feed inputs to the base model
base = base_model(inputs, training=False) # set False because the model contains BatchNormalization layer

# Convert matrices into vectors using pooling layer
vectors = keras.layers.GlobalAveragePooling2D()(base)

# Create dense layer of 10 classes
outputs = keras.layers.Dense(10)(vectors)

# Create model for training
model = keras.Model(inputs, outputs)

Following are the steps to instantiate optimizer and loss function:

# Define learning rate
learning_rate = 0.01

# Create optimizer
optimizer = keras.optimizers.Adam(learning_rate=learning_rate)

# Define loss function
loss = keras.losses.CategoricalCrossentropy(from_logits=True) # to keep the raw output of dense layer without applying softmax

# Compile the model
model.compile(optimizer=optimizer,
              loss=loss,
              metrics=['accuracy']) # evaluation metric accuracy

The model is ready to train once it is defined and compiled:

# Train the model, validate it with validation data, and save the training history
history = model.fit(train_ds, epochs=10, validation_data=val_ds)

Classes, function, and attributes:

  • from tensorflow.keras.preprocessing.image import ImageDataGenerator: to read the image data and make it useful for training/validation
  • flow_from_directory(): method to read the images directly from the directory
  • next(train_ds): to unpack features and target variables
  • train_ds.class_indices: attribute to get classes according to the directory structure
  • GlobalAveragePooling2D(): accepts 4D tensor as input and operates the mean on the height and width dimensionalities for all the channels and returns vector representation of all images
  • CategoricalCrossentropy(): method to produces a one-hot array containing the probable match for each category in multi classification
  • model.fit(): method to train model
  • epochs: number of iterations over all of the training data
  • history.history: history attribute is a dictionary recording loss and metrics values (accuracy in our case) for at each epoch

Notes

Add notes from the video (PRs are welcome)

  • convolutional layers convert an image into a vector representation
  • dense layers use vector representations to make predictions
  • using a pretrained neural network
  • imagenet has 1000 different classes
  • a dense layer may be specific to a certain number of classes whereas the vector representation can be applied to another dataset
  • reusing the vector representation from convolutional layers means transferring knowledge and the idea behind transfer learning
  • train faster on smaller size images
  • the batch size
  • base model vs custom model
  • bottom layers vs top layers in keras
  • keras optimizers
  • using the adam optimizer
  • weights, learning rates
  • eta in xgboost
  • model loss
  • categorical cross entropy
  • changing accuracy during several training epochs
  • overfitting
⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.

Navigation