Keras-FCN

This is a Keras implementation of the fully convolutional network outlined in Shelhamer et al. (2016), which performs semantic image segmentation on the Pascal VOC dataset. My hope is that this document will be readable to people outside of deep learning, such as myself, who are looking to learn about fully convolutional networks.

In preparation, I found the following repos invaluable for reference:

https://github.com/shelhamer/fcn.berkeleyvision.org

https://github.com/fmahoudeau/FCN-Segmentation-TensorFlow/

https://github.com/aurora95/Keras-FCN

Introduction

The goal of semantic segmentation is to identify objects, like cars and dogs, in an image by labelling the corresponding groups of pixels according to their classes. For an introduction, see this article. As an example, below is an image and its labelled pixels.


Image	True label

A fully convolutional network (FCN) is an artificial neural network that performs semantic segmentation. The bottom layers of a FCN are those of a convolutional neural network (CNN), usually taken from a pre-trained network like VGGNet or GoogLeNet. The purpose of these layers is to perform classification on subregions of the image. The top layers of a FCN are transposed convolution/deconvolution layers, which upsample the results of the classification to the resolution of the original image. This gives us a label for each pixel. When upsampling, we can also utilize the intermediate layers of the CNN to improve the accuracy of the segmentation. For an introduction, see this article.

The Pascal VOC project is a dataset containing images whose pixels have been labeled according to 20 classes (excluding the background), which include aeroplanes, cars, and people. We will be performing semantic segmentation according to this dataset.

Data

The number of images with labels in the Pascal VOC dataset is augmented by the Berkeley Segmentation Boundaries Dataset (SBD), which contains 11,355 labelled images. However, there are 676 labelled images in the original Pascal VOC dataset that are missing from the SBD. We have divided our data as follows:

Training set: the SBD training set (8,498 images) + last 1,657 images (out of 2,857 total) of the SBD validation set + the 676 non-overlapping images of the Pascal VOC trainval set.
Validation set: first 1,200 images (out of 2,857 total) of the SBD validation set

In total, we have 10,831 training images and 1,200 validation images. The filenames of the training images are found in data/train_mat.txt and data/train_png.txt. The filenames of the validation images are found in data/val_mat.txt. If you want to duplicate our dataset, you can download the data/ folder of this repository, which contains the 676 extra images of the Pascal VOC dataset, and the SBD dataset from their website. After untarring, place the contents of benchmark_RELEASE/dataset/img into data/images_mat/ and benchmark_RELEASE/dataset/cls into data/labels_mat/.

data.ipynb puts the data into .tfrecords files, since it cannot all be loaded into RAM.

Model

We followed the steps in the original paper. Our model details can be found in models.py.

The base CNN is VGG16. First, the fully-connected layers are converted into convolutional layers. Second, the final layer of VGG16 that predicts 1000 classes is replaced by a layer that predicts the 21 Pascal VOC classes (including the background). Third, these predictions are fed into a deconvolution layer that upsampls 32x to the original resolution via bilinear interpolation. This defines the FCN32 network.

As previously mentioned, we utilize the intermediate layers of the CNN to improve the accuracy of the segmentation. For the FCN16 network, instead of upsampling 32x we first upsample 2x to get an output whose resolution matches that of the block4_pool layer of VGG16. We predicte 21 classes from block4_pool and add these two outputs together. This is upsampled 16x to get to the original resolution. A similar procedure is also done for the FCN8 network, where we additionally include predictions from the block3_pool layer of VGG16.

The training details can be found in train.ipynb. We trained each FCN32, FCN16, and FCN8 model from scratch for 25 epochs using the Adam optimizer at a fixed training rate of 1e-4, with L² regularization with strength 1e-6.

Results

Below are the predicted labels for an example image above, which is in the validation set.


True label	FCN32 prediction	FCN16 prediction	FCN8 prediction
Pixel accuracy:	0.968	0.980	0.972
Mean IoU:	0.774	0.854	0.787

The performance of these models on the validation set are summarized below.

Model	FCN32	FCN16	FCN8
Pixel accuracy:	0.896	0.899	0.898
Mean IoU:	0.568	0.575	0.576

At the time of writing, the Pascal VOC website was down so I could not evaluate on the test set. The weights for the models can be downloaded from this Google drive.

Next Steps

I am quite happy with the performance of the models given the relatively simple implementation and short training period. Our performance is slightly worse than that of Shelhamer. To get better performance, there are a couple of things that we still need to do:

Data set augmentation, such as cropping. This seems to be very important, but relatively easy to include.
Use ensemble methods

When I have time, I will get to these.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
assets		assets
class_names		class_names
data		data
logs		logs
LICENSE		LICENSE
README.md		README.md
augment.py		augment.py
data.ipynb		data.ipynb
models.py		models.py
train.ipynb		train.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keras-FCN

Introduction

Data

Model

Results

Next Steps

About

Releases

Packages

Languages

License

kevinddchen/Keras-FCN

Folders and files

Latest commit

History

Repository files navigation

Keras-FCN

Introduction

Data

Model

Results

Next Steps

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages