Name		Name	Last commit message	Last commit date
parent directory ..
data		data
pictures		pictures
runs/1525124229.2074473		runs/1525124229.2074473
LICENSE		LICENSE
README.md		README.md
helper.py		helper.py
main.ipynb		main.ipynb
main.py		main.py
project_tests.py		project_tests.py

README.md

Writeup

Semantic Segmentation Project

The goal of this project is:

To create a semantic segmentation model to identify pixels belonging to the road in an input image.
Create an encoder-decoder architecture.
Use a pre-trained ImageNet model such as VGG-16, AlexNet to do transfer learning.

Result

The FCN-8 model does reasonably well without any data augmentation or end-to-end training. Following are some of the test images:

Model

Architecture

The model is based on FCN-8 architecture described here. The model was replicated by looking at the code provided by the authors on GitHub. The encoder part of the model is based on VGG-16 architecture and is frozen during training. Only decoder layers are trained.

Picture source

Loss

Training the model on KITTI dataset for 20 epochs took ~10-15min of time on a NVidia Tital X (Pascal) GPU with 12GB of RAM.

The loss went down as follows:

Epoch:   0 Loss: 6.3983E-01
Epoch:   1 Loss: 4.3009E-01
Epoch:   2 Loss: 2.9034E-01
Epoch:   3 Loss: 2.3380E-01
Epoch:   4 Loss: 2.1056E-01
Epoch:   5 Loss: 1.9645E-01
Epoch:   6 Loss: 1.8708E-01
Epoch:   7 Loss: 1.7992E-01
Epoch:   8 Loss: 1.7388E-01
Epoch:   9 Loss: 1.6838E-01
Epoch:  10 Loss: 1.6340E-01
Epoch:  11 Loss: 1.5862E-01
Epoch:  12 Loss: 1.5447E-01
Epoch:  13 Loss: 1.5132E-01
Epoch:  14 Loss: 1.4740E-01
Epoch:  15 Loss: 1.4435E-01
Epoch:  16 Loss: 1.4205E-01
Epoch:  17 Loss: 1.3925E-01
Epoch:  18 Loss: 1.3744E-01
Epoch:  19 Loss: 1.3470E-01

Code

Implementation

The code in the main.py module trains a model based on the data downloaded in the data folder and runs the model on the test images.

Run

Run the following command to run the project:

python main.py

Setup

GPU

main.py will check to make sure you are using GPU - if you don't have a GPU on your system, you can use AWS or another cloud computing platform.

Dataset

Download the Kitti Road dataset from here. Extract the dataset in the data folder. This will create the folder data_road with all the training a test images.

Improvements

As is clear from the images above, the model suffers from the "Checkboard Artifact" effect of the transpose convolutional layers as illustrated here. Some of the solutions explained in the linked publication can be tried to get rid of them.
Efficient hyperparameter search - the model has a buch of hyperparameters such as the regularlization coefficients for the decoder layers, scaling factors for skip connections, batch size, learning rate, kernel sizes for transposed convolution layers, etc. I selected hyperparameters based on prior knowledge and through a trail and experiment basis.
One can be more intelligent in initializing kernels of transposed convolutions layers. I simply used Xavier initialization and it seemed to work fine. But, a few people have reported improvements by starting with kernels that do bilinear interpolation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CarND-Semantic-Segmentation

CarND-Semantic-Segmentation

README.md

Writeup

Semantic Segmentation Project

Result

Model

Architecture

Loss

Code

Implementation

Run

Setup

GPU

Dataset

Improvements

Files

CarND-Semantic-Segmentation

Directory actions

More options

Directory actions

More options

Latest commit

History

CarND-Semantic-Segmentation

Folders and files

parent directory

README.md

Writeup

Semantic Segmentation Project

Result

Model

Architecture

Loss

Code

Implementation

Run

Setup

GPU

Dataset

Improvements