The scope of this project is to develop a pipeline to process a video stream from a forward-facing camera mounted on the front of a car, and output an annotated video which detects vehicles.
The goals / steps of this project are the following:
- Perform a Histogram of Oriented Gradients (HOG) feature extraction on a labeled training set of images and train a classifier Linear SVM classifier;
- Also apply a color transform and append binned color features, as well as histograms of color, to the HOG feature vector.
- Implement a sliding-window technique and use the trained classifier to search for vehicles in images.
- Run the pipeline on a video stream (start with the test_video.mp4 and later implement on full project_video.mp4) and create a heat map of recurring detections frame by frame to reject outliers and follow detected vehicles.
- Estimate a bounding box for vehicles detected.
This project requires Python 3.5 with the following libraries/dependencies installed:
You will also need to have software installed to run and execute a Jupyter Notebook.
-
vehicle-detection.ipynb
- The main notebook of the project. -
helper_functions.py
- The script contained required helper functions. -
Vehicles.py
- The script contained a required Python class. -
README.md
- The writeup explained the image process pipeline and the project. -
combo-pipeline.ipynb
- The notebook for combined lane lines and vehicles detection. -
helpers_lanes.py
- The script contained required helper functions for lane lines. -
Line.py
- The script contained a required Python class for lane lines.
Here are links to the labeled data for vehicle and non-vehicle examples to train the classifier. These example images come from a combination of the GTI vehicle image database, the KITTI vision benchmark suite, and examples extracted from the project video itself.
The code for this step is contained in the 3rd code cell of the Jupyter notebook.
I started by reading in all the vehicle
and non-vehicle
images. Here is an example of one of each of the vehicle
and non-vehicle
classes:
I then explored different color spaces and different skimage.hog()
parameters (orientations
, pixels_per_cell
, and cells_per_block
). I grabbed random images from each of the two classes and displayed them to get a feel for what the skimage.hog()
output looks like.
Here is an example using the YCrCb
color space and HOG parameters of orientations=9
, pixels_per_cell=(8, 8)
and cells_per_block=(2, 2)
:
I tried various combinations of parameters and tested them with a linear SVM classifier. Eventually I used all three channels from the YCrCb
color space with HOG parameters of orientations=9
, pixels_per_cell=(8, 8)
and cells_per_block=(2, 2)
. This combination usually yield the best test accuracy on the linear SVM model.
This part of code is contained in the 4th and 5th code cells of the Jupyter notebook.
I trained a linear SVM using a combination of the spatial features, histogram features, and HOG features. First, the input image is converted to YCrCb
color space. The spatial features are extracted by resizing the image to 16 x 16 pixels
and the flattening the 2D image to 1D vector. The histogram features are extracted by computing a normalized histogram with 32 bins
on each channel and connecting all three channel vectors together. The HOG features are extracted by using the technique explained in the previous section.
All feature values are calculated with a distribution in the range from 0 to 1 to avoid any feature becoming dominant. It should be paid close attention that different image format (PNG vs JPEG) and OpenCV's cvtColor()
function may create different image data format with different value ranges.
Here is an example of final feature vector before scaling. It should always be checked before next step to make sure they distribute in the proper range.
The final constructed feature vector length is 6156. The features are then scaled, shuffled, and split to training and testing sets. An off-the-shelf linear Support Vector Machine (SVM) model are used to trained the data. The test accuracy is 0.99.
This section of code in contained in the 6th code cell of the Jupyter notebook.
A HOG sub-sampling window search is used to find matched objects in the image. Different scales of sliding window were tried and eventually a combination of 64 x 64, 80 x 80, 96 x 96, and 112 x 112 pixels sliding windows are implemented with scales = [1., 1.25, 1.5, 1.75]
. The combination of various sized windows makes sure generating enough number of bounding boxes for each detected object, therefore is beneficial for the next step of ruling out false positives.
Next step, a heatmap are created by combining overlapped boxes, and thresholded with a criterion to rule out false positives. Then a label function is applied to identify each detected object from the thresholded heatmap.
Here are some example images with sliding window scale of 1.5:
Ultimately I searched on four scales using YCrCb 3-channel HOG features plus spatially binned color and histograms of color in the feature vector, which provided a nice result.
Here is a table shown different ystart
and ystop
for different scales of sliding search window.
Scale |
ystart |
ystop |
---|---|---|
1.0 | 400 | 496 |
1.25 | 400 | 528 |
1.5 | 400 | 592 |
1.75 | 400 | 656 |
Here's a link to my video result.
And a GIF:
In order to fasten the process speed, the pipeline does a whole image HOG sub-sampling window search for every 12 frames, and a reduced window search for every 6 frames which only scans the region of interest at where vehicle objects previously detected.
The vehicle boundary boxes of recent 12 frames are stored in a Python deque
object with a length of 12. The current boundary boxes are calculated from a thresholded heatmap of accumulated boundary boxes over the recent 12 frames. These steps did not only rule out false positives over frames, but also smooth the drawing of vehicle boundary boxes.
Here's a link to my combined video result.
And a GIF:
One issue of using HOG features for classifier is that many parameters have to be adjusted and modified manually over trials and errors, and this process is hard to become automated. Also the process is not transferable to other format of video or other circumstances because a new set of parameters have to be found.
The vehicle data are only limited to sedans in this research. It doesn't include other types of vehicles such as trucks and motorcycles. But same strategy can be applied to broader selection of training data including more types of vehicles for practical usage.
Neural networks can be used as a classifier to replace the linear SVM, and I would expect a better accuracy on that. But process speed can an issue because a neural network is kind of slow comparing to the linear SVM.
Talking about the process speed, the current pipeline is still much slower than real time. I speed up the process by skipping frames, and a whole image window search is only performed every 12 frames. This brings up an issue that any new interested object driving into the image can only be detected with roughly half a second delay. This is a tradeoff between the process speed and the prompt detection ability.