Detection and tracking of vehicle on the road in near real-time.
- Color histogram feature extraction
- Spatial bins subsamples feature extraction
- HOG feature extraction
- SVM linear classifier trained on a labeled binary class dataset
- Sliding window search with temporal heatmap false-positive reduction
import numpy as np
import cv2
import os
import glob
import matplotlib.pyplot as plt
%matplotlib inline
from skimage.feature import hog
from sklearn.preprocessing import StandardScaler
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.svm import LinearSVC
import time
from moviepy.editor import VideoFileClip
plt.rcParams['figure.figsize'] = [14,8]
plt.rcParams['figure.frameon'] = False
plt.rcParams['figure.edgecolor'] = 'none'
plt.rcParams['figure.facecolor'] = 'none'
plt.rcParams['ytick.left'] = False
plt.rcParams['ytick.right'] = False
plt.rcParams['xtick.bottom'] = False
plt.rcParams['xtick.top'] = False
plt.rcParams['ytick.labelleft'] = False
plt.rcParams['ytick.labelright'] = False
plt.rcParams['xtick.labelbottom'] = False
plt.rcParams['xtick.labeltop'] = False
plt.rcParams['axes.grid'] = False
plt.rcParams['image.cmap'] = 'gray'
plt.rcParams['figure.subplot.hspace'] = 0.01
plt.rcParams['figure.subplot.wspace'] = 0.01
plt.rcParams['image.interpolation'] = 'bilinear'
VEHICLE_PATH = './vehicles/**/*.png'
vehicle_paths = vehicle = glob.glob(VEHICLE_PATH)
vehicle = [cv2.imread(path, cv2.IMREAD_COLOR)[...,::-1] for path in vehicle_paths]
NOT_VEHICLE_PATH = './non-vehicles/**/*.png'
not_vehicle_paths = glob.glob(NOT_VEHICLE_PATH)
not_vehicle = [cv2.imread(path, cv2.IMREAD_COLOR)[...,::-1] for path in not_vehicle_paths]
print('vehicle sample count:',len(vehicle))
print('vehicle image shape:',vehicle[0].shape)
print('vehicle data type:',vehicle[0].dtype)
vehicle sample count: 8792
vehicle image shape: (64, 64, 3)
vehicle data type: uint8
print('not vehicle sample count:',len(not_vehicle))
print('not vehicle image shape:',not_vehicle[0].shape)
print('not vehicle data type:',not_vehicle[0].dtype)
not vehicle sample count: 8968
not vehicle image shape: (64, 64, 3)
not vehicle data type: uint8
all(ax.imshow(im) for im,ax in zip(not_vehicle[:15], plt.subplots(3,5)[1].ravel()))
all(ax.imshow(im) for im,ax in zip(vehicle[:15], plt.subplots(3,5)[1].ravel()))
ax = plt.subplots(2,3)[1]
for i in range(3):
hist = ax[0][i].hist(vehicle[...,i].flatten(),bins=256)
hist = ax[1][i].hist(not_vehicle[...,i].flatten(),bins=256)
plt.tight_layout()
ax = plt.subplots(2,3)[1]
for i in range(3):
hist = ax[0][i].hist(vehicle[...,i].flatten(),bins=32)
hist = ax[1][i].hist(not_vehicle[...,i].flatten(),bins=32)
plt.tight_layout()
We extract 3 kinds of features:
- HLS color histogram
- Downsampled bins
- HOG
We combine both sample classes for preprocessing.
x_data = vehicle+not_vehicle
len(x_data)
17760
We produce a color histogram for each channel, we then flatten and concatenate to produce the feature map. HLS color space was selected due to anecdotic correlation with better accuracy during experimintation.
def color_transform(img):
return cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
def extract_hist(img):
hist = list(map(lambda i: np.histogram(img[...,i], bins=128, range=(0,256))[0], range(3)))
return np.concatenate((hist[0], hist[1],hist[2]))
We produce spatial color bins by downsampling and flattening out input images. This feature map is an indicator of correlation between individual pixel values. (32,32) pixel size was selected empirically.
def extract_spatial(img):
return cv2.resize(img, (32,32), interpolation=cv2.INTER_LINEAR).ravel()
We compute a Histogram of Oriented Gradients for classifier feature input. The HOG feature map provides a scale invariant gradient description of our classification targets. We've set the parameters as follows: orientations=12
, pixels_per_cell=(4,4)
, cells_per_block=(4,4)
. Parameters values were set after an extended period of empirical tunning to ensure above 99% classifier accuracy.
hog_desc = cv2.HOGDescriptor((64,64),(4,4),(4,4),(4,4),12)
def extract_hog(img):
hh = hog_desc.compute(img[...,0]).ravel()
sh = hog_desc.compute(img[...,1]).ravel()
vh = hog_desc.compute(img[...,2]).ravel()
return np.concatenate((hh,sh,vh))
def combine_features(cf,sf,hf):
return np.concatenate((cf,sf,hf))
def extract_features(img):
img = color_transform(img)
cf,sf,hf = extract_hist(img), extract_spatial(img), extract_hog(img)
return combine_features(cf,sf,hf)
combined_f = list(map(lambda im: extract_features(im), x_data))
len(combined_f)
17760
X = np.vstack(combined_f).astype(np.float64)
X.shape
(17760, 12672)
We've attached binary labels corresponding to our target classes: 0:non-vehicle
, 1:vehicle
.
y = np.concatenate((np.ones(len(vehicle),np.bool), np.zeros(len(not_vehicle),np.bool)))
y.shape
(17760,)
X,y = shuffle(X,y)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
The data is normalized per column based on training features. Thereafter, the same extracted parameters are used to normalize the validation and production input.
X_scaler = StandardScaler().fit(X_train) # fit scaler to traianing data
X_train = X_scaler.transform(X_train)
X_test = X_scaler.transform(X_test)
def normalize(samples):
return X_scaler.transform(samples)
We'll use a linear SVM - ???? whats that?
We've utilized randomized grid hyperparameter search to estimate optimal classifier kernel and penalty values. We've found the RBF kernel to offer greater accuracy, however its prediction time is too long for real time usage. Therefore, we've chosen the linear classifier with a penalty ('C' value) of 1.
svc = GridSearchCV(LinearSVC(), {'C':[1,3,5,7,10]}, n_jobs=4)
svc.fit(X_train, y_train)
print(' Done Training')
Done Training
svc.best_params_
{'C': 1}
print(' Test Accurcay =', round(svc.score(X_test, y_test), 4))
Test Accurcay = 0.9899
t=time.time(); n_predict = 1000; svc.predict(X_test[0:n_predict]); t2 = time.time()
print('',round(t2-t, 5), 'Seconds to predict', n_predict,'labels')
0.01119 Seconds to predict 1000 labels
We've implemented a uniform-dimension sliding window search. The top and bottom (sky and hood) are discarded as they're outside our ROI. A temporal prediction heatmap is implemented in the video pipeline section.
def slide_window(img, size, overlap):
sy, sx = np.mgrid[352:544:int(size*overlap), 0:1280:int(size*overlap)]
sy, sx = sy.ravel(), sx.ravel()
return list(map(lambda y,x: (img[y:y+size,x:x+size],(x,y)), sy,sx))
def get_rects(pred, win, size):
pos = np.argwhere(pred==True)
cord = np.array(win)[pos][...,1][...,0]
rects = [[int(c[0]),int(c[1]), int(c[0]+size), int(c[1]+size)] for c in cord]
return cv2.groupRectangles(rects*2, 1, 0.1)[0]
def draw_rects(img, rects):
for r in rects:
cv2.rectangle(img, (r[0],r[1]),(r[2],r[3]), (0, 0, 255), 3)
return img
img = cv2.imread('./test_images/test1.jpg', cv2.IMREAD_COLOR)[:,:,::-1]
plt.imshow(img)
Small windows (64x64 pixels) reliably provide multiple detections per vehicles. However, they also produce a lot of false positives and carry a significant performance penalty.
small_window_list = slide_window(img, 64, 1.0)
all(ax.imshow(win[0]) for ax,win in zip(plt.subplots(5,20)[1].ravel(),small_window_list))
f_list = list(map(lambda im: extract_features(im[0]), small_window_list))
pred = svc.predict(normalize(f_list))
pred
array([False, False, False, False, False, False, False, False, True,
False, False, False, False, False, False, False, False, False,
False, False, False, True, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, True, True, True, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False, False], dtype=bool)
imout = vis_detection(img, pred, small_window_list,64)
plt.imshow(imout)
Medium windows (128x128 pixels) were found to be unreliable, even when combined with a generous overlap factor.
medium_window_list = slide_window(img, 128, 1.0)
all(ax.imshow(im[0]) for ax,im in zip(plt.subplots(3,10)[1].ravel(),medium_window_list))
True
f_list = list(map(lambda im: extract_features(cv2.resize(im[0],(64,64))), medium_window_list))
pred = svc.predict(normalize(f_list))
pred
array([False, False, False, False, False, False, False, False, False,
True, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False], dtype=bool)
imout = vis_detection(img, pred, medium_window_list,128)
plt.imshow(imout)
medium_window_list = slide_window(img, 128, 0.5)
f_list = list(map(lambda im: extract_features(cv2.resize(im[0],(64,64))), medium_window_list))
pred = svc.predict(normalize(f_list))
imout = vis_detection(img, pred, medium_window_list,128)
plt.imshow(imout)
Large windows were practically unusable without further dataset augmentation.
large_window_list = slide_window(img, 256, 1.0)
all(ax.imshow(im[0]) for ax,im in zip(plt.subplots(1,5)[1].ravel(),large_window_list))
f_list = list(map(lambda im: extract_features(cv2.resize(im[0],(64,64))), large_window_list))
pred = svc.predict(normalize(f_list))
pred
array([False, False, False, False, False, False, False, False, False, False], dtype=bool)
The main distinction of the video pipeline is the addition of a temporal heatmap. The heatmap is implemented as an array of counters. Whereupon predictions are recognized as genuine when high counter values are reached.
def heatmap(pred, heat):
heat = np.where(pred==True,heat+1, heat-1)
heat = np.where(heat<1, 1, heat)
heat = np.where(heat>32, 32, heat)
pred = np.where(heat>4, True, False)
return pred, heat
WINDOW_SIZE = 96
def process(img):
windows = slide_window(img, WINDOW_SIZE, 0.3)
features = list(map(lambda im: extract_features(cv2.resize(im[0],(64,64))), windows))
predictions = svc.predict(normalize(features))
predictions, process.heat = heatmap(predictions, process.heat)
rects = get_rects(predictions, windows, WINDOW_SIZE)
img = draw_rects(img,rects)
return img
process.heat = np.ones(322, dtype=np.uint8)
video = VideoFileClip("project_video.mp4")#.subclip(27,30)
# video = VideoFileClip("test_video.mp4")
processed_video = video.fl_image(process)
processed_video.write_videofile("output_video.mp4", audio=False, progress_bar=False)
[MoviePy] >>>> Building video output_video.mp4
[MoviePy] Writing video output_video.mp4
[MoviePy] Done.
[MoviePy] >>>> Video ready: output_video.mp4
Link to Output Video Link to Source Video
-
Are a lof of false-positives. There's much improvment to be gained by data augmentation and classifier tweaking or changing it altogether.
-
The implementation is doesn't make use of possible data parallelism. There's much room for improvment in this regard. Currently, the model suffers from very low overall performance. Furthermore, Some features are re-computed multiple times.
-
No multi-scale, adaptive, randomized, etc sliding window techniques were used.