Finding Lane Lines on the Road (using hand-engineered CV methods)
The goals / steps of this project are the following:
- Make a pipeline that finds lane lines on the road
- Reflect on your work in a written report
I created a pipeline that consists of 3 main steps: creating a masking filter, selecting the relevant edges that most likely make up the lane separators (segmentation), and calculating a single combined line for the two closest separators. In this implementation I only focused on the two separators that make up the ego lane.
The pipeline runs per-frame, there is no temporal tracking or any information carry-over between frames (I took this as a challenge to create reasonably stable output on the video without using tracking). This pipeline provides acceptable results on the sample videos, but it is hand-tuned to these videos and would certainly fail in more complex scenarios. It already fails on some parts of the challenge video.
The following steps are executed for each frame:
- Masking
- Convert the image to HSV color space
- Select yellow-ish pixels from hue (H).
This is done to keep yellow separators, but it also keeps all other yellow objects, and some green vegetation. - Select bright areas from saturation (S) and value (V),
regardless of their hue.
This selects white separators, plus other bright features of the image. White cars are the most problematic of these. - Combine (OR) the two masks together.
- Select a manually defined region of interest (ROI), which is roughly a triangle
from the bottom of the image to the center.
Cut the previous mask with this (AND them together). This is the most important filtering step right now, as it removes most of the artifacts from other operations. However this is also the reason why most of the lanes end prematurely.
- Segmentation of lane separators
- Convert to grayscale
- Blur.
A Bilateral filter is used in the final submission, as it blurs while preserving more of the edges. On the other hand it is slower than a regular Gaussian blur. - Canny edge detection
- Hough transform
- Calculation of a single line for the two closest separators
- Calculate line parameters for all line segments.
Much like in a regular Hough transform, I first calculate the parameters(m, b)
for all lines. This is necessary beacuse OpenCV does not return the line parameters when using theHoughP
function, which was used in the previous step. Additionally a third parameter is added, describing whether the line is in the left or right half of the image. - Cluster line segments using DBSCAN.
Cluster line segments to group the ones that make up the whole lane separator while ignoring all other artifacts. The left/right parameter was added in the previous step because in some cases DBSCAN added erroneous short line sections to otherwise correct groups, which were usually on the other side of the image (hence their m and b were very similar). This third parameter helped the algorithm to consider these as outliers. - Calculate combined line parameters for all clusters.
Parameters(m, b)
are calculated for each cluster of line segments. - Filter clusters based on their inclination.
Remove those groups where-0.3 < m < 0.3
, essentially which are too close to horizontal. - Select the closest ones for each side from the remaining.
From the remaining line groups, select one with positive and one with negative inclination that is the closest to the center of the image. These are considered to be the most likely lane separators. - Draw.
The lines are drawn from the bottom of the image up to the last actual point that was found in the edge detection. This is done to avoid drawing too long lanes, as the two separators could be crossing each other.
- Calculate line parameters for all line segments.
The following image shows the major steps of the pipeline (but not all steps are visualized):
I used Jupyter interact to create an environment where the effect of each parameter can be explored fairly easily. This is still not very efficient, as only a limited number of images can be tested this way, so checking the final videos still proved to be the most efficient method of evaluation.
For the third step (Calculation of a single line for the two closest separators), I could have simply taken the filtered image and applied Hough transform on that, but I wanted more control and wanted to explore the options for grouping together line segments. DBSCAN might be a more complicated algorithm, but it worked well in this case.
There are several major shortcomings to this implementation.
- Sensitive to white cars and other artifacts
One of the most important issue is that it is sensitive to artifacts, especially white cars and other objects close to the road with a strong gradient. - Aggressive and fixed ROI.
The previous issue is fixed by using setting the upper boundary of the ROI a bit lower than what would be ideal. This fixes the issue so the resulting videos look good, but of course this is more like cheating - it would break down if the road has strong inclination, if the pitch of the vehicle changes significantly (e.g. on a bump), when the vehicle gets closer than usual to the leading vehicle, etc. - Most of the parameters are tuned to accept lanes on straight roads or roads with
large curvature radius.
This would break down when trying to take an exit or going into a road with smaller curvature radius. - Temporally unstable, no tracking.
All detections are performed per-frame, so the results can be totally different on successive frames. - Parameters hand-selected for these videos, no validation on larger set
- Use the
rho
,theta
parametrization for lines everywhere instead ofm
,b
.
I should have used it in the first place.. Now the code is littered with edge cases handling m == 0.0 - Improve criteria when selecting the final line segment group.
The criteria currently used when selecting be best ones from the possible line groups is very simplistic. It could be easily extended with the total length covered by the line segments, whic would favor longer line segments over shorter artifactual groups that could happen to be closer to the center of the image. - Approximate vanishing point, either from current or the previous frame.
The top of the ROI is an important parameter currently which is hard-coded. Ideally it should be the top point where a lane separator can still be seen, which is basically the vanishing point of the lanes. This could be estimated separately on each frame, using the already found 'bottom part' of the lanes, or it could be carried over from the previous frame. Doing the former would require a bigger change, as it would basically turn the algorithm into a two pass pipeline. - Use HSV for edge detection, then combine it with the grayscale results, instead of masking. Converting to HSV worked well, but using it only for masking has limited value. Maybe we could try to detect edges on the binary image that is currently used as the HSV mask, and combine this with the edges calculated from the grayscale image as a kind of voting.
- Add temporal tracking.
Obviously some form of tracking across successive frames would be essential for any real application. - Get a labelled data set and automatically tune the parameters on that. These parameters were hand-tuned, so they are guaranteed to be non-optimal even on this small test set. Ideally I should pick a data set which contains annotations for the lanes, then automatically optimize the parameters based on that. This would requre selecting an appropriate 'goodness function' to be used by the optimizer.
- Performance optimization.
The current implementation does not run real-time (in my laptop), which would be a basic criteria for a real world system. There are numerous optimizations that could be made, from small changes like reducing the number of temporary images, to large ones like optimizing for GPUs, etc.
Thanks for checking this out :)