Lane Detection with Turn Prediction
My Experience, Errors, and Results
Description:
The following project describes a pipeline for lane detection for vehicles. It highlights the lanes in a video while predicting the direction in which the car is turning, using OpenCV in Python.
As you might have learned by a simple search on the internet that there are many different approaches available on the internet for lane detection, which solves the given problem. The different thing I will be sharing with you in this post is how I got started with this project and what methods I tried before selecting a final pipeline for the project.
Why finding lanes is important?
Detecting lanes lines is one of the up-most importance for cars which use the lane departure warning system and corrects the position of the vehicle with respect to the detected lane. In recent times almost, all vehicles come equipped with such a system and hence need a robust lane detection algorithm whatever the lighting conditions on the road might be.
How to find lane lines with background noise?
The image which is captured by the camera on the vehicle although gives you a pretty decent picture of the lane lines, but the image captured by it contains a lot of unnecessary information which can be a hurdle in our detection algorithm, be sure to remove this before starting your processing.
Define the Region of interest
A region of interest (ROI) is a portion of an image that you want to filter or perform some other operation on.
Lanes form a small portion of the image taken by the camera and you want to make sure that it is that particular region you are processing on and not the whole image. To do this you define your ROI, after getting a good enough estimate from the video feed of your camera. Once you have your ROI defined in terms of pixels, you can crop the image to get the region from the image where relevant information is contained.
Note: The above image also has a distortion correction applied to it, it can be easily done by using a function called “undistort”, you can read about distortion correction for images here
Now there are many ways you can crop your frame to get ROI. One of the popular ways is to crop out the sky view and only keep the road view, this way you get a rectangular frame but the problem in this type of frame cropping can be that the lanes lines of the neighboring roads might interfere with your current lane data. Another approach is to crop out a triangular region from the frame which omits the neighboring lanes before processing.
Usually, the final region of interest is still dependent on the pipeline you want to use, and I would highly recommend you to, experiment as much as you can before selecting an option to see which one gives you better results.
Decide if you want to work while looking head-on on the lane or from a bird’s eye view
While many approaches will tell you to directly convert your image to a bird’s eye view to see the parallel road lanes as parallel in your image, I first tried solving the problem without doing so. I had read about Hough lines and decided to directly implement the same on the image frame after pre-processing it through Gray image threshold and Canny edge detector), the result when “bitwise added” to the cropped frame could detect the lane lines but were not a good fit as they were not fitting the curves.
I tried playing around with parameters before finally deciding to go with the bird’s eye view.
A bird’s-eye view is an elevated view of an object from above, with a perspective as though the observer were a bird, often used in the making of blueprints, floor plans, and maps. It can be an aerial photograph, but also a drawing.
Why bird’s eye?
If you think in 2-dimension, a road is just a rectangle with yellow and white lines printed over it. When you see it from a car, the parallel roads seem to converge at a point and hence it is difficult to fit a model of lines on it but when you look from the top, a rectangle appears as a rectangle and the parallel lines remain parallel. This view is also called a top view in orthographic projection.
The bird’s eye view can be generated using 4 points on the image frame on the lane lines as source points and finding (deciding, depends on the user) another 4 points, which represent 2 parallel lines as destination points and then finding the transformation matrix between these sets of points.
The transformation matrix can be generated using an OpenCV function “get perspective transform”, this function returns a matrix which can be used in another function “wrap perspective” to transform the frame image.
Color Space, which one should I use?
Once you have the bird’s eye view of the lanes, you can continue to process the image by removing everything from the frame and keeping just the lane lines for easier detection. This can be achieved in multiple ways too, one of the ways that I will be talking about is using color space conversion, which takes the initial frame image from RGB format and converts it to another color space, for me it was to HSL.
After which a mask for yellow and white detection is created using “bitwise and” which is then “bitwise anded” with the image frame to get only the white and the yellow lane lines in the image. This image is then converted to a single channel image by converting it from HSL to Grayscale image.
the HSL converted image is shown in the image above
The resultant output of the above-mentioned process is an image with marked in white, lane lines. After this conversion, the detection of lane lines is merely finding where the value of the pixel in the image is not zero.
the image on the left is scaled down for the blog
Is lane detection complete?
Well, no, because we still have to decide which pixel corresponds to the left lane and which one corresponds to the right one to be able to correctly predict the lanes.
To differentiate between the left and the right lane pixel value, we use histograms.
An image histogram is a type of histogram that acts as a graphical representation of the tonal distribution in a digital image.
The histograms tell us how the data is distributed within an image and by using the maximum 2 peak values, one from the right and one from the left, we can find the midpoint from where we can divide the frame into 2 smaller regions, one which contains the left lane and the other that contains the right lane.
Note: we are not cropping the frame, but just defining the regions within a frame
Once we have the regions defined, we use the peak value information from each region and divide the frame height into further smaller sections called windows. These windows can be visualized as smaller frames with just a single lane line in them. Once we have smaller windows, we find the indexes of those pixels which are non-zero, because these define our lanes, to do this we use a function called “nonzero”.
The same process is continued in a loop for all the windows in a frame and the information of the pixel indexes are stored in lists for each window, the location of the window shifts to its left or right depending on the mean of the pixel value indexes found in the frame, this is done to keep the lane lines in the window as in case of a steep turn, lanes can go out of the window area.
Fitting a curve over the found pixel values
Once we have all the pixel location for both lanes in 2 separate lists, we can go on to find a polynomial to fit these pixel locations on a frame. For doing this, the user has a choice to select the degree of polynomial to fit the pixel locations. Selecting 1 is not a logical choice here because then curves won’t fit, and console also prints a warning saying that fitting is not proper.
Choosing the degree of freedom for the polynomial as 2 seems the most logical choice and the polynomial can be found using the function “Poly fit”, this function returns 3 values, which are the coefficients of the general second order equation. These coefficients will be used to predict the turn.
Now to fill the region formed by the detected lane lines in the video, we use a function called “fill poly”, this function fills in the polynomial formed by the pixel coordinates given to it. We pass the detected pixel coordinates in the correct order to this function which results in filling up the area between the lanes with the chosen color.
How to plot this area on the actual frame?
Once you have the area shaded in the bird’s eye view, you can transform it back to the head on view from the bird’s eye view using the inverse of the transformation found earlier. This will give you the shaded region in your camera view and then uses “add weighted” function to add this to the original frame image so that the detected lane region is translucent on the original frame image.
Turn Prediction
The turn or the curvature of the road can be found using the results from the poly fit function. 2nd-degree polynomial coefficients are put together in the formula given here to find the turn. The results from this can be used to define a threshold for straight, left and right turn.
Once we get the information for the turn, this can be printed on the frame using the “add text” function.
You can find the code for this project with the report here
I found this video on YouTube by “Ross Kippenbrock — Finding Lane Lines for Self-Driving Cars” extremely helpful and I urge you to watch it only after you have tried few techniques for lane detection yourself.
I would love to know your feedback on this project and its implementation, leave a comment about what you think and if you face any difficulty feel free to reach out.
I will be back with more interesting content until then keep reading and supporting. Cheers!