Self Driving Car (Simulation)
This article is best utilized in conjunction with the associated Git project.
OVERVIEW
Date: 14th Feb 2024
In this project, my aim is to implement the Convolutional Neural Network (CNN) architecture proposed by NVIDIA for self-driving vehicles, as detailed in the following paper: https://arxiv.org/abs/1604.07316. The core idea is to input images of the road captured by cameras into the network, allowing it to predict the necessary steering angle to maintain the vehicle's trajectory. This implementation specifically targets steering angle prediction, assuming a constant speed for the vehicle. Furthermore, it does not take into account other categorical data such as traffic lights, signs, or collision conditions involving objects like vehicles or pedestrians. Please utilize the GitHub link provided above to delve deeper into the implementation and feel free to give it a try yourself.
BASICS
As mentioned previously, the project utilizes the free Udacity simulator for training and experimenting with the machine learning model. The simulator comprises two modes :
- Training
- During the training mode, we capture images of the road from three cameras (left, center, and right) along with their corresponding steering angles as we navigate through the map. The record button, as depicted in the above figure of the simulator, is used to initiate this recording process.
- Autonomous
- During the autonomous mode, the simulator accepts input values on the local host port 4567. Consequently, we execute our trained model, feeding it with input images in real-time. The model then predicts the steering angle, which is subsequently relayed back to the simulator for execution.
WORKING
What kind of data do we have ?
- IMG folder
- Within this folder, you'll find all the images recorded while navigating the car on the map during training mode. Each image is named based on its respective camera position.
- Tabular data
- Following that, there is a CSV file containing the image locations in the first three columns, corresponding to the three cameras. Additionally, the file contains numerical data such as throttle, speed, and steering angle in subsequent columns.
Proposed pipeline
- In the image below, we can observe the pipeline that will be utilized to predict the steering angle.
- The preprocessed image from the camera serves as input to the deep neural network, which subsequently predicts the steering angle of the vehicle.
Simple exploratory data analysis
- Now that we've identified our prediction target, namely the steering angle, we can conduct an exploratory data analysis to gain insights into the distribution of the data points we'll be predicting.
- Since our implementation does not utilize any additional numerical or categorical data, our analysis will focus solely on the steering angle.
- In the histogram below, we observe that the data points with a steering angle of zero are more numerous compared to other steering angles. This imbalance in the steering angle data is evident.
- This imbalance is attributed to the likelihood of driving the car primarily in a straight line during data recording. Consequently, the steering angle remains close to zero for a significant portion of the time.
- To address this imbalance, we attempt to balance the data by limiting the number of data points in each bin of the histogram. Specifically, we cap the data points in each bin at 400 samples. The resulting distribution of the histogram is displayed below. Notably, we observe that the distribution now resembles a more normal or Gaussian distribution.
Image augmentation & preprocessing
- We possess images captured from three cameras: left, center, and right. It's crucial to rectify the images from the left and right cameras concerning the center camera. This involves determining the adjustment needed for the left steering angle to align with the center, and similarly for the right camera. For the left camera, we add 0.15 to the steering angle, while for the right camera image, we subtract 0.15 from the steering angle.
- Now that we have the images and their corresponding steering angle we have our data ready to be split for training and testing.
- We apply various data augmentation techniques like rotation, flipping, pan to enrich the training data.
- The images must undergo preprocessing before training. Initially, we crop the images to remove unnecessary backgrounds such as trees. Subsequently, we apply Gaussian blur. Finally, we convert the image to YUV format, as required by the CNN network proposed by NVIDIA. An example of an image in YUV format is depicted below.
Implementing the CNN architecture & training
- Now that the data is prepared, we proceed to construct the CNN architecture as depicted in the image below. This architecture, proposed by NVIDIA, forms the backbone of our model. However, it's important to note that in the code, we do not utilize all layers due to time and space constraints while utilizing the free GPU on Google Colab.
- Below is an illustration showcasing the layers involved in the training of our model.
- The model is trained in batches of 100 with an epoch of 10 and 300 steps per epoch. The resulting loss curve is shown below.
- The trained model is serialized and downloaded to the local computer.
Driver snippet to put the trained model to action
- Now that we possess the serialized model in .h5 format, we can construct the pipeline. Here, we input the preprocessed image to the CNN network, which in turn predicts the corresponding steering angle. (refer the proposed pipeline section above)
- The preprocessing of the images during prediction should mirror the methods used during training the images.
- We will use a Flask / socketio server to communicate with the simulator which listens on the port 4567 local host.
- We run the driver script and then open the simulator in the “Autonomous” mode to accept steering values.
- Remember we are only predicting the steering angle. The speed of the car remains constant.
- The following code snippet illustrates the driver script and its utilization of the model for prediction, as well as how it communicates the results to the simulator.
SUMMARY
- This article is well utilized with the code implementation on Git
- We only use the images and steering angle in our project
- Indeed, alternative simulators such as CARLA offer dynamic environments featuring pedestrians, traffic signals, and various traffic signs.
- There are several avenues for enhancing this model, including :
- Hyper-parameter tuning (learning rate, # epochs, batch size..)
- Data augmentation and other preprocessing techniques
- Trying a different CNN architecture
- Refining the layers in the existing model involves strategic modifications, such as integrating BatchNormalization layers and substituting pooling operations with convolutional layers utilizing strides. These adjustments aim to enhance model performance and efficiency.
- Incorporating a dynamic environment would necessitate adopting mixed modeling approaches to accommodate additional numerical and categorical data. This is an aspect I intend to explore in the next iteration of this project.