Self Driving Car (Simulation)

Type

Tech CornerFeature

Title

Published

February 14, 2024

Status

Live

OVERVIEW

Date: 14th Feb 2024

In this project, my aim is to implement the Convolutional Neural Network (CNN) architecture proposed by NVIDIA for self-driving vehicles, as detailed in the following paper: https://arxiv.org/abs/1604.07316. The core idea is to input images of the road captured by cameras into the network, allowing it to predict the necessary steering angle to maintain the vehicle's trajectory. This implementation specifically targets steering angle prediction, assuming a constant speed for the vehicle. Furthermore, it does not take into account other categorical data such as traffic lights, signs, or collision conditions involving objects like vehicles or pedestrians. Please utilize the GitHub link provided above to delve deeper into the implementation and feel free to give it a try yourself.

The above image shows us the udacity simulator in action under training mode

BASICS

As mentioned previously, the project utilizes the free Udacity simulator for training and experimenting with the machine learning model. The simulator comprises two modes :

Training

During the training mode, we capture images of the road from three cameras (left, center, and right) along with their corresponding steering angles as we navigate through the map. The record button, as depicted in the above figure of the simulator, is used to initiate this recording process.

Autonomous

During the autonomous mode, the simulator accepts input values on the local host port 4567. Consequently, we execute our trained model, feeding it with input images in real-time. The model then predicts the steering angle, which is subsequently relayed back to the simulator for execution.

WORKING

What kind of data do we have ?

IMG folder

Within this folder, you'll find all the images recorded while navigating the car on the map during training mode. Each image is named based on its respective camera position.

Tabular data

Following that, there is a CSV file containing the image locations in the first three columns, corresponding to the three cameras. Additionally, the file contains numerical data such as throttle, speed, and steering angle in subsequent columns.

Proposed pipeline

In the image below, we can observe the pipeline that will be utilized to predict the steering angle.
The preprocessed image from the camera serves as input to the deep neural network, which subsequently predicts the steering angle of the vehicle.

Simple exploratory data analysis

Now that we've identified our prediction target, namely the steering angle, we can conduct an exploratory data analysis to gain insights into the distribution of the data points we'll be predicting.
Since our implementation does not utilize any additional numerical or categorical data, our analysis will focus solely on the steering angle.
In the histogram below, we observe that the data points with a steering angle of zero are more numerous compared to other steering angles. This imbalance in the steering angle data is evident.
This imbalance is attributed to the likelihood of driving the car primarily in a straight line during data recording. Consequently, the steering angle remains close to zero for a significant portion of the time.

Histogram plot of steering angle values before removing excess data

To address this imbalance, we attempt to balance the data by limiting the number of data points in each bin of the histogram. Specifically, we cap the data points in each bin at 400 samples. The resulting distribution of the histogram is displayed below. Notably, we observe that the distribution now resembles a more normal or Gaussian distribution.

Histogram plot of steering angle values after removing excess data

Image augmentation & preprocessing

We possess images captured from three cameras: left, center, and right. It's crucial to rectify the images from the left and right cameras concerning the center camera. This involves determining the adjustment needed for the left steering angle to align with the center, and similarly for the right camera. For the left camera, we add 0.15 to the steering angle, while for the right camera image, we subtract 0.15 from the steering angle.
Now that we have the images and their corresponding steering angle we have our data ready to be split for training and testing.
We apply various data augmentation techniques like rotation, flipping, pan to enrich the training data.
The images must undergo preprocessing before training. Initially, we crop the images to remove unnecessary backgrounds such as trees. Subsequently, we apply Gaussian blur. Finally, we convert the image to YUV format, as required by the CNN network proposed by NVIDIA. An example of an image in YUV format is depicted below.

Original RAW image vs preprocessed image

Implementing the CNN architecture & training

Now that the data is prepared, we proceed to construct the CNN architecture as depicted in the image below. This architecture, proposed by NVIDIA, forms the backbone of our model. However, it's important to note that in the code, we do not utilize all layers due to time and space constraints while utilizing the free GPU on Google Colab.

architecture proposed by NVIDIA [credits : link]

Below is an illustration showcasing the layers involved in the training of our model.

The model is trained in batches of 100 with an epoch of 10 and 300 steps per epoch. The resulting loss curve is shown below.
The trained model is serialized and downloaded to the local computer.

Loss curve obtained after training the model

Driver snippet to put the trained model to action

Now that we possess the serialized model in .h5 format, we can construct the pipeline. Here, we input the preprocessed image to the CNN network, which in turn predicts the corresponding steering angle. (refer the proposed pipeline section above)
The preprocessing of the images during prediction should mirror the methods used during training the images.
We will use a Flask / socketio server to communicate with the simulator which listens on the port 4567 local host.
We run the driver script and then open the simulator in the “Autonomous” mode to accept steering values.
Remember we are only predicting the steering angle. The speed of the car remains constant.
The following code snippet illustrates the driver script and its utilization of the model for prediction, as well as how it communicates the results to the simulator.

import socketio
import eventlet
from flask import Flask

import numpy as np
from keras.models import load_model

import base64
from io import BytesIO

from PIL import Image
import cv2


"""Here we will use a combination of Flask & SocketIO to communicate with the simulator"""

# define the socketio server, this initializes the socketio server
sio = socketio.Server()

# Flask application initialization, 
# Flask is a web framework used for building web applictaions in Python
app = Flask(__name__) 


# This variable is used to control the speed of the vechicle
speed_limit = 10

# ***************************************************************************************

# We will define the preprocessing function for the image data 
# Note that the preprocessing has to be the same as that which was used in training
def img_preprocess(img):
    img = img[60:135,:,:]
    img = cv2.cvtColor(img, cv2.COLOR_RGB2YUV)
    img = cv2.GaussianBlur(img,(3,3),0)
    img = cv2.resize(img,(200,66))
    img = img/255
    return img

# ***************************************************************************************

# function that emits the steering and throttle to the simulator
# It emits a "steer" event with the steering and throttle values
# We did not predict throttle but it is needed for it to work
def send_control(steering_angle,throttle):
    sio.emit('steer',data={
            'steering_angle': steering_angle.__str__(),
            'throttle': throttle.__str__()
    })

# ***************************************************************************************

# This is a SocketIO event handler that listens for a connect event
# When the client connects to the server the connect function is executed
@sio.on('connect')
def connect(sid,environ):
    print('Connected')
    send_control(1,0)

# ***************************************************************************************

@sio.on('telemetry')
def telemetry(sid,data):

    speed = float(data['speed'])

    # Read the bytes of image data received using BytesIO
    # Decode the base64 format of the image 
    # Use the Image module of PIL to read the image matrix
    image = Image.open(BytesIO(base64.b64decode(data['image'])))
    # convert the image tp numpy array from PIL format 
    image = np.asarray(image)
    # Apply the image preprocessing on the image
    image = img_preprocess(image)
    # Here we add an extra dimension to the image 
    # This essentially creates a batch of one image (represents batch size)
    # example if image has shape (h,w,chan) then np.array([img]) => (1,h,w,chan)
    # Here 1 is the batch size
    image = np.array([image])

    steering_angle = float(model.predict(image))
    throttle = 1.0 - speed/speed_limit

    print('{}{}{}'.format(steering_angle, throttle, speed))
    # send the control with the steering_angle and the throttle
    send_control(steering_angle, throttle)



# ***************************************************************************************



if __name__ == "__main__":
    # Load the model
    model = load_model("model/model.h5")
    # call socketio, this combines both socketio server and flask app to work together
    app = socketio.Middleware(sio,app)
    # This launches the Flask server using the eventlet web server
    # It listens on port 4567 for incoming connections from the simulator
    eventlet.wsgi.server(eventlet.listen(('',4567)),app)


# ***************************************************************************************

SUMMARY

This article is well utilized with the code implementation on Git
We only use the images and steering angle in our project
Indeed, alternative simulators such as CARLA offer dynamic environments featuring pedestrians, traffic signals, and various traffic signs.
There are several avenues for enhancing this model, including :

Hyper-parameter tuning (learning rate, # epochs, batch size..)
Data augmentation and other preprocessing techniques
Trying a different CNN architecture
Refining the layers in the existing model involves strategic modifications, such as integrating BatchNormalization layers and substituting pooling operations with convolutional layers utilizing strides. These adjustments aim to enhance model performance and efficiency.

Incorporating a dynamic environment would necessitate adopting mixed modeling approaches to accommodate additional numerical and categorical data. This is an aspect I intend to explore in the next iteration of this project.

Complete CNN architecture [credits : Cloning Safe Driving Behavior for Self-Driving Cars using CNN link]

See the portfolio

Name	Excerpt
1	Twitter \| Instagram \| LinkedIN
2	© 2025 visionmatrix
3

Self Driving Car (Simulation)

OVERVIEW

BASICS

WORKING

What kind of data do we have ?

Proposed pipeline

Simple exploratory data analysis

Image augmentation & preprocessing

Implementing the CNN architecture & training

Driver snippet to put the trained model to action

SUMMARY

Content