Paper—Machine Learning to Classify Driving Events Using Mobile Phone Sensors Data Machine Learning to Classify Driving Events Using Mobile Phone Sensors Data

With the introduction of autonomous and self-driving cars, innovative research is needed to ensure safety and reliability on the road. This work introduces a solution to understand vehicle behaviour based on sensors data. The behaviour is classified according to driving events. Understanding driving events can play a significant role in road safety and estimating the expense and risks of driving a vehicle. Rather than relying on the distance and time driven, driving events can provide a more accurate measure of vehicle driving consumption. This measure will become valuable as more ride-sharing applications are introduced to roads around the world. Estimating driving events can also help better design the road infrastructure to reduce congestion, energy consumption and pollution. By sharing data from official vehicles and volunteers, crowd sensing can be used to better understand congestion and road safety. This work studies driving events and proposes using machine learning to classify these events into different categories. The acquired data is collected using embedded mobile device motion sensors to train machine learning algorithms to classify the events. Keywords—Mobile development, driving events; machine learning, classification


Introduction
The rapid growth in population is leading to the unprecedented expansion of cities around the world. Such growth necessitates smarter traffic management systems to reduce congestion and its negative impact [1] [2]. One key element for smart cities' management systems is intelligent transportation systems (ITS) [3]. Intelligent transportation systems provide solutions to save lives, energy, and environment through optimized transportation paths and road conditions monitoring. Modern vehicles are equipped with active safety features, which interact with the road conditions trying to avoid accidents or at least minimize the severity of accidents caused by road conditions.
The ability to classify driving events is critical to a smart city infrastructure. It enables ride sharing and vehicle rental to have a better estimate the cost of a trip. Rather than relying on time and distance alone, driving events will factor into the calculation of a trip expense. Driving events enable authorities to better monitor their fleets of vehicles and trucks to determine when drivers need a break or should no longer be driving. Another application of driving events classification is to enable rental car companies to get a better estimate of vehicle consumption. Insurance companies can also benefit from the classification by including driving events to the estimate of vehicle insurance.
To better illustrate the need for driving events, consider the following scenario. Assume two vehicles are intended to travel the same distance from one location to another. The first vehicle maintains a travel path at a constant speed, while the second vehicle drives a path where it must speed up then slow down repeatedly. Based on the tools available today, both vehicles will show the same distance travelled and therefore would have the same expense for the trip. Introducing driving events as a metric allows for a better estimate of the wear and tear that a second vehicle is subjected to and thus provides a more accurate estimate of the trip expense.
Driving events can also be used to investigate accidents and collisions. By understanding vehicle behaviour prior to an accident, driving events act as a "Black Box" of information. Another application of logging driving event is to evaluate and improve vehicle driver safety and performance. The information from the vehicle's travel can be analysed to determine dangerous behaviour of the operator or an inefficient driving behaviour such as speeding and breaking.
The evolution of computational capabilities and sensors integration in communication devices (mobile phones) are enabling the development of smart applications and solutions in the areas of health, agriculture, transportation, and safety. In this work, mobile phones are used to collect sensors data and Machine Learning is utilized to classify driving events using collected data. Several classifications algorithms are applied and studied for accuracy. Machine learning is used to determine the minimum features required to successfully classify the events. This research builds on previously published work that relates different sensors' data to driving events [4]. Section 2 reviews literature and presents related work. Section 3 presents the proposed solution. Section 4 introduces Machine Learning. Section 5 presents the results. Finally, Section 6 concludes the paper and discusses future work.

Literature Review
Various aspects of driving events are researched and studied. In [5], vehicle sensor data is used to study longitudinal jerk to identify aggressive driving. The work finds the distribution of the jerk associated with the brake pedal operation has a wider range iJIM -Vol. 15, No. 02, 2021 on both negative and positive side compared to the jerk associated with the gas pedal operation. The jerk is found to positively correlate to the speed the driver was pressing the gas pedal. The work concludes that aggressive drivers were associated with significantly higher values of Jerk-based metrics. Large negative jerk seems to have better performance in identifying aggressive drivers.
Driver distraction and its role in fatal crashes is studied in [6]. The study found that inner cognitive distraction accounted for the greatest proportion of driver's distractions. Young drivers have a high probability of being distracted by in-vehicle technology-related devices/objects. Among six subcategories of distractions, older drivers are more likely to be affected by inner cognitive interference.
The work in [7] proposes using electroencephalogram (EEG) spectrum to monitor drivers' alertness. Using the relationship between changes in driving performance and EEG spectrum, it might be feasible to estimate driving errors based on multichannel EEG power spectrum estimation and principal component analysis algorithm. EEG data is also used in [8] to classify driving styles. EEG features extracted from power spectral density and classification results of the driving data were used to train a Support Vector Machine (SVM) model. To evaluate the performance, a leave-onesubject-out cross validation was utilized. Different driving styles were related with different driving strategies and mental states. The results suggest the feasibility of driving style recognition from EEG patterns.
To assist with lane change, an algorithm for the Advanced Driver Assistant System (ADAS) to help classify the driver's intention was proposed in [9]. Measurements from conventional on-board sensors are augmented using an artificial neural network (ANN) model. The information is fed to a Support Vector Machine (SVM) to detect the driver's intention with high accuracy.
Driving styles was studied in [10] using an onboard measurement and communication unit. Data was acquired through the diagnostics port along with accelerometer and GPS data. Experiments were conducted on drivers to show that it is feasible to differentiate driving styles in terms of safety, economy, and comfort. A score was assigned based on the eight measurable indicators such as bumping, cornering, and speeding. A driver detection system was proposed in [11]. The system utilizes the gyroscope and magnetometer data and the interplay between electromagnetic field emissions and engine startup vibrations to detect the driver position and events. The system is evaluated experimentally with four participants and three different vehicles using varying vehicle-riding scenarios. The results show that the system can identify the driver with 89.1% average accuracy for different scenarios. A driver monitoring system using driver mobile phone was proposed in [12]. The system is based on Multi-Task MobileNets (MT-MobileNets). It consists of the Mo-bileNets' base and multi-task classifier. The classifier recognizes facial behaviors related to the driver status, such as distraction, fatigue, and drowsiness.

Proposed Solution and Experiments
This work proposes utilizing sensors data to classify driving events. The growing processing capabilities of mobile devices make it possible to utilize a mobile phone equipped with sensory information to collect and process data [13] [14].
Drive tests are conducted to collect sensor data during travel. A mobile device with motion sensors is used to log time, sensors data, and GPS location. The drive is repeated several times while trying to maintain the same travel path. The path allows the vehicle to accelerate and decelerate and contain ramp up and ramp down. The travel path is illustrated in Figure 1.
The collected data includes Acceleration, Gyro Rotation, Yaw, Roll, Pitch, Rotation rate, Quaternion, Gravity, Magnetic field, and Orientation. Data is collected in x, y, and z direction. The software logs over 43 measurements and is sampled at a rate equal to 30 samples per second.
Using GPS locations and knowledge about the drive performed, it is possible to classify the driving event. A total of six events are identified for this experiment. They are: High Speed, Low speed, Stopping, take a Ramp UP to highway, Exit highway Ramp Down, and U-turn. Figure 2 shows the data from Gyro Rotation and Motion Pitch sensors and the corresponding event.

Fig. 2. A sample of sensors data and corresponding driving event
A new column is added to data with the classification of the event. A numeric value between 1 and 6 is used to represent the events according to Table 1 below:

Machine Learning
Machine learning refers to the ability to train machines (computers) to make decisions or predict future data. Training is required to learn how to map input to output. It enables a system to update and modify its structure. Machine learning is utilized to discover patterns in data. The algorithms used can broadly be classified as supervised learning and unsupervised learning.
Supervised learning utilizes training data. Data teaches the algorithm what conclusion it should have. The training data is pre-generated and should be large enough so that the system is able to produce correct results. They should cover all expected in-puts and outputs. Unsupervised learning does not require training data and is used to discover patterns in data. Thus, providing insight into the data is unknown in advance.
Different classifiers exist in machine learning. These classifiers differ in their performance i.e., speed, memory usage, and interpretability. Different classifiers of supervised learning are studied and their ability to classify driving events are compared. Below, we briefly describe the classifiers used, namely: • Decision Trees • Discriminant Analysis • Naïve Bayes • Fit k-nearest neighbour • Ensembles • SVM Support Vector Machines (SVM) are supervised learning models with associated learning algorithms that classify data. The algorithm builds a model that assigns known data into categories with the goal of using training data to predict target values. In SVM, input data is mapped nonlinearly into a higher dimension feature space. A separation hyperplane is generated in feature space that solves the classification [15].
A classification decision tree is based on features contained in the data set [2]. Decision Trees make decision by generating a tree-like graph. A binary tree splits branching node based on the values of a column of data. The tree is composed of decision, chance, and end nodes. Classification trees produce a categorical output while regression trees produce number outputs.
Naïve Bayes uses Bayes' theorem and assumes features at independent. Given features vector , Naïve Bayes calculates the probability that these features belong to each classification i.e., ( | 1 , 2 , … ). Using Bayes' theorem and the fact that ( ) is known By chain rule, the joint probability ( | ) is equivalent to its joint probability ( , 1 , 2 … ) i.e.
Ensembles combines several models into one predictive model to improve machine learning. Ensembles decreases variance (bagging), bias (boosting) and improve predictions (stacking). Ensemble methods are either sequential or parallel. Sequential methods exploit the dependence between base learners. Performance is improved by assigning higher weight to mislabelled examples. Parallel methods exploit the independence between base learners. Most ensemble methods use a single base learning algorithm to produce base learners.
Discriminate Analysis is based upon the concept of searching for linear combination of predictors that best separates two classes. To assess the effectiveness of discrimination, Mahalanobi distance between groups is calculated. A distance greater than 3 means that two averages differ by more than 3 standard deviations. Thus, the probability of misclassification is small.
The K-Nearest Neighbour is a nonparametric and instance-based learning algorithm. The algorithm makes no explicit assumptions about the functional form. The algorithm does not explicitly learn a model. Instead, it chooses to memorize the training instances which are subsequently used as knowledge for prediction. The classification is performed by a plurality vote between the K most similar observations. Similarity is defined as a distance metric between data points. The algorithm runs through the data set and computes distances to each of the training observations. It then estimates conditional probability for each class i.e., fraction of training points with class label. Finally, the input is assigned to the class with largest probability.

Results
The five round trips conducted results in very large sensors data. The data collected contains over 25000 rows and 30 columns. To reduce storage and processing, irrelevant data is removed from the data set. Only motion and gyro sensors data are kept. A table that contains the sensors data and event classification label is created to train the machine learning classifiers.

All sensors classification
In order to train the classifier, 70% of the data is selected randomly as the training data set. The table below lists the number of samples taken from each event to train the classifier. A cross partition of data is created for 5-fold cross validation on the samples. The partition divides N observation into 5 disjoint subsamples chosen randomly but with equal size. The algorithms above offer optimization options that improve their accuracy. Table 3 compares the loss for different algorithms with and without optimization. The loss is calculated as the ratio of misclassified labels to total samples.  Figure 3 shows that the accuracy of the classification depends on the algorithm used. The results show that all algorithms perform well at classification of driving events. With no optimization, the Decision Tree performs best. With optimization, Ensembles has the best performance, while Discriminate appears to perform the worst.
Although the above classification performs well, it uses large number of features making it difficult to be deployed on an embedded system.

Reduced features
Using Component Analysis (NCA), it is possible to select a small subset of features that carry the most relevant information to classifications and reduce redundancy in features. Neighbourhood Component Analysis (NCA) is a non-parametric method for selecting features with the objective of maximizing prediction accuracy of classification algorithms. Figure 4 shows the relative weights of features and the regularized objective function for NCA.  For Tree classification, the features that have weights greater than threshold value (10%) are 9 features namely: gyroRotationZ, motionYaw, motionRoll, motionPitch, motionRotationRateZ, motionQuaternionX, motionQuaternioY, motionQuaternionW, and motionGravityZ. The Ensemble shares the above features in addition to mo-tionGravityY. Using this reduced set of features to predict the travel event The corresponding loss for using Tree and Ensemble is 0.0155 and 0.0488, respectively as shown in Figure 5. The results show even with reduced feature, an accurate classification of the event can be achieved.

Conclusion
With the growing road traffic around the world, the introduction of autonomous cars, and advances in automobile technology, new research is needed to take advantage of sensory data to improve efficiency and improve safety. This work introduces a technique that uses Machine Learning to classify driving events. It utilizes sensors data available from a mobile device with motion sensors to classify different events. Understanding driving events is essential for estimating cost of travel for vehicles and improving their efficiency by planning new routes that might reduce certain events. Our results show that by using machine learning with reduced features, it is possible to accurately classify events with very high accuracy. Using Decision Trees, the accuracy of classification is over 98%. Future work will consider using sensors data to classify driver behaviour in terms of efficiency and safety.