Paper—Mobile-Based Driver Sleepiness Detection Using Facial Landmarks and Analysis of EAR Values Mobile-Based Driver Sleepiness Detection Using Facial Landmarks and Analysis of EAR Values

Sleepiness during driving is a dangerous problem faced by all countries. Many studies have been conducted and stated that sleepiness threatens the driver himself and other peoples. The victim not only suffered minor injuries, but also many of them ended in death. Nowadays, there are many kinds of studies to improve sleep detection methods. But it faces difficulties such as lack of accuracy and poor performance of detection; thus, the system inadequate works in real-time. Recently, automobile companies have begun manufacturing special equipment to recognize sleepiness driver. However, the technologies are only implemented in certain cars since the price is still quite expensive. Therefore, a system with a comprehensive method is needed to discover the driver's sleepiness accurately at an affordable price. This study proposed driver sleepiness detection implemented on a smartphone. The system is capable of identifying closed eyes using the extraction of Facial Landmark points and analysis of calculation results of the Eye Aspect Ratio (EAR). The system qualified works in real-time since it uses a particular library designed in a mobile application. Based on some experiments that have been done, the proposed method adequate to identify sleepy driver accurately by 92.85% Keywords—Driver Sleepiness, Sleepiness Detection, Smartphone, Facial Landmark, Extraction, Real-time, Eye Aspect Ratio.


Introduction
Sleepiness while driving is a dangerous problem faced by all countries. Besides being experienced by developing countries, sleepiness is also an important concern in developed countries [1]- [3]. Many studies have shown that sleepiness while driving strongly leads to many casualties [3]- [6]. Victims not only the driver itself but also other peoples such as pedestrians, cyclists and other people who are around in the highway [7]. Victims suffered minor to severe injuries; even many of them ended in death [8].
Lately, the automotive industries began to develop special devices aimed at keeping the driver awake while driving. Mercedes-Benz, Volvo, and Bosch developed devices to recognize decreased vigilance due to fatigue or sleepiness while driving. Despite having various names, those technologies have the same function [9]- [11].
Panasonic also developed a similar technology called Drowsiness-Control Technology to identify the drowsy driver and recognize surrounding conditions using special sensors [12]. But all those technologies require an instrument that is particularly designed at a very expensive price and only certain circles who are capable of taking benefit over it. Meanwhile, generally, vehicles that are being used today only have a simple technology. So, it needs further development to be qualified to implement and experience the benefits of the sleepiness detection system to reduce the traffic accidents caused by a sleepy driver.
To overcome this problem, many studies have been conducted to develop sleepiness detection methods, and one of them is utilizing a webcam connected to a computer. Rahman et al. identify sleepiness based on eye blink analysis. The first step is detecting the facial area using the Viola-Jones method. The eye area was identified using the Haar-like feature algorithm, while blinking was detected using distance measurements between points on the pupils obtained from the extraction process. To expose the sleepy eyes, Rahman et al. evaluate the frame of closed-eye that was caught [13].
Meanwhile, Jacobe et al. utilize two models of artificial neural networks (ANN) to detect and predict sleepiness in the driver in a driving simulation during a certain time [14]. Although both types of research gave satisfying results, there were some weaknesses. Rahman et al. must convert the color of each frame, and this will affect to inadequate the detection process. Moreover, both kinds of research are tested and run on computer devices. It will encounter several obstacles and difficult to implement inside the car.
Besides uses webcam and computers, some researches using a smartphone to discover sleepy drivers. Mohammad et al. utilized the Haar Cascade Classifier method to recognize the eye area and observe the eye sclera to discover closed eyes based on mobile [10]. Afterward, the methods will observe the sclera that has been discovered. When the eyes are open, the sclera has a dominant white color, and when the eyes are closed, the color of the sclera becomes black. Haar Cascade Classifier is also used by Guksa and Erkmen to identify areas of the eye as well as to detect closed eyes [15]. Although the system is capable of providing high accuracy, the application of the Haar Cascade Classifier has drawbacks. The driver's face must be aimed directly at the camera. Otherwise, recognition of the eye area will be reduced so that the accuracy of drowsiness detection becomes low. Also, the Haar Cascade Classifier needs further development to be able to detect eye areas more precisely [16].
Meanwhile, Jabbar et al. proposed a method of drowsiness detection utilizing the extraction of facial points combined with Deep Neural Networks [9]. The developed system is adequate to recognize a drowsy driver by 81%. Although this method provides good results, the application of Neural Networks requires high-level computer specifications because it requires a complex learning process. Also, this technique is a time-consuming method to gain optimal weight values [17], [18]. Therefore, a method with a new approach and more comprehensive is proposed. This method is capable of recognizing the eye area accurately using the extraction of facial landmark points. Meanwhile, open and closed eyes are distinguished using an analysis of the Eye Aspect Ratio (EAR), which was previously proposed by Soukupova and Cech to identify blinks [19]. Blink and sleepiness are classified by analyzing the number of frames caught during closed eye. Android-based face detection library is chosen so that the system is qualified to engage in real-time detection.
There are several contributions from the proposed method: (a) providing a framework for real-time sleepiness detection that easily implemented on smartphones, (b) replacing the technology that are very expensive with an affordable price system and also flexible to implement in various vehicles, types of cars and drivers and last but not least is (c) increasing driving safety to prevent an undesirable situation.

Related Work
The sleepiness detection system continues to be matured and also diverse. Such as an implementation of a webcam that is connected to a computer [13], Internet of Things (IoT) approach [18], [20], wearables [21] and even utilizing smartphones [9], [10], [22], [23]. Most of the researchers are competing to promote a method that is easy to perform and has a comprehensive impact. In this study, the method is proposed by utilizing a smartphone. That makes sense since the smartphone is equipped with various components and accurate sensors, making it more accessible for development and implementation.
Haar Cascade Classifier is the most common method for identifying eye area [10], [13], [15]. But some of them were applying the Histogram of Oriented Gradient (HOG). HOG will be optimized using the Support Vector Machine to discover closed eyes optimally [24]- [26]. However, other studies use different methods. Mohammad et al. distinguish open and closed eyes by observing the area of the sclera in a square of 1cm x 0.5cm (77pixel x 39pixel). The white area becomes dominant when the eyes are open. Whereas. it disappears when eyes are closed. Yet this method requires a longer process. The system must be initiated with the color conversion process [10].
Meanwhile, Rahman et al. initiated by obtaining the edge point and capable of extracting three points [13]. Afterward, the method proceeds to calculate the distance between the edges of dots that are formed utilizing the Pythagoras Theorem (Pythagorean Theorem). The closed eye is calculated based on the distance of the extraction point on the pupils. To distinguish blinking and drowsiness, both Rahman et al. and Mohammad et al. analyze the number of closed eye frames during a specified time interval. Analysis of the number of frames is also adopted in this study because it does not require a huge computational process.
Besides Haar Cascade Classifier, there is a method to discover facial area precisely defined as Facial Landmark. Facial landmarks are capable of identifying the facial area and adequate to recognize the Regions of Interest (ROI) such as eyebrows, eyes, nose, mouth, and jawline by extracting up to 68 dots around the face [27], [28]. According to these capabilities, a facial landmark is widely implemented for face detection, blink, and sleepiness identification [19], [29], [30]. This method is also effective to classify a person's face emotions [31]. As a result, the facial landmarks were chosen as the primary method in this research.
To identify a closed eye, the point of facial landmarks around the eye area is measured. The measurement method uses Euclidean Distance based on the number of extraction points. This method is known as the Eye Aspect Ratio (EAR). Soukopova and Cech used EAR to recognize blinks using 6 facial landmark points around the eye [19]. In this study, the number of points applied was restricted. The points are selected based on the most influential in closed eye detection. Furthermore, the reduction of the extraction point will speed up the computing process.

Proposed Method
This paper is organized into 3 phases: Eye Recognition, Closed Eye Identification, Sleepiness Detection. Figure 1 illustrates the flow model of the proposed method. Each phase will be further discussed below.

Eye recognition
The proposed method initiates by identifying the facial area, then recognize the eyes area. Both processes utilize Facial Landmark. Facial landmarks are extraction points that are scattered over the face using a face detector [32]. During the extraction process, the points produced are incompletely precise in the ROI. Therefore, the Tree Regression method is implemented [27].
The face detector also works to extract the facial landmark points on an initiate frame when it used for the extraction point in a video. The face tracking feature is applied to extract points in the next frame [33]. By performing this step, the system adequate to extract points not only in static images but also dynamic images such as a real-time video.
There are various numbers of dots generated on the face, depending on the library used. When using the Dlib library, the number of points generated are 68 around facial areas and produce 6 points in the eye area. In this research, Face Detection is employed as a library to recognize eye area developed by Google. The library, similar to facial landmark extraction using a face detector and capable, produce 133 points around the face. The library qualified to extract 32 points in the eyes, divided it into 16 points on the right and left of the eye. In this study, the point used was limited.  The Adjustment of the file format is needed to identify the face area in real-time videos. Figure 2 demonstrates the stage of extraction of facial landmark points inside the smartphone. While the face is captured using the Camera API. Since the API capable of modifying attributes on the camera such as pixel size, focus level, and even the frames per second (fps). The capture results are saved and encoded in the YUV NV21 file format. This format is chosen since the capability to extract points in a frame effectively. The extraction results are presented in an output frame and displayed on the smartphone screen. This process is performed repeatedly, depending on the number of frames captured. In this study, the smartphone camera is set at 8 fps to gain efficiency and engage in real-time. 16 points are generated from each eye on an output frame. However, only 4 points will be utilized to recognize closed eyes in the next process. The dots are arrayed at 57, 61, 65, 69 in the right eye and 73, 77, 81, and 85 in the left eye. Figure 3 displays selected points on the screen.

Closed eye identification
Firstly, identifying the size of the open and closed eye based on the extraction points in the eye area. Therefore, the Eye Aspect Ratio (EAR) is applied. This method is proficient at distinguishing open and closed eyes accurately [19].
Closed eyes are recognized by comparing EAR with the threshold of the closed eye. Soukupova and Cech applied a threshold value of 0.20. If the EAR is greater than 0.20, then the system identifies as opened eyes. Conversely, if the EAR is less than the threshold, then it is classified as a closed eye. But it should be noted, Soukupova and Cech utilize 6 points of the eye area. In this study, the number of points was reduced so that the threshold value needs to be further analyzed. An analysis of threshold values will be discussed in the results and analysis section.
In this research, the point of facial landmarks on the eye area is decreased to 4, since only those points that hugely affect the measurement of closed eyes. Also, reducing the number of points has a significant impact on the calculation process and improves computational time to work in real-time. Figure 3(b), displays the selection of 4 points in the eye area.
The basis of the EAR method uses the estimation of the distance between (x, y)coordinate of two-point in an image. The estimation is commonly known as Euclidean Distance since each point is located at (x, y)-coordinate. Figure 4

Sleepiness detection
When the eyes blink, it takes 100 -400 ms to close [13]. This was strengthened by research conducted by Caffier et al., who declared that the closed eye requires around 200 ms [34]. Meanwhile, when sleepy, the eyes need more than 500 ms to close. The estimation of the closed eye begins when the eyes start to close until they are fully closed [4], [35]. Based on those studies, the length of the closed eye can be determined by applying Equation (2).

Fig. 4. An Illustration of Points Estimation: (a) The Distance of Points When An Eye is Opened, and (b) The Distance When The Eye is Closed
f denotes the number of frames of blinking and sleepy eyes. Fps stands for frame per second denotes the number of fps that uses by a smartphone camera. In this research, the camera of the smartphone is set to 8 fps. t denotes the time (in second) that needed by eyes to fully close.
When eyes are blink, it takes 200ms to fully closed, so the number of frames (f) needed is 1-2 frames. Meanwhile, in the sleepiness condition, the eyes require at least 500ms to be closed, so the number of frames (f) needed is four frames.
It can be concluded that if the EAR value is beneath the threshold, and the captured frame is 1-2, then it will be detected as a blink. Meanwhile, if the captured frame is 4 frames, it will be recognized as sleepiness. Afterward, the system will turn on the alarm.

Result and Analysis
The experiment was arranged in two different environments i.e., indoors and inside a car with some of the simulations drive around a soccer field. The dataset was realtime videos and collected from 24 people around Malang City, which are consists of students, researchers, drivers, and office workers. 10 people were used to analyze the optimal threshold of closed eyes, where 14 were purposed to observe driver sleepiness at driving simulations. Each eye size of the dataset varies since races in Indonesia diverse. In the end, the dataset is dominated by Malay, Javanese, Madurese, and Chinese-Indonesian tribes.
The experiment was implemented on the Asus Zenfone 2 ZE551ML with 4 GB of RAM and a 2.3 GHz processor. The camera of the smartphone is set a resolution of 640x480 with 8 fps. The lighting condition both indoor and inside a car was taken started in the morning until afternoon. Illumination is also measured, which was between 50 -300 lux. The distance of the smartphone to the user's face is set at 40 cm, both indoor and inside a car.
The first step is to analyze the optimal threshold of the closed eye which is done indoor using a gorilla tripod. According to the study of Soukupova and Cech [19], the threshold of a closed eye was determined as 0.20 for all subjects. However, in this research, the threshold is set varies started 0.20; 0.22; 0.24; 0.26; and 0.28. It aims to find the optimal threshold when applied to various eye sizes, such as narrow to wide eyes.
The dataset consisted of 10 different people based on four different types of Indonesian races and was asked to performed 20 times of closed eyes as long as 1-2 minutes per user. The result is obtained 200 data of closed eye.
When a smartphone measures EAR of an opened eye, the value fluctuates from 0.01 to 0.03. The size of an opened eye can be determined by calculating an average of the EAR when eyes still open. Based on this method, an EAR of an opened eye on the narrow to the wide eye can be resolved. The second column of Table 1 shows the Mean EAR of opened eyes consists of narrow to wide eyes.
The EAR Threshold must be less than The Mean EAR of an opened eye to identify closed eye accurately. For instance, if the Mean EAR is 0.27, and the EAR Threshold is set at 0.20 -0.24, then the system capable of identifying closed eyes convincingly. However, if the EAR Threshold is enlarged, then the accuracy of closed eye detection will decrease even to 0, as seen in Table 1. Figure 5 illustrates a relationship among The EAR Threshold and The Mean EAR of an opened eye.
In closed eye detection, the system will apply threshold initiate with 0.20 to 0.28. Line 1-3 shows narrow eyes indicate by small EAR values compare to others. At this stage, the system shows the optimal results when implementing threshold 0.20 -0.24 with an average of true detection 17-18 of 20 experiments. The accuracy is decreased when the system applies threshold 0.26 and 0.28 with 5-8 true detection of 20 experiments.  Eventually, according to some experiments of Table 1 obtained an optimal threshold of the closed eyes of all datasets that begins with narrow to wide eyes is 0.24. It since 0.24 is the central threshold among narrow to wide eyes. It's justified through a series of experiments that have been done. Based on Equation (3), the system adequate to identify closed eyes by achieving an accuracy of 90.50%. In the second step, an optimal threshold is utilized to observe driver sleepiness detection at a driving simulation along the soccer field and on the street using a smartphone. The smartphone is located in front of the driver that is placed on the dashboard within a car that utilizes a phone car holder. It has been designed to be steady, very strong, and sets to unblock the driver's sight while driving. It can be noted that the detection process only works when the system identifies the eyes area completely. Figure 6 shows the installation of a smartphone within a car.
In this stage, the dataset consists of 14 objects are asked to drive a car as long as 3-5 minutes. Each of them is requested to demonstrate 15 times of sleepiness expression during driving. The system will observe the driver's face in real-time to identify the sleepiness based on analysis of frames of the closed eye, as explained in Equation (2).
As seen in Table 2, the system capable of recognizing sleepy drivers convincingly. For instance, in the third, seventh, and thirteenth lines, the system adequate to identify 13 true detections of 15 experiments. It is because the system detects 2-3 frames of closed eyes in 2 experiments, which means less than 4 frames. In the first and second line, the system capable of recognizing 14 true detections of 15 experiments. Meanwhile, in line fifth and fourteenth, the system shows the best result as much as 15 true detections of 15 experiments. It the end, the proposed system qualified to identify the sleepy driver as much as 92.85% based on Equation (3). Figure 7 demonstrates driver sleepiness detection within a car in real-time. The proposed system accomplished to recognize sleepy driver to users who wear glasses and scarf in light and bad illumination.
Some videos collected by NTHU Computer Vision Lab and used by Jabbar et al. [9] are considered. However, the dataset is not entirely under the needs of this study such as the use of sunglasses and the condition of the head down when users are sleepy. Hence, datasets are collected independently and arranged that closely similar to the NTHU dataset.
Meanwhile, Mohammad et al. [10] are used only 2 subjects consist of a male and a female, whereas Guksa and Erkmen [15] do not clearly state the specifications related to the dataset. Therefore, the dataset employed in this study is more diverse so that the final results are convincible and acceptable. Table 3 shows a comparison with some previous studies. The proposed method adequate to provide better accuracy and capable of works in real-time. Also, the proposed system efficient in identifying sleepy drivers in static images (photos) in some experiments since the capability to discover sleepiness in dynamic images (real-time videos).  Besides, since the capability to work in real-time, the proposed system qualified to take over Wearable sensors technology [21], which 3-7 times more expensive than a smartphone. Moreover, the proposed method adequate to succeed in unreasonable price technology that is being developed by automotive companies [11], [12].

Conclusion
The Facial Landmark method qualified to identify driver sleepiness accurately based on analysis of EAR (Eye Aspect Ratio) values utilizes the extraction of 4 points around the eye area. Besides, this method capable of adaptation in different types of eyes size based on an analysis of the threshold value of the EAR. Moreover, in some experiments that have been done, the proposed system accomplished to work on smartphones in real-time and efficiently to discover sleepiness during driving accurately by 92.85%. Therefore, the sleepiness detection system is strongly recommended to utilize a smartphone compared to other devices. Additionally, the device is equipped with various sensors and efficiently combined with other features. Moreover, many librar-ies have been created and are accessible to develop in the mobile application version. Therefore, it can be an effective way to substitute some expensive sleepiness detection technologies at an affordable price.
In further works, the development of the system will be improved to work in various light conditions, especially at night. Besides, this system will combine with some sensors within a smartphone, which optimized in the data filtration process; thus, the system quickly identifies vehicles that stop and drive.