Implementation of Deep Learning Predictor (LSTM) Algorithm for Human Mobility Prediction

—The studies of human mobility prediction in mobile computing area gained due to the availability of large-scale dataset contained history of location trajectory. Previous work has been proposed many solutions for increasing of human mobility prediction the accuracy result, however, only few researchers have addressed the issue of human mobility for implementation of LSTM networks. This study attempted to use classical methodologies by combining LSTM and DBSCAN because those algorithms can tackle problem in human mobility, including large-scale sequential data modeling and number of clusters of arbitrary trajectory identification. The method of research consists of DBSCAN for clustering, long short-term memory (LSTM) algorithm for modelling and prediction, and Root Mean Square Error (RMSE) for evaluation. As the result, the prediction error or RMSE value reached score 3.551 by setting LSTM with parameter of epoch and batch_size is 100 and 20 respectively.


Introduction
The research of that human mobility prediction in mobile computing area increased due to the availability of large-scale dataset contained individual's location trajectory.The prediction of human mobility itself is related to an estimate of the next place, which will be visited by people in a city.It also suggests strong temporal presence such as people go to work during the day regularly and go shopping routine from by the data of the location of mobile users [1]- [4].
Here is a general definition of a problem prediction on human mobility: for example,   represents one's location at a time (1 ≤  ≤ ).For this person, we have a history of the locations visited in order   =  1 ,  2 ,  3 , … ,   .Based on   =  1 ,  2 ,  3 , … ,   , we want to predict where the person next visited  +1 , at time  + 1 [5].
The capability to forecast urban human mobility is one of the research problems in study area of mobile computing.The increasing of predictors accuracy can increase the performance of mobile computing applications for processing human mobility data [6] [2].If a mobile application can provide location prediction well, then it can be used to know where a person will visit in a city.
In business concept, the data of individual location history can be used for taxi companies in order to serve the recommendation to taxi drivers goes to areas where demand for taxi services is high [7].Travel companies (for example, Uber) can relocate their vehicle to fulfil passenger services in a region.Another benefit of human mobility prediction is to solve many issues, for example, development of traffic forecasting [8]- [11] tourism [12]- [15], advertising [16] and so forth.
Previous work has been proposed using many approaches for human mobility, for example, data mining [17]- [20], Markov-based algorithm predictors [21], compression-based predictors (LZW) [21], time-series based predictors (ARIMA) [22], however, only a few researchers have addressed the issue of human mobility for implementation of LSTM networks.Based on [5], [23], [24], algorithm long short-term memory (LSTM) is the best algorithm for sequential data modelling and large-scale geo-dataset which is the main characteristic of GeoLife GPS Trajectories dataset.
The other challenge of human mobility is how to cluster the arbitrary shape of largescale trajectory dataset.Based on the literature, it can be tackled by using Density-Based Spatial Clustering of Application with Noise (DBSCAN) which can recognize the number of clusters of the arbitrary trajectory without any prior information [25], [26], [27].Motivated by the above, the research objective of this research is how to cluster arbitrary shape of large-scale trajectory dataset by using LSTM and DBSCAN.

Literature Review
Research on human mobility has been done at the beginning in 1885.The research on The Laws of Migration [3] in the Royal Statistical Society Journal was regarded as the first modern research about human mobility understanding.
The study of human mobility has significantly changed since the significant growth of mobile phones.Mobile application use information from a cell tower and the GPS or Global Positioning System for a particular position tracking.In every day, many people use their phone in several locations, that provides a large amount of data on human mobility.This makes human mobility data become "big data", that makes it possible understand the condition of human mobility in an urban area and gives a new chance for modelling and predicting human movement more accurately [7], [28]- [32].
Previous work has been done in human mobility.Chen et al. [17] conducted research in human mobility prediction based on personal trajectory data by using the data mining approach.Vu et al. [18] used a rule-based location prediction method named RLP for predicting a person's next location in the location-based services system.Lee et.all completed research regarding the spatio-temporal pattern mining for predicting a person's future place location based on their mobile device logs data [19].
iJIM -Vol.14, No. 18, 2020 Yavas et al. [20] studied human mobility prediction based on next inter-cell movement approach from mobile user location data.This approach consisted of the threephase algorithm, including user mobility patterns identification, mobility rules extraction and mobility prediction.Wang et al. [33] and Heaslip et al. [34] implemented convolution neural network (CNN) for data modelling from human mobility trajectory data.Li et al. [22] proposed an improved time-series-based predictor (ARIMA) for spatial-temporal variation of people mobility prediction from dataset contained 4.000 taxis' GPS traces.Kai et al. [21] and Song et al. [35] attempted to use Markov-based predictors for evaluating mobility traces prediction.

3
Research Methodology

Data collection
The dataset that we used in our study is GeoLife GPS Trajectories dataset which is from Microsoft Research Asia project contained 182 people data from 2007 to 2012.The GPS trajectory data is reflected in sequence time-stamped points with form  = {  ,   , … ,   } from an urban area or city.For each trajectory consists of a series of  records.  =   ,   , … ,   .Each record  is a tuple that consist of t = <latitude, longitude, altitude, date-number of days, date as string, and time as string>

Research phase
The flow chart of the research methodology is presented in Figure 2. The section of research methodology divided into five phases including dataset collection, stays points detection and extraction region of interest using DBSCAN, data modelling and prediction using LSTM and evaluation using Root Mean Square Error (RMSE).

Fig. 2. Research methodology
Based on the flow chart above, every phase explanation is elaborated as follows:

Detecting Stays Points
In this phase, we detect several stays point from a GPS trajectory of a user in a period of over five years.A stay point is a geographic location where a person stays for a period of time.To detect stay points, two parameters are required, they are time (ϴt) and distance (ϴd) threshold.The time threshold and distance threshold that we used is 20 minutes and 200 meters.If time interval and distance of two points match with the threshold condition, then the points will be concatenated into one stay point by substituting them with the center of the point.

Extracting Region of Interest
In this stage, we used the DBSCAN algorithm to cluster region of interest from stay points.We compute the clusters of stay points that are close to each other using this algorithm.The parameters of this algorithm are epsilon (eps) and minimum samples (min_samples).The epsilon parameter is the maximum distance between points so that it can be considered as a cluster.In this study, we used 1.5 km as the maximum distance of points can be considered as a cluster, and the min_samples is 1.

Modeling & Prediction
The model building for training data is done using multivariate LSTM forecast model.After the model is produced, then we performed prediction to the testing data using this model.
LSTM is an RNN-specific architecture developed for sequential data prediction, which can study time-series data over long spans and define optimal time-lags automatically [5].Recurrent Neural Network (RNN) have the capability to get temporal and spatial evolution in patterns of human movement.However, traditional RNNs fail to get long temporal dependence on sequential data due to the disappearing gradient problems [1] [12].The LSTM network calculated a mapping from an input sequence  =   ,   , …   to an output sequence  =   ,   , …   by computing the network unit activations by utilizing math equations from  =  to  below [37]: = (    +    −1 +     +   ) (4) Where  represent input gate,  represent forget gate, and  represent output gate.The , and  represents the activation vectors for each memory block and cell. represents the weight matrices, whereas,  represent bias vectors that is to link the memory block, input layer, and output layer [37], [38].
Moreover, ʘ denotes the scalar product of two vectors,  express the network output activation function, (. ) and (. ) denotes the input and output cell activation functions (centered logistic sigmoid function), and (. ) express the standard logistics sigmoid function [37], [38].

Evaluation
Furthermore, the performance evaluation is performed to the prediction result that produces in the previous stage.Error metric that used to evaluate the model is Root Mean Square Error (RMSE).RMSE is used to compute prediction errors based on data points of the regression line with the equation as below: Where  denotes sample size,  denotes served label, and  ̂ denotes prediction label [39].
We conducted this study in Ubuntu 14.04 LTS 64-bit on a PC with Processor Intel® Core™ i7-6500U CPU @ 2.50GHz × 4, Memory DDR2 RAM 8 GB and a Hard Disk 160 GB.The experiments are conducted using Keras deep learning library in Python.We implemented the deep learning method that is LSTM.We divided the dataset into training and testing data and performed cross-validation.The training data took 75% of the dataset, while the validation data took the rest.Fig. 3.The format of individual people's data Figure 3 shows the example data of a user.For this study, we used data of user 001 that have 71 paths and 108607 points.From this data, we plot all trajectories.Figure 4 shows all trajectories from 001 user data, where the  axis is latitude and  axis is longitude.Then using this data, we detect the stay points by making use of time threshold 20 minutes and distance threshold 200 meters.The stay points that detected from this data are 102 points.The result of the stay points detection shows in Figure 5 and Figure 6.We compute the clusters of stay points that are close to each other, so that consider as a region of interest.We used the DBScan clustering algorithm with a maximum distance of 1.5 km and minimum samples 1 of points that can be considered as a cluster.This process is successfully extracted 12 clusters from the stay points so that we had 12 regions of interest.The clustering result of this data shows in Figure 7.   8, where the  axis is number of the poch and the  axis is loss value.

Fig. 9. The result of train_loss and val_loss
As the final phase, we conducted evaluation of the prediction model using an evaluation score for train_loss, val_loss, and RMSE that can be seen in Table 1 below.RMSE is a suitable measurement for computing of prediction model accuracy and is the main criteria to define if the research is regarding to prediction model [40].

Conclusion
This study was successfully conducted a human mobility prediction using GeoLife GPS Trajectories dataset.From this dataset, we conducted the stay points detection, region of interest extraction using DBSCAN, and prediction using multivariate LSTM forecast model.As a result, the evaluation score of prediction using LSTM algorithm for the parameter of epoch and batch_size is 100, and 20 respectively demonstrated good levels of accuracy and in terms of prediction error or RMSE reached score 3.551.

Acknowledgement
This research was fully supported Centre of Research, Universitas Mercu Buana through an internal research grant (named penelitian internal).

Fig. 5 .
Fig. 5. Trajectory of stay points with other points

Fig. 7 .
Fig. 7.The format of individual people's data After we get the clusters or regions of interest, then we used the index of the cluster as the label class of each point, and we used it to model the prediction using the LSTM multivariate forecast model.The design network to our model is shown in Figure 8.

Table 1 .
Result of model evaluation