TF-IDF Decision Matrix to Measure Customers’ Satisfaction of Ride Hailing Mobile Application Services: Multi-Criteria Decision-Making Approach

—In recent years, the use of ride hailing mobile application services is increasing exponentially. Customers’ expectation of these phone services varies and change dynamically as the needs of each individual also vary. Customer reviews about mobile application are honest, voluntary opinions; and these could become essential input for mobile application providers to measure satisfaction. However, managing a large number of reviews into actionable plans could be challenging. This study combines the Term Frequency-Inverse Document Frequency (TF-IDF) and Multiple-Criteria Decision-Making (MCDM)-VIKOR approach to process 600 reviews into a meaningful insight to enhance ride hail-ing mobile application services. The four-phase method analysis concluded that application ease of use and affordability are the most important aspects that most contribute to customers’ satisfaction in ride hailing mobile application services.


Introduction
Mobile applications have become a primary tool used on a daily basis in this era of rapid technological advances.The estimated number of unique mobile subscribers was about 5 billion by 2020 globally [1].In transportation sector, mobile applications were initially used mostly for navigation and location-based services.Recently, the use of transportation mobile applications varies from education, data collection, travel information, travel safety, route planning, and ride hailing.
Ride hailing could be defined as a service that arranges a one-way trip in a short notice.The existence of this technology contributes to the changes of people's lifestyle into more digital.The increasing demand for a ride hailing application includes taxis to commute to workplaces, and home shopping, food and goods delivery [2].The most popular transportation mobile services in the United States (US) and Europe are Uber, Lyft, Sidecar, and Carpool [3].Meanwhile in Asia, three ride hailing applications-Didichuxing, Grab, and Go-Jek-have been listed in the top 10 Asia unicorns with the highest value and remarkable influence to people's lives.These three-transportation applications value US$62 billion, USUS$14.3 billion, and $10 billion respectively [4].
The number of users of transportation mobile services grew exponentially in the last five years, but then the COVID-19 outbreak started in late 2019.This phenomenon has changed people's life in almost every sector, including transportation.The enforcement of movement control order worldwide ceased most people's mobility.In the first quarter of 2020, Uber and Lyft ridership in the US dropped between 70% and 80% [5].The same trend happened in Indonesia, although the downturn was not as severe as the US, even the biggest mobile transportation applications-Grab and Go-Jek-reported a drop between 14% and 16% by March 2020 [6].As the pandemic persist to date, the number of ride hailing customers has not rebound to the point it was before the pandemic.But as the lockdown restrictions in many countries have now been eased gradually, so the ridership is picking up, particularly for food delivery services within the same mobile application services [5][6].
As people's habits and lifestyles now cannot be separated from digital platforms, and because market competition is becoming fiercer, customer expectations and satisfaction levels are also higher.People will have certain perception on some of ride hailing applications feature or attributes and expect more or less of a certain service availability within the applications.One of the best ways to find out about customers opinion is through the app reviews.Since mobile services' users generally write their impression by using their mobile devices rather than desktop computer, customer reviews in mobile applications tend to be concise and straightforward.It is because writing from a mobile device would be less convenient than using a computer keyboard.Customer reviews of mobile services application prefer to be clearer and explicit [7].
Customer reviews represent customers' voice and so these are essential for mobile service providers in order to monitor and improve customer satisfaction.Customer reviews provide insights such as practical characteristics, superiorities, and flaws of certain mobile applications.From customers' points of view, mobile application reviews will help them gain information of whether a certain mobile application can meet their needs, thus can help them decide to choose certain applications instead of the others.However, monitoring a big number of customer reviews could be a problem for both mobile application providers and customers themselves.For mobile service providers, analyzing millions of reviews into actionable plans will need a reliable methodology and much efforts.Meanwhile, for customers who aim to gather information about the applications, analyzing a large number of reviews may seem too daunting.As a result, people only examine global ratings when they first download a mobile application although these ratings are less helpful to understand customer satisfaction.Average ratings typically have bimodal distribution, where it is either extremely superior or extremely inferior, thus it might not be advantageous for the purpose of controlling and improving customer satisfaction [7].
This study presents customer satisfaction measurement based on term frequencyinverse document frequency (TF-IDF), then the computation of the value rank using a multicriteria decision making (MCDM), namely VIKOR.TF-IDF is used to weight factors from the consumer reviews.The weighting is essential to gain an accurate idea of the salient words from the frequency of appearance in the consumer reviews [8].Meanwhile, VIKOR is a multi-criteria optimization that compromise ranking and solution from the initial weights [9][10].Tang et al. [11], Gao et al. [12] and Zhang et al. [13] implemented MCDM and natural language processing (NLP) to mine public interests for each topic.However, the common natural language processing in that research only used term frequency (TF) to weight factor which will be used to create the MCDM matrix.Since the weight factor is important, then we consider using TF-IDF in NLP to gain accuracy of the weight factor.This study employed the integration of TF-IDF, MCDM and VIKOR approach on the application reviews to measure customers' satisfaction in ride hailing mobile services prior and during the COVID-19 pandemic.
MCDM is recognized as a highly reliable methodology for ranking multiple alternatives [9].While these methods have been successfully applied in different areas of economics, energy management, transportation, human resources management and other domains, its integration with TF-IDF is relatively new, particularly in the field of business.The practical implication of the current study is for mobile application providers to gain insight of customer satisfactions and expectations of an ideal ride hailing mobile services.For customers, they can save time and efforts to understand other customers' opinions about specific transportation mobile application services.

Methods
Data collection and analysis on this study is classified into four steps.The first one is data collection and then it is followed by data processing to prepare the data before text mining will be done.Last, ranking model using VIKOR to determine the order of preference.The proposed system in this study is described by Figure 1.

Data collection
Data collected in this study were obtained from Google Play Store website, accessed from https://play.google.com/.The interface of the website can be seen in Figure 2. The selected ride hailing mobile applications are Go-Jek, Grab and Didichuxing.The user interfaces from three selected ride hailing mobile applications can be seen in Figure 3.The online reviews from each mobile application were crawled and the specific variables were taken such as username, date, rating, and comments.The example of the data is described in Table 1.From four variables taken, we decided to use comments and ratings.The number of reviews from each application are 200, so in total there are 600 online reviews in this study.

Data pre-processing
Before the online review data are processed in text mining, the data undergo a pre-processing for the removal of any irrelevant data text [14] and for data normalization [15].There are four steps on the data pre-processing: 1) tokenization; 2) case transformation; 3) stop word filtration; 4) stemming.Tokenization separates the collection of information into individual words [16].Verma et al. [16] use RapidMiner to process raw text into separated words.Tokenization step aims to cut the input string based on each constituent word.RapidMiner is used in this study for the tokenization.An example of this step can be seen in Table 2.After that, the individual words will be transformed into lower case so there is no uppercase in the data [17].Case transformation is necessary to avoid confusion between similar words using uppercase and lowercase [18].Mohadab et al. [17] and Gupta et al. [18] applied case transformation before doing filtering and stemming.Hence, the order of data pre-processing after tokenization is case transformation, filtration and stemming.The example of case transformation cases from Table 2 are "ENGLISH" to "english"; "Over" to "over"; "Cities" to "cities"; "ASIA" to "asia".
Stop word filtration aims to remive stop words from the information.Since the language used on online reviews is English then we erase the English stop words from the review, the example of English stop words is 'a/an', 'the', 'am/is/are', 'in', 'to', 'about', et cetera.From the example in Table 2, the English stop words found include "and" and "in".Both stop words were deleted.The last step of data pre-processing is stemming to reduce words to their stem words by erasing suffixes.Jivani [19] divided stemming algorithms into three types: truncating, statistical, and mixed.In this study, truncating method was applied to remove suffixes in plurals and then convert them into singular form.The example of stemming from the above sentence is "cities" into "city".If the suffix in each word is reduced, then tagging the words into their part-of-speech type will be easier.Part-of-speech tags help to differ word types, and this can be seen in Table 3.

Text mining
The most frequent words appear in a text could be identified through text mining [20].The process starts with each word from the online reviews restored to its base form.Then, part-of-speech tagging is assigned (see Table 3).In this study, we classified the stem words into two categories.The first category is attribute dictionary, and the second category is sentiment dictionary.The attribute dictionary consists of nouns while sentiment dictionary consists of other than nouns.In Table 3, verbs, adjectives, and adverbs are included in sentiment dictionary.The dictionaries are divided automatically based on part-of-speech tag using Stanford taggers [21].Each category is called a dictionary because it consists of a lot of words like a dictionary.From the attribute dictionary, then the most frequent attributes are selected to be aspects (see Table 2).Each aspect will be used as measurement of customer satisfaction and these aspects will be named as A 1 , A 2 , A 3 , A 4 , A 5 , A 6 , A 7 , A 8 .The simultaneous occurrence of attribute and sentiment word is calculated using TF-IDF to measure how relevant a given word is to a document in a collection of documents [22][23] [24].The formula of the TF-IDF is shown in eq. 3, which is constructed from eq. 1 and 2. In this study term of presence is also calculated to count the number of aspect words that appear in each mobile application review.tf i,j = count (t j , d j ) (1) where: t i : i-th word, d j : j-th documents containing t i , |D| : total number of documents.A way or course taken from a starting point to a destination.

VIKOR method
VIKOR is multicriteria decision making (MCDM) because VIKOR helps to solve a problem by considering multicriteria [22].It uses multicriteria ranking index to compare the proximity of each criterion so that the ideal alternative is obtained.Ranking index is derived by calculating the maximum group utility (S j ) and minimum individual regret (R j ) [23].There are several steps to be completed using VIKOR, as follows: Step 1: Create the decision matrix.The decision matrix was formed from TF-IDF value for each aspect since in this study we used TF-IDF value.The TF-IDF value is calculated using eq.1-3.Also, the word's frequency of appearance is counted for each aspect.The formation of decision matrix is as follows: where: A i : i-th alternative, SN ij : the value of j-th aspect for i-th alternative.
Step 2: Normalized the decision matrix.
The TF-IDF from each aspect from step 1 is normalized to a number between 0 and 1.The formulation for normalization is as shown in eq. 4. The word presence from step 1 will be normalized using eq. 5.The weighting scheme is implemented to measure the weighted aspects. where: x* : normalization result, x : TF-IDF value.Step 3: Determine the ideal negative and ideal positive solutions.The two referential sequences of positive and negative ideal solution are obtained using eq.6 and 7.
where: f j *: positive ideal solution, f j -: negative ideal solution, f i, j : the value of normalized score of j-th aspects.
Step 4: Calculate new decision matrix with weight.The formula to assign weight for new decision matrix is written in eq. 8.
where: w j : weight of j-th aspect.
Step 5: Count the utility measure and regret measure.
Counting separation measure, as shown in eq. 9 and 10.
where: S i : utility value, R i : maximum value of utilities.
Step 6: Calculate the VIKOR index where: Q i : VIKOR index, v : VIKOR index weight value, S* : maximum value of utilities, S -: minimum value of utilities, R* : maximum value of regret, R -: minimum value of regret.
Step 7: List the order preference.
The alternative solution with the least VIKOR index value is the perfect solution. 3

Results and discussion
The first phase of VIKOR analysis is establishing decision matrix with TF-IDF value.From the calculated score using TF-IDF, a matrix of weighting score for each aspect is presented in Table 4. Also, there are word presence in the last row of the table.Word presence shows the frequency of each aspect's appearance in the consumer reviews.In the second step, the scores from Table 4 are normalized between 0-1 then new decision matrix is developed as shown in Table 4.The weight from each aspect is obtained by using eq. 4.
Step three, progression of determining ideal positive (f j * ) and negative (f j -) solution is shown in Table 5.Each ideal solution is obtained by looking for the maximum and minimum number of normalized scores from Table 4.In step four, based on eq. 8, a new decision matrix is arranged to consider the weight of each aspect.The result is presented in Table 6.After obtaining weighted decision matrix, then step 5 is computed based on eq. 9 and 10 to calculate utility measure and regret measure.From this process, the value of S i , R i and Q i could be obtained for each aspect, as shown in Table 7.In step 6, the Q-value is calculated using 10 different v-value.The number used as v-values are between 0-1.The v-value is a weighting factor to examined the result.The decision is made by comparing the results.From the value of R i depicted in Table 8, the maximum value of utility shows the important aspects of ride hailing applications, where A 1 (Application) and A 4 (Cost) are the top among those listed aspects.From this result, it could be inferred that in order to improve the customer satisfaction, each ride hailing mobile application should take notice on the ease of utilization and the fare.Meanwhile, the result of the ranking model is greatly influenced by v-value as weighting factor, if the decision maker sets v-value as v=0 until v=0.7 then the rank of each ride hailing mobile application is sorted as Go-Jek > Grab > Didichuxing.The rank is sorted based on Q-values as shown in Table 9, where the smallest number means the best solution.However, if the v-value is set to 0.8-1 then the rank is changed into Go-Jek > Didichuxing > Grab.It is worth noting that in all v-value scenario, Go-Jek is in the top position among the three ride hailing applications.Therefore, Go-Jek could be considered as the ride-hailing mobile application benchmark because the rank of Go-Jek remains at first position for 10 possible v-value.The impact of v-value scenario toward VIKOR rank is known as sensitivity.
Sensitivity analysis is conducted to rank the impact level of v-value as shown in Figure 4. Sensitivity analysis determines how different value of an independent variable affects dependent variable [27].The ranking of Go-Jek was not altered by v-value.It means that the Go-Jek has higher customer satisfaction in terms of both S * and R -.On the contrary, the ranking of Grab was improving when v-value is decreased.It reveals that Grab has higher customer satisfaction when it focuses on R -, while Didichuxing is the opposite.The ranking of Didichuxing increases when the importance of S * was increased.
The study can convert text-based customer opinions into a matrix and then it is ranked based on its criteria.The result is understandable, not time consuming and can provide an insight about the ride hailing mobile application.The study uses ride hailing mobile application as a case study to show the effectiveness of the proposed framework.The similar research by Liang et al. [28] uses hotel booking as a case study to explain the practicability and effectiveness of the method and a model of integrated MCDM.
Distribution linguistic VIKOR was developed to rank travel hotels based on online reviews [28].The similarity from the proposed method in this study is that information loss can be avoided with the integration of MCDM-VIKOR method because all aspects written in online reviews are used as attributes in the ranking process.Liang et al. [28] use common term-frequency (TF); while this study uses TF-IDF to measure a word's relevance in a document in a collection of documents.TF-IDF can identify which words carry more information as opposed to those frequent words with lesser information.
The limitation of this research is that the number of mobile ride hailing application services is limited to the three most well-known mobile ride hailing application services; online reviews of each mobile application is also limited to 200 reviews and MCDM method used in this study is only one i.e., VIKOR.The example of another MCDM methods is AHP, TOPSIS, ELECTRE, WSM, WPM [29].In addition, each country has different types of ride hailing mobile application and the language used in online reviews may vary.It is important that further studies to be carried out using bigger size of data and analyze the online reviews in different languages other than English.

Conclusions
This study proposes multicriteria decision matrix to measure customer satisfaction of ride hailing mobile application.The technique used in MCDM is VIKOR combined with TF-IDF value as decision matrix.Based on the results, we can gain an insight about the customer opinions as expressed in the online review data in this study Ease of use and fare are the most demanded utility of the ride hailing applications.It means that ride hailing mobile application should highly consider these aspects to enhance their services as well as gaining customer satisfaction.Besides, it is also concluded that Go-Jek is ranked at first position among other ride hailing applications.The result shows that the proposed ranking model is effective to measure customer satisfaction

Table 1 .
Customer reviews data

Table 2 .
Example of input text

Table 5 .
Decision matrix with TF-IDF value

Table 7 .
Positive ideal solution

Table 8 .
Decision matrix with weight

Table 9 .
Q values