Android Apps Security Assessment using Sentiment Analysis Techniques: Comparative Study

— Considering that application security is an important aspect, especially nowadays with the increase in technology and the number of fraudsters. It should be noted that determining the security of an application is a difficult task, especially since most fraudsters have become skilled and professional at manipulating people and stealing their sensitive data. Therefore, we pay attention to trying to spot insecurity apps, by analyzing user feedback on the Google Play platform and using sentiment analysis to determine the apps level of security. As it is known, user reviews reflect their experiments and experiences in addition to their feelings and satisfaction with the application or not. But unfortunately, not all of these reviews are real, and as is known, the fake reviews do not reflect the sincerity of feelings, so we have been keen in our work to filter the reviews to be the result is accurate and correct. This study is useful for both users wanting to install android apps and for developers interested in app optimization. of A


Introduction
Mobile applications are becoming increasingly common in people's daily lives as technology progresses at such a quick pace. This causes an increase in the number of applications in the application stores daily, which makes it difficult for users to choose the appropriate application for them in several aspects, the most important of which is the safety of that application. An estimated one in every 36 mobile devices has risky apps installed, according to some estimates [1]. Determining application security is difficult, especially for those who do not have the technical knowledge, so we decided to determine the level of security for the application by analyzing user reviews using sentiment analysis, which studies have proven successful in many areas.
Using a computer, sentiment analysis or opinion mining analyzes the way people feel and think about a wide range of topics, including products, services, issues, events, and themes. As a result, in order to track how the public feels about a specific entity, sentiment analysis can be utilized. This knowledge can then be put to use. This type of knowledge can be used to understand, explain, and forecast social processes. When it comes to business, sentiment analysis is crucial for strategizing and gaining insight into customer opinion on a company's goods and services. In today's customer-focused company culture, knowing your customer is critical. Sentiment analysis incorporates elements of psychology, sociology, natural language processing, and machine learning. Data volumes and processing power have recently increased dramatically, allowing for more sophisticated forms of analytics. As a result, sentiment analysis using machine learning has grown increasingly popular [2].
The rest of this paper is organized as follows. Section 2 discusses related works about spam detection, aspect-based sentiment analysis and evaluation of mobile apps security. Section 3 focus on the proposes solution. Section 4 discusses methodology used in developing the proposed system. At the end section 5 explains the conclusions.

Related work
This section is divided into three fields related to this paper. In each field, we lay out some related studies and existing systems.

Spam detection
We found that several techniques and methods are suggested to help detect fake reviews with greater accuracy. One of the most effective ways is the process of extracting features from the text, that can be categorized into two main groups: Features related to review content, which focus on the text of the review by analyzing its features such as Bag of word (BOW), word embedding (WE), and term frequency (TF) etc [3] [4]. While the second method focuses on the features of reviewers that include characteristics of the user who is posting reviews such as IP-address, the number of posts of the reviewer, etc. Using these two methods to spot spam reviews yields a better and more accurate result [5] [6]. On the other hand, given the approaches used, most researchers have worked with supervised classification models, while few researchers have worked with unsupervised models. Unsupervised models can be hard due to the lack of a reliable labeled data-set of reviews. [6] Supervised learning method of spam detection. Supervised learning is a classification method intended to train the machine through labeled data to predict the output. While this technology can be used to detect and assess assaults in numerous areas of cybersecurity, it is also known as Naive Bayes (NB), Support Vector Machine (SVM), Decision Tree (DT), and Random Forests (RF). Also, it can be used to discover spam reviews which require a labeled dataset to detect spam of unseen data. [7]. Table 1 compere number of studies about supervised learning on spam detection. Unsupervised learning methods of spam detection. Supervised learning requires a labeled dataset to train a model. Unsupervised learning has been suggested to overcome this problem. In fact, most research that uses supervised learning technique is based on pseudo fake reviews rather than real fake reviews [13]. Unsupervised learning approaches can be used for more ideas related to spam review discoveries as illustrated in Table 2.  [14] They have improved the use of Review skeptic, to be also used in spam detection from non-hotel reviews by adding reviewer behavior and time criteria to it, Known as Review Alarm.
Review skeptic is automated tool that uses text-related criteria for defining hotel spam reviews.
[15] They proposed a framework combining LSTM and Spam reviews can be identified using an LSTM-autoencoder.
Instead of assigning a class label to a review, the goal is to teach a model to learn the patterns of real reviews from the specifics of the review's textual content. Therefore, if the new review contains representational words different from those learned in the training model that could be considered an anomaly from what it has learned, the system considers them spam reviews.
The model uses One Hot embedding to learn real review patterns and then calculate the reconstruct ion loss and cluster them by EM into spam or real review.
[16] Predictive capabilities of review, reviewer, and product vector representations will be exploited by applying Doc2 and Node2 algorithms to a raw spam review dataSet. To distinguish between fraudulent and genuine reviews, give each one a separate vector representation.
In order to create a classifier for spam review identification, the results from both steps are aggregated and fed into the logistic regression algorithm.
The Node2vec algorithm uses review metadata to create an underlying reviewer-product network that can improve the vector representation of each reviewer and product. At the same time, Doc2vec is used for generating document embedding from their textual content.
[13] They applied the collective classification algorithm MHCC over the users-reviews-IP addresses' heterogeneous network to detect spam reviews. As there are likely to be many phony reviews buried in the unlabeled collection, the classifier could be confused, hence MHCC views unlabelled/unfiltered reviews as negative data. As a result, they tried to transform MHCC into a model of Collective PU learning. (CPU).
Only during initialization does the CPU model treat unlabeled data as negative. Classification results are evaluated and a reliable positive and negative state is generated based on the trained classifiers after the original classifier is known.
[17] The system aims to reach high detection accuracy by using only a small number of positive labeled data sets and many unlabeled data. As well as, they used behavior density to improve the detection accuracy by doing a secondary check for spam reviews.
They used PU learning combined with behavior density to prevent users from spreading fake reviews in the App store.
[18] By using relational data (review-user-product graph) and metadata (behavioral and textual data) and building a relationship between them, it aims to consider the challenge of spam detection as network classification task to uncover spammers, as well as products targeted by spam. In addition, the system can work as semi-supervised fashion by accepting a small set of labeled data, this version called SpEagle+. As well as SpLite version was launched that aims to reduce computational load, by relying on review features rather than the user and product features.
Metadata (such as labels, timestamps, and review content) supports network classification, as spam detection guides by extracting features from reviews, which are subsequently converted into an anti-spam score for inclusion in class priors.

Sentiment analysis
As a text-analysis technique, sentiment analysis aims to discover people's emotional polarity in the document as a whole (such as a positive or negative opinion). In addition to paragraph, sentence, or clause. It is now a common social media analysis tool carried out by companies, marketers, and political analysts [19]. Sentiment analysis has many types, such as Fine-grained, Emotion detection, Aspect-based, and Multilingual sentiment analysis. This section explains the research papers that are related to Aspect-based Sentiment analysis as shown in Table 3.

Evaluation of mobile apps security
Open-source nature of Android makes it the most popular smartphone operating system. Static analysis, dynamic analysis, and hybrid analysis are all methods used to check for Android security flaws.
Static analysis can't catch exploits being used in the wild. During runtime, data flows can be inspected to get around this limitation [3].
Static and dynamic analysis are combined in hybrid analysis. Using this technique, dynamic analysis data can be included into a static analysis program [25]. Table 4 [25] presents a comparison of static, dynamic, and hybrid analytic methods. These methods rely on app functionality and must be installed first. We, on the other hand, aim to halt the installation process through the analysis of user reviews. Intriguing results suggest that customer reviews are beneficial in understanding customer sentiments through machine learning techniques methods. In order to notify programmers exactly where to enhance across updates, this information must be extracted and described efficiently from reviews [3]. The app's security suffers greatly when it is updated. There have been very few studies done on the effectiveness of applications in terms of security.
Evaluation of mobile apps based on reviews. This part explains the research papers related to the analysis of user reviews for security evaluation. An overview of previous studies and dataset used, methods applied, and analysis results in Table 5. Table 5. Studies of review-based app evaluation

Results/Findings Method and Tools Dataset Details No
Experiments showed significant improvement against Independent Logistic Regression as a baseline method.

Independent Logistic
Regression is used as a baseline.
36,464 comments from 3,174 apps. [26] In comparison to other cutting-edge approaches, the results of the experiments demonstrated a 6-7% gain in performance.
Crowdsourcing through Two-Coin for client Ranking-SVM.
First dataset contains 6,526 apps. Second dataset contains 6,257 apps [27] As compared to the other approaches, AUTOREB excels by a large margin with 51.36% in accuracy.
multi-class SVM with linear kernel.
Dataset of 19,413 reviews from 3,174 apps. [28] A user survey indicates the usefulness and feasibility of the summarization of SRR-Miner.
Vader Sentiment Analyzer, Stanford Parser7. 64789 reviews from 17 mobile apps. [4] According to FairPlay, 75% of the malicious programs have been found to involve in search rank fraud.
MLP, DT, RF. over 87K apps, 2.9M reviews, and 2.4M reviewers. [29] Only 23% of applications had a reputation larger than 0.5, according to the findings. Naive Bayes classifier 13 apps, 1050 security related reviews, 7,835,322 functionality-related reviews. [3] According to the findings, average ratings aren't a valid ranking system when compared to SERS.

Details of 35 Apps. [30]
LR got the highest accuracy among other algorithms.
812,899 user reviews of 200 apps within 10 app categories. [31] Authors in [26] showed that Comments with Security/Privacy Issues (CSPI) must first be recognized to eliminate all those irrelevant comments to expose the issues related to an app's security/privacy. This paper presents a label system illustrating the "What" and "When" of the occurrence of an observed CSPI. A CSPI Detection with Comment Expansion (CDCE) approach is proposed, then a multi-label supervised learning technique is applied to classify diverse kinds of CSPI.
User comments aggregation treated as a crowdsourcing challenge for inferring security risks is [27]. User feedback may be used to create a new two-stage model that automatically ranks app hazards based on latent security labels.
Authors in [28] has developed the AUTOREB framework, which uses ML algorithms to automatically assess if the app has security-related behaviors from other users' experiences. Sort user evaluations according to four distinct security-related behaviors.
To make predictions about app-level security issues, it employs crowdsourcing.
For extracting reviews, [4] suggest a Security-Related Review Miner instead of utilizing ML techniques (SRR-Miner). To begin, it extracts security-related review clauses using a keyword-based technique. Using established semantic patterns, it then pulls out phrases that reflect bad behavior, attributes, and viewpoints. It uses triples to sum up security issues as well as user sentiment.
On the other hand, the proposed FairPlay framework [29] organizes the study into the following 4 modules to define malware and search rank fraud targets in Google Play. Moduls include the Co-Review Graph (CoReG), the Review Feedback (RF) and IRR/JH Relation modules. Several features are generated by each module and then sent into a classifier to be trained. As well as the average rating, the total number of downloads, and the number of reviews, FairPlay makes use of these more general features.
Authors in [3] provides a framework called CIAA-RepDroid, a fine-grained security-related reputation based on security-related sentiment analysis and probabilistic classification model. CIAA-RepDroid breaks down reputation into reputations of confidentiality, integrity, authentication, and availability.
In order to grade security claims, the SERS ranking scheme [30] proposes to use evidence-based security-related ranking. Static and sentiment analysis are both tools used by the authors. Sentiments about confidentiality are tallied. As a result, they obtain a high app rating, indicating that users have confidence that the app will not divulge any sensitive data.
Mobile App Reviews Summarization (MARS) was introduced by authors in [31] as a mechanism for summarizing reviews and extracting privacy concerns. Their mechanism has a precision of 94.84%, a recall of 91.30%, and an F-score of 92.79%. Privacy and security are treated as keywords in this paper, and the trustworthiness of apps is determined by whether or not they pose a threat to privacy.

Proposal
As our dependency on smartphones rises, so ensures our experience to security threats. Hence, the security level of apps downloaded on our smartphones must be a priority for us because Applications represent the largest security and privacy risk to a device and user's data [32]. For this purpose, users tend to evaluate the app's security level primitively by using some risk indicators such as the developer's reputation, the number of downloads of the application, the app rating, and the user reviews. But, since it is common for these indicators to be manipulated and Fabricated, users can not consider it trustworthy or sufficient to trust a specific app. This is where our work comes in. The proposed framework aims to produce a helpful tool to assess the risk of android apps in google play by identifying the security issues in apps based on the sentiment analysis of genuine user reviews.

Methodology
The study is constructed from seven steps: Step 1: The collection of user reviews, after search and discussion we concluded to use the dataset from [32][33][34].
Step 2: Apply some preprocessing on the dataset like removing irrelevant and redundant information present or noisy and unreliable data to make it suitable and reliable for further analysis.
Step 3: Detect and exclude spam reviews by analyzing the list of behavioral and textual features of the review and the application using the review content, timestamp and rating associated with each review.
Step 4: We will start filtering user reviews to extract only reviews about securityrelated based on a list of keywords from two research [35][36].
Step 5: Apply sentiment analysis on filtered reviews.
Step 6: Categorize the reviews into many security aspects. by evaluating the distribution of apparition of security-related keywords in each security aspects.
Step 7: Deliver an assessment of each security aspect of the app and a global assessment for the app as a whole.

Conclusion
Our study is useful to users who are willing to install android apps and for developers interested in making an app better. It helps to Increase the awareness of users to combat suspicious apps, Boost Google Play's security and cut down on the number of attacks. Provide a comprehensive summary of security issues to users, as well as user-generated feedback regarding the app's vulnerabilities and misbehaviors, to developers.