Examining Users’ Concerns while Using Mobile Learning Apps

— Mobile learning applications (apps) are increasingly and widely adopted for learning purposes and educational content delivery globally, especially with the massive means of accessing the internet done majorly on mobile handheld devices. Users often give feedback on use, experience, and general satisfaction via the reviews and ratings provided on the various distribution platforms. The massive information offered through the reviews presents an opportunity to derive valuable insights. These insights can be utilized for multiple purposes by different stakeholders of these learning apps. This large volume of reviews creates significant information overload and reading through could be time-consuming. We analyze these user reviews by combining text mining techniques of topic modeling using Latent Dirichlet Algorithm (LDA). These techniques identify inherent topics in the reviews and variables of user satisfaction while engaging the apps. The content analysis reveals the importance of videos (multimedia) and downloads as integral parts of learning apps. The thematic analysis identifies features under these headings – financial, technical and design. Going by the values derived from these integral components and features of learning apps, it is worthwhile for app developers to improve on these to create a rewarding learning experience.


Introduction
Smartphones, being a form of handheld devices, have mobile apps that are installable according to user preference; mobile apps are software applications developed for mobile devices. There are different categories of mobile apps downloadable from the respective distribution platforms, Google play store for android phones and App Store for iOS devices. Users' feedbacks on these apps appear as reviews on the different platforms for the developers and other interested stakeholders. Hence, user reviews and ratings are essential variables in the development and upgrade of mobile apps. When requests made through user reviews are addressed with utmost satisfaction, the app rating will be equally high [1][2][3]. Online store reviews are free and fast crowd feedback mechanisms that developers can use as a backlog for the development process [2].
Mobile learning is a two-way communication process between educators and students via various learning facilitators and mobile devices, not confined to time and space barriers [4]. It has been an effective way to support and deliver knowledge to distance learners and part-time students. Mobile learning, characterized by its personalized and interactive approach to learning, provides "learning at your own pace" as it maximizes the use of all forms of handheld (mobile) devices. These apps differ in their context and scope; some are multi-disciplinary, while some are specific in their content. MOOC (Massive open online courses) primarily have web platforms accompanied by their mobile app versions, presenting a comprehensive coverage for users. For institutions that have a MOOC platform, the expected benefits include extending the institution's reach to a broader audience, building, maintaining the institution's brand, and using MOOCs as a potential source of revenue [5].
User feedback is essential to the continuous development, maintenance, and evolution of software products. User reviews are feedback or requests which can be (i) bugs or issues that need to be fixed [6], (ii) summaries of the user experience with specific features [7] , (iii) requests for enhancements [8], and (iv) ideas for new features [6], [9]. Maleej et al. in [10], [11] also stated four significant types of user reviews in bug reports, feature requests, user experiences, and ratings. User satisfaction could also be about how users are comfortable using the application, how they enjoyed exploring the application, and their overall satisfaction with it [12]. User reviews are significant in the life of an app; developers depend on these reviews to obtain feedback from users, which would inform the next stage of the app. These reviews are voluminous. As reported in an empirical study by [6], mobile apps received approximately 23 reviews per day and those popular apps, such as Facebook, received on average 4,275 reviews per day. These reviews appear in the form of unstructured text that is difficult to parse and analyze. Thus, developers and analysts are faced with the enormous task of reading large amounts of textual data to understand users. In the opinion of [13], a classification model for user reviews could allow developers to (i) filter relevant information in user reviews, (ii) understand more quickly the software maintenance tasks to apply, and, consequently, (iii) be more responsive to users' requests. Specifically, topic models can be used to cluster sentences, for example, grouping together sentences of an identified category related to the same functionality of a given app.
User satisfaction is an essential context in information systems as it checks for the positive impact on users. User satisfaction with other metrics such as system use, organizational performance, and user decisional performance measures the effectiveness of information systems [14]. Extensive research has shown that user satisfaction with apps has a positive impact on customer loyalty, intention to continue to use the apps, willingness to pay for and recommend the app to others for use [5], [15][16][17][18]. As highlighted in [19], perceived benefit, which entails perceived enjoyment and perceived usefulness, has a significantly positive relationship with user satisfaction.
The large volume of online reviews results in significant information overload; reading through all the reviews is time-consuming while making it difficult to draw conclusions and make informed decisions [20]. The reviews and ratings are significant to improving software quality and addressing missing application features that aid user experience. It is challenging to discover underlying topics from many online reviews and rate them against these topics [21]. Hence, app developers are challenged with having a first glance knowledge of users' opinions from the massive text available in the app stores. Previous works have covered several areas using different content and text analysis techniques, sentiment analysis, and deriving insights from user reviews for stakeholders. The use of topic modeling method in supplementing these techniques will discover features mentioned in user reviews of learning apps for software developers towards software maintenance and evolution. Topic modeling is a type of statistical modeling for finding the abstract "topics" that occur in a collection of documents. Latent Dirichlet Allocation (LDA) is an example of a topic model technique used to classify text in a document to topics. It builds a topic per document model and words per topic model, modeled as Dirichlet distributions. Topic modeling as a statistical method will be applied to the extracted user reviews to classify the texts.
Previous research in learning technologies and interventions have investigated the adoption and usage of mobile technologies for learning [22][23][24][25] and analyzed the content of mobile learning applications especially for children [23], [26]. Studies have also identified the absence of scientific standards and benchmarks in app design and development [27][28][29]. While extensive research has been carried out and is ongoing, there is little and insufficient research evidence to confirm the effectiveness and usefulness of these apps. At the same time, to the best of our knowledge, there is none that reviews of MOOCs apps. Analyzing user reviews will provide actionable insights into the desirable features, benefits, issues, and limitations of these apps. This paper evaluates user reviews of 10 top MOOCs apps on Google Play and App Store. We used topic modeling to gain insights into the desirable features of these apps that will influence usage and determine the effectiveness of these apps. We analyzed the reviews classified as "negative" based on the user rating. We conclude with recommendations and suggestions for improvement for future updates and design.

Methodology
In this section, we discuss the various approaches adopted in developing the classification model. We extracted the reviews through a Python-based web scrapping tool. These sets of reviews became the input data for this study. The user reviews were preprocessed to include stop word removal, lemmatization, stemming, part-of-speech tagging (POS), and phrase modeling (n-grams). Subsequently, we transformed the review texts to a standard form in lower case (case folding) and removed all nontext reviews, e.g., emoticons and symbols, etc. Upon completion of the preprocessing stages, the texts are put through the LDA to identify inherent topics that would determine the classification of the subjects in the reviews. Using LDA as the topic modeling algorithm in this work is based on the premise that the review dataset serves as training data for the Dirichlet distribution in the document-topic distributions. A sample is derived from the Dirichlet distribution, and progress in the analysis is made hereafter. However, since we have a dataset, it allows an efficient and easy-to-use topic model. Also, hidden semantic structures in the body of the review document are discovered with the LDA generated through unsupervised learning. The LDA also comes with its examining and evaluation metric, which is the coherence score calculation.
Topic Coherence scores a single topic by measuring the degree of semantic similarity between high-scoring words in the topic. High coherence scores show that there is an increased coherence (semantic similarity) within the words categorized in a topic [30], [31]. Topics derived in the processing stage are visualized in diagrams and charts. Python modules such as Word Cloud and pyLDAvis in the Genism package help with the visualizations. Word Cloud gives a word-frequency distribution, where the words with the highest frequency show in big, bold letters in the entire corpus. The pyLDAvis module also provides an inter-topic distance map which gives a global topic overview by distributing the topics into four quadrants. These quadrants depict topic closeness or similarity through the words identified under each topic. Word Chart is another visualization that gives the top 10 common words in a bar chart with the frequency of occurrence. Figure 1 shows the methodology workflow discussed above.

Results
The data used in this work were extracted from Google Play Store and Apple Appstore (Nigeria). A total of 19709 reviews were extracted from the following apps: Udemy, Coursera, eDx, Khanacademy, Pluralsight, Linkedinlearning, Lynda, Skillshare. These apps are highly rated (top learning apps) in both play stores. The mobile apps selected for use in this work equally exist as Massive Open Online Courses (MOOCs) platforms. Only English reviews were considered in this study. The preprocessing task resulted in 4284 reviews for further analysis. A sample of the extracted review features is presented in Table 1. A screenshot of the result from the preprocessing stage -case folding, stemming, and lemmatization is presented in Figure 2. Each record in the dataset was labeled using the star ratings given on the apps; hence, star ratings "1" and "2" represent negative reviews, "3" represent neutral or unbiased reviews, while "4" and "5" ratings represent positive reviews. This labeling approach is also adopted by [32][33][34]. By count, there were 14005 positive, 4285 negative, and 1419 neutral reviews. Since our research focuses on identifying the variables of user satisfaction, the negative reviews are of importance in achieving this. Hence, the negative reviews are further analyzed for sentiments and topics to identify user satisfaction variables. We run the LDA model on the preprocessed negative reviews dataset. For optimization, the number of topics is set to 5 with an optimal coherence score of "0.54". The LDA model is built with the variables stated and set the number of topics to 5. The results of the LDA modeling highlight the keywords that are similar enough to be classified under the same topic. The topic naming is done manually by inferring from the keywords ranked together. From the results, we have topics numbered as 0,1,2,3,4; the numbers before the words indicate the weight such words hold within the topic classification. Figure 3 gives the results of the LDA topic model showing the keywords that make up each topic and assigned weight within the topic.

Fig. 3. Screenshot of identified topics and keywords
Visualizing a document corpus is valuable for graphically illustrating and generating images from large datasets. The identified topics are visualized, and the visualization for Topic 1 is presented in Figure 4, while Figure 5 gives the Word cloud visualization of frequent tokens the overall negative review dataset.

Discussion
The significance of this work focuses on extracting useful information from a large body of text. The large body of text being user reviews submitted in mobile digital distribution platforms and the mobile applications considered are mobile learning apps. User feedback contains inherent insights within the body of text in the different app distribution platforms important to app developers and vendors. Analyzing these reviews will provide requirements for targeted and focused app improvement and modification. Learning app users value "videos" while engaging content. Research has shown the importance of videos (multimedia) as an effective information delivery and educational tool during learning [35][36][37]. Videos allow for easy memory recall and knowledge retention, facilitate ubiquitous learning, have a wider reach and acceptance, and cater to different learners. Our research justifies this claim as the highest occurring token in the reviews is "video," followed closely by "work" (not functional), "download," and "load. Some of the challenges identified by users include: • small video size (aspect ratio) -"…The only issue with the app is that video does not go full screen. Unreasonable top big header remains and does not stretch edge to edge". "…Wish full screen used all the screen space on edge-to-edge screens".
• Going by the value derived from this integral component of learning apps, it is worthwhile for app developers to improve on this to create a rewarding learning experience. The identified topics in the reviews after analysis, modeling, and processing show different variables related to user satisfaction as analyzed using LDA. It is crucial to infer from and utilize the insights identified in the reviews given by current and past users of the apps; the information derived can help retain existing users and attract potential users when the negative issues are fixed.
Our analysis of negative reviews identified desirable features to aid user satisfaction while using mobile learning apps. These features vary along the lines of payments, experience, features etc., and are consistent with the findings in [34], [38]. The topics are discussed under the following broader headings -Financial, Technical and Design issues:

Financial issues
The reviews identified under this topic speak to challenges and feedback on payment facilities offered on the apps, such as subscription and in-app purchases. It ranges from subscription made and not reflecting, in-app purchases of and payment for specific and introductory courses on the apps which are expensive, card payment issues, exchange rates, country/location-specific payment restrictions. Users give some feedback when payment is involved especially, in cases where disparities are observed. For any and every payment made, mobile app users expect to receive total value for all these. Sample comments include: • "…Charged my account for Premium membership after my trial period without giving any prior notices. I barely used the app and haven't taken any more classes since I first signed up………". • " …I use this app to make easy and secure payments. But

Technical issues -non-functional features
App users gave feedback on new features added or modification of existing features not functioning as expected or not suitable for the purpose intended. Typically, all new features should be tested and checked for best fit, suitability, and interoperability. Issues would have been raised and fixed, if any, at the point of testing. However, due to time, cost, and other constraints, some of these new features are not duly checked before pushing out to the public for use resulting in complaints on the distribution platforms. Users request "fix" features on the apps via the reviews submitted. These requests speak to how the features will aid or enhance user satisfaction on the apps, require some modifications or reset. Mobile app stakeholders harvest these fix requests on features to guide developers and app vendors in their evolution and maintenance practices.
Some other reviews are aspect/feature-based, highlighting features that make the user experience not exciting or smooth. Here, users state their challenges with specific or general parts while using the apps, which hinders total contentment with the app's behavior. From these reviews, app stakeholders can identify these aspects to work on them accordingly. Sample comments include: • "…Cannot login with password via Android app. Website login works…" • "I can't open subjects up after an update.. can't download a whole subject after an update…" • "…The app is not working since last update. It simply opens, but any keys doesn't respond. Plz help me out…" • "…I can't open subjects up after an update.. can't download a whole subject after an update…"

Design issues -media defects
Users in their reviews also comment on some aspects of the mobile learning apps related to media such as video, audio, resolution, size, video orientation (landscape or portrait) based on device compatibility and aspect ratio on the mobile learning apps. With this information, app owners understand and identify the specific media issues that users face while using the apps. While some might be a function of the type of device that users have and interact with the apps on, some might be the responsibility of the app developers, which requires some further attention and fix. Sample comments include:

Conclusion
This work contributes to the aspect of information system that seeks to improve user satisfaction. The results from this study are significant for learning apps development and maintenance. End-users widely use mobile apps and give feedback in the form of reviews; hence, the need to infer how to utilize this feedback for further improvements.
The developed model can be deployed to other domains with mobile app platforms and a high influx of user reviews. This act would help identify and derive substantial insights that can positively influence app maintenance and evolution. Consequently, future research can examine the potential relationships between the emotions and tones of the review content and the numerical user ratings and determine the extent of the impact of these variables (design, cost, and technical issues) on continuous usage and retention of the apps.

Authors
Senanu R. Okuboyejo is an Assistant Professor of MIS in the Department of Computer and Information Science, Covenant University, Nigeria. Her research interests include Health Informatics, Technology Adoption and Use in LDCs, and Applied Machine Learning. She has publications spanning these interests. Dr. Okuboyejo is a member of the Association of Information Systems (AIS).
Ooreofeoluwa I. Koyejo graduated with a first-class bachelor's and master's degrees in Management Information System (MIS) from Covenant University. She is an information security practitioner currently working as the information security manager with a fast-growing startup company in Lagos, Nigeria, where she maintains and manages the Information Security Management System (ISO27001 standard) within the organization.