Performing Usability Evaluation on Multi-Platform Based Application for Efficiency, Effectiveness and Satisfaction Enhancement

— Usability evaluation is a crucial activity to contribute to a higher standard of user experience. Without proper usability evaluation, the risk of delivering a dissatisfying product is likely to increase. This study conducted a usability evaluation on multi-platform applications according to usability attributes defined in ISO 9241-11 standard: efficiency, effectiveness, and satisfaction. The evaluation was conducted using the think-aloud protocol on both platforms, web and mobile applications, and fifty participants. The results discovered the suitability of think-aloud protocol on multi-platform usability evaluation and revealed some usability issues. Therefore, to further enhance the users' satisfaction, efficiency, and effectiveness, a set of improvement recommendations were proposed. Insights from this study provide a standard basis on the usability evaluation procedure, which can guide the usability evaluation of multi-platform applications.


Introduction
Nowadays, where almost everything is digitized, technology plays a significant fundamental role in amplifying the way humankind lives their life. Technology not only advances to alleviate human beings, but it can also promote humanity to human beings if correctly embraced. This study aims to perform a usability evaluation on a multi-platform-based application that runs on the Web and Android platform to enhance its efficiency, effectiveness, and users' satisfaction. A multi-platform application refers to an application that runs on two platforms or more, which enable each of them to send and receive updates [1]. Multi-platform development focused on optimization by having the same code programmed among the versions of various platforms [2]. Among the benefits of multi-platform are cost and time saving, while the mobile type device is said to be more popular as they are more usable than others [3,4]. Multi-platform applications could be between web and mobile, hybrid, or interpreted web applications [5,6,7].
The application was developed to support the pet boarding system in terms of reservation, package management, and reporting. The evaluation was conducted on both platforms; the web and mobile application by examining seven main tasks of different scenarios and collecting important attribute such as task completion (number of people completed the task), time is taken to complete on each task, errors on performing the task and satisfaction ratings given by participants. A post-evaluation questionnaire was distributed, consisting of required demographic information and application usability feedback based on those main attributes: effectiveness, efficiency, and satisfaction level. The usability level can be identified for the application, and hidden problems obstructing its usability potential can be revealed. Previous research on a pet application was conducted [8], where a mobile application was developed to assist users in finding a temporary pet adopter. The application support location detection to find the nearest pet's medical assistance using a map. The application was described as easy to use and allows its users to leave ratings. A study [9] involved a mobile app development for a shelter to manage pet lost and found and promote pet adoption on stray animals. Another study that focuses on pet adoption, which is equipped with a pet product business, was also conducted [10]. A study conducted by [11] involved creating a system to carry out animal-related tasks more efficiently in a temporary animal shelter. However, none of the pet-related studies focused on evaluating the application's usability that was implemented on multi-platform. Therefore, the study aims to identify the aspects of consumer demands on multi-platform-based application in the context of pet boarding under different user categories to discover the usability issues by validating the effectiveness, efficiency, and user satisfaction level towards the application.

Literature Review
ISO 9241-11 standard described usability as an attribute of quality in use and as a result of perceived effectiveness, efficiency, and satisfaction [12]. The two most frequent usability characteristics are effectiveness (how reliably users can accomplish their objectives) and effectiveness (resources that can be utilized efficiently to accomplish that purpose [13]. There is a strong correlation between effectiveness and efficiency. The simpler the interface is, the more resources can be devoted to other tasks, which leads to task performance improvement. The study has noted a significant relationship between those two with user satisfaction, the beneficial outcome of pleasant interactions, including feelings of satisfaction. Assessing the usability involved measuring user interaction performance with the system, which involves efficiency, effectiveness, and error rate [14]. Since it is newly designed, the user satisfaction level is vital in how the application portrays; its first impression can affect its decision to put it into fair use. Hence, usability evaluation is an effective way to measure the applica-tion's performance and possibly search for improvement rooms. Furthermore, exploring various similar studies on how the usability assessment can be done helps decide to choose a suitable method. In this case, usability testing is the best approach as it is a commonly used technique that had been proven to be effective from various usability evaluation research studies as it encourages feedback from the end-user.
From various usability evaluation studies, the overall usability aspect differs for each study, where each brings a refreshing perspective of its' own. Usability is a particular attribute that influences the software system's consistency and should satisfy user requirements [15]. It complements user experience as usability focuses on the functional part while user experience is emphasized on the emotions stemmed from the aspects of the product, as explained in [16]. It considers user satisfaction as quality indicators and aims to improve user experience. In a way, it relates to work conducted [17], which utilized three attributes that are considered vital as a measurement of how a user can use software to complete a set of operations successfully. These attributes are efficiency, effectiveness, and user satisfaction. Besides, a mobile application's usability challenges include limited connectivity in certain areas, small screen size, small storage capability, and the unconventional input mechanism. Furthermore, [18] stated that usability should be approached from multiple points of view to become sensitized to elements that possibly affect the system usage. On the other hand, usability is characterized as how successful user concerns are addressed. [19]. Hence, instead of adopting the standardized usability model, many studies include attributes considered important in developing any application. For example, in the study conducted [20], usability is vital to increase application accuracy and reduce reaction time for several user activities. Overall, the various definition of usability stated above actually describes the advantages it brings indirectly. To summarize, there are plenty of benefits of implementing usability evaluation, such as reducing training effort, improving the product quality, and users' satisfaction.
In terms of usability attributes to pay attention to, various usability evaluation studies implemented standardized usability models, as presented in Table 1. Besides that, usability evaluation studies only implement attributes vital in the application-specific context without following any specific usability models. There are also usability evaluation studies that combined selected usability models and attributes vital to the dedicated application. The international organization of standardization came up with the ISO 9241-11 model, which consists of 3 attributes: a performance measure of completing tasks on time, efficiency in completing the task successfully, and satisfaction as users' acceptability.
Jakob Nielsen model consists of five attributes, which are learnability as in easy to learn, efficiency as in easy to finish a task or navigate, memorability as in easy to remember or reestablish, errors as in the low rate of error detection and satisfaction as in it is pleasant to use by the user.
Factors in the article journal include compliance with features specified as ISO software operability, efficiency or latency, ease of learning as in understandable, memorability as easy to remember, correctness in lieu to dealing with system failure, and esthetics as attractive. [19] 4 Errors, Understandability, Accessibility.
In addition to effectiveness, efficiency, and satisfaction, these additional factors were used to assess mobile applications and the visually impaired. [24] 5 Adjustability, Inability, Reliability and Satisfaction.
Factors include adjustability that refers to the degree of the user acceptability to the platform used to learn, inability as in a mobile learning platform, and the concept pleased the users. Reliability in a mobile learning platform means that the platform should not perform unexpected ways and satisfaction as in learner's satisfaction and the learner's assurance to perform tasks with any external activities via their mobile device. [21] 6 Effectiveness, Efficiency and Learnability.
The researchers decided to exclude satisfaction as an attribute and absorbed learnability to ensure a more parsimonious framework. Attributes learnability refers to how much learning a system is convenient for users. [25] 7 Effectiveness, Learnability, Efficiency, Memorability, Errors, Cognitive Load, Satisfaction, and Timeliness.
It is a combination of the PACMAD usability model developed by Harrison and a single attribute that was deemed vital in mobile learning, in which timeliness refers to whether students can receive teachers' messages quickly. PACMAN attributes such as learnability, efficiency, memorability, error, and satisfaction are absorbed from Jakob Nielsen's model, where all of them serve the same purpose. The cognitive load attribute relates to the number of perceptual abilities needed by the program when being utilized. [20]

Usability assessment
From various researches on various usability evaluation, there is more than one way to conduct the assessment. Usability evaluation can be conducted using various methods such as hallway usability testing, a cheap method in which a group of random people will be selected to do a set of tasks. Users will interact with the system while thinking aloud. It is a commonly used technique that helped numerous usability evaluation research [16,26]. The expert review involves usability experts using a set of guidelines to measure essential usability criteria, while an automated expert review is a similar procedure of expert review but in an automated way [16]. Another assessment method is inspection, where users act as an observer while experts do evalu-ation and testing. Inspection can be done by a heuristic evaluation, which utilizes the usability principle to implement. It is cheap or conducted by a cognitive walkthrough based on user interface evaluation by experts that consider end-user opinions [22,23]. Cognitive walkthrough is similar to heuristic evaluation because experts do it, but it does not necessarily require a whole system. A system prototype is just adequate. However, a pluralistic walkthrough [27,28,29] is a method for a team to investigate the system's paper prototype with the users' focus group to discuss system interface issues [26]. [16] however, highlighted the non-face-to-face assessment by suggesting remote usability testing. This method separates test conductor and user and is categorized by the synchronous time such as remote application sharing software or asynchronous time where data will be collected from logging user's activities. Furthermore, there are a different kind of data gathering techniques can be utilized to extract the outcomes of the evaluation which can either be test monitoring, direct recording, a think-aloud method which require the user to talk aloud while using the system or performing the particular task as explained in [23]. On top of that, there are also other methods of conducting usability testing, such as Metrics for Usability Standards in Computing by (MUSiC), which was introduced in 1998 [30], Software Usability Measurement Inventory (SUMI). In 1992, Diagnostic Recorder for Usability Measurement (DRUM) by [31] was introduced.

Methodology
The usability evaluation was conducted on both platforms, web and mobile platforms, measuring the efficiency, effectiveness, and satisfaction outlined in ISO 9241-11. The usability assessment framework for a multi-platform application used in this study is presented in figure 1.

Data collection
The methodology involved hiring the test participants, preparing a test plan, and finalizing the test scenario according to both platforms' set of tasks. Necessary technical preparation was made, which include device and site set up. Data was collected using pre-testing, post-testing questionnaire, observation technique, and think-aloud protocol with fifty participants. A pre-testing questionnaire was used to collect demographic data, whilst the post-testing questionnaire was equipped with 15 questions to identify the participants' satisfaction level. The questionnaire's findings were objectively analyzed, interpreted as a mean rate of satisfaction calculation, while the outcome of the think-aloud protocol was evaluated and categorized according to the right usability category as recommended [12] in order to determine efficiency and effectiveness and provide relevant recommendations.

Usability evaluation session
The session was carried out in a pet boarder's place with a total of 50 participants. The participants were randomly selected from the existing pet boarder's customers and non-customers with ages ranging from 18 to 64. An initial demographic background was surveyed before proceed with the evaluation task. Participants with the following criteria were selected: a pet boarder who owns and offers boarding service, regular visitors to the pet store, a customer to pet border, and those pets. A set of tasks for the different scenario was identified for the evaluation. There were two types of scenarios representing the pet border (web-based) and customer role (mobile-based). A total of 8 task scenarios were prepared for the pet border, and 7 for the customer's role. All participants will act in different roles and access the application on both platforms with two appointed facilitators' assistance. All evaluations used the researchers' laptops and mobile devices. Participants were allocated 4 minutes to accomplish each task scenario. The task scenarios for each role are portrayed in Table 2. The user clicks the signup button and will be redirected to the registration page. User input username, name, email address, system role, password, confirm password, check the term, and policy agreement checkbox. Finally, click the Register button. 2 User input username, password, and click sign in button.

3
User clicks the create announcement menu section, input the message, and click the create button.
The user clicks the announcement menu section and the view list of the announcement.

4
User clicks the create notification menu section, input the message, and click create button.
Users click the notification menu section, and the list of all notifications posted will be listed.

5
User clicks the create package menu section, input the package name, slot available, price, reward points, upload multiple pictures, and add one addon service.
User clicks the package menu section whereby all packages will be listed, choose one package, click the book now button, input pet name, select start date and end date, select available addon service checkbox button and click create button.

6
User click booking menu whereby all booking requests made by all customers will be listed, select one booking request and click the update button, view booking status information, select approved from the dropdown list, and click the update button.
User view list of booking requests made by clicking on the booking request menu section, and the list of all booking requests made by the user is listed.

7
The user clicks the report card menu whereby all booking requests made by each customer will be listed, choose one booking request and click the create report button, input notes, upload media files by clicking the add media icon or choose file, select a picture and click the insert button.
The user clicks the report card menu section whereby all report card submitted is displayed; the user can view the pet name and report date, click to view the notes button, and click to view the media files button.

8
User clicks the user menu whereby all users will be listed, select one user, click the update button, scroll down to the user status information, the approved user from the dropdown list, and click the update button.

N/A
For collecting numerical data, a survey has been conducted as a second evaluation. There were two sets of distributed questionnaires, consisting of the respondent background information and application review from the end-user. The usability questionnaires include the following: 1) I think this application is easy to use, 2) It is easy to find the information needed, 3) I think this application has a pleasant interface, 4) The font utilized by the application is pleasant, 5) This application can speed up task completion, 6) This application provides error messages that inform how to solve the problems, 7) I find the steps to execute tasks in this application is simple, 8) This application provides clear and descriptive information, 9) I feel that it is easy to navigate through the pages, 10) This application has no broken link/menu/page, 11) The application provides all the functions that I expected it to have, 12) This application needs to be improved, 13) I have the intention to use this application shortly, 14)I will recommend this application to others, and 15) Overall, I am satisfied with this application.

Results and Discussion
The study was carried out by issuing questionnaires to the 50 participants after each respondent had done the set of tasks required for the usability evaluation. The purpose of choosing participants from different backgrounds is to have a broader perspective of the study.

Fig. 2. Distribution of respondents by profession
As shown in Figure 2, respondents were categorized according to their occupation status, working in the private sector, government servant, self-employed, retiree, and student. The highest number was those who worked in the private sector reflected the ability to own pets, requiring cost and maintenance.
Task Completion Time: 100% of participants completed task 1 until four on both platforms, representing sign up, sign in, and create / view announcement. Only 41 out of 50 users can complete task 5, create a package by admin on the web (82.0%) with seven errors, while 90.0% can complete the view/apply package on mobile with four errors. No issues were reported on task 6 on mobile for viewing customer's booking request, but three errors have been identified on the web application for approve booking request scenario with a 94% completion rate. Task 7, which focuses on updating the report card, has encountered four errors and a 92% completion rate, while the mobile application has a 100% completion rate with zero error encountered. The final task, which is approved user registration, is only available on the web, and this task has triggered five errors with an 86.0% completion rate. This is because the user's

Self-Employed, 14%
Retired 6%, name did not appear in the list even after several pages reloads. Task completion time is a measure of the efficiency of the app. Table 3 displays the time taken by each user per task compared to the allowed 3 minutes for the task completion-more time spent on Web-based applications than mobile applications due to a more user-friendly look at mobile. The time allocated for each task was 4 minutes. However, not all the tasks can be completed within the allocated time: tasks 3, 5, 6, 7 for web-based applications and task 5 for mobile applications. The lowest completion time was spent on task 5, which was 4.7 minutes reflecting that the application lacks efficiency. Success rate of task completion: Task completion or success rate is a measure of an application's effectiveness in use. Table 3 shows that only tasks 1 -5 achieved a 100% completion rate while the rest scored below 94%, which is equal to 47 participants and below. Therefore, this application is not sufficient enough. The age might be the factor of this result as 3 participants were retirees with age above 55. Elderly participants took a long time getting familiar with the interface and understanding the steps even though they are smartphone users. Table 4 summarizes details on participants' error and feedback, classified according to ISO 9124-11 usability attributes; efficiency, effectiveness, and satisfaction. A set of recommendations was proposed to promote the application's enhancement, leading to higher efficiency and effectiveness levels. • "There is no problem, but I cannot differentiate between what to post an announcement or notification." Task 5 (Web) • "I am moderately satisfied." Task 5 (Mobile) • "The interface is too technical," "simple." • The information provided on the menu section and each package is unclear and written poorly." Satisfaction: Besides evaluating the applications using the prepared tasks list, participants must complete a usability questionnaire using a five-point Likert scale, ranging from 1 to 5, from strongly disagree to agree strongly. The questions were to evaluate the participants' satisfaction over the completed tasks performed earlier. Table 5 summarizes the findings.

Fig. 3. Overall Mean Analysis
Overall, most of the participants were satisfied with the application, with an average mean of 3.68. The agreed notch counts the agree and strongly disagree score. 70% of participants agreed that they were satisfied using the application on both platforms. The highest score was on the ease of use, with 76% agreement, followed by the application's ability to speed up participants' task with 74%, while the lowest score according to participants was on the page navigability, 40%. Due to several crucial errors during the evaluation, half of the respondents thought this application needs modification. Suggestions for modification have been identified in Table 4 to increase the users' satisfaction by further adopting the formal requirements technique [34]. The use of the think-aloud protocol was proven to help obtain a comprehensive understanding of usability issues [35] related to the user and multi-platform application interactions, where 12 crucial errors from 8 assigned tasks were successfully discussed.

Conclusion
This study has explored the potential use of the think-aloud protocol in determining the usability of the multi-platform framework to optimize its efficiency, effectiveness, and satisfaction. This study's proposed approach offers new insights in the usability evaluation context, particularly for mobile and multi-platform applications. The evaluation involved both platforms, web and mobile applications, and the users were required to evaluate the application in both platforms using a think-aloud protocol. This study involved measuring the success rate of task completion, task completion time, task errors, participants' feedback, and a satisfaction assessment through a questionnaire with the involvement of 50 participants. The results indicated that 100% task completion rate was only applied to task 1 until task 4 for both platforms, and an additional task 7 for mobile application. The remaining tasks discovered errors, and comments from participants were recorded. Therefore, this application seems to lack efficiency and effectiveness. However, satisfaction has a good score, with an average mean of 3.68. To further increase the satisfaction rate, efficiency, and effectiveness, the suggestion for improvements were made accordingly. It is believed that fixing the errors as highlighted by participants will likely increase the satisfaction, efficiency, and effectiveness of the application on both platforms. This study's limitation involved a biased sample that focuses on a specific pet border, and user satisfaction based on human needs behavior was not recorded. Due to a severe need for useful tools [36], future studies are expected to produce comparable outcomes after specific improvements were made on the user interface based on the results of this study [37,38] and present the model requirements for the reliable modeling of actual user behavior to extract usage patterns.

Authors
Nik Azlina Nik Ahmad is a lecturer at the Department of Software Engineering in Universiti Kuala Lumpur. She has experience working as a system analyst for a library system development. Her research interests lie in software testing, usability, user experience, requirements engineering, software maintenance, and mobile UX design. She is certified with Certified Professional for Requirements Engineering