A New Vision for Evaluating the Quality of E-Commerce Websites

—Quality has been established as a key factor in ensuring the success of E-commerce in attracting and retaining customers. To help in this, numerous software metrics and website quality models have been developed, with a correspondingly large literature. We provide a critical review of this literature and the state-of-the-art. Most of the wide ranges of E-commerce website evaluation models give emphasis on the web applications of the system, using techniques like feature inspection and collecting data about end-users’ opinion by questionnaires. However, this is in conflict with two fundamental pragmatic aspects of current web-sites. Web technologies evolve extremely fast, enabling sophisticated tools to be deployed and complex interactions to take place. Secondly, the life cycle of a website is also extremely fast: maintenance of a website is performed at a rate that is higher than that of other software products because of market pressure and lack of distribution barriers.


I. INTRODUCTION
Over 45 scholarly models of website quality have appeared in the last 10 years. A small sample of those studies had been tested on over 436,000 data points from 16,000 respondents. What this indicates is that the application and use of scholarly models of website quality is a very-well established discipline. However, many of these models have numerous factors and sub-factors, as well as unusually large measurement instruments that demands extra time for data collection and data analysis in each measurement phase, which are economically prohibitive to apply. Also, many of these models have not proven very robust, and exhibit low levels of reliability and validity. In this paper, we recommend a holistic model for Ecommerce website evaluation, using Bayesian Belief Networks, as alternative approaches to the single-issue models used at present. This model differs from questionnaire-based surveys approaches in that it uses a process aiming to limit subjectivity and frequent errors in similar surveys and provides a flexible way to define the quality of E-commerce websites, as users perceive it, in a short period of time.

II. THE IMPORTANCE OF QUALITY ON E-COMMERCE
The most experienced and successful E-commerce companies are beginning to realize that key determinants of success or failure are not merely web presence or low price but delivering on a high quality website. Recent research shows that price and promotion are no longer the main draws for customers to make a decision on a purchase. More sophisticated online customers would rather pay a higher price to a provider with high quality service [1].
According to reference [2], when consumers conduct a purchase across the border, they will have concern as to whether they will receive quality services from a "foreign" E-commerce website. It concludes that attention to quality is of paramount importance for E-commerce success.
Quality has been established as a key factor in ensuring the success of E-commerce in attracting and retaining customers [3]. To this end, it is necessary to define what constitutes a high-quality E-commerce website and a methodology for evaluating the quality of E-commerce websites [4].

III. OVERVIEW OF WEBSITE EVALUATION
The common issues found in the literature relating to website evaluation are quality (e.g. [5], [6], [7]), Web design (e.g. [8], [9]), and usability (e.g. [10], [11], [12], [13]). Researchers have adopted the Web quality concept from the quality of product or service (e.g. [6], [14]). For example, reference [5] adopted Kano's Model of Quality as a theoretical framework to evaluate the quality of websites. This model separated product and service quality into three levels according to customer expectations: expected, normal, and exciting. The entry level, "expected", refers to the minimum level of qualities, properties, or attributes that must exist for the system to function. These expectations are also known as the dissatisfiers because by themselves they are unable to fully satisfy a customer. The next higher level, "normal" identifies the "wants" or the satisfiers because they are the ones that customers will specify as though from a list. They can either satisfy or dissatisfy the customer depending on their presence or absence. The highest level, "exciting", as described by Kano, identifies the "wow" level qualities, properties, or attributes. These are also known as the "delighters" or "exciters" because they go well beyond anything the customer might imagine and ask for. These researchers believe that quality in a product or service is not what the provider or seller put into it, but what the client or customer receives from it. Thus, a website should try to satisfy its customers' needs in order to ensure repeat visits from them, and gain their loyalty.
In regard to Web design, reference [15] provided an Objects/Actions Interface (OAI) model for Website design. This encourages designers of a website to focus on analyzing the relationship between the task and Web interface. Reference [12] looked at problems in Web design from the perspective of network analysis. They suggested that care must be taken when designing the homepage, which is the entrance to the website. The authors of reference [16] suggested five major categories that should be considered when designing a website for a business: page loading, business content, navigation efficiency, and security and marketing/consumer focus. They argued that page loading is the most important factor in website design. Reference [11] suggested shifting the focus on evaluating Web design from individual pages to aggregated collections based upon Web directories, domains, and the entire site.
Undertaking a usability study usually needs high consumer or user involvement, and sometimes the study needs to be conducted in an experimental environment. The author of [17] and [18] provided guidelines and criteria to evaluate the usability of website design and suggested that every design project, including website development, should be subjected to usability testing and other validation methods. Reference [19] also suggested that Web pages should be designed for usability and understanding. However, a website with good usability cannot guarantee users' preference [20].

IV. MEASURING AND ANALYZING E-COMMERCE QUALITY
The measurement of quality in information technologies has been an issue of concern for a long period of time. This issue has had a great deal of attention from many researchers in the academic world ( [21] and [22]).
The authors of reference [23] established a simple classification for information systems, being either E-type or S-type. An S-type system is one that is completely and totally defined, and is required to be correct with respect to a mathematically defined specification. An E-type system, on the other hand, resolves to expectations of the system. An E-type system is correct when it satisfies the user expectations.
A classification of information systems in terms of its quality indicators, categorized quality to three perspectives: product, process and service [24]. Various studies related to the three perspectives have produced a number of measures for evaluating informational systems such as E-commerce websites. These include system usage [25], information value [26] and user satisfaction [27].
The diversity of these various measures was initially a cause for concern, so [27] attempted to synthesize them into a unified model. The Model of "Information Services Success" in [27] has been regarded by many authors as a major contribution [10] and has been the focus of several studies (e.g. [28]). Reference [29] proposed a modification of this model to include a "Service Quality" component. This modification was endorsed by [30] together with other modifications integrated to the updated Information Services Success Model [30].
Some researchers have highlighted the problem of inadequate measures for assessing the benefits of investments in Information Technology [10]. There is a considerable difficulty in measuring the quality of informational systems and there lies some difficulty in searching for appropriate metrics. Notwithstanding the literature review concerning the difficulty in developing measures, there is still a need for an indicator of the success of a company's E-commerce website. One possible indicator is that of user satisfaction. Various sources have argued that measuring satisfaction of users is useful as a surrogate indicator of information system quality. The utilization of user sat-isfaction for measuring quality is discussed in the next section.
V. USER SATISFACTION AS AN EFFECTIVE MEASURE User satisfaction gradually became a measure of software quality during the 1950s, 1960s, and 1970s ( [31]; [32]; [33]). User satisfaction is defined as "the sum of one's feelings or attitudes toward a variety of factors affecting that situation," e.g., computer use and adoption by end users [27].
Most studies until 1980 focused on the end user's satisfaction toward software developers; but one study squarely focused on the end user's satisfaction with the software itself [34]. Reference [35] produced one of the first studies to address a variety of software attributes such as software accuracy, timeliness, precision, reliability, currency, and flexibility.
Studies throughout the 1980s addressed user satisfaction with both designers and software ( [36]; [27]). The late 1980s marked a turning point with studies focusing entirely on user satisfaction with the software itself and attributes such as content, ease of use, and timeliness of the software [37].
A study of user satisfaction at IBM was based on reliability, capability, usability, installability, maintainability, performance, and documentation factors. Throughout the 1990s, IBM used a family of user satisfaction models called UPRIMD, UPRIMDA, CUPRIMDA, and CU-PRIMDSO, which referred variously to factors of capability, usability, performance, reliability, installability, maintainability, documentation, availability, service, and overall satisfaction [38].
User satisfaction, now commonly referred to as customer satisfaction, is no doubt related to earlier measures of software attributes, usability or user friendliness of software, and more recently, web quality. In E-commerce, interaction with the end-user is conducted through webbased applications including both server and client-side applications commonly referred to as a website. All user system communication is realized through the interface, so it is self evident that the quality of an E-commerce system is directly related to the quality of the user interaction experience [39].
Research efforts by [40] have directly tied the assessment of an E-commerce website to customer satisfaction. A survey carried out in [40] on 35 E-commerce companies in the United States identified three proponent methods for assessing quality. All three were actually an assessment of the satisfaction of the customer.
The three major assessment methods are text comments, categorized rating and overall rating. Text comment allows customers to write their own comments in 500 to 1000 characters on the "where"s and "why"s they did their shopping. Categorized rating is achieved with a questionnaire that asks online shoppers to rate a number of quality determinants using a scale of 1 to N where N is the best rating. The overall satisfaction rating uses an ordinal rating system with a scale of 1 to N where N is the best rating. User satisfaction is a combination of experience and perception [41]. It has been shown that several factors can positively or negative influence a user's experience and their perception of a website experience [3].

VI. EVALUATING E-COMMERCE WEBSITE: A REVIEW OF
EVALUATION CRITERIA Website quality models -appearing in the late 1990s, following the user satisfaction movement -appeared as important measures of software quality [42]. One of the first models of website quality identified background, image size, sound file display, and celebrity endorsement as important factors of software quality [43]. The web assessment method or WAM quickly followed with quality factors of external bundling, generic services, customer specific services, and emotional experience [44]. In what promised to be the most prominent web quality model, attitude toward the site had quality factors of, informativeness, and entertainment [45]. The next major model was the e-satisfaction model with its five factors of convenience, product offerings, product information, website design, and financial security. The website quality model or WebQual for business school portals was based on factors of ease-of-use, experience, information, and communication and integration. An adaptation of the service quality or ServQual model, WebQual 2.0 measured quality factors such as tangibles, reliability, responsiveness, assurance, and empathy [46].
Although some researchers have tried to provide ways of evaluating E-commerce website specifically (e.g. [47]), the selection of evaluation criteria still requires more theoretical justification. A selection of evaluation criteria is shown in Table 1; each of these has their points of strengths and weaknesses. Studies on E-commerce website quality also focus on more specific quality characteristics such as issues that warrant successful transactions [48], maximize the perceived trustworthiness [49], or ensure E-commerce website reliability [50].
Although, all the above factors affect the quality of Ecommerce websites and are prerequisites for their success, they are not the only ones that relate to E-commerce website quality. Reference [57] jumped to a conclusion that there is no fully integrated approach after their review of the literature. From these previous studies, it can be inferred that a global approach, such as the one discussed in this paper, is required combining all factors affecting quality.
A Critique of Current Approaches to Evaluating E-Commerce Website Early definitions of software quality included fitness for use, conformance to requirements, or degree to which software satisfied its specified requirements. These classical definitions of software quality imply one must gather customer requirements, develop a software product, and then determine how many quality requirements have been satisfied. Since the 1960s, increasingly sophisticated views of software quality have emerged: software size, software errors, software attributes, software defect models, software complexity, software reliability, user satisfaction, and website quality, to name a few. One of the earliest approaches for measuring software quality was the practice of quantifying and assessing attributes or characteristics of computer programs. Software attributes are traits, characteristics, features, or other properties of software products. Early studies attempted to enumerate, qualify, and quantify all of the attributes of software products. One such study [58] identified the following attributes: correctness, efficiency, flexibility, integrity, interoperability, maintainability, portability, reliability, reusability, testability, and usability.
Throughout the 1970s and 1980s the practice of measuring software attributes waned in favor of statistical models of software quality and reliability, which estimated defects and mean time to failure. However, during the 1990s, the practice of measuring software attributes began to take a foothold once again in the form of user satisfaction and website quality models. User satisfaction models were used to measure end user attitudes towards software products. One such model [53] measured user attitudes about the following attributes of software quality: usability, design, information, trust, and empathy.
Models of user satisfaction were eventually overtaken by models of website quality by the end of the 1990s. Basic website quality is defined as a "customer's judgment about the website's overall excellence or superiority, which is an attitude that comes from a comparison of expectations and perceived performance". Within the context of Ecommerce, website quality refers to "the extent to which a website facilitates efficient and effective shopping, purchasing, and delivery of products and services".
Most of the tools that have been developed for the assessment of E-commerce websites give emphasis on the web applications of the system and they are based on surveys [10]. This process provides significant results but demands extra time for data collection and data analysis in each measurement phase.
The work presented in this paper, differs from questionnaire-based surveys in that it uses a process aiming to limit subjectivity and frequent errors in similar surveys and provides a flexible way to define the quality of Ecommerce websites, as users perceive it, in a short period of time.
VII. PREDICITING E-COMMERCE QUALITY Given that the establishment of an E-commerce website is mainly a software development effort; there are several standards that apply in governing the quality of such development. According to reference [59], there seems to be an almost overwhelming abundance of quality standards that lead to a high level of cynicism and skepticism surrounding them and the eventual lack of use. Website developers need to use standards and best practices to ensure that websites are functional, accessible and interoperable. However many websites fail to achieve such goals and no standard can directly predict the quality a website under development is going to achieve.
The software behind any E-commerce website is, in essence, the virtual organization and business operation of that site. It is thus reasonable to conclude that the quality and evaluation methods of E-commerce systems will always be dependant on the quality of applications they contain and their ability to meet end-user requirements.
Past approaches concerning the quality of E-commerce websites emphasized the usability standards, using techniques like feature inspection methods and collecting data about end-users' opinion by questionnaires. These methods provide an important feedback and their results are of useful background for future work, however, they do not contribute directly to a dynamic model that enables forecasting [60].
In this paper, a model is proposed where the attributes are of a dynamic character. The results derived from the application of the proposed model are utilized to predict E-commerce website quality and to direct the development of a website to increase the quality measures, producing a site that gives an E-commerce experience with high service quality and user satisfaction. Furthermore, the results derived from its application are utilized for the model's constant improvement, thus contributing to a continuous evolvement and upgrading.

VIII. MOTIVATION FOR APPLYING BAYESIAN BELIEF
NETWORKS APPROACH Having a metric for quality makes matters easier for a business, as it can then measure whether quality is being attained. The authors of reference [28] define quality as "a relative value that is meaningful only when compared to postulated values that are defined by the user or by standards organizations." Several researchers such as [61] and [58] have since proposed holistic quality models incorporating a wide array of measures, in order to define a quality system. According to de reference [59], holistic models such as these often require substantial infrastructure in order to capture and analyze the data gathered. Consequently, many companies look for easier alternatives, such as a single measure of quality, as opposed to process-driven quality.
Reference [62] describes a Bayesian Belief Network (BBN) as a model that defines various events, the dependencies between them, and the conditional probabilities involved in those dependencies. The mathematical model on which Bayesian Belief Networks are based is the theorem developed by the mathematician and theologian, Thomas Bayes. The BBN is a special category of graphic models where nodes represent variables and the directed arrows represent the relations between them. Therefore, a BBN is a graphical network that describes the relations of probabilities between the variables [63]. This information can then be used to calculate the probabilities of various possible causes being the actual cause of an event.
A Bayesian network is used to model a domain containing uncertainty in some manner. The technology with which a system handles uncertain information forms is a crucial component of its overall performance. The technologies for modeling uncertainty include Bayesian probability, Dempster-Shafer theory, Fuzzy Logic, and Certainty Factor. Bayesian probability uses probability theory to manage uncertainty by explicitly representing the conditional dependencies between the different knowledge components. It offers a language and calculus for reasoning about the beliefs in the presence of uncertainty. Prior probabilities are thus updated, after new events are observed to produce posterior probabilities. By repeating this process, the implications of multiple source of evidence can be calculated in a consistent way, and the uncertainties are exploited explicitly to reach an objective conclusion. A Bayesian Belief Network provides an intuitive graphical visualization of the knowledge including the interactions among the various sources of uncertainty.
A framework for assessing the qualities of an Ecommerce website is the essence of this paper. Now, the question which arises is: 'Can a Bayesian Belief Network be applied to anticipate the level of quality of the site and the factors behind that level of quality?' According to reference [64], in applying a Bayesian Belief Network, a single model can be used for both diagnostic and causal reasoning. That is, the same model can be used to reason from effects to causes and from causes to effects. This suggests that a Bayesian Belief Network could be used to systematically predict the qualities of an E-commerce website under development and to determine the reasons for the predicted quality.

IX. A PROTOTYPE BBN MODEL FOR E-COMMERCE
WEBSITE While there is insufficient space here to fully describe the development and execution of a BBN model here we have developed a prototype BBN to show the potential of BBNs and illustrate their useful properties. With this model, we should be able to show how assessments might be made.
The philosophy underlying the BBN model is the creation of a dynamic network that concentrates and exploits the knowledge gained from the analysis of data gathered during previous researches and that can also use its own results for future estimations. A graphical presentation of the network is illustrated in Figure 1.
The model uses nodes to represent the quality factors, characteristics and sub-characteristics of E-commerce websites. Each node is characterized by a set of possible states called evidence and is connected to its parent nodes by directed arrows. In figure 1 the node 'Quality' represents the E-commerce website quality as a whole and is characterized by three possible states (evidence): 'Yes, 'Perhaps', and 'No'. The parent nodes of 'Quality' are the nodes: 'Conceptual Reliability, 'Usability', and 'Representative Reliability'. These quality factors characterized by three possible states: 'positive, 'Neutral, and 'Negative. Each quality factor node is connected to the corresponding E-commerce websites quality characteristics, based on our previous research ( [65] and [13]). Finally, each of these quality characteristics is connected to a number of child nodes comprising the quality sub-characteristics of Ecommerce Websites.
The tool computes the component availability based on the information found in the Node Probability Table. This is a table for each node that holds all the possible combinations of this availability number, which can be interpreted as the initial value for the component. These availabilities can be updated as additional evidence is gathered so that the tool can re-compute the overall component availability based on new data.
In the Hugin tool, the Node List is used to enter evidence and retrieve beliefs. By doing this, the model helps in the forward-looking assessment of the probability that a website will be considered to be a quality site with a considerable level of confidence, and it traces backwards, looking for causes for the quality level the website is currently at.
Further, the Hugin tool provides two types of propagation: Sum and Max. The Sum normal propagation is the most commonly used propagation method where it updates all probabilities, distribution functions, and expected utilities of the discrete chance nodes respectively, according to entered evidence. In the Max normal propagation, the tool searches for states in the network belonging to the most probable configuration of all nodes in the network. If a state of a node belongs to the most probable configuration it is given the value 100. All other states are given the relative value of the probability of the most probable configuration they are found in comparison with the most probable configuration. When running this propagation on the default settings of the network, the results yielded a 100% "Perhaps of Quality". Figure-2

X. BBN MODEL USES
It is important to realize that any model is a simplification of reality. Therefore, the output of a BBN is also a simplification of reality. The design of BBN model was aimed to get useful output to aid developers to determine the quality of a website, and the factor elements that caused the website to reach its current state. The benefit and application expected from BBN model is to support the decision making process regarding the next developmental steps appropriate for the website.
The output of a BBN consists of prior probabilities for each state in each variable (quality factor). The idea is that a user enters probabilities for some of the variables, for instance P(Content Adequacy)=1.0. This information is then used together with the quantitative specification of the network to re-calculate all the other probabilities. Furthermore, probabilities other than 1.0 can be entered, so the user is able to enter information that is uncertain. Though the output of the network in itself is quantitative, the user can use this output to make qualitative statements based on the quantitative output.
Sometimes the output of a BBN contradicts what is expected from the given input. Contradicting output can always be traced back to either errors in the BBN, lack of input for the BBN, unrealistic input, confusion about terminology in the BBN or a mistake by the user.
In other cases the BBN will give neutral output i.e. the probabilities for each state in a certain variable are more or less equal. The cause is attributed to the lack of information in the BBN to favor any of the states or that the variable has no incoming arrows.
If the output is correct, the structure of the BBN can be used to find a proper argument for the probabilities of the variables. If, for instance, the BBN gives "unsatisfactory" quality for the website due to usability issues, the variable predecessors of Usability in the BBN and their predecessors can be examined to find out why the usability is at a low level. This analysis may also suggest solutions for problems. For example, a low level of Usability can be traced back to Navigability and Efficiency -any solutions for the low level of Usability will have to address the low levels of Navigability and Efficiency.
Though the ways in which a BBN can be used is unlimited, four types of usage strategies for BBN model have been identified:

A. Quality attributes prediction
In this type of use -as much information as possible is collected and put into BBN -BBN model can calculate all the variables that have not been entered. This can give an impression of the quality level and reveal problems, if any, in the website. For example, given that the user has the information in Table 2 about some observable nodes, the user then can plug this information into BBN model to predict the quality level and reveal problems, if any, in the website.
According to BBN model, the results provided a forecast, with a 100% probability that the site would be lacking of Conceptual Reliability, giving a 99.5% high level of Usability and an "unsatisfactory" Quality for the website.

B. Diagnostic Use
One of the possible uses of BBN model is as a diagnostic tool. When using BBN model in this way, the user is trying to find possible causes for problems. For example, Figure-4 shows that the Conceptual Reliability is at a low level. Using BBN model as a diagnosing tool, the user can find that Accuracy and Security are the causes of this problem -by moving Accuracy and Security to a positive state this will promote the Conceptual Reliability to a positive state as highlighted in Figure 4.

C. Impact Analysis
Another way to use BBN model is to evaluate the consequences of the future changes in the observable nodes on the intermediate nodes as well as the target node (Quality). To do so, the potential future states that the observable nodes are entered. The BBN model then calculates the intermediate nodes as well as the target node (Quality) that is likely for such changes on the observable node. In Figure 5, the user can investigate what will happen if the state of Involved Capacity changed from Neutral to Positive as shown in Figure 5.
According to the BBN model, the results provide a prediction with 100% probability of high level of Conceptual Reliability, high level of Usability, and a satisfactory Quality level.

D. Quality Attribute Fulfillment
BBN model can be used to give ratings and prioritized rankings of features that can be used to determine development priorities before coding begins. This can be done by entering beliefs about intermediate nodes into BBN model. The probabilities for all the observable nodes are then calculated. If, for instance, the design of the Ecommerce website has to be highly usable, BBN model will probably give a high probability for Navigability, Maintainability, Efficiency, and User-Friendliness as shown in Figure 6. This information can help in cases where features that were under serious consideration are ranked near the bottom of the priority list; these can be removed from consideration, thereby saving valuable development resources. Based on this probability, the design team should give these quality factors more serious consideration during the development of an E-commerce website in order to produce a highly usable E-commerce site.
The four usage strategies can be used in combination with each other. A quality attribute prediction usage of the BBN model can, for instance, reveal problems (making it a diagnostic usage). This may be the starting point to do impact analysis for solutions for the detected problems. Alternatively, if there are a lot of problems, the quality attribute fulfillment strategy may be used to see how much the ideal quality level deviates from the actual level.
XI. CONCLUSIONS Much of the published empirical work in the E-Commerce website evaluation area is well in advance of the unfounded rhetoric sadly typical of much of what passes for software engineering research. However, every discipline must learn as much, if not more, from its failures as its successes. In this spirit, we have reviewed the literature critically with a view to better understand past failures and outline possible avenues for future success. Our critical review of state-of-the-art of models for Ecommerce website evaluation has shown that most of the tools that have been developed for the assessment of Ecommerce websites give emphasis on the web applications of the system and they are based on surveys. This process provides significant results but demands extra time for data collection and data analysis in each measurement phase.
In this paper, we recommend a holistic model for Ecommerce website evaluation, using Bayesian Belief Networks, as alternative approaches to the single-issue models used at present. This model differs from questionnaire-based surveys approaches in that it uses a process aiming to limit subjectivity and frequent errors in similar surveys and provides a flexible way to define the quality of E-commerce websites, as users perceive it, in a short period of time.