A New Analytical Approach on Continuous Improvement Evaluation: A National Presentation Case

This paper demonstrates a new approach on how to analyze evaluation data to find a team’s (sometimes called circle) improvement ability and t he characteristics of evaluators by using the data from the national Quality Control Circle (QCC ) presentations in Taiwan. Most evaluations on the QCC presentation use the i temized rating method or the duplex-pole evaluation method. The former has the negative effects of canceling out the higher and lower while the latter cannot dis tinguish the ability type of a team. This study takes the average of the highest r ating “A” as an index for superiority. Based on these indices and the K-mean method, simi lar ability groups and evaluator groups are generated, respectively. According to a team’s improvement ability level which is determine d by the similar group it belongs, a proper regenerated input can be applied to sustain continuous improvement activity. Also, knowing the characteristics of eval uators, proper solutions can be applied to improve the evaluation credibility. This paper also shows that some of the evaluation data can provide valuable information through further study and thus increase “evaluation use”.


Introduction
In the era of globalization and fast changing economies, Continuous Improvement (CI) is thought to play an important role in maintaining a company's competitiveness. The ultimate goal of CI is to create an environment conducive to learning and growth through company-wide involvement in gradual improvement on process performance and innovation. CI generally takes account of the activities performed under the names of Statistical Quality Control (SQC), Quality Control Circle (QCC), Quality Improvement Team (QIT), Project Management (PM), 6 Sigma, etc.

The problem with the continuity of CI activity
A company's overall capability can be improved through persistent CI activity. A CI activity will be sustained provided it produces good results. However, maintaining the original performance is difficult, not to mention creating further effective results. Imai (1986) described the problem as, "All systems are destined to deteriorate once they have been established unless continuing efforts are made to maintain it and then to improve on it." In other words, performance may diminish even upon repetition of the original activity.
In general, there are three key components in a CI program: theme/problem, model and tool, and promotion (Wu & Chen, 2006). These construct the three bases of a pyramid structure. With solid bases and strong linkages among them, a company's capability in improvement and innovation can be reached. Unfortunately, most companies focus on promotion as a way to boost the employees' involvement or improve some factors such as training, communication, or management involvement and commitment (Coronado and Antony, 2002). Nevertheless, increasing the participation rate or a few factors in the absence of solid structure or overall capability cannot guarantee good results. Wu and Chen (2005) analyzed the performance of the QCC program run by Uni-President Enterprises Corporation 1 . They found that the correlation coefficient between participation rates and monetary benefits was very low.
This indicates that an increase in the participation rate cannot assure the increase in financial benefit. On the other hand, this could also imply that the participation rate in the company has reached a level which cannot be improved anymore. In this case, the financial benefit can be generated only if the company's capability is upgraded.
The other possible reason why a CI activity cannot be sustained is because there is no complete roadmap for continuity to move forward. A company may not have a clear idea about what its next step should be and how to get there. Senge (1990) 1 Uni-President Enterprises Corporation is the oldest and biggest company in the food industry in Taiwan. They have run QCC programs for many years and generate good results.
proposed the concept of learning organization. However, a learning organization cannot be established in one day. Bessant et al. (2001), by observing many companies' behaviors, summarized five evolutionary CI levels: from no improvement activity (Level 0) to trying out the idea (Level 1), structural and systematic CI (Level 2), strategic CI (Level 3), autonomous CI (Level 4), and ultimately becoming a learning organization (Level 5).

Possible solution for CI continuity
Although the evolutionary CI model can be treated as a roadmap for companies to move toward a learning organization, it is not easy for management to implement it so as to upgrade a firm's CI level. Wu and Chen (2006) proposed an integrated Based on the progress of the associated improvement cases, Wu and Chen (2006) classified the improvement ability into 4 levels. They are Level 1, case cannot be completed; Level 2, case is completed but not selected for presentation; Level 3, case is selected but is classified as a general class that is possessing basic improvement ability; Level 4, case is selected and classified as a process class that is possessing the ability to solve the root cause or to innovate (Wu and Chen 2006). For example, moving from Level 3 to Level 4, a company requires the basic ability of solving problems, while from Level 4 to Level 5, a company should possess the process and innovation ability. Only if equipped with certain ability can a company move up to a higher improvement level.

The objectives of this study
For a company to upgrade its capability, it needs to know where it stands, that is, what ability it currently possesses. This information can be found from the evaluation results of its improvement cases. In general, an improvement case in a CI activity will go through several evaluations including self-assessment, field evaluation (site-examination), and presentation evaluation. It may also be selected to participate in a national presentation by the company.
One of the objectives of this paper is to introduce an analytical method which, by analyzing the national evaluation data, is able to find a participating team's improvement ability. Based on the result, a company can apply a workable regenerating input to its system, so that its improvement activity can be sustained.

hal-00257116, version 1 -18 Feb 2008
Undoubtedly, the characteristics and qualifications of evaluators affect the fairness of the evaluation results. Thus, using the same data and analytical method, this paper also analyzes the evaluators' characteristics. From the results, the shortcomings of the evaluation system and the weaknesses of the evaluators may be found. Solutions can be applied to make the evaluation more fair and trustworthy.
Moreover, one of the core subjects on evaluation in the past 10 years is the evaluation use (Patton, 1997;Weiss, 1998;Henry & Mark, 2003;Ginsburg & Rheit, 2003). The evaluation data used in this study is somewhat useless to the sponsor after the presentation activity is over. This study, by chance, demonstrates that some of the evaluation materials can provide valuable information through further study, which increases the use of the evaluation.

Evaluation of the national presentation in Taiwan
The development of CI in Taiwan has seen a long history. QCC, SQC, CWQC (Companywide Quality Control), etc. are developed at the core of CI. For example, the reason why Phillips (Taiwan) won the Japanese Deming award in 1991 was because it achieved an excellent performance through SQC activities. The Association of Pioneer Quality Control Research ("the Association") has sponsored the QCC national presentation in Taiwan for many years. The rank of teams was determined based on the evaluation results. Once finalized, the evaluation data has met its goal.
The evaluation form includes 16 items (Table I). These are developed from each step, or category, in the QC-STORY model. Every step contains 2 to 4 items. There are 5 grades for each item, A, B, C, D, and E. The meaning of each grade and the score with which it is associated is shown in Table II.  In the first five years, the Association adopted an itemized rating method, that is, it decided the ordering places of each team based on their total scores which were generated by summing up the individual score of each item. This system caused two complications: 1) the participant whose score is slightly behind others doubted the evaluation results; 2) the results can hardly be objective because the individual evaluator has his/her subjective view. For example, the evaluation results on one event from a technician and from a management can be different. The difference may cancel each other out and, thus, the strength of a team is not able to be recognized. Also, according to Ieta et al. (2004), concatenated sets of grades from scales not belonging to the same categories may bring about errors of rank and absurd averaging.
To cope with this problem, the Association then adopted the duplex-pole evaluation method (Tsong et al., 1983). First, the standard of different ranks was set up. This standard, instead of scores, only considers the percentage of each level of grade that a team obtained from evaluators (Table III). The concept for the duplex-pole method is that the teams that have been given many "D"s and "E"s will not be included in the excellent or outstanding group, and the teams that have been given adequate "A"s and "B"s will be included.
The procedures of duplex-pole evaluation are as follows: During the hearing of a hal-00257116, version 1 -18 Feb 2008 presentation, evaluators have to make judgments whether a certain item is outstanding.
If so, mark "V" onto the corresponding column A in the evaluation form; if any feel that a certain item valuable or better than other member's presentation, mark "V" onto the corresponding column B; if any feel that a certain item of the presentation is incomplete or relatively poor, mark "V" onto the corresponding column D; if any feel that a certain item of the presentation is relatively inferior, mark "V" onto the corresponding column E; the blank ones that are considered as barely fair will be marked with "V" onto column C. Summing up the number of each level of grades.
Comparing the results with the standard set up by the Association, the final rank will be determined for each team.

The superiority index: the average of "A"s
Basically, the participating teams are good teams since they were selected previously. However, people want to know their outstanding fields that peers can learn from. If the presentation results can only tell their scores or ranks, people still cannot know their ability types. For example, if one team gets 5 "A"s and 5 "C"s, while the other gets 10 "B"s, they may get the same total score or same rank. Yet, we know the first team has 5 outstanding items and the latter has none.
According to the Association's concept, that is, giving "A" to the team that impressed evaluator most, therefore, obtaining "A" indicates that this team is very outstanding in this item. Since one of the goals of this paper is to find a team's improvement level and ability type from the evaluation outcomes, this paper proposes a new method which takes the average of "A" as an index of superiority for an item of give "A" to a team on Item #1, then this team gets 0.7 as its superiority index on Item #1. These indices will be used for cluster analysis to generate similar groups (or clusters).

Cluster analyses
Euclidian nearest centroid distance method or K-mean method was used for clustering. This is the most popular clustering tool used in scientific and industrial applications (Berkhin, 2002, p39). The main advantage of this method is that it is easily understood and implemented, and the foundation of analysis of variances is firm. The shortcomings can be that there is no concrete method to choose the number of clusters (K), the computed local optimum may be quite different from the global one, and the process is sensitive to the outliners. (Peck, 2005;Berkhin, 2002) The procedures of K-mean method are briefly described as follows (Chang, 1993): (1) Arbitrarily divide data into K groups. In this study, K=10 is predetermined.
Calculate the centroid or weighted average for each group.
(2) Calculate the distance of each data point to the 10 centroids. Relocate each data to the group whose centroid is the most closest to this data. If d 1 < d 2 , then data C(x,y) will be classified into group A; if d 1 >d 2 , then data C(x,y) will be classified into group B.
(3) Recalculate the centroid of each group. (The Centroid Method in SPSS was used to calculate centroids in this study) (4) Repeat steps (2) and (3) until there is no more necessity to relocate the data.
To ensure that the classification is proper, the variances of data within each group were calculated and the F-value of the variances between 2 groups was derived. All F-values are significant. This confirms that the classification is appropriate. Finally, radar charts were drawn based on the cluster mean in each group. Radar charts provide information about the ability type of each cluster. A total of 249 evaluators were involved. These evaluators include QC experts and CI activity promoters.

Reliability analysis
First, the internal reliability analysis was applied on each team. The results (Table IV) show that the Cronbach's α of 163 teams (86%) is higher than 0.7; and only 9 teams (4.74%) whose Cronbach's α is below 0.35. Among them, 8 were from the same session. According to Guildford (1965), data is not reliable if the Cronbach's α is below 0.35 while Wu (1985) suggested 0.3 as a critical number. Thus, our data in terms of internal consistency should be acceptable.

Teams' improvement ability
From the data, the superiority index of every item in each team was calculated.
Ten clusters were identified by the K-mean method. The cluster mean of each item was listed in Table V. Radar charts were drawn for each cluster using the cluster means (Table VI) From the percentage rate of excellent teams, it seems that the result from this study is consistent with that from the duplex-pole method adopted by the Association.
That is, many teams in the higher ability level, such as those types in the process class, were ranked as excellent teams in the national presentation. Thus, the results from our study and the national presentation are consistent.

hal-00257116, version 1 -18 Feb 2008
A team of a company can find its ability feature from the group to which it belongs. Government may also get a picture of the ability of companies within the nation.

Table VI Ability types and radar charts
General class：A group generally does not have an outstanding performance in each item but could possibly be good in a certain area (Total 161teams). Three types are found according to their outstanding items.

The common type (Cluster #1)：
Indicates that there is no outstanding characteristic in any item (0%).
Process class：A group that has above-average performance in each item but is excellent in a few (Total 22 teams). Three types are found according to their outstanding items.
The SQC type (Cluster #5)： Indicates excellence in the application of analyzing tools (80%). The performance confirmation type (Cluster #6)： ： ： ： Indicates excellence in identifying and maintaining the effectiveness of the performance (100%).
Extraordinary class: Indicates that one particular item is outstanding, while others are not so good (Total 7 teams). This class cannot become the prototype for learning. There are 4 types in this class (Cluster #7 to #10). (radar charts are omitted) (14%) Note: The percentage in the parentheses (.) is the percentage of the participants of this ability type who won the excellence award in the national presentation.

hal-00257116, version 1 -18 Feb 2008
In order for an evaluation to generate useful results, the evaluators must be qualified. If no one in a class receives "A"s, does this imply that there is no outstanding student in this class, or did the evaluator/teacher set the standard too high?
The answer could depend on the characteristics of the evaluators.
Using the same evaluation data and analytical methods, the characteristics of the 249 evaluators in the national presentation were analyzed. First, the average of "A" for each item and each evaluator is generated, that is, dividing the number of "A"s given by an evaluator to the number of teams such evaluator had ever evaluated. Call this average the grading index, which is equivalent to the superiority index for ability.
Thus, 1 indicates that this particular evaluator gave "A" on this item to all teams he/she evaluated; 0 indicates that no "A" was given by this particular evaluator on this item; 0.1 indicates that one out of ten teams received "A"s from this particular evaluator, and so on. Next, based on these grading indices, the same cluster analysis was applied to identify 10 clusters. The cluster mean and their associated radar charts are shown in Table VII.
Since the first four groups cover 232 evaluators (93%), only these four groups are discussed. Table VII shows that for Group #1, except for Item #16, the cluster means are relatively low. The average of 16 items is 0.087. This indicated that the evaluators in this group seldom gave an "A". The cluster means in Group #3 are slightly higher than those in Group #1, except for Item #1, which is extremely high.
Without Item #1, the average values of these two groups are quite close (0.088 vs. 0.112). The cluster means of Group #2 are consistently higher than their counter parts in Group #1. The cluster means of Group #4 are also much higher than those of their counter parts in Group #1, and that Item #1 is particularly high.
For individual items, there are a few findings: the high (0.816) and low (0.074) of the cluster means of Item #1 is significant; the cluster means of Item #2 among groups are consistently low; the cluster means are high for Items #7 through #10.

Team's improvement ability
Four conclusions were summarized from the results: 1) the ratio of the process class is relatively low; 2) presentation type is shown in each class, even in the extraordinary class; 3) there is no technology-excellence type in the process class; 4) the theme value was not shown.
Further examination can be done on the above four conclusions. For example, why does the theme value not show? Was the item description not clear enough? Item #2 is a good example since its cluster mean is low in most groups. The description "Is the substitution proper?" (Note that this is a direct translation from Chinese which is different from the description shown in Table I) seems vague. The real meaning of this item is "Does the theme properly indicate the problem?" This could be the reason why evaluators could not give a high score.
Next, why is there no technology-excellence type in the process class? This is because the innovation ability of teams is either not sufficient, or the problem came from the quality of evaluators. A proper solution, being either finding a suitable regenerating input or improving the evaluation system, cannot be applied unless the true reason can be determined. Table VII shows Groups #1, #2, and #3 account for 86% of all evaluators. Yet, those evaluators rarely gave "A"s, especially Group #1, which seldom gave an "A" on any item. Group #3 has a high grading index (0.775) on Item #1, which tells us that most of the evaluators in this group appreciated the themes presented. The two evaluators in Group #9 gave every item a high grade. Group #1 and Group #9 show completely opposite results.

Characteristics of evaluators
Item #1 and #2 are the only two items that are related to theme. The cluster mean of Item #1 has mixed high and low values. This is evidence that the evaluators' perspectives are diverse and the itemized rating method is not proper to identify a team's outstanding ability. The cluster means of Item #2 are very low among all groups. A reasonable explanation of this finding is that the description is vague, as mentioned in the previous section.
Most of the cluster means of Items #7 through #10 are high, which indicates that most evaluators appreciated the teams' performance on "countermeasure." Of course, performance is closely related to the degree of difficulty of the theme/problem. Since the ratio of technology-excellence type is low, it may reasonably be inferred that the hal-00257116, version 1 -18 Feb 2008 selected themes/problems are not difficult.
Factoring out personal preferences such as personal relationships among evaluators and participants, extreme conservatism (the one who is hard to please, giving unreasonably low scores), or extreme optimism (the one who is easy to please, giving high scores indiscriminately), the difference between the above two disparate evaluators (Group #1 and Group #9) tells us that either they lost the ability to distinguish the superiority of teams or that the description of the items evaluated is too vague to be understood. To cope with the first difficulty, perhaps the obvious thing to do is to re-educate the evaluator with a purpose toward certification. For the second problem, a redesigning of the evaluation method and items can help.

Conclusion
In the past, people have taken different approaches toward CI activities. Some focused on problems, some focused on models and tools, and some focused on promotions. Everyone thinks that his or her approach is the most effective one. As Deming expressed, what is best for a sub-system may not be the best for the entire company (Latzko and Saunders 1996). Focusing on one area worked well in the past because the evolution of CI was still at the lower level. However, when it reaches a higher level, the ability has to be upgraded in order to sustain the CI activity. This study applied a new analytical method on the evaluation data from the national presentation in Taiwan. The results provide two kinds of information. One is the improvement levels of companies and their ability types. Individual companies can realize their strengths and weaknesses and thus inject the proper regenerated input for future activity. The other is the characteristics and quality of evaluators. The sponsor can modify the system accordingly to make the presentation more fair and reliable.
This introduced method can be applied in various fields, especially in public issues, such as education, healthcare, social welfare, etc., so that resources can be used effectively. The other contribution of this paper is that it increases the use of evaluation data.