New Weighted Clustering Approach to Map and Prioritize Learning Knowledge Objects Towards Learning Approaches

— To have a unique learning experience and a high learning impact, diverse courses should be incorporated in e-Learning. Learning Management System, a tool in e-Learning manages and delivers content to users. Learning Objects (LO), the course content is the fundamental unit of Learning Management System. Knowledge Object of Knowledge Management System can also be a viable resource in technology supported learning. A learning scenario for a given learner has to be identified. The course content (LO) has to match their learning skills. Data mining techniques can be widely used to find similar objects and K-Mean clustering technique can be used to produce more consistent clusters. The clusters can have strong and similar concepts of Learning Knowledge Objects. A new algorithm, a weighted cosine distance that gives real-valued distances between instances which further modifies the structure of the feature space is used for prioritising objects in clusters. These objects can be further mapped to learning approaches of the users. An experiment is conducted by using Learning and Knowledge Objects to understand the effectiveness of the weighted measure, thereby a personalized holistic learning environment is provided to the learners.

INTRODUCTION e-Learning helps to stabilize and improve the performance of learning by capturing and presenting the subject content in synchronous or asynchronous form. The landscape of learning technology consists of repositories, digital content, adaptive tutors, personalization, course management and collaboration. The standard learning content focuses on content packaging, metadata, accessibility, resource list, sequencing, and sharable resources. The learning system standard involves tool, learner profile, competencies, accessibility, digital repositories, learning design, learning object discovery and delivery.
Learning Management System (LMS) is a widely used tool in e-Learning environment and it needs to focus on content delivery in a new context. The environment has to improve learning by providing additional resources to LOs. These resources can be captured from the Knowledge Base (KB) of the Knowledge Management system (KMS). Knowledge Objects of KMS can be a viable resource in e-Learning [25]. Various theories and models are proposed in the delivery of LOs [6], converging LOs and KOs [32] [34] [35] [36] [38] [40] [47]. These converged objects are called Learning Knowledge Objects (LKO). Many techniques are used for delivery of these LKOs [5] [29] [36] [37] [46]), This research work uses clusters of LKOs and maps them to learning approaches.
Clustering hypothesis states that, closely associated objects tend to be relevant to the same query request. Similar objects are placed within a cluster, leading to a faster retrieval of objects and effective presentation of information. K-Mean technique is a partition based clustering method. It is a vector quantization approach, and it is uses pair wise Euclidean distances between points. A Euclidean distance is closely tied with cosine or scalar product. It gives an accurate measure of similarity than magnitude. Magnitude is an important element when considering similarity. To find similar LOs & KOs, cosine measure that gives similarity value between two objects and Euclidean measure that gives the magnitude of objects can be used. These measures provide a different aspect of similarity between two entities.
The research formulates the weights for the LOs & KOs as a selection criteria for identifying suitable LKOs for different learning approaches. The work proposed involves in creating the most appropriate structure to deliver the objects through clusters and arranging the clusters in a manner that aids find-ability of the objects. It also provides a new framework for narrowing down an appropriate set of "objects" for different learners thereby providing a faster retrieval and an improved learning experience. This paper is organized as follows: -Literature Review, Methodology, Experimental Set up, Results and Analysis.

II. LITERATURE REVIEW
The theoretical review is based on Learning Content, Design, Development and Delivery as shown in " Fig.1".

A. Learning Content (Resource)
The LMS is an application for administration, managing and delivering of learning content within an organization [15]. The Learning objects are independent small pieces of information and should be reused [3][44] [22] [27] [43].
A Learning Object refers to any digital educational resource. Instead of providing all material for an entire course, a particular topic or a lesson can be delivered during technology-supported learning. Broadly speaking, Learning Resources usually refer to documents or collections, whereas Learning Objects are components of a document or collection. LO is comprised of assets like image, text, video, web page.
KMS is based on the Knowledge Management principles supporting knowledge processes and practices of KM like creation, acquisition and sharing [7] [31]. It is a critical factor in the success of any organization. KM objects can also be used in Online Educational System 30] [33] especially in higher education system [18] [19] [25].
The tacit knowledge in a knowledge conversion process can be considered as the content of KOs. According to Merrill [25] a Knowledge Object is a record of information that serves as a building block for KMS. Horton [18] [19] says, ''A KO is a chunk of electronic content that may be accessed individually, and that can carry out a single goal'' and it should also be reusable.

B. Learning Content (Design & Development)
Learning is an ever evolving process, improving the overall effectiveness of the teaching learning process, Bloom has given a clear outline of the learning objectives [2], learning styles [12] [24] and learning approaches [11] [12] [23] of a learner. These approaches can be broadly categorized as follows: • Surface/basic learners are aiming to reproduce the study material in a test or exam rather than actually understanding it. • Strategic learners intend to obtain high grades and organise their time and distribute their effort to greatest effect. • In deeper approach, learners are aiming towards understanding it. Higher education learners can belong to this category.
Regardless of students' preferences, the goal of learning is to make the users understand the concept and successfully apply them.

C. Feature weights
Data pre-processing is an important step in data mining, which improves the quality of data. It includes cleaning, normalization, transformation and selection. Feature weighting, a data pre processing technique is used to approximate the influence of individual feature in a set. The relevant attribute has a higher weight value, whereas irrelevant features are given a weight closer to zero. The need for attribute weight setting and its advantages are discussed in the research work [10] [28]. In mining a dataset, all features may or may not be relevant or correct, and clustering can benefit from using a selected subset of the features. Many research works on feature weighting are proposed [4]

D. Similarity Measure
Many clustering methods use distance measure to determine the similarity or dissimilarity between any pair of objects. Different distance measures are available for binary, ordinal, categorical and continuous data types. Similarity (proximity) compares the two vectors a and b and Cosine, Jaccard, Dice are some of the methods of proximity measures. Cosine similarity, uses similarity scores of an object in a given data set. This measure takes two argument !"#$% and !"#$& as parameter which are a vector representation of the object's content A and B, and returns the similarity score, which lies between 1 and 0, indicating that the two are completely similar (1) or dissimilar (0). Refer "(1)". (1)

E. Learning Content (Delivery)
A good clustering method produces high quality clusters with high similarity and inter class dissimilarity [17]. Clusters are formed based on "distance" between points. Two major classes of distance measures are Euclidean and non-Euclidean distance measures (Jacquard, Cosine). The clustering methods can be broadly classified as the Partition, Hierarchy, Density-Based, Grid-Based and Model-Based techniques. K-Mean is a simple, partition based approach [17] [41].
K-Mean clustering Algorithm as follows: 1. Determine the value of 'K'. 2. Select 'K' random objects from the data set as initial points (center) of the cluster. 3. Each object is compared with the initial points (center) and the distance is calculated. 4. The points to the closest center point is assigned. 5. The mean is calculated for each cluster and the center point is updated with the new mean point. 6. Repeat step 3 till there is no assignment of objects to clusters. The distance is calculated using "(2)": (2) F. Evaluating Measures -Silhouette index K' in K-Mean algorithm is an important criterion in determining the number of clusters. Elbow method, cross validation and silhouette index are some of the methods used in determining the value of 'K'. Silhouette coefficient uses the average distance to elements in the same cluster with the average distance to elements in other clusters [9].
• Cohesion a (x): average distance of x to all other vectors in the same cluster. The value of s (x) lies between [-1, +1]: Objects with a high silhouette value are considered well clustered and objects with a low value may be outliers. This index works well with K-Mean clustering, and is also used to determine the optimal number of clusters

G. Evaluating Measures -DB Index
Cluster validity assessment, is the process of evaluating the results of a clustering. Finding an optimal cluster is an important criterion. External, Internal and Relative are the three different criteria used clustering algorithms.
Internal and External criteria use statistical methods and Relative criteria compares different clustering schema with validity indices. The various validity indices are Davies Bouldin Index, Root Mean Square Standard Deviation (RMSSDT) Index, Dunn Index [16]. The Davies-Bouldin index is calculated by the formula given below: -"(4)".
Algorithms that produce clusters with low intra-cluster distances (high intra-cluster similarity) and high intercluster distances (low inter-cluster similarity) will have a low Davies-Bouldin index. This measure is considered as an evaluation measure for clusters in K-Mean technique. The methodology used is discussed in the next section.

III. METHODOLOGY
The proposed methodology is shown in a flowchart Refer " Fig.2".

B. Need of weighted clustering & choice of weights
In weighted clustering, every element is associated with a real valued weight, representing its mass or importance. The algorithms that always respond to weights are as K-Mean and K-Median. Algorithms such as single-linkage, complete-linkage, ignore weights. Clustering changes depending on the underlying weights [1].Few advantages of adding weights are as follows: • Accessibility of different objects may have varying importance. • The weighted approach can prioritize certain objects.
• Since the objects can be distributed, the weighted approach can enable quick access and delivery.

B. Feature selection of weights
Rapid Miner, tool is used to check the relevance/weight of the attributes. The tool converts Attribute Weights into a ranking structure. The Largest weight is 1 and smallest weight is 'p', the number of attributes considered. Attributes Title, Topic, SubTopic, Content, Category is considered for weight selection. The output of the tool is shown in " Fig.4." " Fig.4", represents the attribute "content" has a higher weight. Cosine similarity is applied to the attribute content of the LKO and the weights are added to it.

C. Experimental Approach
As discussed in the previous section, to calculate the proximity (similarity) between two objects, cosine similarity measure which returns values of interval [-1, 1]. The similarity of the contents is converted into 124 x 124 matrix as given below " Fig.5".
Weights are added to this cosine similarity matrix. A python program is used to generate the new weighted matrix. The input variables and steps followed are described in " Fig.6".

D. Choice of 'K': Silhouette coefficient
Objects with a high silhouette value are considered well clustered and can be used to determine the optimal number of clusters in 'K'-Mean. This measure is suitable for estimating the best partition of the cluster (Refer eq 3). The average silhouette for K=3, 4,5,6 are 0.028,0.034,0.017and 0.031 respectively. The value of 'K'=4 has the highest silhouette value and it is considered as an appropriate value for 'K' in K-Mean algorithm. The rapid miner tool was used for creating clusters. The clusters obtained for the three matrices were discussed in " Fig.6".

A. Clusters & LKOs
" Fig.7", represents the output of various clusters for the three matrices: • No weights • Greater weights for LOs • Greater Weights for KOs.
The graph shows the following findings: • Number of LOs has increased in cluster0 when greater weights are given to LO. • Number of KOs, has increased in cluster2 when greater weights are given to KO

B. Learning Index & Knowledge Index (Output &Graph)
Learning Index and a Knowledge Index is calculated using the metric given below: -(Refer "(5)" and "(6)".
The average LI & KI of four clusters (cluster0, cluster1, cluster2, cluster3) for the three experiments and the corresponding bar chart is shown in " Fig.8a" and "Fig.8b".
1. Learning index (LI) is given as: 2. Knowledge index (KI) is given as: The output shows that the Learning Index of two clusters have increased when the weights are set as LO=. 7 and KO=. 3 (" Fig.8"). The Learning index of cluster0 is 41.8889 and cluster3 is 13.76.
Knowledge Index of 3 clusters has increased for the weights set as LO =. 3 and KO=. 7 (" Fig.9"). The objects are packed within few clusters and thereby weighted approach prioritizes certain objects and enables in quick access and delivery.

C. Evaluation of the clusters
The three cluster models and the clusters along with the DB index are shown in " Fig.10". The clustering algorithm that produces a collection of clusters with the smallest Davies-Bouldin index is considered the best algorithm based on this criterion. According to this index the clusters with weights has the smallest values (0.30&0.32) as compared to clusters without weights (0.35) " Fig.10 from Cluster2 whose scale is 10 " Fig.11". The normalised (Min-Max Normalisation) values for Learning Index and Knowledge Index for each metric in the scale of 1-10 is shown in Table 1.

Surface Learners
Deep Learners Strategic Learners Figure 11. Mapping of clusters with learners PAPER NEW WEIGHTED CLUSTERING APPROACH TO MAP AND PRIORITIZE LEARNING KNOWLEDGE OBJECTS TOWARDS LEAR…

VII. CONCLUSION
The subject matter from an expert along with a well defined content on a particular topic can give a new dimension to the learning design and can bring credibility to the concepts. The material that a learner receives in a topic for a subject during learning could be blended with appropriate KOs and thereby a knowledge enriched learning environment can be provided to the users. In this research work, clustering based on weighted cosine distance was adopted. This method uses weights to attribute "content" during the clustering process and it makes full use of the characteristics of the data distribution as well as increasing the accuracy of the clustering results. The experimental result shows the technique used is capable of producing a partition that is as good as or better than the best individual clustering.
The new algorithm improves the findability of objects within the clusters and the quality of clusters is confirmed using the DB index. The implications and findings of the research will help better and appropriate delivery of objects based on different learners. The variation of weights may give different clusters. Analysis can also be carried out with other clustering algorithms that respond to weights. The weights can be also defined and added based on learners' feedback. The work can be further studied using cluster ensemble techniques.