Paper —Design of a Smart MOOC Trust Model: Towards a Dynamic Peer Recommendation to Foster… Design of a Smart MOOC Trust Model: Towards a Dynamic Peer Recommendation to Foster Collaboration and Learner’s Engagement

—Recent evolutions in the Internet of Things (IoT) and Social IoT (SIoT) are facilitating collaboration as well as social interactions between entities in various environments, especially Smart Learning Ecosystems (SLEs). However, in these contexts, trust issues become more intense, learners feel suspicious and avoid collaborating with their peers, leading to their demotivation and disengagement. Hence, a Trust Management System (TMS) has become a crucial challenge to promote qualified collaboration and stimulate learners' engagement. In the literature, several trust models were proposed in various domains, but rarely those that address trust issues in SLEs, especially in MOOCs. While these models exclusively rank the best nodes and fail to detect the un-trustworthy ones. Therefore, in this paper, we propose Machine Learning-based trust evaluation model that considers social and dynamic trust parameters to quantify entities' behaviors. It can distinguish trustworthy and untrustworthy behaviors in MOOCs to recommend benign peers while blocking malicious ones to build a dynamic trust-based peer recommendation in the future phase. Our model prevents learners from wasting their time in unprofitable interactions, protects them from malicious actions, and boosts their engagement. A simulation experiment using real-world SIoT datasets and encouraging results show the performance of our trust model.


Introduction
In the most recent decade, due to the prominent evolution of ICT and the advent of the Internet of Things (IoT) paradigm, the physical and virtual worlds will In the most recent decade, due to the prominent evolution of ICT and the advent of the Internet of Things (IoT) paradigm, the physical and virtual worlds will increasingly be distanced from each other [1]. Cyberspace becomes a part of real space while constituting a pervasive space called ubiquitous computing [2]. Portable computing devices like smartphones, tablets, and wearables have become an integral part of our daily lives. In addition, various researchers have explored the possibilities of incorporating the concept of social networks in the IoT ecosystem. This integration has led to a new paradigm of the SIoT that represents a suitable platform for better interactions between people and things [3], [4]. Figure 1 shows the general evolution of connected things. Therefore, this impressive progress of ICT has strongly affected several areas and sectors including the education field. Education is greatly reconstructed in the most recent decades by the integration of IoT technologies. We refer to a new concept which is 'Smart education' or "Smart Learning" that describes learning in an intelligent era and that provides a facility to the learner for learning at any place and any time by using smart devices to learn knowledge, acquire skills and connect with their peers [5]. Indeed, the rapidly expanding possibilities of ICT in the education area have enabled the emergence of novel collaborative systems like MOOCs. These ecosystems revolutionize traditional education methods and attract attention in academic and industrial areas. They represent the famous category of Smart Learning [6]. How-ever, in these contexts characterized by a big number of participants, with intensive interactions, heterogeneous communications, and various devices, learner engagement and completion are problematic [5], [7]. Thus, trust issues arise from the search for a trustworthy peer that can provide the desired service. This situation leads to learner demotivation and disengagement. Trust models could be adopted successfully in this context to help learners by selecting the most appropriate peer to overcome their learning difficulties and maintain their motivation. In general, trust has been widely used in diverse areas to improve the quality of social networking by fighting malicious peers, selecting appropriate partners or service providers, and enhancing the decision-making process. The definition of trust that we derive for our research and in the context of a pervasive world and the ubiquitous computing (IoT and the SIoT paradigm) is: "a qualitative or quantitative property of a trustee, evaluated by a Trustor as a measurable belief, subjectively or objectively, for a given task, in a specific context, for a specific period" [6]. Whereas Trust management is, "mechanism used to ensure trust in various types of systems, his role consists of computing a trust score, which will help nodes to decide on invoking or not, services provided by other nodes" [8]. Trust is a relationship including at least two entities: a "Trustor" entity and a "trustee" entity [9]. The former represents an entity that is supposed to initiate an interaction with another entity, while the latter is the second entity that provides the necessary information (knowledge, content, service) to the Trustor at its request [6]. Moreover, trust has several characteristics and properties, it is asymmetric transitive, propagative, and very dynamic [10], [11]. Ultimately, the trust evaluation process is dynamic in research our context. It involves the Trustor, Trustee, and the underlying context. Thus, the Smart learning Environment (SLE) network comprises users (learners) and devices owned by users. Rules are set by the owner (learner) to create relationships and to provide or obtain services from other objects. Figure 3 describes the idea of our Smart Learning context.

Fig. 3. Illustration of Smart Learning context
To the best of our knowledge, our work is the first that addresses trust evaluation issues among entities (learners and devices) in pervasive learning environments particularly MOOCs. Some works have addressed trust in MOOCs focusing on trust in platforms and MOOCs providers [12], [13]. However, our research handles social trust and is interested in trust among learners to ensure efficient collaboration. In addition, our work is the first that suggest a dynamic Peer Recommender Framework based on the proposed MOOC trust model that due to space constraints it will be presented in future work. In this work, we propose a new trust model based on new trust features derived from OSN and SIoT ecosystems since MOOCs resemble these contexts and have shared characteristics like openness, mobility, and dynamicity, a massive number of participants, and heterogeneity of the components.
In general, the main contribution of this research to the existing literature is that it produced results related to the concept of Trust and Trust models related to learners in MOOCs, an area in which there is currently limited research [12]. Then, we can present other scientific contributions that are summarized as follows: ─ Analyzing recent works of trust evaluation in OSNs and SIoT ecosystems with a focus on trust models based ML methods, considering that trust models in these contexts are very advanced and the research on these models is in notable evolution. ─ Design of a smart trust evaluation based on classification algorithms to predict the trustworthiness of each partner in future transactions. ─ The proposed trust model will be the basis for dynamic peer recommendation. It is flexible and can be used in different application scenarios such as ubiquitous systems and large-scale collaborative systems.
The remainder of the paper is organized as follows. Section 2 reviews and analyzes the recent works of OSN and SIoT trust evaluation based on Machine Learning in the literature. Section 3 introduces and explains the proposed trust evaluation model. Section 4 covers and describes the methodology and material adopted in simulation setup, results comparison, and discussions. Finally, Section 5 concludes this paper and discusses the future works.

Literature review
In the literature, several trust models are proposed. So, to choose the most appropriate Machine Learning algorithm for handling trust evaluation concerns, some of the OSN and SIoT trust evaluation models based on ML suggested over the last few years were examined. Moreover, considering that in our previous works [14], [15], [16], we have given an examination and a study of the relevant OSN and SIoT trust models used traditional methods like weighted sum, fuzzy logic, and Bayesian belief [17], [18]. In this section, we have reviewed relevant as recent trust management schemes based on ML approaches.

Comparative study
In [19], researchers proposed a trust model between users on Facebook. Features are extracted from user interaction information and profile information. KNN, SVM, and MLP are used to predict trust levels. MLP provides the highest accuracy rate. In [20], the authors realized trust evaluation as a classification problem based on the SVM technique. The work of [21] presented trust model-based MLP based on the node's Packet Delivery Ratio (PDR) and set a threshold to distinguish them. In [22], the authors used the trust values calculated by a traditional method and some additional information as training features. They employ an LR method to classify nodes. The results showed that trust evaluation-based ML has higher accuracy by comparing it with other traditional methods. In [23], Eight ML methods were tested. Results showed that the performance of trust evaluation using LR and Neural Network (NN) was the best. The paper of [6] proposed a trust assessment model for IoT services. They used unsupervised ML (k-means) and supervised ML (SVM classifier) to combine six trust factors and classify trustworthy and untrustworthy nodes. In [24], authors expose a trust model that used SVM to aggregate trust features and compute trust among entities. In [25], researchers utilized ML techniques instead of traditional methods to classify vehicles into trustworthy and untrustworthy. They use real IoT data set to perform ML classifiers precisely SVM and KNN. Researchers in [8] proposed a trust model based on attributes derived from the description of the principal trust-related attacks cited in the literature. Their trust model can detect malicious nodes and isolate them for a resilient network. Recently, the previously mentioned researchers in [26] proposed TMS-based MLP able to detect malicious nodes and the types of attacks they have made. Table 1 presents the comparison of previous works according to four criteria: The comparative study highlight that most of the analyzed works handle the trust assessment issue as a classification problem. Then, this analysis conducted us choose the suitable ML model to elaborate our trust model.

Trust classifier selection
The ML models selected consist of SVM, KNN, LR, and MLP. We briefly describe each of them in Table 2: Table 2. Description of the Machine Learning models

SVM
Involves the idea of a "margin" that separates two data classes [27].

KNN
Proposed by Cover and Hart [28], based on the principle that instances of a dataset usually exist near other instances with similar properties. Simple with high accuracy [19], [25].

MLP
Based on the use of Artificial Neural Networks (ANN). Most used for numerical data [25], [30]. Composed of several perceptrons which are simple algorithm that performs binary classification [31].

3
Design of smart trust model

TMS life cycle
In this section, we present for the first time the fundamental components of TMS commonly known in the literature [32], [33], [34]. Thus, TMS is composed of five phases as follows: Gathering information, Trust calculation, Trust Decision, Trust up Date and Reward and Punish. In this paper, we focused on the former three steps that are the basis of the proposed Trust model. Hence, we aim to develop the two last ones in future work especially the Reward and Punish phase related to the Peer Recommender Framework.
 Gathering information: The TMS gathers information from all the nodes of the system. It comprises two functions explained subsequently: -Trust Composition: it consists of the extraction of trust parameters essential to trust value creation. These features can represent the Quality of Service (QoS) that an entity provides or represent the social behavior of an entity and its social relationships with other entities on the system (Social Trust) [32].
-Trust Formation: it linked to building trust value on single or multiple parameters. The majority of TMS consider multiple parameters [32].
 Trust Calculation: After gathering trust information, trust values are computed. it includes two major phases: -Trust Aggregation: its objective is to arrive at a final and an overall trust value that can be binary, (trustworthy/untrustworthy) or numerical to the ranking of the trustees. The most known technique is the weighted mean [16], [17]. Recently, to overcome the shortcoming presented by this latest mentioned, ML algorithms were applied [35].
-Trust Propagation deals with how the trust information propagates through the network. They are two kinds: Centralized when there is a unique and central entity in charge of gathering, calculating, storing, and propagating trust information around the network. In a Decentralized scheme, information gathering and trust calculation are performed by all entities of the system.
 Trust Decision: this step permits the Trustor to decide to trust or not the trustee. They are two types of TMS: -Policy-based TMS: based on storing and sharing policies and credentials.
-Reputation-based TMS is based on the trust evaluation process of a service provider by the service requester or other entities.

Key phases of the proposed trust model
Our trust model steps are based on the TMS components already explained.
Step 1: Preprocessing the raw SIoT dataset. The preprocessing phase includes dealing with inappropriate values to convert data into a more suitable form for the selected ML algorithms. Nevertheless, finding an appropriate dataset is a challenging task. Thus, MOOCs contain personal data about learners, data related to a course, and data about learners' interaction with learning resources. This kind of data is insufficient to classify learners as trustworthy and untrustworthy. Therefore, there is a need for external information, especially, social information related to learners' social behavior such as their relationships, preferences, and interests to ensure a better assessment of trust. The learner is a human being, defined by different characteristics including his interests, preferences, and social context that represent an important factor in his choices and decision making [36]. Therefore, for these reasons, we used a raw SIoT dataset derived from MobiClique, a Mobile Social Network (MSN) used during SIGCOMM 2009 conference in Spain [37]. This MSN lets the availability of dynamic data. It helps users to explore and join various interest groups and to create new ones at any time. Likewise, it enables users to meet and find new friends. Hence, the list of interests and the friendship graph dynamically change. This dynamic data is desirable in the trust evaluation because trust is very dynamic.
The raw dataset is divided into several Comma-Separated Values (CSV) files. Table 3 exposes each file used in this search with a brief description of its contents. For more information on these files, visit the CRAWDAD platform (https://crawdad.org/ thlab/sigcomm2009/20120715/index.html).

CSV files content
Participants Includes a basic social profile: home city, country, and affiliation.

Transmission
Message transmission logs Data is transmitted between two devices using Bluetooth RFCOMM protocol.

Reception List of messages receipted
Step 2: Dynamic Trust Features engineering. In general, the raw data is inoperable. Feature engineering is a crucial step since it impacts strongly Machine Learning's performance and consequently the decision-making process [38]. In a formal way, the problem is directed towards designing a set of features extracted and used to build a binary classification model y using a given training set such that it takes features X as an input and predicts the class label of a learner as an output. The label of each training sample i is denoted by ( ): {untrustworthy, trustworthy}. In our context, a device is untrustworthy because its owner (learner) is untrustworthy. In the following, we present the eight trust features extracted from the dataset and we give the calculation formulas of the three calculated trust parameters. These trust attributes are inspired from the works of [39], [40] and described in Table 4: We have used MATLAB R2018a to merge the different CSV files and to compute all trust features.
Packet delivery ratio. (PDR) or Direct Trust Value (DTV): It is related to the current direct trust observation. Also, it is linked to the ratio of the number of the packets successfully forwarded to the total number of the packets at any given time as: In the literature, the PDR is considered the primary parameter for calculating direct trust to a trustee and a key criterion for designing trust models and for identifying malicious behaviors [6].
Mutuality or Mutual Friends (MF). if two users have mutual friends, these friends can close the trust gap between them. This feature is computed as the ratio of common friends between a Trustor and a trustee to the total number of friends between the two as: Where represent the number of friends of a Trustor and a trustee respectively and |. | shows the cardinality of a set which gives the count on the number of elements in the set.
Common Interest Groups (CIG). Two nodes with a degree of high communityinterest, have more chances in interacting with each other, trust each other, and thus can result in better network performance. It represents the ratio of common interest groups to the total number of interest groups where both the Trustor and trustee are involved, and his calculation formula is as follow: Where depicts the communities of a trustor and represents the count on communities of a trustee.
The dataset in Figure 5 shows a representative example of trust attributes and samples captured over the simulation scenario.

Fig. 5. Representative samples of our dataset
We notice that in this paper, the small number of trust features is an advantage, which leads to a higher speed computation.
Data Labelling. In our model, we are supposed to perform labeling of the data to identify two different labels, namely those that are trustworthy and those that are not. Using k-means, the data set is simply divided into two clusters 1 and 0 arbitrarily. In our case, we have continuous trust values that are converted to binary values by comparing their value to the threshold that can be adapted to meet different requirements [29]. So, we used a conditional function and fixed a threshold which is used to decide when an entity is trustworthy or not). It is 0.5 for our study. Hence, a node is consid-ered trustworthy if its PDR value is greater than 0.5 and the node has an MF value or CIG values greater than 0.5. If the PDR value is below 0.5 and MF or CIG values are below the threshold, the node is untrustworthy. Figure 6 shows the dataset after data labeling. Fig. 6. Samples of our dataset after data labeling In our case, we have the sizes of the two classes that differ; from 5776 we have 4089 of class 1 and 1687 of class 0. Then, we chose a subset of 1678 nodes from these samples by using simple random sampling to shuffle the dataset for not having the same values consecutively for the label's feature to ensure a reliable ground truth. In addition, to avoid overfitting data and to obtain the maximum accuracy of the learning algorithm, 80% of samples were used for training purposes, whereas, 20% of them were used to evaluate the accuracy of the proposed model.
Classifiers Hyper parameter optimization. It is prominent that ML models cannot achieve the best performance without considering optimization techniques [29].
In the case of SVM, we tested the Linear Kernel (LK) and the Radial Basis Function Kernel (RFBK) [27]. The former shows high accuracy with 0.9978 and the RBKL gives 0.9940. Then, the choice of appropriate parameters is a crucial step for achieving reasonable results [41]. The settings of these parameters are based on a socalled "grid search" [27]. The goal is to identify two parameters: "C" and "gamma." that are conventionally used to avoid data overfitting [41]. For that, we have utilized part of the training samples as the cross-validation to find the best parameter set and the results obtained via the trained model are enhanced. For KNN, the appropriate value for K can be configured experimentally. Therefore, we can reach the optimal value of K by using 10-fold cross-validation on our dataset using a generated list of odd numbers ranging from (1-10). In our case, the optimal number of neighbors is K= 3 with accuracy of 0.9673. Concerning the MLP model, Table 5 reports the set of parameters used in MLP model and Figure 7 depicts the trust evaluation based on MLP:

Fig. 7. Trust evaluation based on MLP model
Step 3: Training and classification using Machine Learning Classifiers. The four ML Models are tested, namely SVM, LR, KNN, and MLP to select the most appropriate.
Step 4: Performance Evaluation of the proposed model. In this step, we applied evaluation measures commonly used for trust prediction and classification issues that are reported and explained in detail in the subsequent section. Finally, a generic structure of the proposed MOOC trust model is shown in Figure 8. The following section outlines the simulation environment and gives details related to experiment outcomes.

Material and methods
The subsequent section describes the experimental tools used. It gives information about the metrics used for evaluating the results. Finally, it provides a comparative analysis of the obtained results.

Experiment tools
The following experiments were all done under a personal computer which is configured as a win13 system, Intel(R) Core (TM) i5-3427U, 8Go RAM, 64-bit operating system. Concerning the Data preprocessing and the training ML models are performed in "Google Colab" which is a python notebook. It allows writing and running Python scripts in an internet browser with zero configurations required free access to GPUs, and easy sharing. The fact that "Google Colab" is based on Python makes the proposed model easy to integrate into MOOCs.

Performance analysis
The performance metrics used are: Accuracy, Recall, Precision, Receiver operating Characteristic (ROC) and, Area Under the Curve (AUC). Next, these metrics are explained in more details with their formulas as follows: ─ Accuracy is the ratio of the number of correct predictions to the total number of input samples. The formula for calculating the Accuracy is: Where: TP = True Positive / TN = True Negative/ FP = False Positive / FN = False Negative ─ Precision: This is the most well-known and general measure for evaluating the performance of classifiers. It reports the ratio of the correctly classified instances to all the instances. His formula is: ─ Recall or True Positive Rate (TPR): It provides important insight into classification performance relative to the number of incorrect predictions. It be calculated as follow: fier's ability to predict both positive and negative classes. Thus, the ROC is a probability curve and the AUC represents the measure of class separability, it indicates how well the probabilities from the positive classes are separated from the negative.
We notice that both Precision and Recall can reflect the strength of the classifiers in predicting trust correctly since they are calculated based on the true positive/ negative and false positive/negative values, accordingly, as shown in Equations (5) and (6). Positive and negative represent trustworthy entities and untrustworthy entities respectively. Next, we present their interpretations in our real-life context that is a MOOC.
TP: It means the number of examples correctly classified as trustworthy. Correctly classifying learners can help to improve the quality of collaboration among them in the SLE because interactions will occur between reliable elements.
FP: it is the number of untrustworthy learners that are incorrectly labeled as trustworthy. Thus, co-learners that are supposed to be blocked for a learner are displayed and recommended for them. Undesirable actions of untrustworthy co-learners can result in the learner to drop out.
TN: it depicts the number of instances with distrust relationships that are correctly predicted as distrust. It is crucial for our context, to block untrustworthy learners and ensure MOOC network performance.
FN: it shows the number of trustworthy entities that are incorrectly classified as untrustworthy. This can result in poor quality services and prevent learners from collaborating with participants they truly trust.

Results comparison and discussion
This section outlines the comparison of the classification results obtained by the four classifiers. To demonstrate the effectiveness and the performance of the four classification methods, we generate a confusion matrix shown in Table 6:  Table 6 shows the proposed model can to obtain good accuracy scores for each classifier. Additionally, Figure 9 and Table 7 demonstrate that the SVM (LK) achieving the highest accuracy with 99, 78% proved to hold higher efficiency. LR and MLP were 99, 70% and 99, 40 % respectively. Finally, the accuracy of 3NN was 96, 73%, which presents the highest error rate compared to the other classifiers.    Generally, all classifiers give good AUC values ranging from 0.95 to 0.99, demonstrating that the tested classifiers are better at distinguishing between untrustworthy and untrustworthy classes (0 and 1 respectively). However, SVM (LK) gives the highest values in the other evaluation metrics. An overview of the results shows that our proposed approach is achieving encouraging outcomes. We can observe that our selected trust features yield better results of a classification in all the evaluation metrics (Accuracy, Precision, Recall, and AUC) and surpasses 99%, which indicates the proposed trust features perform well to recognize the untrustworthy entities and identify the trustworthy ones as well. Moreover, two reasons lead to the improvement of our proposed trust evaluation method. Firstly, the trust feature extraction approach help to find out the optimal set of features for the trust evaluation process-based ML. In addition, the datasets which comprise dynamic and social data are essential for better trust evaluation performance. Therefore, learners in MOOCs use various tools outside the MOOC [42]. However, they prefer mobile devices like smartphones, tablets, etc., and the MSN apps have become an integral part of their lives [43]. It is suitable to incorporate in the MOOCs an adapted MSN that becomes a part of the MOOC design and which learners use inside the platform. This application will enable learners to collaborate, share content, create interest communities, and also make social and contextual data key of trust computation available and obtainable.

Conclusion and future works
In this paper, we designed an intelligent TMS based on ML techniques in MOOC ecosystems that can dynamically assess trust among learners allowing not only their classification but also the prediction of their future behaviors. Hence, we have conducted experiments on the real-world dataset from the MobiClique MSN that provides data from real mobile users and devices. Ultimately, our research can be expanded in two directions. First, we intend to build a dynamic Peer recommender Framework based on the proposed MOOC Trust model and to implement it in a MOOC platform to recommend trustworthy learning peers and to block untrustworthy ones. Second, we intend to propose a design of an MSN adapted to MOOCs platforms to boost social interaction and make available social data and trust information to guarantee an efficient trust evaluation. Our aim is to maximize the MOOC network performance, boost collaboration sustainability and ensure a better learning experience.