Web-Based Learning Under Tacit Mining of Various Data Sources

— Nowadays, many platforms provide open educational resources to learners. So, they must browse and explore several suggested contents to better assimilate their courses. To facilitate the selecting task of these resources, the present paper proposes an intelligent tutoring system that can access teaching contents available on the web automatically and offers them to learners as additional information sources. In doing so, the authors highlight the description logic approach and its knowledge representation strength that underwrites the moduli-zation, inference and querying about a web ontology language, and enhanced tra-ditional tutoring systems architecture using ontologies and description logic to enable them to access various data sources on the web. Finally, this article con-cludes that the combination of machine learning with the semantic web has provided a supportive study environment and enhanced the schooling conditions within open and distance learning.


Introduction
Every educational system aims to reduce the associated costs of the education process and increase the learners' assimilation level. Thus, several approaches and methods were proposed to reach this objective. On the other hand, embracing openness principles in e-learning has allowed people to freely access various data sources on the web.
Along with the rise of the openness principle, radical changes have been brought in many domains, especially those related to intellectual and literary production. For its part, teaching and learning have not remained insensitive to the phenomenon of openness. Many concepts have in fact emerged aimed at democratizing knowledge and enabling free access to information sources for each person, among these are Open Pedagogy, Open Educational Resources (OER), and Practices (OEP).
It is recognized that OERs are educational materials available to the public under an open license allowing free access to any user [1]. OER movement was initiated by the famous Open Courseware (OCW) project launched by the Massachusetts Institute of Technology [2]. OERs are increasingly integrated into e-learning platforms as complementary resources for teaching and learning. Free access and customization through open licenses make OERs a next-generation educational tool. Thus, a rising number of e-learning platforms are embracing OERs to provide both teachers and learners with free access to quality materials. Perhaps the most famous non-profit platform interested in educational resources area is "OER Commons". This platform allows the free sharing and exploration of available educational content to enhance learners' level of comprehension by adopting this content as a complementary information source. In fact, to benefit from the available content, the user must specify the subject, education level, and standard.
Recently, logical platform based web data access is considered as a new model in which organized intelligence or an ontological context is used to provide improved information exploitation when querying it. Querying-response, which attempts to calculate the answers to a query asked in terms of ontologies, is a vital service offered by a logical description method. Due to the multiplicity of platforms offering such free educational services, as well as the fact that most learners do not have a good command of web-based information search techniques, the prospecting process for educational content has become a tedious task for many people. To solve this problem, the work described in this paper proposes a hybrid Intelligent Tutoring System (ITS) that allows the exploitation of existing OERs from various sources on the web. The proposed solution takes great benefits from the description logics (DLs) and ontologies techniques in terms of indexing and research for information from various sources and provides a powerful learning means under favorable conditions. The authors chose using of DL-Lite motivated by its efficiency for query-answering. This paper begins by presenting the literature review in Section 2. In Section 3, we find highlight ontologies and DLs and its effectiveness in the knowledge representation. Then, in Section 4, we present our model for an ITS accessing a variety of data sources, followed by a detailed explanation of their operation in Section 5. In addition, the evaluation of the proposed model is resented in Section 6. Finally, we conclude the research described in the present paper.

Background
An ITS are seen as a subject of crossing educational, psychological, and Artificial Intelligence (AI) research interests. There are attempts to produce educational systems that could simulate a human teacher. The adopted knowledge representation manner affects directly the effectiveness of such kind of systems. For this reason, caution must be taken when choosing a representation method.
Formally, exploiting computer knowledge does not aim to blindly manipulate information on the machine, but to enable a productive exchange between the system and their users. Then, the system must have access not only to the terms used by the human being but also to the semantics associated with them, so that effective communication is possible. On the other hand, the progress of AI in the domain of solving problems and representing knowledge has added a certain amount of intelligence to educational systems.
Within these developments, many surveys have recently been performed in the ITS development domain. A recent survey approached the effectiveness of these educational systems through a meta-analysis review [3], it found the nature of the control treatments and the adequacy of program implementation directly influence the performance of the ITS developed. Other authors present a systematic review on existing systems by exhibiting of its critical information and the extent to which their different features are prevalent [4]. Also, another study has more in-depth compared to previous ones, as it adopted a synthetic review of evaluation characteristics, applications, and methods [5].
In concrete terms, several systems have been developed to simulate human behaviour in teaching field, as they were closely related to the subject to be taught. For example, the researchers in [6] have proposed an ITS for teaching computer science engineering and allow to distance many learners at once using AI methods. Others have proposed a new approach of ITS based on adaptive workflows and serious games [7]. To solve the reusability problem of the produced tutoring systems and their dependency on the subject matter to be taught, numerous authoring tools have been developed to enable the development of generic ITSs, such as EDUCA [8] and GRAT [9].
Researchers have recently attempted to improve ITS quality by incorporating ontologies and exploiting their robustness and high level of knowledge representation. Machine learning technologies were introduced in [10] which the authors studied its models explainable by the semantic web and suggested additional research directions. They have also proven that linking machine learning and semantic web technologies can offer interesting opportunities for model explicability. Also, a review of current research using ontology to achieve customization of referral systems in e-learning was conducted [11]. The focus of this review was to understand the different research works in the domain of e-learning systems that recommend not using ontology and those of hybrid recommendation. Also, personalizing the educational process according to learners' preferences is the subject of the work performed in [12], their authors proposed an ITS based on ontologies to represent the learner model in e-learning. The proposed ontology schema consists of the learner's academic and personal information. The learner features that were included in the ontology of the learner model were derived from an experimental study on a sample of learners. Along the same lines, and to counter the impact of the information acquisition, which prevents the widespread use of knowledge-based technologies, a meta-knowledge engineering architecture has been developed. [13]. The established framework is based on the distribution of the knowledge engineering load on various stages of reusable information engineering methods, as well as various knowledge engineers who can reuse them. Another work aims the ontology-based knowledge representation structure to develop an ITS capable of teaching object-oriented paradigm courses has been developed in [14], the key feature of this work is the reusability of developed ontologies. Against this background, the improvement of the learning cycle forms the target of work done in [15] which consider one of few works in this context where the authors went further by seeking to propose the design and construction of two ontologies: the first provides a representation of the knowledge covered in the course, and the second is a generic ontology of modelling contributory features. Also, a different approach based on employing the web data mining techniques has been developed in [16] which its main characteristic was to allow promoting the individual service and improving the teaching quality of modern distance education.
Given the various related works studied in this section, there is still no consensus on the content of ontologies, the methods to be used to build them, the models and languages used to represent them, so many problems have not yet been solved. Furthermore, despite the great number of researches presented in the domain of intelligent educational systems, no one has been found that exploits the multitude sources of open educational resources OER to improving the education process quality. In order to be able to exploit various data sources available on the web, research described in this paper suggest to adopt the approach of logical description in the modelling of intelligent tutor systems. In the following section, the authors will understand how DLs approach works in the context of knowledge representation. The following statements summarize authors' contributions in this work: 1. The first contribution tackled in this work is reviewing the literature of an ITS accessing a variety of data sources bases problem; 2. Secondly, the authors present a description of the knowledge in the DL-Lite via three-level architecture of Ontology Based Data Access; 3. Finally, they developed a new approach to tutoring systems under DL-Lite knowledge bases which have not been addressed in the previous scientific literatures.

DL-Lite approach and knowledge representation
DLs were introduced by Brachman [17]. It is a sequence of syntaxes intended to reflect knowledge in a given domain and then reason by deriving new knowledge. The purpose of this logic in AI is to build languages that fundamentally represent domain information and make this representation accessible for reasoning [18].
As its name suggests, DLs have formal semantics; the description of terms used to define a domain, as well as the semantics dependent on logic that can be provided by a transcription in logic of predicates or in first order logic.

Knowledge representation in DLs
Description logic uses only unary and binary predicates, called: concepts and roles. Concepts represent the set of individuals, and roles represent binary relationships between individuals (or between concepts). Concepts and roles can be defined or primitive. A concept (or a role) is defined if it is structured from constructors. A concept (or a role) is primitive if it is comparable to atoms and serve as a basis for the construction of defined concepts (or roles). An DL is made up of two levels: 1. A TBox terminology level: dedicated to the description of concepts and roles.
2. An assertional level ABox: describes the description of facts.
An DL knowledge base, denoted by K=T, A, such that T represents the level terminology TBox (generic domain knowledge), and A represents the extensional level ABox, which contains assertive facts (individuals).
Several light fragments of DL have been developed such as the family of DL-Lite family [19] which we are interested to using in this work according to his ideal for OBDA settings.

The family of DL-Lite
The W3C consortium proposed OWL2 profiles to provide significant benefits in particular application scenarios. In recent years, DL-Lite (family of lightweight description logic), has been specially designed for applications using huge volumes of data such as web applications (Semantic Web) where responding to requests is the task most important reasoning. DL-Lite guarantees an efficient computational complexity of the reasoning process using relational database techniques [20].
DL-Litecore is the core of basic kernel of all logics DL-Lite, DL-LiteF and DL-LiteR are under profile family OWL2-QL. In this paper, the authors use the DL-Lite family of description logic to refer to these three fragments. The knowledge representation format for DL-Lite is based on lists of atomic concepts NC, roles NR and assertions NI.
In order to describe complex concepts and roles three connectors: ,  and ¯ that are used: Note that, this article follows the description of DL-Lite used in [20]. Such that, Role¯ is the inverse of Role. CConcept is a complex concept and CRole is a complex role.
Formally, a knowledge base KB represented in DL-Lite is a pair KB= TBox, ABox. TBox is a list of inclusion axioms of the type: Concept1  Concept2 or Concept1  Concept2. ABox is a list of facts of the type: Concept(fact) and Role (fact1, fact2).
The DL-Litecore language is extended by the capability of functional specification on roles (or inverses role) of the type: functRole. This extension is called DL-LiteF. However, the DL-Litecore language is extended by the ability to specify inclusion axioms of the type: Role1  Role2. This extension is called DL-LiteR.
A DL-Lite knowledge base's semantics are expressed in terms of interpretations I.
An interpretation function . I links each fact f from NI with an element f I from  I , and each Concept with a subset Concept I from  I . Also, it links Role with Role I from  I × I . I  I  I   I  I  I   I  I  I  I   I  I  I  I   I  I  I   I  I   Role  Role Role Role

Architecture of three level for data access
The main objective of the modelling process, which began in the 2000s, is the creation of intelligent structures for data processing from database sources. In order to show data on RDF (Resource Description Framework) graphs from a relational data base, the central concept is to provide declarative mapping requirements for domain ontology axioms. These RDF maps are triple-materialized. The premise is that triples do not materialize and stay imaginary, and then in a second stage where they are executed, query processes are created. Therefore, by preventing recursion and property chains, the typical technique used is query rewriting [20]. The main idea of OBDA's (Ontology-based Data Access) is to have users with access to knowledge from multiple databases via a three-tier architecture, consisting of ontology, sources, and mapping, where ontology is a formal classification of the field of interest and is the subject of the structure. Via this system, he provides a semantic endto-end link between users and data sources, allowing users to directly query data spread through various distributed sources using the ontology's well-known vocabulary. SPARQL (Semantic Protocol and RDF Query Language) is generated by the user queries over the ontology that are translated into SQL (Structured Query Language) queries over the underlying relational datasets via the mapping level. In OBDA system, the querying is to transform the query Qu to the query Qu1 in the data terms so that, for any possible data, the responses to Qu are the very same as the answers to Qu1.

ITS model accessing a variety of data sources
The proposed model represents a refinement of the classic ITS model. Formerly, the ITS model consisted of four main components: the domain model, the pedagogical model, the learner model, and the communication module. The improvement resulting from the proposed model was the addition of a prospecting module, as shown in Figure  2. The authors added module provides automatic access to various data sources on the web, which the pedagogical model then presents to the learner in the form of suggestions for additional information related to the lesson topic. Based on the principles of the environment, users should ask questions rather than the constructs of the data sources. It would not demand that the data sources be completely implemented at once. Rather, extra data sources or more components can be phased in after building even a rough skeleton of the domain model, as they become usable, or as needed, thus amortizing the cost of integration. In addition, the corresponding mapping of data sources provide a common basis for the documentation of all the data within the organization, with clear advantages for the management of the information system.

Domain Model
The domain model has two components: Curriculums Engine. as indicated by his name, this engine is responsible for providing the pedagogical model with the subject matter to be taught to the learner according to a teaching planning and method adapted to his profile.
Regular Courses and Exams. it includes all developed educational content within the present tutoring system such as courses, exams, and pedagogical activities. These contents are designed to make them adaptable according to the learner's profile.

Pedagogical Model
This model is considered the most important element of each e-learning system as it takes charge of the teaching task and it includes a pedagogical knowledge base and a mediator.
Pedagogical mediator. this component is the key actor that coordinates the activity of all components of the system to make e-learning a success.
Knowledge base. it contains a range of pedagogical methods and teaching aids that assist the tutor in the management of the educational process.
1. Teaching methods: the diversity of teaching methods provided by the tutoring system increases the students' capacity to assimilate the presented courses. 2. Courses planning: allows to divide the course into a set of independent observable and measurable teaching units. 3. Teaching aids: it is all the tools and media provided by the system to facilitate the presentation of the subject matter to be taught, the most common of which are images, videos, charts, flashcards and objects.

Learner model
The learner model includes two constituents, a session manager and learner profiles. The session manager monitors and controls every working session and protects learner's confidential information to prevent and detect malicious activity. On the other hand, the learner profiles are a database containing a collection of information about the learner, which is used for planning and supporting instruction.

Prospecting module
As the authors saw above and thanks to the three-level architecture of OBDA, this module provides learners to access information from their data sources. Also, the prospecting module enabling learners to explicitly query data scattered over multiple distributed sources using the SPARQL queries over the ontology.
Note that, the full collection of detailed advantages of this module they are structured to involve domain specific patterns. This means that the logical schema used in the module in combination is also the specific framework added to the SQL database method.
Ontology Layer. in architecture, the ontology layer is the way of following a declarative method to information integration and, more broadly, information governance. A systematic and high level overview of both its static and dynamic dimensions, described by the ontology, defines the organization's domain knowledge base. It reusable of the gained information by making the domain representation transparent, this is not achieved when the global schema is merely a single description of the data sources that underpin it.
Mapping Layer. the mapping layer links the ontology layer with the knowledge source layer. However, they are not only running the information system, but they are valuable metadata tool in circumstances if data is commonly distributed into numerous pieces of metadata that are often difficult to access and seldom conform with uniform standards.
Source Layer. the data source layer is made up of existing data sources of the organization.

Communication module
The communication module is responsible for the management of different interactions between the devices of the learner and the system. it brings flexible usage to the system for achieving the desired objectives of the educational process.

Implementation
Software prototyping encompasses all the processes involved in the proper functioning of an element within its environment. It focuses on answering questions about how to transition from the conception phase to operational software that conforms to the requirements defined in the technical specifications. To reach this aim, it is necessary to identify the different appropriate programs and tools, and then to translate one's own model using the set of tools chosen as a usable product. The proposed tutoring system has been developed as a mobile application to give its users the possibility to study anywhere and anytime. In doing so, the authors have chosen to use the Java and XML languages under "Eclipse" as IDE based on a software package (libraries, tools) called SDK Android and ADT. They also took advantage of the AVD emulator to run and test the application to be developed on Windows. SQLite was adopted also for database creation and manipulation. Moreover, the ontologies were manipulated by means of the "Protégé" editor to represent the domain's knowledge.
To do so, using the chosen software range, a prototype was developed, called "Prospec-T" where it was used by a section of learner in higher education during this last academic season. Indeed, the proposed mobile application was the most appropriate alternative to classroom study due to the actual global health crisis and it provided very satisfactory results in terms of learner's academic success.
However, the authors use in this paper lightweight ontologies logical languages to represent Tacit Mining Course (TMC). The main advantage of this representation is that query answering can be done efficiently through the different sources of course motivated by efficient computation of conflicting information. Typically, a course is composed of one or more educational material that represents the knowledge which developers of e-learning platforms can use an ontology to model it using OWL. This logical course description allows us to handle relationships and rules of inference for the different sources of educational material.
OBDA as seen above is a new paradigm of organizing access using description logic to various types of information and provides an application of additional declarative specifications for describing a set of maps that specify the translation of a common set of ontology structures into a relational database [21]. The authors illustrate the usefulness of a three-level architecture of the OBDA approach to developing an open course system with a deep level of interoperability among different databases (educational material) and accounting for additional dimensions of data quality. The Figure 3 presents the architecture with a three-level mapping of the ontology-based web course (Prospecting Module). Connections between low levels of TMC ontology and database are realized by OBDA specifications using the different data sources: web site course, a local course, historical course, and cloud course. Also, the mapping level includes special service data models to manage course import or course receiving from external sources.

Experiment
Aiming to know whether the proposed system improves learning conditions, the authors conducted an experimental study on a sample of higher education learners. In this context, they established at the beginning their research hypotheses. Also, they specified the characteristics of the study in terms of the targeted sample and the associated evaluation criteria. Finally, they discussed the obtained results to clarify the benefits of the proposed system.

Hypotheses
The present experiment is based on two main research hypotheses reporting the quality of the educational process through the proposed educational system. It was also reinforced by a third one to ensure the credibility of the results obtained. As for the main hypotheses, they are based on the improvement of learning conditions in terms of increasing learning gains and reducing the time spent studying, while the additional hypothesis aims at ensuring the homogeneity of the studied groups.
1. Hypothesis of increased learning gains: assumes that the proposed educational system can make a major improvement in learning gains because it is designed to provide its users with additional information from a variety of web-based data sources relevant to the subject matter being taught. These learning gains allows measuring learners' assimilation level of the teaching material before and after the adoption of the proposed system to determine its impact degree on educational process quality. Indeed, a pre-test and post-test in each group of the course are counted to calculate the percentage of learning gains. This ratio makes it possible to see both the increasing or decreasing trend in learning gains and the general level of improvement throughout the learning process. 2. Hypothesis of reduced learning time: decreasing the learning time is the second hypothesis adopted to measure the performance of the proposed system. As well as learning time refers to the duration necessary to achieve a pedagogical objective by people who engage in distance learning. So, the average elapsed learning time for both participating groups in this survey must be calculated to infer the reduction rate. 3. Homogeneity hypothesis: it is assumed that the two groups studied (experimental and control) are homogeneous, i.e., they are composed of a convergent mix of learner types: a catalyst for change, computer literacy, intelligence, and hard work. To check the validity of this hypothesis, it is necessary to obtain balanced pre-test results for both groups.

Method
The authors experimented with a sample of 82 learners in a higher education institution. To do so, they divided the study sample into two groups, in which the EG consisted of 48 learners while the CG was formed of 34 others. They also ignored the gender of the learners because they considered it to be a non-influential factor in the elearning process. However, the focus was on the psychological and cognitive aspects of the study sample. Table 1. shows the parameters of this study. The authors have developed two versions for an ITS, one of which is a full version that includes an automated prospector of web-based open resources, while the other is the same tutoring system but does not include the prospecting service. Then, considering the type of tutoring system chosen by each individual in the study sample, the authors divided the participating learners into two main groups: experimental and control. The EG is composed of learners who chose the full version of the tutoring system that includes the prospection service. While the CG is composed of learners who have chosen the same tutoring system but do not have a prospecting service.

Results
In this study, learning time is defined as the amount of time the learner uses the proposed system to practice their classroom activities, excluding exam time. On the other hand, the learning gain is a measure that indicates how much effect of such educational system on the schooling career of the learner, as it is calculated based on the ratio of obtained scores at the beginning and the end of the course.
In fact, encouraging results were obtained after a trial period of the proposed educational system with regard to reducing learning time and improving learning gains. For a better understanding of the collected statistical data, the authors have adopted the technique of one-way analysis of variance (ANOVA) where the statistical test (F-test) was calculated with an alpha level (significance) is 0.05 for data set of pre-test, posttest, time, and gain of learning.  Figure 4. shows that EG members who benefited from the web-based OER prospecting service practiced their classroom activities better than everyone else.

Discussion
This survey has presented very positive results relating to the study conditions that have benefited the learners accredited on the open resources prospecting via the web. This positivity is described as follows: • Results showed a high degree of convergence in pre-test between both survey groups which confirms the hypothesis of homogeneity (F = 1.001, p = 0.491). • In post-test, the EG was significantly progressed from the CG (F = 0.546, p = 0.035), which is due to the use of the open resource prospector from which the EG members benefited; • The substantial improvement presented by the EG in the post-test had an effect on learning gains of (123,45 %) compared to the CG. • The recorded rate of learning gain is due to the provided search facilities on the web via the proposed open resources prospector from which the EG benefited; • Learning time was reduced by about one-third (-32.93%), as the CG recorded an average learning time of (30,65 hours), while the EG recorded an average of (20,56 hours); • The decrease of learning time resulted from the fact that the EG members were spared the trouble of searching for information on the web by the use of the proposed educational tool which provides the best possible conditions to reach the aimed pedagogical objective; • The majority of learners who participated in the survey appreciated the proposed educational system, especially those who used the full version containing the prospector.

Conclusion and future work
Learning processes often focus on exploiting additional information sources to understand a given educational content. So, the learner must have certain skills for making an effective web search. On the other hand, the application of the openness principle in education has led to the democratization of knowledge and the overcoming of difficulties associated with trying to access it. For this aim, the authors developed an ITS for mobile learning which could explore automatically various sources of OER available on the Web. It adopts the DLs approach to represent knowledge thanks to its robustness and completeness of distributed systems' modulization.
However, there are some shortcomings in the actual implementation of the proposed system that need to be addressed as soon as possible. The authors have received few comments from the learners, concerning the enhancement of application functionalities, such as interface features and the incorporation of advanced interaction mechanisms with the artificial tutor. Also, another perspective of this work is to enrich the associated prospector with an adaptation scheme according to the learner profile. This improvement will provide relevant research results that match the learner's profile and offer better study conditions. Although such negatives are noted, they should not diminish the work's value as it was met by learners with great approval by giving them a favorable environment for mobile learning anywhere, and anytime. The last direction that authors will consider exploring is how to deal with the inconsistency information which they have a different levels of reliability.