Towards a New Scalable Big Data System Semantic Web Applied on Mobile Learning

In Web 3.0, semantic data gives machines the ability to understand and process data. Resource Description Framework (RDF) is the liagna franca of Semantic Web. While Big Data handles the problematic of storing and processing massive data, it still does not provide a support for RDF data. In this paper, we present a new Big Data semantic web comprised of a classical Big Data system with a semantic layer. As a proof of concept of our approach, we use Mobile-learning as a case study. The architecture we propose is composed of two main parts: a knowledge server and an adaptation model. The knowledge server allows trainers and business experts to represent their expertise using business rules and ontology to ensure heterogeneous knowledge. Then, in a mobility environment, the knowledge server makes it possible to take into account the constraints of the environment and the user constraints thanks to the RDF exchange format. The adaptation model based on RDF graphs corresponds to combinatorial optimization algorithms, whose objective is to propose to the learner a relevant combination of Learning Object based on its contextual constraints. Our solution guarantees scalability, and high data availability through the use of the principle of replication. The results obtained in the system evaluation experiments, on a large number of servers show the efficiency, scalability, and robustness of our system if the amount of data processed is very large. Keywords—Semantic Web, MongoDB, Big Data, RDF, Mobile Learning.


Introduction
In view of the rapid emergence of new mobile technologies and the growth of the offerings and needs of moving society in training, work is on the increase to identify new relevant learning platforms to improve and facilitate the process of learning. "Distance learning" [1]. The next step in distance learning is, of course, the port of e-Learning to new mobile systems. This is called M-Learning [2,3] (mobile learning). The search for information in the field of m-learning can be defined as an activity whose purpose is to locate and deliver learning contents to a learner according to his need for information and its context. Until now, the learning environment was either defined by a pedagogical framework or imposed by the learning content.
Recent years have been marked by the rise of mobile learning or m-learning, driven by the continued development of new mobile technologies. Learning becomes situated, contextual, and personal. This phenomenon encourages the evolution of learning methods to adapt to this new type of learning. New uses in the field of learning have multiplied in different ways. In the context of learning within companies, we seek to develop an M-Learning system whose main issues are: • Learning at work whatever the time, place, delivery device, and the technological constraints of the learning process and adapted to the learner's profile • Learning without breaking.
In this paper, we propose a scalable and powerful Big Data recommendation system based on Semantic Web technologies. This system is composed of two main layers, in the semantic knowledge layer we use an M-Learning domain ontology, and in the storage layer, we use a document-oriented NoSQL database named MongoDB for data management. This system is a case of using our RDFMongo [4] solution, which presents a complete system for managing massive semantic web data.
This paper is organized as follows. Section 1 introduces the notion and emergence of mobile learning technologies and the standard representation of semantic data (RDF). We make a brief state of the art on existing work that deals with the topic of using Semantic Web technologies in the area of M-Learning in Section 3. Section 4 is devoted to the main contribution of our work with the syntax and architecture of our solution. Section 6 focuses on evaluating the performance of the solution implemented on real datasets from databases and standards. Finally, we give the conclusion and perspectives to our work.

Related Work
The goal of mobile learning research is to build a learning environment where activities need to adapt to the learner's mobility situation using new digital technologies. The representation of the abstract model in a specific format is called binding. Today there are two bindings of the LOM schema: either of the XML binding or the RDF binding: The XML binding is easy to implement, however, it remains insufficient for the representation of all the elements of LOM since it does not allow to express the semantics of these elements. The RDF binding defines a set of RDF constructs that facilitate the introduction of LOM metadata into the web and is complemented by RDFS for defining classes, properties, and so on. The advantage of this second type of binding is that it adds semantics to the elements of LOM, except that it is not expressive enough to define all the constraints of LOM. Consider the "Title" and "Entry" elements of the "General" category that are mandatory elements in the LOM. Using RDF and RDFS one cannot specify that a property is mandatory or constrain its use at one time for a resource. As a second example, RDF and RDFS do not allow to express the inverse of a relation: thus, to say that a LOx "has a part" a LO y, will not allow inducing that the LO y "is part of "LOx. This lack of expressiveness leads us to think of the use of another more powerful formalism. In order to determine which language is the most appropriate for solving the expressiveness problem, we have focused on identifying the necessary description logic (LD). The LD is a family of formalisms to represent knowledge in a structured and formal way. A fundamental characteristic of these languages is that they have a formal descriptive semantics. We start from a minimal logic ALC and we add to this logic the constructors necessary to define all the constraints of LOM.
Several research efforts use Big Data technologies for the management of massive RDF data, the [5] work presents a comparative study of RDF data management systems based on NoSQL databases, and the Hadoop HDFS file system (Hadoop Distributed File System) [6]. To manage large volumes of RDF data we proposed RDFMongo [4] an evolutionary system of semantic data management based on Mon-goDB [7], this system consists of two layers, the first manages the storage of RDF data, these data are stored in MongoDB, using an RDF triple transformation technique to a JSON document. The second layer manages the SPARQL query processing part, each SPARQL [8] request entered by the user is transformed into a MongoDB Query Language program. The SPARQL language is known by the complexity of its queries by that RDF to a graphical structure, for that, we have developed a system [9] for querying complex SPARQL queries using Apache Spark [10], in paper [11] the authors transform SPARQL queries to Hive [12] program.
This work is a case of using a large RDF data management system based on Big Data [13.14] systems like Hadoop and NoSQL databases. In [15] the authors present a case study of the use of the Semantic Web for automatic price management. Vesin et al [16] describe the use of Semantic Web technologies to facilitate the use of mobile devices in e-Learning systems. For information retrieval in Mobile Learning, existing work retrieves a lot of irrelevant information, but with the use of the Semantic Web and precisely the ontologies [17], we try to maximize the relevance of the search results. In [18] the authors aim to reduce the rate of retrieval of irrelevant information through the use of ontology, the execution of a mapping program using this ontology to successfully reduce processing time, as well as then the complexity of the calculations. El-Seoud et al [19] present the impact of the Semantic Web architecture on the development of e-Learning systems, they have derived the role of Semantic Web technologies in the process of development of learning systems through Semantic data processing mechanisms of e-Learning. Bakhouayi et al [20] presents a new Semantic Web-based solution to improve the interoperability of e-Learning systems using the next generation of SCORM specifications [21]. The principle of this system is based on the use of Resource description framework (RDF). more than that the RDF standard is a widely used interchange format for overcoming the interoperability problem by that in a system of mobile-learning several components: programs, software agents, web abuse communicate with each other, the use of RDF at the level of this communication is a means of optimization, and the availability of data used by these components. Rya [22] and RDF-3X [23] are two distributed RDF triplestores using for managing large amounts of RDF data, Rya stores RDF data in the cloud, we use both systems with our approach to measuring the efficient, and the scalability of our system.

System Architecture
In this section, we present our approach, the RDF data storage, and processing part is managed by our RDFMongo [4] system. The massive RDF [24] data is stored in the MongoDB database, and for the querying of this data, users use the SPARQL query language [25], it is a semantic Web standard dedicated to RDF data, RDFMongo transforms the SPARQL query of the user in MongoDB Query Language. So it's total management of RDF data and SPARQL queries by the MongoDB system. Figure 1 below shows an overview of RDFMongo, which is the background part of our Mobile-Learning system. Fig. 1. RDF data management using RDFMongo [4] As we have already quoted. We worked particularly well with the MongoDB system, which is one of the most used at present [26]. This system relies on the JSON [27] format for data representation MongoDB manages collections of JSON documents. This format allows for great flexibility in structuring data. Collections can store simple documents similar to the n-tuples of a relationship with atomic values, these collections can also include documents with complex structures that recursively nest other documents, or even tables of documents, or reference documents stored in other collections. The system allows to freely combining all these manufacturers and opens the possibility of storing data represented in a very complex way. The Mon-goDB system offers many features for mass management. It can be deployed on a client-server architecture but also on a cloud. It offers facilities for horizontal scalability and high availability. MongoDB allows partitioning of data, sharding, and implements master-slave replication protocols. MongoDB provides a set of operations for inserting, deleting, querying and updating documents. Concerning the interrogation, a set of selection, projection, aggregation and classification operators is provided. Logical operators, arithmetic operators, date manipulation and character strings are also available. After seeing the background part, we now describe our two layers that make up our system that aims to handle the M-Learning data semantically.
M-learning is often considered as an extension of e-learning. This extension is not mobile only, but it is also an extension to new forms built into the learning environment that e-Learning does not allow. The learning context is a crucial aspect of mobile learning. It is, therefore, necessary to determine according to the context of what resources to send, in what way, when, on which interface, etc. The whole learning process must adapt to these contextual changes. However, contextualization in learning is not easy to achieve. The diversity of mobile technologies and dynamics in mobile environments complicate the contextualization process.
Context management is an iterative process that uses contextual information at the system level from context detection and acquisition. It involves capturing the context data, storing them and distributing the LOs to the learner according to the contextual information stored. We define the necessary steps in the life cycle of a context-aware system: • Context data acquisition: This is to capture all the contextual information that is available. • Storage: The captured data is stored in a meaningful and understandable way for the intended use. • Processing: In our case, context information processing consists in selecting LOs from a query and applying an optimization method to refine the selected LOs. The data, as well as its metadata are stored in RDF format for use by our recommendation ontology, storage in RDF format has many advantages as: the semantic processing of the content, a format of exchange between the software agents and the program, and it is considered as the basic standard for ontologies.
For RDF data sharing, we present a new model that aims to preserve consistency through a combination of two techniques: commutativity and dependency relationships. The mechanism of commutativity consists of defining a set of operations that commute to each other on a given data structure. It must ensure consistency regardless of the order of the editing operations upon receipt.
We propose to formulate the contextual information in the form of RDF data, to record it in the MongoDB database and to apply a reasoning based on the inference rules, MongoDB is the NoSQL database that will be used to store the data RDF as a JSON document. To store the RDF triples in MongoDB, we transform the RDF triples into a JSON document using a Linked Data JSON-LD API [28].

Ontology
The ontologies make it possible to standardize the vocabulary, to standardize the language of exchange between the various actors, to compare the different systems and to structure the knowledge to simplify the analysis and the synthesis of the knowledge of a domain. Building on ontologies, the development of the Semantic Web opens up new possibilities and challenges to the design of a generation of adaptive systems, making it possible to model users' profiles and contexts. As part of the adaptation of learning pathways to the learner, ontology has become an unavoidable solution. It allows the construction of complex knowledge models that can be used to model both the users of the system, their context and the field of application in an intelligible way.
In the semantic knowledge layer of our proposal, and for reasoning purposes, only one ontology is considered that we call domain ontology of m-Learning. However, we distinguish two subparts of this ontology: (1) the LO model and (2) the context model. In the next section, we begin by describing the model of LO.

Replication data
In this section, we present a context-guided replication model for mobile RDF data. Replication of semantic web data on mobile enables applications and services to operate independently of the quality of network connections. Nevertheless, this replication must take into account the limited resources of mobiles such as storage capacity. For this, a selective replication is desirable in order to make available to users only useful data. The selection process is based on contextual information about the user and their environment. Let's begin by illustrating the context of which we can find several definitions in the literature. In general, context can be defined as any information that can be used to characterize the situation of an entity. Another definition describes the context as anything surrounding a user or device and gives meaning to something. Our replication model helps ensure high data availability. In this model we propose to formulate the contextual information in the form of RDF data, to record them in triplestores and to apply reasoning based on the rules of inference. Costly processes in terms of energy consumption such as the evaluation of contextual information as well as the selection of partial graphs run on the clone (to unload the mobile). The mobile is only able to acquire contextual information and send it to its clone without any additional processing.

Evaluation
In this section, we first introduced the configuration, test environment, and datasets used to evaluate our system based on the combination of Semantic Web and Big Data technologies applied to the m-Learning domain. This system has been tested on real databases, and benchmarks of performance comparison according to the function of relevance and the response time have been achieved. In order to show the efficiency of our system compared to existing systems, we compared our system with the two existing solutions: RDF-3X, and Rya. The results illustrated in the figures above show the robustness and efficiency of our system whatever the volume of data.
To evaluate our system we use Open University Courses [29] dataset of 54,584,125 triples, the second dataset is Coursera MOOC with a size of 4,927,697 triples, and LUBM [30] Benchmark and to show the scalability of our system our latest test are to realize on the Latest Wikidata [31] its size is 69GB and 7.2 Billion triples.

Evaluation using LUBM benchmark, open university, and coursera MOOC
Tables 1, 2, and 3 below show the execution times for LUBM Benchmark queries on the three LUBM Benchmark dataset instances created.   The following table presents the results obtained from loading time for the five datasets: LUBM1000, LUBM2000, LUBM5000, Open University, and Coursera MOOC. The following figure 4 presents the results of the execution of the three systems RDF-3X, Rya, and our system, to better evaluate and compare these three systems we used the five datasets. The data loading results show the efficiency of our system compared to the RDF-3X and Rya, for all data-sets tested, thanks to MongoDB which allows loading the data in a fast way compared to others Big Data system [32]. The following figure graphically illustrates the results in the previous table.

Evaluation using the latest wiki data
Our latest tests are made using the Latest Wiki Data which contains more than 69GB and 7.200.000.000 triples, we have executed the following 5 queries and the results obtained are illustrated in figure 7. The following figure illustrates the execution results of these 5 queries in (ms), we notice that the queries 1 and 4 take more time compared to the other requests this time deference is justified by the joining of these two requests which requires more time to join the recovered data. The query 5 of the count is executed quickly since it returns only a number.

Discussion
This M-Learning system allows you to recommend courses to learners according to their profiles, thanks to the use of M-Learning domain ontology and a knowledge server, all data is stored in a single format which is the triple RDF. After evaluating our system using datasets that contain large volumes of RDF data, we noticed the efficiency of our approach if the amount of RDF data is very large, as well as an optimal recommendation for learners according to their interests and their relationships through the benefits of RDF graphs and vocabularies like FriendOfFriend. The predicates of RDF triples are properties that express the relationships between subjects and objects of triples, in our system the properties of relations like reflexivity, transitivity and other are very simple to represent and store them using the M-Learning domain ontology because ultimately these data from this ontology are transformed and stored as triple RDFs. The efficiency of our approach will be reached when it is going to be used by a large number of learners, teachers, and others, and also when the database of courses and user information increases through scalable management provided by MongoDB, this document-oriented NoSQL database gives us scalability, partition tolerance, and high data availability for that it is considered the most used NoSQL database management system in the world. Another strong point of our system that is replication, the replication of semantic web data on mobiles allows applications and services to work independently of the quality of the network connections. Nevertheless, this replication must take into account the limited resources of mobiles such as storage capacity. For this, a selective replication is desirable in order to make available to users only useful data. The selection process is based on contextual information about the user and their environment. Finally, in this paper, we presented our approach for a recommendation system applied to the field of mobile-learning combining semantic Web technologies based on MongoDB database management system.