Design and Implementation of Anonymized Social Network-based Mobile Game System for Learning Mathematics

— Group work and student collaboration during problem solving sessions are teaching methods which positively affect learning outcomes and socialisation. It is extremely complex to find a way of applying these methods to make them appropriate for and interesting to the new digital generation of students. This paper proposes a model which enables social network collaboration between primary school students within the system for mobile game-based learning of mathematics. It also suggests technology and proposes a general model which enables researchers to access anonymized data, teachers to keep track of student progress, and students to keep track of their own progress relative to other students, all at the same time. A microblogging social network service is integrated in the system in a way that enables sending messages without additional authentication and thus facilitates dynamics of the system. The proposed model enables mining of anonymized data streams originating from both the game and the social network. In this paper the model is used for the analysis of concepts which students most often publish, and for the analysis of their correlation with other activities within the system. Social network posts are analysed with the aim to detect students capable of taking advanced classes which cover more complex areas than the regular curriculum.


Introduction
High use of online social networks as a primary tool for communication and forming interest groups among children and younger population facilitates research in the field. Mining data from these platforms gives insights into communicators' reasoning and enables tracking local and global trends across different areas [1]. In general, social networks cover different areas and thus gather different groups. However, Facebook stands out for younger population (of age 29 or under). Its popularity is partly due to the fact that groups can also be closed and contain posts inaccessible to the general public. The primary format of the Facebook content is blog. Twitter is another social network which also uses blog style format, but the main difference between the two is that Twitter messages are by default public instead of private and thus accessible not only to a closed group but to everyone. Another difference is that Twitter uses microblogging, i.e. the message is limited in the number of characters to the length of an SMS message. Due to the length restriction, a new sign language is being developed, which aims to express a complex phrase or a sequence of concepts with several stylised characters [2,3].
The appearance and wide acceptance of smart mobile phones helped these social networks in gaining popularity by making content accessible at all times and enabling users to promptly engage in communication. The user and the system create enormous amount of textual and other data, which can be unstructured or semi-structured [4].
By taking into account particularities of microblogging and its potential in educational processes, this paper introduces models for analysing such data. These models are implemented in a system for learning primary school mathematics by using video games for enhancing intrinsic motivation while learning complex STEM areas (science, technology, engineering, and mathematics) early in formal education. Educating and preparing future students for STEM areas is of particular importance in the Republic of Croatia, since STEM has been recognised as a base for innovation. However, statistics show that the number of students in this area is decreasing, and over 40% of the students give up their studies at the very beginning [5]. This research aims at detecting talented students by analysing social network data in order to pinpoint students who might be successful in STEM area.

Related work
The development of new digital technologies and their acceptance in everyday life changes the way we approach knowledge. As early as in 2001, Prensky compares the new generation of students who are digital natives to educators who are digital immigrants with respect to cognition and habits. Digital natives do and learn multiple things at once and want immediate results. Digital immigrants, on the other hand, are used to step by step approach [6]. Regardless of how they are called, it is evident that young generations grow up and communicate in a way different from their predecessors [7]. These behavioural and communicational changes between generations have been accelerated with the appearance of mobile phones, and in particular with the appearance of first smartphone devices which are nowadays dominant platforms (iOS 2007, Android 2008). Smartphones have a major impact on the functioning of individuals and society as a whole. The constant digital connectivity can be regarded as having a positive impact, while addiction and physical alienation can be regarded as having a negative impact [8]. Besides replacing multiple devices by one device, smartphone devices owe its popularity to a huge number of available applications -at the end of 2017 Google Play had 3.66 million applications [9], and Apple's App Store had 2.06 million applications [10].
In the last ten years (ever since the existence of smartphone devices), a numerous research studies have been conducted with respect to mobile learning and social network use among digital natives in universities. Students mostly use the smartphone applications such as social networks (Facebook, Twitter, and Instagram) and applications for communication (WhatsApp), followed by search engines. Music and entertainment applications are also in top five [11]. Besides using it primarily for collaboration [12,13], Twitter is positively accepted by students as a tool for building up knowledge when connected to an e-learning system [14].
In addition, Twitter can be used in high schools as a tool for communication between teachers and parents [15], and for enhancing language skills [16].
Primary schools students access social networks either under parental supervision or more often without parents' knowledge by lying about their birth year and thus bypassing personal data protection restrictions. Children's personal data protection restrictions are tackled in the section titled Research questions.
Besides spending time on social networks, new generations have accepted video games as one of primary forms of entertainment. Its' market value is often compared to film and music industry [17]. Computer games are played on the following three platforms: PCs, consoles, and mobile phones, with smartphones dominating the market by the number of sold devices but also by the number of available games. Playing video games can have a positive impact on intrinsic motivation in the process of learning as students like to spend their time playing in general. It is one of the reasons behind the introduction of video games or even wider gamification concept into teaching less interesting or harder-to-learn content. Majority of related research studies are conducted in the context of higher education. The most often analysed aspects include behavioural change, improving learning, socialisation, and engagement [18]. However, the limiting factors in the wider adoption of teaching through video games are game development complexity and pricing [19].
The implementation of the described teaching methods and their use as platforms for online learning runs in parallel to the development of models and methods which enrich data by analysing it and thus contribute to the optimisation of the learning process. By integrating educational data mining and learning analytics into e-learning systems [20], the quality of both teaching and learning can be enhanced [21].
Although growing-up with digital technologies, new generations have not developed better skills in all cognitive areas in comparison to older generations. Multitasking is one such example [22]. Majority of other characteristics featuring digital natives affect the need to adapt traditional teaching methods in such a way to support mobile learning and learning with new technologies. By moving the process of learning outside the physical boundaries of schools, technology enables the digital generation of students to be constantly connected with their fellow students, but also with teachers, and to get immediate feedback. Moreover, experimenting is much easier, which positively affects the learning process and learning outcomes [23].

Research questions
Data about the user, including personal data, is usually collected when the user accesses online content and services which require registration prior to use. Majority of social networks require registration. Due to this, different regulations related to personal data protection apply to these systems. In order to protect the privacy of children under the age of 13, online services in the United States of America are subject to COPPA (Children's Online Privacy Protection Act) [24]. Those in the European Union shall be subject to GDPR (General Data Protection Regulation) as of 25 May 2018. GDPR is related to personal data protection in general. It moves the limit for collecting and analysing personal data without parents' approval to the age of 16. If any of the member states wishes to amend the limit, it must not be under the age of 13 [25].
Since 2009 Twitter does not have any age limits for creating a user account, i.e. profile information does not include birth date or year. However, this information is required if the person wishes to follow a brand. Twitter logs only whether the requirements are met, i.e. whether the person is over the age of 13, and not the exact birth date thus avoiding records of private data. In 2013 Snapchat introduced a new version, named SnapKidz, for those under the age of 13. In that version posting photographs and videos is disabled, as it would violate provisions related to personal data protection. Profiles in social networks like Facebook and Instagram can include all sorts of personal data. If the entered age does not match the limit of 13, or even 16 in some countries, it is not possible to create a profile.
The above-described limitations can be easily bypassed if the user enters false birth date or birth year. Therefore, children using social networks need to be supervised by their parents. Digital natives aged from 9 to 12 are very frequent users of social networks. Over 50% of children of that age has a profile on one or more social networks, while that percentage goes up to 72% for those of age 13, and 89% for those of age 15 [26].
Children who create profiles on social networks by providing false birth data can be exposed to different negative influences and messages inappropriate for their age, such as alcoholic drinks, etc.
In relation to the above said and in order to inspect the possibilities of applying special technology aided collaboration and grouping, a question arises of how to integrate social networks in the appropriate way. Appropriate means that no personal data is collected and that use is pretty straightforward for primary schools young generations.
The second question refers to the selection of a model which supports anonymized access but also enables participants to recognise each other in communication or to compete in educational games without revealing their identities to external stakeholders. At the same time researchers have to be able to integrate attributes in a way that does not allow unanimous identification. Establishing correlations between activities within the e-learning system can be facilitated based on the student profile for the purposes of further research, adaptive learning, or shortening the process of knowledge acquisition.
The third question is related to the possibilities of optimizing teaching material by detecting talented students based on their interaction with the system and their posts on social networks. These students need to be detected on time to adapt curriculum appropriately, i.e. to make it challenging for them by incorporating additional, more demanding content and exercises, and to monitor their progress efficiently.
With respect to the listed research questions, the following null hypotheses will be examined: 1. There is no difference between students attending regular classes and those attending advanced classes with respect to providing solutions. 2. There is no difference between students attending regular classes and those attending advanced classes with respect to expressing opinion on the game. 3. There is no difference between active microbloggers and users who do not publish posts in terms of their ranking among the top 20 results. 4. There is no difference between active microbloggers and users who do not publish posts in terms of their ranking among the 20 most frequent players.

Methodology
A week prior to using the mGBL system developed for the purpose of this research, the mathematics teachers announced to the students that they will soon gain access to the system which enables learning concepts and solving problems by playing a game. Not only that the game is supported on mobile phones, but it also allows students to compete and compare their results. The students got the task of creating nicknames they will use for access and to report them to their teachers before the due date. The teachers got the task of making sure that no duplicates exist (although no such cases occurred). The students were in no other way motivated for the game. Their main motivation was mutual competition. The game was available at all times and there were no restrictions imposed on the time spent playing the game.
The students were divided into two. The control group played the basic version of the game, while the experimental group played the enriched version. Besides enriched graphics, the enriched version enables students to win a cup or a medal, and on each game level shows customised encouraging and stimulative messages. The nickname is used for determining and loading the right version. Difference in user interface is shown in Fig. 1.
Students of one class belonged to the control group, and students of the parallel class belonged to the experimental group, i.e. there was no difference in the group membership within the same class.
Since primary and secondary school children bypass age restrictions on social networks, it is evident that there is a huge interest in using them. Due to this, there is a necessity of building a communication platform similar to social networks with the difference that no personal data is collected, either through the interaction with the system or through creating a profile. The research in this paper is based on the analysis of data originating from social networks modified in the described way.

Technologies used in system implementation
In order to make the system as widely available as possible both to student and teacher population, the main requirement is that it supports all dominant desktop and mobile platforms. Due to this, several open source technologies are used. Mobile platform adaptation refers to rendering in WebGL/canvas, autoscaling of the interface to full screen, registering clicking and tapping, and using sensors. The client-side is based on HTML5 standard and supported APIs, JavaScript frameworks (Phaser, JQuery), JSON format, and AJAX for client-server communication. Real-time data fetching with AJAX calls is done on server-side using PHP. The data is stored in both, *.out csv-formatted files, and the database (databases supported by PHP are available through connectors).
Since the described systems generate enormous amount of stream data, representative of all three Vs (Volume, Velocity, Variety) [27], HiveClient is used for connecting to the database in the Hadoop ecosystem in case of an increased number of users, which makes it scalable.

Anonymized m-learning model
For personal data protection, i.e. for assuring privacy during system use, data anonymization is conducted and the dataset for further analysis is obtained. The model of interaction and relationships between different areas integrated in the game are shown in Fig. 2. Data created through interaction with the game or the system arrives in streams [28]. Different parts of the system continuously send data of different types in irregular time intervals. For example, data saved to an output file as a result of an interaction with an object includes the following -start time, object id, duration, end results, inactivity time, pause time, time of switching to other parts of the system, etc. Nicknames authorize students to use the system. The authentication is performed as soon as the student accesses the system (access data not valid warning is displayed otherwise and the user is redirected to a new authentication trial). Upon successful authentication and authorisation, local storage is used for login data. Besides its simplicity, this approach positively affects dynamics of the system use since login data needs to be re-entered only with the change of a user. Therefore, the game can start as soon as the user accesses the system. The drawback of this approach comes to light with shared devices, i.e. school computers. More precisely, one needs to be careful to change the system user every time users switch computers. However, this problem persists even with the classic authentication modes of other web applications, e.g. e-mail service if login data is remembered on the computer.
The gathered data is organised into two datasets: 1. Student data, which is available to the teacher, is available to the researcher only after anonymization (with personal data removed, e.g. name, surname, parents, address, etc., or any combination of attributes which could potentially unanimously identify the student). Such anonymized dataset can consist of attributes such as grades in mathematics (prior knowledge) and other subjects, general achievement, sex, age, remedial classes, advanced classes, extra-curricular activities, after-school activities, left-handed/right-handed, behaviour, absences (excused and unexcused absences above or under average), type of program (regular, individualised, special), etc. The discretization of numerical attributes such as absences, and any other data is conducted as long as a combination of attributes which could unanimously identify the student exists. The teachers assign the group labels to differentiate between the control and experimental groups. The matching between nicknames and full names is available only to the teachers so they could follow their students' progress.

Integrated model for the analysis of stream and microblogging data
Besides the general model, a model based on the microblogging platform is proposed and implemented within this research. Although it reminds of Twitter, for privacy protection it does not collect any personal data. The model enables posting messages directly from the game. It facilitates the dynamics of the system as there is no need to sign up for a separate user account.
In order to distinguish the author of a message, the application reads the nickname from the Local Storage and appends it to the beginning of the message prior to sending it. The format of the message is the following: #Nickname + message content. Since replies are sent from the game and not from the microblogging interface, a hashtag is added for the purpose of indexing and chaining (like on Twitter), and in this case for merging records of the same user.
Simply put, from the client-side the game is just a graphical representation of pixels on the screen. There is no possibility of using forms and objects of the textbox class for text input, which would call device-specific keyboard on operating system level. The problem of sending messages from the game can be solved by creating or using modules like Slick-UI which draw keyboard on the principle of "one key one sprite" and use JavaScript events of the type onKeyPress = onInputDown/Up. The character representation of a sprite is then saved to a variable and AJAX request is sent to the server. The server-side technologies are used for posting content to the microblogging service, and storing it in a file and database.
A disadvantage of the presented model that might come to light only in case of long messages is that the maximum size of a message is shortened by the size of the nickname (limited to maximum 20 characters). Therefore, the maximum message size in this research is limited to 120 instead of 140 characters. This limitation is acceptable as the average length of a tweet in English is about 34 characters [29].
Uploading photos or videos is disabled for privacy protection reasons, similarly as in SnapKidz.
The proposed model enables publishing anonymized posts even on Twitter for those who meet the age limits. OAuth is used for the authorisation of messages sent from the game to Twitter. The students do not need to log in as the application sends messages from a developer's account within the system. One advantage of the presented model is that no additional Twitter connectors are needed and data can be analysed in realtime since the content published on Twitter is also stored to the server file. Using this approach Twitter rate limits [30] do not present a hindrance.
The proposed model is shown in Fig. 3. It uses workflows for analysing merged anonymized data stream on the open-source KNIME platform [31]. Data arrives from the microblogging system (File Reader -> 1), but also in the form of data stream generated through the interaction between the student and the game (File Reader -> 2 refers to any move such as touch/click, drag/drop, interacting object ID, thinking time, etc.). The two datasets are merged with the static anonymized student attributes (File Reader -> 3) by the nickname in the extended stream dataset.
Pre-processing using String Manipulation Node is done prior to concept extraction since messages can contain typos or spelling mistakes. All the steps involved in concept extraction are shown in Fig. 3a. In this way concepts like "rjesenja", "rijesenja" and "RIJESENJA" are normalised. The content can also include shortened forms or sign language [32].
A standard processing workflow is modified in the part which includes reading files. Since there is no re-use of the API for social network data fetching, there are no standard limits, as mentioned in previous paragraphs.
Extended anonymized dataset can be analysed by classic datamining methods, as well as those for data streams.

Dataset
The research is conducted on the sample of 104 students attending 5th to 8th grade of a primary school in the Republic of Croatia. The students are between 11 and 14 years old and they take classes in mathematics. The time period refers to the academic year 2017/2018, and includes one week prior to winter holidays, three weeks of holidays, and two weeks after holidays.
The total of 73 students or 70.19% accessed the system. The game was played 44 times on average. The total of 16 students played above average with respect to the number of played games, out of which 11 played the game over 100 times.
The research put an emphasis on the students who take advanced classes in mathematics as opposed to those who attend only the regular curriculum. Advanced classes are for talented students or for those who wish to build up their knowledge in mathematics. They make an extension of the regular curriculum in which additional classes are used for covering more complex mathematical concepts. Some of those students who take advanced classes are later involved in city competitions in mathematics from which they may advance to county and state competitions.

Data analysis and results
The stream data recorded in the file and the database is generated by the interaction between the student and the game. The students create content by posting game-related messages or through mutual communication (Table 1). A semantic analysis of the microblogging stream is conducted based on the model shown in Fig. 3a.
The total of 25 students posted messages (active microbloggers), out of which 17 accessed the system from a mobile platform, 6 from a desktop platform, and 2 used both platforms. The most frequent concepts are shown in Fig. 4.
The presented analysis can serve as a questionnaire on the acceptance of learning mathematics by playing. A positive opinion on the game is given by 21 students, while 2 students find the game difficult. None of the students dislikes this way of learning.
Relationships and the most frequent concepts can be detected by using TagCloud (JavaScript) for data visualization as shown in Fig. 3. The visualizations are in correlation with Table 1. Phrases such as "I found two solutions" stand out in the category provide solutions (Cro. nasao dva rjesenja), while "super game" (Cro. super igra), fun (Cro. zabavna), cool, etc. stand out in the category express opinion. Express opinion 5 18 Cheer others on 2 2 Say hello 1 1

Fig. 4. The most frequent concepts in the microblogging content
The attribute related to the advanced classes is integrated in the dataset by the model shown in Fig. 3b. For the purposes of future research, any other attribute from the anonymized dataset can be integrated in the described way.
The total of 23 students, or 22.16%, out of 104 students take advanced classes in mathematics. Ten of them, or 43.48%, posted messages. On the other hand, only 18.52% of those who take just regular classes posted messaged. It can be concluded that there are 2.5 times more students who attend advanced classes than those who do not among active microbloggers. Furthermore, the active microbloggers who attend advanced classes all either provide solutions or express opinions on the game. The ratio of the messages belonging to the categories provide solutions and express opinion with respect to the advanced classes is presented in Fig. 5. Although there is no difference between students who attend advanced classes with respect to the content-related classification of messages, there are 4.5 times more students who provide solutions in the group of those who take advanced classes than in the group of those who do not, and 2.5 times more students who express opinion in the group of those who take advanced classes than in the group of those who do not.
The first null-hypothesis that there is no difference between students attending regular classes and those attending advanced classes with respect to providing solutions is rejected by the chi-square test (the chi-square statistic is 7.5109, the p-value is 0.006133, the result is significant at p<0.05).
The second null-hypothesis that there is no difference between students attending regular classes and those attending advanced classes with respect to expressing opinion on the game is accepted as there is no statistically significant difference between the two.
All students could see the top 20 result list which includes nicknames and their average scores. Fig. 6 presents the ratio of students providing solutions or expressing opinions in the top 20 results and in the most frequent players list.
The relative ratio of those who provide solutions or express opinions among those in the top 20 results or in the most frequent players list is multiple times higher, i.e. active microbloggers achieve 3.5 times better result with respect to the final score, and have 4.5 times more games played.
The third null-hypothesis that there is no difference between active microbloggers and users who do not publish posts in terms of their ranking among the top 20 results is rejected by the chi-square test (the chi-square statistic is 14.3198, the p-value is 0.000154, the result is significant at p<0.05). The fourth null-hypothesis that there is no difference between active microbloggers and users who do not publish posts in terms of their ranking among the 20 most frequent players is also rejected by the statistical chi-square test (the chi-square statistic is 18.9144, the p-value is 0.000014, the result is significant at p<0.05).
One of the reasons why the students taking advanced classes are more represented in the number of played games compared to their presence in the top 20 results might be that they wanted to find another way of solving the task.
The posts are mostly (81.82%) published after classes, i.e. outside of school. Two students attending advanced classes and eight of those attending only regular classes were not active microbloggers, but read the posts nevertheless. They make 8.70% and 9.88% of the groups, respectively. There is no significant difference between the groups in this respect.

Conclusion and future work
By integrating social networks not only that an m-learning system could be upgraded, but the learning process might be optimised. This paper presents techniques and methods employed in designing and developing one such system for learning primary school mathematics. Such platforms support networking so students can collaborate in problem solving, point out the difficult content, or emphasize positive aspects, all with the aim to additionally adapt the system to their needs and affinities in future system development.
When a service collects personal data, which is the case with the majority of social networks, its use is restricted to those over the age of 13 or even 16. Nevertheless, almost three quarters of thirteen-year-old students use this way of communication by using fake birth dates and, thus, bypassing the restriction.
The proposed anonymized mGBL model enables using social networks as an additional motivating factor in the process of learning. Due to data anonymization, a whole set of different attributes can be accessed. In synthesis with the data obtained from interactions between students and a system, they may give rise to new models.
The presented research proves that the analysis of social network data enables identifying talented students or potential candidates for advanced classes.
Besides classic datamining and data stream mining algorithms, in our future work we intend to use deep learning classification algorithms in order to obtain a model which would be as accurate as possible and with the aim to help teachers approach students appropriately and with adequate curriculum. Not only that such model could help teachers detect the type and needs of their students, but it could also help students with knowledge and skills acquisition.