Reflections on Different Learning Analytics Indicators for Supporting Study Success

Common factors, which are related to study success include students’ sociodemographic factors, cognitive capacity, or prior academic performance, and individual attributes as well as course related factors such as active learning and attention or environmental factors related to supportive academic and social embeddedness. In addition, there are various stages of a learner’s learning journey from the beginning when commencing learning until its completion, as well as different indicators or variables that can be examined to gauge or predict how successfully that journey can or will be at different points during that journey, or how successful learners may complete the study and thereby acquiring the intended learning outcomes. The aim of this research is to gain a deeper understanding of not only if learning analytics can support study success, but which aspects of a learner’s learning journey can benefit from the utilisation of learning analytics. We, therefore, examined different learning analytics indicators to show which aspect of the learning journey they were successfully supporting. Key indicators may include GPA, learning history, and clickstream data. Depending on the type of higher education institution, and the mode of education (face-to-face and/or distance), the chosen indicators may be different due to them having different importance in predicting the learning outcomes and study suc-


Introduction
Research focusing on learning analytics is still rapidly evolving with most of the respective implementations being located in UK, USA and Australia [1,2]. Although in the last five years, there has been an increase of the number of related research, largescale empirical evidence regarding the effectiveness of learning analytics remain to be seen [3,4]. The field arose originally as a result of the increasing availability of educational data, and the phenomenon that a significant proportion of first year university students do not complete their courses [5]. Hence, a number of benefits arising from learning analytics include the identification of at-risk students [6,7], the possibility of constructing adaptive support of students' learning journeys [8,9] or providing students with additional support for coping with academic requirements and expectations [10,11]. Accordingly, study success is conceptualised as the successful completion of a first degree in higher education to the largest extent, and the successful completion of individual learning tasks to the smallest extent [12]. However, only small-scale empirical evidence regarding the effectiveness of learning analytics for supporting study success has been located as presented in a recent systematic review [13] as well as in several other review articles [14][15][16][17].
The aim of this research is to gain a deeper understanding of not only if learning analytics can support study success, but which aspects of a learner's learning journey can benefit from the utilisation of learning analytics. We, therefore, examined different learning analytics indicators to show which aspect of the learning journey they were successfully supporting. Following a data profiles approach [18], the following research questions guides the present study: Which learning analytics indicators help to determine and support the study success in higher education while classifying them into student, learning, and curriculum profiles?
The remainder of the paper is organised as follows -Section 2 presents a literature review focusing on study success and learning analytics; Section 3 presents the research methodology we undertook for this paper; Section 4 presents the results; Section 5 presents a discussion and recommendations; and finally Section 6 presents a conclusion and future work.

Literature Review
The success of students at higher education institutions has been a global concern for many years [19]. Even though many academic support programmes have been implemented [20], and research on study success is extensive [21][22][23][24], dropout rates in higher education remain at about 30% in the Organization for Economic Cooperation and Development member countries [25]. Factors that contribute to student success, which may influence a student's decision to discontinue higher education are various and complex [19,26]. Important factors for dropouts that have been consistently found in international studies include the choice of the wrong study programme, lack of motivation, personal circumstances, an unsatisfying first-year experience, lack of university support services, and academic unpreparedness [27][28][29][30].
Common factors, which are related to study success include students' sociodemographic factors (e.g., gender, ethnicity, family background), cognitive capacity, or prior academic performance (e.g., grade point average [GPA]), and individual attributes (e.g., personal traits, and motivational or psychosocial contextual influences) as well as course related factors such as active learning and attention or environmental factors related to supportive academic and social embeddedness [24,[31][32][33]. The possibility to collect and store data for the above mentioned factors and combining them in (near) real-time analysis opens up advanced evidence-based opportunities to support study success utilising meaningful interventions referred to as learning analytics [34].
The concept of learning analytics has been used in various contexts and with various focal points, resulting in a lack of clarity and precise definition. For instance, Wong [35] presents several case studies utilising learning analytics for (a) improving student retention, (b) supporting informed decision making, (c) increasing cost-effectiveness, (d) helping to understand learning behaviour, (e) providing personalised assistance, and (f) delivering feedback and interventions. Further, an extensive diversification of the initial learning analytics approaches can be documented [5]. These learning analytics approaches apply various methodologies, such as descriptive, predictive, and prescriptive analytics to offer different insights into learning and teaching [36]. Learning analytics with a specific focus on higher education and their link to study success have been defined as the use, assessment, elicitation and analysis of static and dynamic information about learners and learning contexts, for the near real-time modelling, prediction and optimisation of learning processes, and learning environments, as well as for educational decision-making [8].
From a data management perspective, three distinctive data profiles have been identified [18]: student profile, learning profile, and curriculum profile (see Figure 1).

Fig. 1. Distinctive data profiles for learning analytics applications
The student profile includes static and dynamic indicators. Static indicators include gender, age, education level and history, work experience, current employment status, etc. Dynamic indicators include interest, motivation, response to reactive inventories (e.g., learning strategies, achievement motivation, emotions), computer and social media competencies, enrolments, drop-outs, pass/fail rate, academic performance, etc.
The learning profile includes indicators reflecting the current behaviour and performance within the learning environment (e.g., learning management system). Dynamic indicators include trace data such as time specific information (e.g., time spent on learning environment, time per session, time on task, time on assessment). Other indicators of the learning profile include login frequency, task completion rate, assessment activity, assessment outcome, learning material activity (upload/download), discussion activity, support access, ratings of learning material, assessment, support, effort, etc.
The curriculum profile includes indicators reflecting the expected and required performance defined by the learning designer and course creator. Static indicators include course information such as facilitator, title, level of study, and prerequisites. Individual learning outcomes are defined including information about knowledge type (e.g., content, procedural, causal, meta cognitive), sequencing of materials and assessments, as well as required and expected learning activities.
The available data from all data profiles are analysed using pre-defined analytic models allowing summative, real-time, and predictive comparisons. The results of the comparisons are used for specifically designed interventions which are returned to the corresponding profiles. The (semi-)automated interventions include reports, dashboards, prompts, and scaffolds for different stakeholders (e.g., teachers, students, administrators). Additionally, stakeholders receive customised messages for following up with critical incidents (e.g., students at risk, assessments not passed, satisfaction not acceptable, etc.).
The above described data profiles have been utilised in different learning analytics applications and systems [1]. However, studies completed on which learning analytics indicators would best fit the different purposes of learning success prediction such as student grades, student engagement, student behaviour, student performance and course completion are scarce. Accordingly, the following research seeks to investigate international research studies with regard to the effectiveness of learning analytics indicators for determining and supporting higher education students learning journey while classifying them based on the three data profiles.

Method
This article presents a secondary analysis approach of a previously conducted systematic review which followed the eight steps proposed by Okoli and Schabram [37]. Hence, from our previous completed systematic review of studies derived from highquality academic journals and conference proceedings [2], we formulated a list of 49 studies to inform whether there is empirical evidence that the general use of learning analytics could improve study success. Even though there were 3,163 articles that contained the required search terms "learning analytics" in combination with "study success", "retention", "dropout prevention", "course completion", and "attrition", the actual number of articles which fitted our inclusion criteria a) higher education context, b) published between January 2013 and December 2019, c) written in English language, d) had substantial qualitative or quantitative analyses and findings, and e) were peerreviewed. The findings showed that only a small number of identified articles were implemented into higher educational institutions successfully with a tangible positive increase of study success. In other words, learning analytics were effective in retaining students and decreasing student dropout.
The secondary analysis of these articles specifically focusses on learning analytics indicators utilised for supporting study success. The research team developed a research protocol, which described the individual steps of conducting the secondary analysis and validated the research protocol in a training session focussing on database handling, reviewing, and note-taking techniques. The full text analysis of the remaining publications focused on the theoretical rigor of the key publications. The research team used a quantitative and qualitative content analysis as well as reflective exchange to extract the findings of the key studies. This synthesis of key publications followed the triangulation approach, as the final studies included quantitative and qualitative studies [38]. The final step of conducting the secondary analysis included the dissemination of the findings through the writing of this paper, which documents the findings and discussion of implications as well as obvious limitations.

Results
From the 49 studies, five categories of predictions were formulated: (1) student answers/grades, (2) student social learning behaviour/engagement, (3) at-risk/low-performers, (4) student performance, and (5) course completion. The applied learning analytics indicators for the five categories are presented in the following subsections. The results are summarised in Table 1 which includes indicators of the five categories mapped to the three data profiles (student, learning, curriculum) as described above. The following sections outline the individual studies from which the indicators were drawn from as well as additional information regarding the utilised data analytics methods.

Indicators for predicting the correctness of answers/grades
Two out of the 49 studies aimed to predict the exact grades/answers of students' assignments. As the targeted information to be obtained is very precise, a full detailed range of information is also required.
1. Thompson [39] used transcription, extraction and analysis of video and audio recordings utilising discourse captured in video and audio recordings. Key indicators: Video, audio and digital pen input trace data. 2. Yang, et al. [40] applied statistical analysis (such as means and standard deviation) utilising clickstream data for video. Key indicators: trace data (clickstream) and assessment data.

Indicators for predicting students' social/learning behaviour including engagement
Eight out of the 49 studies aimed to obtain more generic social learning behaviour such as their engagement/participation, any relating study patterns. These indicators can also be used for checking and confirming student attendance. Methods such as social network analysis, latent class analysis, descriptive statistics and correlation analysis, data mining techniques, group behaviour analysis, mean-generation task, visualisation and multi-level modelling were popular.
1. Bydzovska and Popelinsky [41] applied social network analysis utilising student datasets including study-related, social behaviour and data concerning previously passed courses (key indicators being the two aforementioned variables). 2. Carroll and White [42] applied latent class analysis utilising datasets on lecture attendance, tutorial attendance, online scheduled access, print access, online full access to learning materials (key indicators being the five aforementioned variables). 3. Gong, et al. [43] used descriptive statistics and correlation analysis with quantitative self-report (student engagement questionnaire) and quantitative observation measures (number of viewing records and posts to a discussion board). Key indicators: student engagement and achievement. 4. Hu, et al. [44] administered data mining techniques, classification and regression tree, system usability survey (self-report questionnaire) techniques utilising a dataset of completed learning activities. Key indicators: login, total reading time, homework delay and forum activity. 5. Labarthe, et al. [45] conducted group behaviour analysis, i.e., a recommender panel was integrated into the experimental users' interface, which enabled them to manage contacts, send instant messages or consult their profiles. They utilised learning traces as interaction logs and demographic information from questionnaires. Key indicators: attendance, completion, scores and participation. 6. Lu, et al. [46] applied statistical analysis such as t-test utilising questionnaires and skills post-tests. Key indicators: programming skills, and number, length and quality of discussion. 7. Nam, et al. [47] used a mean-generation task via log data (a total of 1,500 items including free-text responses). Key indicators: familiarity concerning the learning tasks, previous grades and level of skill. 8. Nguyen, et al. [48] applied visualisation and multi-level modelling utilising online VLE. Key indicators: time spent on the VLE, actual workload in hours, study patterns and performance level.

Indicators for predicting at-risk/low-performing students
The majority of studies (N = 20) were focused on locating at-risk students. Techniques in combination with various datasets were utilised such as binary classification problem, basic and extended pass-fail classifier, cross-validation techniques, data examination, logistic regression, sequence model, feature vector model, binary classifiers, probabilistic models, chi-squared and machine learning.

Indicators for predicting general performance of students
Thirteen out of the 49 studies focused on the overall student performance and achievement. Similar methods as in subsection 4.3 were used in this category.
1. Carter, et al. [69] used statistical analysis and machine learning techniques on programming log data and course grades. Key indicators: students' grades students' overall assignment average, students' final grades. 2. Conijn, et al. [70] computed correlation analysis and multi-level analyses with crossrandom effects, multiple linear regressions techniques on datasets of students' online behaviour (from Moodle LMS). Key indicators: LMS data and assessment data (including in-between grades, final exam grades and overall course grade). 3. Conijn, et al. [71] used descriptive analyses, Pearson correlational analyses and multiple linear regression, backward stepwise regression techniques utilising datasets on a MOOC-course provided on Coursera. Key indicators: platform logfiles (trace data). 4. Daud, et al. [72] deployed Support Vector Machines, C4.5, classification and regression tree, Bayes network, Naïve Bayes techniques. Key indicators: family expenditure, family income, student personal information and family assets. 5. Elbadrawy, et al. [73] used a multi-regression model utilising student data, course data and learning activities data. Key indicators: performance-specific features, activity/course-specific features and Moodle interaction features. 6. Gkontzis, et al. [74] applied regression analysis techniques (random forest, linear regression, neural network, adaboost, sim and kin) utilising each student's interactions extracted from Moodle. Key indicators: gender, logins in module, logins in forum, forum replies, dedication time, main quizzes, MCQ per week, self-assessment quizzes. 7. Jo, et al. [75] computed a multiple linear regression analysis utilising web-log data.
Key indicators: total login frequency, studying time, irregularity of learning interval, interactions with content, peers and instructor(s), and total number of completed assignments/assessments. 8. Kim, et al. [76] calculated dashboard usage frequency, summation of students' scores, t-test, multiple regression analysis techniques utilising log data extracted from LMS, survey data and final assessment scores. Key indicators: dashboard usage frequency, dashboard satisfaction, and learning achievement. 9. Mitra and Goldstein [77] used cross-validation techniques utilising online survey datasets. Key indicators: demographic factors, academic history and records, workrelated factors, course-related factors and academic self-concept factors. 10. Nespereira, et al. [78] applied risk-detection algorithm, time-series, temporal decomposition techniques utilising datasets from the Moodle platform (such as course contents, students' personal data, grades, and students' interactions with the platform). Key indicators: number of completed assignments/courses, blog activity (if any), and participation in the viewing of resources, forums and quizzes. 11. Okubo, et al. [79] deployed recurrent neural network utilising log data in educational systems. Key indicators: attendance, quizzes completed, report data, course & slide views, utilization of markers, memos, actions, and word count in forums. 12. Rogers, et al. [80] computed regression analysis utilising variables from online systems (demographic and performance-based). Key indicators: grade, gender, age, current academic load, completed courses, GPA, whether they had withdrawn any courses previously, whether they are enrolled next year for any courses, counselling activities (if any), and any previously notice given for poor progresses. 13. Sarker [81] used categorical principal component analysis and logistic regression techniques on data from a questionnaire consisting of 49 items from the institutional internal databases items. Key indicators: academic background, environmental variables and psychological test outcomes.

Indicators for predicting course completion
Six out of the 49 studies were focused on overall student course completion. Similar indicators identified in section 4.3 and 4.4 were utilised.
1. Andersson, et al. [82] used binary logistic regression for examining the trail left by students' activities on a discussion forum in online courses during three different points in time (50%, 75%, 100% of course completion) while studying a course. Key indicators: number, length and frequency of posts. 2. Aulck, et al. [83] reported machine learning experiments utilising university databases containing demographic and pre-college entry information, e.g., standardised test scores, high school grades, parents' educational attainment, application zip code and complete transcript records (these variables also forming the key indicators). 3. Dawson, et al. [84] applied common statistical methods utilising student information systems, LMS interactions and assessment data. Key indicators: LMS engagement, attendance in class, and academic grades/outcomes. 4. Djulovic and Li [85] used Chi-squared tests, information gain, gain ratio and correlation analysis techniques with enrolment data (such as age, gender, GPA, sat reading score, math score, writing score and student term-specific financial balance). Key indicators: Pre-enrolment variables, semester-specific variables and financial aid status. 5. Guerrero-Higueras, et al. [86] deployed the Model Evaluator to investigate different machine learning models. Cross-validation analysis was also deployed in this study utilising learners' interaction with the version control system -GIT repository. Key indicators: students' activity, students' interactions -such as comments, days spent, comments per day, additions, deletions, number of issues, and authorship proof -in online forums. 6. Zimmerman and Johnson [87] computed stepwise logistic regression and confirmatory factor analysis utilising datasets. Key indicators: expected grade, expected time commitment and first lesson quiz.

Discussion
Technology-based innovations in education have significantly altered both the scale and resolution of measurements for complex learning processes [88][89][90]. Recent developments in this field have heightened the need for educational data mining, machine learning, and statistics to gain insights from the fine-grained process data generated in technology-rich learning environments [89][90][91]. Several perspectives on educational data and analytics have been identified: (1) The data-driven perspective utilises existing data, mostly stemming from database systems, for informing different stakeholders. While big datasets may be available, the purpose for collecting data may have been different in the first place, hence, being biased when utilised for other purposes. In contrast, the (2) data-demand perspective follows a specific analytics purpose and defines the data to be collected. This enables a well-directed analysis of educational data with direct implications for learning and teaching. A third perspective may be a combination of the above-mentioned approaches.
Consequently, emerging analytics solutions are related to different data available for analysis, also dependent on the level or type of educational institution (e.g., university vs. distance learning vs. MOOC) [92]. Predictive models need to be trained using tutors' experience or by machine learning algorithms. Then they are required to make predictions for current students based on information that they have on the current presentation of the course as well as past information. Tendencies and traditions, many universities would like to be informed whether their students would finish their studies in the designated time. Models are trained from data from previous cohort and applied to current cohorts [92]. GPA was found to be the feature with the highest predictive power [93]. At university or higher education settings, models are trained for identify success or failure of current cohort, often only from demographic data [18]. In addition, institutions may define at-risk student differently and therefore currently no standard method of detecting at-risk students exist [53,92].
In summary, student study history such as GPA or evaluating the learners' learning progress from assessment results are the most successful key indicators. However, for entry-level courses or first-year students, historic academic performance or study history may be unavailable [10,11]. However, quiz results in the first week can be used to obtain this information to make the appropriate analysis. In MOOCs, clickstream data seem to be the key indicators, as performance and demographic data may not be available due to limited data collection and privacy issues [92].
Regarding the distribution of indicators mapped on the three data profiles (student, learning, curriculum) [18] it becomes apparent that indicator related to the curriculum profile are underrepresented. As such curriculum related indicators may function as benchmarks for formative feedback and (near) real-time scaffolds as well as for improving the learning design, it is suggested to further investigate the benefits and usability as well as the validity of such indicators.

Recommendation of methods
The following five recommendations for specific indicators focus on (1) task related predictions, (2) social, learning or engagement behaviour, (3) low-performing or dropout students, (4) general or overall performance of students, and (5) course completion.
1. For predicting the correctness of answers/grades -indicators such as videos and clickstream data are useful. Methods such as transcription, extraction and analysis of video and audio recordings are helpful. 2. For predicting social learning behaviour -indicators such as study-related, social behaviour, lecture attendance, material and forum activity, and study patterns are useful. Methods such as social network analysis, latent class analysis, descriptive statistics and correlation analysis, data mining techniques, group behaviour analysis, mean-generation task, visualization and multi-level modelling mostly utilizing datasets are helpful.
3. For predicting at-risk students -indicators such as online activity, academic ability and goals, motivation, interaction with other students, socioeconomic status, test performance, study load, demographic information are useful. Methods such as binary classification problem, basic and extended pass-fail classifier, cross-validation techniques, data examination, logistic regression, sequence model, feature vector model, binary classifiers, probabilistic models, chi-squared and machine learning are helpful. 4. For predicting student performance -indicators such as exam grades, interaction with others, forum activity, completed assignments, dashboard usage frequency, learning achievement and academic history. Methods such as statistical, correlational analyses, support vector machines, multi-regression model, multiple linear regression, recurrent neural network, cross-validation techniques, risk detection algorithms are useful. 5. For predicting student course completion -indicators such as frequency of posts, LMS engagement, students' activity and forum interactions are useful. Methods such as binary logistic regression, machine learning, common statistical analysis, stepwise logistic regression techniques and confirmatory factor analysis are helpful.

Conclusions and future works
A number of research/implementation directions were made clear and concluded from our study. These include (1) the standardisation of learning analytics systems ready for institutions to adopt without the need for each one to implement their own; (2) additional personalised prevention and intervention strategies for different study programmes fitting to different requirements in various institutions with the awareness that the standardised system may need to be adjusted; (3) elaborating from (2), individual tailored learning packages optimised for each learner based on their profile (e.g., geo-social demographic backgrounds, qualifications, learning journey engagement, website activities, search information); (4) more work on privacy and ethical guidelines; (5) quality assurance of learning analytics systems and related recommendations including an accreditation body; rigorous multidisciplinary research focussing on (quasi)experimental studies and longitudinal designs for producing robust findings regarding the effectiveness of learning analytics for learning and teaching.