Learning Analytics for Blended Learning A Systematic Review of Theory, Methodology, and Ethical Considerations

Learning Analytics (LA) approaches in Blended Learning (BL) research is becoming an established field. In the light of previous critique toward LA for not being grounded in theory, the General Data Protection and a renewed focus on individuals’ integrity, this review aims to explore the use of theories, the methodological and analytic approaches in educational settings, along with surveying ethical and legal considerations. The review also maps and explores the outcomes and discusses the pitfalls and potentials currently seen in the field. Journal articles and conference papers were identified through systematic search across relevant databases. 70 papers met the inclusion criteria: they applied LA within a BL setting, were peer-reviewed, full-papers, and were written in English. The results reveal that the use of theoretical and methodological approaches were disperse. We identified approaches of BL not included in categories of BL in existing BL literature and suggest these may be referred to as complex hybrid learning, and that ethical considerations and legal requirements often have been overlooked. We highlight critical issues that contribute to raise awareness and inform alignment for future research to ameliorate diffuse applications within the


The emergence of learning analytics
Given the wealth and complexity of learning, learning sciences have become an interdisciplinarity domain that includes cognitive, educational, social and computer science among others. Ten years ago, learning analytics (LA) emerged as a multidisciplinary research domain with the overarching premise to harness the power of data and analytics to advance our understanding of learning as well as help improve learning, teaching and optimize the learning environments [26]. As such, LA is defined as "the measurement, collection, analysis and reporting of data about learners and their contexts for purposes of understanding and optimizing learning and the environments in which it occurs" [83]. Interest in LA was catalysed by three main factors; firstly: the rewarding business intelligence and big-data success stories that has contributed to industry growth and added competitive advantage to companies by allowing them to better understand customers and offer better recommendations (ibid.). Secondly, the availability of immense volumes of digital traces and clickstream data recorded by learning management systems and other digital learning environments, such as student's information systems, online library platforms and video streaming services. Third, the revolutionary developments in data science methods and improved computer hardware that became more powerful and accessible [15,21,83]. Inspired by the industry, LA has initially used digital trace data to create predictive models to for example, forecast dropouts, identify students at-risk of failing, or offer visual dashboards. However, criticism had been levied at these models for failing to account for contextual factors, being atheoretical and difficult to replicate [25,100]. Research has since extended both data collection, analysis methods and approaches to theory in LA research. Recently, data collection methods have grown in volume and diversity to cover the full breadth of learners' activities e.g., classroom interactions, physiological indices, proximity data, eye tracking, self-reports in addition to the commonly used digital traces. Similarly, methods have grown to include sequence and process mining, epistemic network analysis as well as advanced machine learning algorithms [16,20,62].
Today, when it is common that students' educational reality is an integration of physical and virtual learning environments, often referred to as a Blended Learning (BL) [37][38][39], the data collection methods are often broadened to survey both the physical and the virtual spaces. Applying such data collection and theoretically aligned refined models, researchers hope to capture learners' behaviour where it occurs. The study of LA and BL is a growing field of inquiry [48], which is garnering a growing attention in and outside the LA community. The rationale for undertaking this review is that LA research has been criticised for not being grounded sufficiently in theory. For example, that perspectives of learning are lacking [57], and that BL research have remained vague and unclear, why there has been a call for research to further develop definitions of BL, models and conceptualisations [46]. Moreover, as LA is becoming an established research field of its own, it is interesting to explore the methodological practices applied across LA research and with the fast-paced development of big data analytics on individual trace data and the implementation of the General Data Protection Regulations of the European Union (GDPR), valid concerns might be raised with regards to if and how ethical and legal considerations have been applied. Accordingly, a systematic review that identify theoretical underpinnings, methodological practices, considerations of ethical aspects and legal requirements and the contributions in LA research is needed to raise awareness and inform alignment for future research.

Learning analytics in blended learning
Blended Learning is a term coined in the 90's, which related practices over the years gained substantial influence, and today is regarded as the "new normal" [46]. The concept of BL, its operationalisations and definitions are wide and still evolving [1]. Researchers have highlighted that BL is a broad term that may reflect different variations to what is blended, the extent and duration of the blend, models of blended learning, a systematic approach that include any part of a system that combines face-to-face (f2f) instruction with instructions meditated be digital technologies [37,46] which commonly include blended instruction, blended delivery, blended learning environments [37] spatiotemporal aspects, where learning can be self-paced and individual qualitative aspects [56] which reflect thoughtful integration [31]. In short, BL can be viewed as an umbrella term, that without specific descriptions will not inform the reader what aspects of teaching and learning that is approached.
In addition, LA approaches has been critiqued for being atheoretical [32,57]. In LA and BL research, as in other educational research, the main aim is to support students to succeed with their education. Such effort can be demonstrated by including theory that provides guidance for how to understand, operationalise, measure and interpret for example students' engagement in learning. Engagement theories may emphasize different aspects, for example agency [5] or cognition [6] related to self-regulation (SRL). Student engagement is critical for learning, and from this perspective LA in BL is warranted as it combines the BL setting with theoretical insights of students use of digital technologies, their ability to self-regulate to re-engage in the face of difficulty, distraction, frustration, simultaneous social demands et cetera. However, critique has forwarded that LMS data may not be suitable to capture a nuanced understanding of student engagement, this as engagement is a multi-dimensional construct, and LMS data, still, at best can reflect a one-dimensional aspect of engagement, [42]. Moreover, researchers [42] could not find any significant correlation between student self-reports of engagement and the LMS trace data. Thus, if the approaches and applications of BL, theories from learning perspectives and how these are operationalised are lacking, and self-reports and traces data are not correlating, this decreases the value of the contributions in the field of LA. Previous reviews have surveyed theoretical underpinnings in LA research, and concluded that the grounding in (educational) theory is evident but too often meagre or lacking [e.g. 57,96]. For example [57] concluded that existing learning analytics dashboards research are rarely grounded in learning theory, cannot be suggested to support metacognition and thus, do not inform teachers of effective practices and strategies. LA reviews have also explored methods applied within LA research [e.g. 6] and identified that LA studies use diverse methods, for example, visual data analysis and clustering techniques, social network analysis (SNA) and educational data mining. Taken together, however, the existing reviews on LA research have not taken contributions into account, such approach is critical, as if applications of BL, theories from learning perspectives and how these are operationalised are insufficient or lacking, the contributions becomes unclear.
Today, there are a substantial number of Learning Analytics reviews. LA reviews often specialise on particular areas like for instance: game learning analytics [9], visual learning analytics, [96], the role of self-regulated learning (SRL) [98], learning analytics dashboards [57], or the uses of LA in relation to specific methods or approaches e.g.; open learner models [10] educational data mining [12] or apply a wider scope that explore national research ef-forts, policies, infrastructures and competence centres across several European countries [68].
While several reviews highlight similar findings (i.e. a lack of theoretical underpinning, unclear uses of methods) there is a risk of transferring and projecting findings across LA research, as findings which might not reflect the broader LA research, which in turn may lead to overgeneralisations. Although there are many published (systematic, scoping and area-specific) reviews on LA in online settings, in order to understand their aim, objective and contribution, it is beneficial to approach a less specific overview of LA research to survey commonalities, of theoretical underpinnings (including conceptualisations of BL and learning perspectives) methodological approaches, ethical and legal requirements and contributions.
However, in addition to theoretical and methodological aspects, an additional layer of complexity is added to LA research in a BL environment. LA is in itself a practice of gathering, analysing and sharing big amounts of personal data, which comes with an increased need for ethical considerations and adherences to legal requirements. The ethical, privacy and legal concerns of processing of personal data are on the frontier of data processing due to the presence of the GDPR [14]. LA is a subject developed on data-driven approaches to education innovation, and hence, in the spotlight of this concerns. Beyond ethics, the GDPR provides a legal framework in preserving the rights of the data subjects, that is: the students. Learning analytics operates on data about students and their learning environments, where personal data of the students is an integral part. Personal data of students refers to any data that directly or indirectly connected to an identifiable person, e.g., student names, personal identification numbers, email, photographs, and other data that could lead to identifying an individual [29]. It is typical that learning and student management systems store, retrieve, and process such data, driven by different academic and learning purposes [15]. While the absence of ethical considerations [16], [17], privacy issues and GDPR [18] have been previously critiqued in regards to the adoption of LA we did not find any existing review that had explored these aspects of GDPR, ethics and privacy on LA research. Therefore, in this study, we have added a focus on how the reviewed studies consider ethical and legal aspects of using data. Informed by these previous concerns and critiques we raised the following questions: 1. How is blended learning defined in the reviewed learning analytics research? 2. For which learning focus perspectives are theories used in the reviewed learning analytics research? 3. What approaches of data collection, methods and analysis are evident in the reviewed learning analytics research? 4. How are ethical and legal aspects considered in the reviewed learning analytics research? 5. What are the contributions of the reviewed learning analytics research?

Search strategy and selection procedure
This study examines academic journals and conference papers applying Learning Analytics in Blended Learning from two databases (see Table 1). A systematic search was conducted using EBSCOhost via Stockholm University library (filtered by content providers: Scopus, ERIC, Academic Search Premier, Directory of Open Access Journals, ScienceDirect) for academic journals, and ACM DL Digital Library, for conference papers. As detailed below, the systematic search via EBSCOhost followed educational journals by status [13], and the selection employed journal rankings provided by SCIMAGO Institutions Rankings.

Database and journal search strings
Hits EBSCOhost database search string "learning analytics" + "blended learning" 79 "learning analytics" + "blended learning" (incl. "within full text of articles") 282 "learning analytics" + "blended environment" 0 "learning analytics" + "blended learning environment" 2 "teaching analytics" + "blended /"-learning"/ "-environment" 0 "educational data mining" + "blended learning" 8 "educational data mining" + "blended" 8 "educational data mining" + "blended environment" 0 ACM DL Digital Library "learning analytics" + "blended learning" + "LAK" (Learning Analytics & Knowledge conference) 21 "educational data mining" + "blended learning" + "LAK" (Learning Analytics & Knowledge conference) 11 "learning analytics" + "blended learning" 43 "educational data mining" + "blended learning" 22 Journal search via EBSCOhost via Stockholm University Library and the Journal of Learning Analytics "blended learning" 7 "learning analytics" + "blended learning" 6 "educational data mining" + "blended learning" 3 Internet and Higher Education "learning analytics" + "blended learning" 14 "educational data mining" + "blended learning" 8 Educational Technology and Society "learning analytics" + "blended learning" 2 "educational data mining" + "blended learning" 0 Journal of Computer Assisted Learning "learning analytics" + "blended learning" 2 "educational data mining" + "blended learning" 0 British Journal of Educational Technology "learning analytics" + "blended learning" 16 50 http://www.i-jai.org Paper-Learning Analytics for Blended Learning A Systematic Review of Theory, Methodology, … "educational data mining" + "blended learning" 5 Computers in Human Behavior "learning analytics" + "blended learning" 26 "educational data mining" + "blended learning" 17 Computers and Education, Communications in Information Literacy, Learning and Instruction, International Review of Research in Open and Distance Learning, Educational Evaluation and Policy Analysis International Journal of Mobile and Blended Learning "learning analytics" + "blended learning" 0 "educational data mining" + "blended learning" 0 The search combinations used in SCIMAGO: Social Sciences + E-learning + All regions / countries + Journals + 2017; Social Sciences + Education + All regions / countries + Journals + 2017; Computer Science + Human-Computer Interaction + All regions / countries + Journals + 2017; and Computer Science + Human-Computer Interaction + All regions / countries + Journals + 2017. Inclusion from each 4 search combinations above was determined by relevance of the title and the choice was limited to the top-10 journals in each search combination. We identified papers from the following six journals Internet and Higher Education, Journal of Computer Assisted Learning, British Journal of Educational Technology, Computers in Human Behaviour, Educational Technology and Society, and the Journal of Learning Analytics. To search combinations used for the EBSCOhost database, we used combinations of keywords: "learning analytics" + "blended learning"; "learning analytics" + "blended environment"; "teaching analytics" + "blended learning"; "teaching analytics" + "blended environment"; "educational data mining" + "blended learning"; "educational data mining" and "blended environment". We included peer-reviewed, academic journals, written in English. We also tried including a "search within full text of articles"; and screened titles and abstracts of the papers for inclusion, and remove duplicates. We decided to not utilise the function further, as it returned irrelevant articles where BL and LA were mentioned only in the reference section. We searched for articles published the between January 2013-July 2020.
Overall, the keyword searches amounted to 304 hits (not including the search within full text of articles). After removing the duplicates, 193 journal articles and conference papers remained; 38 hits did not return full texts and 4 hits returned hits in other languages (three in Danish and one in German) although the search criteria were aiming at English texts only. After that, we sifted through the remaining papers and excluded 32 papers that were not directly relevant to LA and BL, and 49 that lacked one of the two focuses (either LA or BL). During close-reading, an additional three papers were excluded, as they did not meet the inclusion criteria. Thus, which we proceeded to code and later analyse 70 papers.

Data coding and analysis
Following a coding scheme all articles were read through by two authors, who sorted the content in: article data (country, publication year, title), educational context, (blended learning interpretation and level), research aims/questions, theoretical underpinnings and definitions of BL, data sources, data collection methods, ethical considerations, analytical methods and results and contributions. All authors then conducted a deeper analysis of one section of the reviewed articles each (1. theoretical underpinnings, 2. data collection, methods and analysis, 3. ethical and legal considerations and 4. contributions). In depth discussions were held between the authors to discuss approaches and align findings.

Results
The result section details the findings as follows: 4.1 Theoretical underpinnings, 4.2 Data collection, methods and analysis, 4.3 Ethical and legal considerations and 4.4 Contributions of the reviewed articles.

Theoretical underpinnings
To discern the positioning of the articles in terms of their relation to BL, we analysed the articles with regards to how blended learning was used throughout the articles, in particular how frequent the authors refer to blended learning, their definition, description and use of theory. Currently, BL literature [e.g. 37,39] have identified three common ways in which explorations of blended learning delivery may vary: blended instruction, blended distribution, as identified in [14,15] or blended pedagogies [54]. However, going through these descriptions, we also found that studies could displayed a combination of blended instruction and blended distribution; i.e. when a section of the course is provided fully f2f, followed by the remainder offered fully online [79] or reversed: a course is delivered fully online and then fully f2f [49]. We also identified that the BL was used in ways beyond these categories. We identified a combination of blended learning approaches in which some, but not all, students use the BL component. For example, we identified studies that offer i) optional adoption of the blended component to the students, [e.g. 43,80,87], or when ii) there was a synchronous teaching of f2f students and online students in the same classroom [9] and iii) in cases of reversed distribution; a channel directed exclusively from the student to peers and/or teachers, in which the teaching (distribution, delivery and pedagogy) has remined traditional, for example as an e-portfolio accessible in a social network [35] or in flipped classrooms, where students responded to distributed (asynchrounous) media and instruction in their own time and place. [40]. Table 2. Blended learning definitions Blended learning definitions "Technology to support face-to-face teaching and to enhance student participation" (Liao & Lu, 2008).
[2] "Blended learning system as one which combines face-to-face instruction with computer-mediated instruction with the aim of complementing each other" (Graham, 2006;2009; [7] [32] [58] [63] [75][102] "The range of possibilities presented by combining Internet and digital media with established classroom forms that require the physical co-presence of teacher and students" (Friesen, 2012) [23] "B-learning is the form of learning environment where the traditional classroom teaching and face-to-face communication between teacher and students are blended with the computer-mediated interaction "(Bubaš & Kermek, 2004) [30] "Blended learning is a combination of traditional face to face learning and online learning. It has the advantages of the both, providing students with unique flexible learning experience and becoming one of the fastest growing trends in educational field" (Thorne, 2003) [36] "The thoughtful integration of classroom face-to-face learning experiences with online learning experiences" (Garrison & Kanuka, 2004) [41] [76] "Taking the best from self-paced, instructor-led, distance and classroom delivery to achieve flexible, cost-effective training that can reach the widest audience geographically and in terms of learning styles and levels" (Marsh & Drexler, 2001) [44] "The integration of thoughtfully selected and complementary face-to-face and online approaches and technologies'' (Garrison & Vaughan, 2008) [60] "Blended learning is learning that happens in an instructional context which is characterized by a deliberate combination of online and classroom-based interventions to instigate and support learning. Learning happening in purely online or purely classroom-based instructional settings is excluded" (Boelens,Van Laer, De Wever & Elen, 2015).
[94] Table 2 shows an overview of the used definitions of blended learning. While 29% of the articles offered a clear definition, most articles relied on inferences or contextual descriptions. 18% of the articles neither inferred nor described BL. The articles that offered a definition most commonly cited Graham [37][38][39] Analysis from a learning focus perspective: Revealed five themes reflecting the perspective of the research: (i) the flipped classroom, (ii) collaborative learning, (iii) conversational aspects of learning, (iv) engagement and self-regulation operationalised using system trace data and (v) learner profiles and procrastination. Studies that include theories are presented in a condensed and summarised form (the others are not).
1. The flipped classroom: While most studies exploring the flipped classroom, approached student engagement and learning, a few were focusing on the actual learning situation [19][20][21]. These studies applied a more over-arching, abstract level of theory to inform their study, and also discussed their findings in the light of theory. However, while SRL were, by far, the most commonly used theory to explore flipped classroom design, most studies did not seek to explore the blended learning environment. 2. Collaborative learning: Social Network Analysis was used to visualise online interactions, and identify productive behaviours and correlation with performance [35,41,43,81]. These used constructivist and situated learning theories and theories of self-regulation.
3. Conversational aspects of learning: Studies exploring conversations aspects of learning, most commonly approached feedback operationalised as online reports, referring to feedback and assessment theories [e.g. 72,90], or deep learning theories [40,51]. Another type of input to learning was explored by [89], grounded their study in the Dispositional Learning Analytics (DLA) infrastructure, used previous publications on assistant conversational agents, and theories on cognitive load in microblogging. Using the foundation of the Community of Inquiry framework, which prioritises teacher presence, and active participation, [79] used trace data to operationalise active participation, as the number of: messages sent, documents uploaded, chat sessions attended, as well as data collected to analyse teacher presence. 4. Engagement and self-regulation operationalised using system trace data: Out of all the theories applied, engagement in general and self-regulated learning (SRL) in particular, were the most commonly used. To add to these research approaches, aspects of culture and gender were introduced and explored [86]. While SRL often was operationalised as observable indicators in system logs, motivation was approached by measures of self-efficacy, intrinsic value, test anxiety, cognitive strategy and selfregulation using a questionnaire. SRL was often operationalised as trace data, and combined with other engagement and learning theories. [36,52,58,80]. Numerous studies explored relations between trace data, performance and SRL using self-reports [30]; some in combination with other theories, for example theories of motivation [20] socio-cultural perspectives [31], Self-Determination Theory and the Control-Value Theory of achievement emotions [86][87][88]. 5. Learner profiles: We identified that with studies exploring learner profiles, it was common to inform this approach with other theories, for example, course satisfaction and social constructivist theory [18], deep and shallow learning [32] active learning and engagement [35] and procrastination [1,54,66]. In the reviewed studies, student learning strategies was often operationalised as trace data on student interaction with online learning resources [33]. Amongst these, procrastination was found to be common. Several studies operationalised SRL as procrastination [54,66]. For example, using LMS data to survey time spent studying and time spent refraining from accessing available data [54]. Procrastination was also explored without relation to SRL, or how long the student waited before accessing LMS materials [1]. Other researchers used questionnaires to survey procrastination and risk taking using the Expectancy-Value Theory, motivation, using the Academic Motivational Scale and help seeking, and epistemic emotions to inform a to approach how different learning strategies relate to preferences of feedback [66].
In sum: While most reviewed studies approaching a flipped classroom, used theories with a focus on students and their engagement and learning, a few were focusing on the actual learning situation [69,75,76,94] or combined flipped classroom theories of Computer Assisted Language-Learning [34]. The latter studies applied a more overarching, abstract level of theory to inform their study, and also discussed their findings in the light of theory. Some studies argued that there is a need to develop a specific SRL-LA theory [63]. However, while SRL were, by far, the most commonly used theory, most studies did not seek to explore the blended learning environment, but seemed to relate their data collection to operationalisations related to a learning perspective, with or without underpinning theories of learning.

Data collection, methods and analysis
All studies included in the review used a digital platform for collecting data. As can be expected, LMS was the most used platform for data collection (this was true in 56 studies, 89%). Among them 14 studies (25%) used more than one platform for data collection (for a full overview of studies and data sources, see Appendix A). A single study used custom LMS, two studies used video streaming software, and one study used wiki. Digital traces were the most collected data types in (90.5%), followed by selfreported surveys 27 (42.9%). Self-reported surveys were used to collect data about students' depositions such as engagement, motivation and learning styles. Relational and social network data from computer supported mediated interactions were collected in eight studies (12.7%). Interviews were collected in five studies, video or observation in three, multimodal data were collected in two studies, and, transcripts of classroom interactions were reported in one study.
Most of the data collected in the reviewed studies were digital data (see Table 3). Data were collected from the classroom in only six studies, where two other studies reported on multimodal data, and four studies used video recording and observation of classroom setting. [45] used multimodal data through a system called SPACLE to record classroom interactions among students and teachers. The interactions recorded included on-task, off-task, talking to class, outside or inactivity data. The system allowed for spatial data about positions of the users in the class, and their activity levels. [85] used classroom observations to report on the teachers and students' classroom behaviour, although the methods do not clearly describe in detail what was observed and how it was reported. [81] collected f2f data to measure teaching presence according to the community of inquiry framework. Transcripts of audio recordings of the lessons facilitated the thematic content analysis. Real-time classroom observations were also done. Performance data such as grades or continuous assessment were collected from most studies (88.9%). While LMS data may be informative, it does not capture the f2f learning environment, the process of learning, or the student-teacher or the student-student dynamics. The stark contrast between results collected from digital resources and classroom represents an obvious gap. Most data were gathered using digital traces, dispositional self-reports, relational data and interviews that are disconnected from the classroom where a significant amount of learning happens. The analysis methods in most of the studies (98.4%) employed were the traditional descriptive statistics, frequentists, and group comparisons, that included correlations, comparison of means, and chi-square (see Table 4). Visualisation was used in a significant number of studies (77.8%), in the context of explaining results, but not necessarily as a research objective. Thus, few studies used visualisation as their research objective. However, we also found evidence of development of systems that gather information from different data sources to provide visual analytics to enhance feedback offered to students [102]. However, such application of visualisation was rare. Regression analysis were used to predict performance, or forecast learning outcomes in 29 studies (46%). Results show that prediction of performance is the main research objective for learning analytics in blended learning. In 88.9% of all the studies included performance, prediction or optimisation as the main objective. In 33.3% studies methods for unsupervised classification of students by means of clustering studies were used to categorise students according to certain criteria such as learning strategies, baseline disposition, learning process sequences or self-regulation. Sequence mining appears to be gaining in the learning analytics field with 15.9% of the studies exploring the concept, and, in most of the times it was coupled with clustering and visualisation. Yet, all the studies in this category have not researched the impact of these visualisation on teachers or learners. Studies that used SNA in the analysis are 15.9%, and, similar to process mining research, all of the articles have not used visualisation techniques for the sake of helping students or teacher to optimise learning. Qualitative research was performed in nine studies through the analysis of interviews or transcripts. Data mining and pattern recognition was performed in five studies.

Ethical and legal considerations
Irrespective of the necessity for considering ethical obligations in the use of student data, the papers did not provide documentations of such responsible use of personal data. Almost all the literature examined in this study, that is 99% of the articles, primarily focuses on LMS data. However, the ethical and legal aspects are very much under-represented in the discussions with only eight articles provided a clear evidence that they do not count on personal data or the data are de-identified. 22 of the 70 reviewed articles (31%) mentioned anonymising students.
Nevertheless, it is important to recall here that hiding the student names from the data set is not enough to guarantee that individuals cannot be identified [40]. For example, if a student who enrolled in a course in a specific year, with specific major and so on could possibly have a significant probability of resulting in a perfect attribute set for identifying a specific student. Such events might raise red flags for ethical concerns of how legible is it to consider that anonymisation of data is sufficient (ibid.). Although 40% of the articles indicate that they, at some point, considered ethical aspects when collecting data, which are those ethical aspects were, how did these aspects mattered in the data collection, processing and outcomes, were not been mentioned in any of the reviewed studies. An important observation here is that at least 24 papers among the reviewed studies explicitly focus on the collection, analysis and managing of individual's personal data. Although a more profound discussion to explicate the instrumentation of the legal and ethical procedure of retrieving and processing the sensitive pieces of the data is anticipated, a considerable gap in this focus in the articles is inevitable. Thirteen articles reported studies from Europe, but only six articles are mentioning that they have considered legal aspects and informed the students before the data collection, or the data has been anonymised. As nearly all of the studies were conducted prior to the GDPR rules in the EU [29], new and rigorous practices need to be applied in future LA approaches.

Contributions
The contributions of reviewed studies could be classified into three themes, such as i) understanding and predicting performance, ii) understand student's behaviours and profiles, and iii) understanding and improving the learning environments (for an overview, see Table 5).

Understanding and predicting performance:
To predict students' academic performance, random forest, linear and logistic regression, and ensemble modelling based predictive models provided satisfying results (over 70% accuracy) [2,51,77,80,81,102]. Similarly, a forecast learning outcome model (FLOM) was developed using interactive data to predict at-risk students [67]. However, FLOM achieved lower accuracy than other predictive models. On the contrary, student's data visualisations found helpful for teachers to detect anomalous situations [97]. Regarding appropriate time for prediction, studies discovered that two to six weeks data is sufficient to obtain accurate prediction [51,53]. However, the portability of predictive models across courses remains low [23,32]. Since prediction is entirely dependent on the supplied data, studies identified that LMS variables (e.g., access time), engagement indicators, self-regulated learning (e.g., self-efficacy and test anxiety), and collaborative learning (e.g., social stability, and time spent on task) variables have reliable predictive power due to their positive correlation with students achievements [2,30,71,80,90,91,102]. Nevertheless, for some courses (e.g., graphic design) tracking data became useless because different patterns exist in the effect of individual data variables [32]. Reviewed studies also disclosed that social network metrics (e.g., degree, authority and PageRank) could be employed to predict student performance [15,43,81]. However, using these metrics, the representativity of the predictive models would be limited [81]. In factors identification, four online (e.g., activities, video clicks, videos backwards and practice score per week) and three traditional (e.g., participation in after-school tutoring, homework and quiz scores) factors were identified that affect students' performance [53]. While, attendance, time spent in class, sitting position, and groups are essential for collaborative learning and selfefficacy, positive strategy, less anxious and less usage of negative strategy found important for self-regulated learning [2,72].

Understand student's behaviours and profiles:
To identify students learning patterns and behaviours, studies utilised student's participation, resources access, and other LMS data. For instance, based on students participations two learning behaviours emerged: sensing where students are more likely to participate in information access, interactive and networked learning activities, reflective where students are more predisposed to materials development activities [53]. Similarly, a study identified behaviours before and after midterm exams, for example, out-degree centrality, LMS visit, and time spent before midterm exams and discussion views and visit interval regularity after the midterm exam [51]. In the self-regulated learning context, based on resources access three patterns emerged: self-regulator, external sources users, non-self-regulatory and based on LMS data four behaviours emerged: continuously active, probers, procrastinators and inactive [12,20]. Likewise, based on resource use five different learning trajectories discovered: overall below-average activity, average resource use, higher use of resources, most active students, least active students [56]. Similarly, studies clustered and profiled students based on their learning behaviours, for example, four clusters (achievers, regular, half-hearted, underachievers) discovered using students' performance measures [57]. Likewise, using video annotation tool interactions, four profiles were created, such as minimalists, task-oriented, disenchanted, and intensive [60]. Correspondingly, students viewing behaviour were adopted to cluster consistent, slide intensive and less intensive students [27]. On the other hand, utilising e-tutorial and information systems data, six profiles emerged, which were the difference in overall activity level and the use of worked-out solutions [62]. In the same vein, based on LMS usages, three clusters were generated such as low, acceptable, and good and students have different patterns of learning behaviour in these clusters [59]. In self-regulated learning context based on authenticity, personalisation, learner control, scaffolding, and interaction, three profiles were identified such as self-regulating, external regulating and lack of regulation [21]. On the other hand, a considerable number of studies contributed in terms of identifying the association and effects of different learning behaviour on students' achievements. For instance, self-assessment exercises, regularly resources access, active online behaviour, and time management are significantly correlated with student learning outcome [5,18,23,34,49,52,63,79]. While, the use of videos annotations, metacognitive skills, and motivational strategies are weakly associated with learning achievements [54,55,73]. On the other side, procrastination behaviour, low level of participation, and dependency on worked examples could affect students learning outcome [54,60,89]. Furthermore, few studies discovered that students have a tendency to change their learning behaviour throughout the course and comparison can be conducted between successful and non-successful students based on their learning patterns [34,49]. 3. Understanding and improving the learning environments: Reviewed studies discovered that course material access without lapses, LMS access time, active learning days and teachers' monitoring influence learning results [1,7,8,44,45,65]. Whereas, worked-out solutions and engagements create adverse effects on students' achievements [41,66,85]. In the context of feedback provision, personalised feedback have a small to medium positive effect on the learning outcome [71]. In terms of intervention, learning analytics-based interventions improved student academic achievement, with a 10.6% higher score than blended learning without intervention [36]. Concerning improvement in courses, video viewing patterns, resource utilisation, course item frequencies and order of activities provide enough feedback to enhance classroom teaching and course resources [19,23,34]. Similarly, visualisationbased learning analytics allow teachers to identify which learning design elements should be revised and improved [59].

RQ1 and RQ2:
We raised the questions how blended learning is defined and how learning theories and perspectives are used in the reviewed learning analytics research?
In line with [46] we conclude that BL seems to have become somewhat of a metaconcept. Thus, as have been detailed in the results, blended learning is often not an adoption of one pure type of blended learning, but a combination of blended learning approaches. When BL approaches are combined, there is a greater complexity than in a "simple blend", why we propose that this can be referred to as complex hybrid learning. For example, we see combinations of blended instruction and blended distribution: optional adoption [e.g. 66,80], synchronous teaching of f2f students and online students [6] and iii) reversed distribution; [35,40]. Result show that SRL, by far, was the most common used theory. To operationalise SRL, engagement and other perspectives of learning, LMS trace data was used to collate the number of messages sent, documents uploaded, chat sessions attended, as well as data collected to analyse teacher presence (e.g. 1,54,79]. In line with [42], we conclude, that operationalisations relying on LMS data might risk to be superficial and oversimplified interpretations of the underpinning theory. However, some of the reviewed studies also explored relations between trace data, performance and SRL using self-reports [70,87,91]. However, results revealed that certain perspectives of learning were more common to explore than others: for example, the flipped classroom, collaborative learning and conversational aspects of learning. We thus call for innovative perspectives of learning, for example, complex or multi-modal data gathering, longitudal studies and mixed methods approaches. Conclusively, the theoretical underpinnings of a research study (including what is meant by BL and the learning perspectives taken), are needed to increase clarity, quality and validity of the objectives and contributions of that study to enable comparison and transparency a richer description of the actual blended learning environment approached.

RQ 3:
We also raised the question, what approaches of data collection, methods and analysis that are evident in the reviewed learning analytics research. One would expect that learning analytics in a blended learning scenario would account for the fact that the context of BL integrates both modalities (physical and digital). However, most of the reviewed studies have used digital traces, dispositional self-reports, relational data and interviews that does not fully cover the gamut of possible data sources of the classroom where a significant amount of learning happens. While predictive models have -in many cases-been able to infer future performance, they have failed short of explaining learning, or offer a guide on how to intervene in the classroom. Of course, collecting data in a blended scenario is not easy, and therefore more research is needed that collects contextually relevant data, and more importantly, on how to unobtrusively collect data in the classroom. We also recognise the benefits of visualisation as an intermediate step: albeit we found that visual analytics were rare [102]. Believing that eye-balling the data, the accessibility of instant overview, might support the teacher, we propose that more research on the impact of visual insights offered to stakeholders is needed. We found that blended learning studies did not use classroom data to investigate the complexity of both the online and f2f learning setting.
RQ4: We then explored how ethical and legal aspects are considered in the reviewed learning analytics research. In line with previous critique [e.g. 74; 84] we found that although ethical, privacy and legislator requirements exits, the current practises do not always consider these. While results reveal that almost all (99%) of the reviewed studies were conducted prior to the GDPR rules in the EU. 13 articles reported studies from Europe, but only six articles mentioned legal aspects, having informed the students prior to data collection, or considered anonymising the data. This raises critical concerns, as aside the GDPR legislation, ethical considerations need still be adhered to. This may also reflect general slow governmental responses to regulate consequences of the digitalisation.
RQ5: Lastly, we surveyed the contributions of the reviewed articles. The results revealed that reviewed articles made several contributions on predicting academic performance, identifying learning behaviours, and improving learning environments. With regards to predicting academic performance, machine learning-based predictive models was proven to be effective but with low portability across courses, whereas visualisation-based methods required teacher assistance [2,77]. Moreover, data variables effectiveness on performance prediction is based on course structure; however, social network metrics and variables related to LMS engagement, self-regulated learning, collaborative learning are found significantly correlated with academic performance [86,90,102]. In terms of identifying learning behaviours, results show that by utilising student's participation in online activities and resources access impactful learning behaviours could be identified, and these behaviours are beneficial to cluster or profile students based on adopted behaviours [56]. In the regards of improving learning environments, results show that learning resources that provide student assistance to complete their assignments create positive effects on learning outcome [1,7].

Conclusion and Future Research
As BL currently seems to be a more general concept, detailed descriptions of the actual learning situation, delivery, blend or hybrid solution is needed alongside clear underpinning theories to position the research, or as proposed an indication of whether one is approaching a "simple blend" or complex hybrid learning. We argue that in the current wake of the transforming distance learning, we see hybrid solutions, that raises awareness of a complexity of multiple blended solutions in parallel, that if not described, could mean just about any kind of learning, delivery or setting. We found that data used in many learning analytics studies were used as a proxy for what happens in the classroom. However, when studies do not include manifestations in the real classroom, they fall short of explaining learning, or offer a guide on how to intervene in the classroom. More research is needed that accounts for the context of BL and more importantly, on how to unobtrusively collect relevant data that enables the support of learning where it occurs. In the light of our findings of ethical and legal considerations, we strongly argue that while there are no established traditions in LA research in terms of legal requirements; new and rigorous practices need to be developed and applied in current and future LA approaches. Ethical consequences might be devastating and the field urgently needs to acknowledge this lack of consideration.