A Systematic Review of Voice-based Intelligent Virtual Agents in EFL Education

— Since its debut in the field of education nearly three decades ago, Artificial Intelligence (AI) has been considered a powerful tool to facilitate new paradigms for instructional design and innovative educational practice by means of intelligent tutoring systems, adaptive learning systems, educational chatbots, teaching robots, to name a few. Recent technological advances in the adjacent areas of natural language processing, machine learning, and computer graphics focusing primarily on design features that can improve their human-like qualities of naturalness and believability as human interlocutors have also amplified new application opportunities for Intelligent Virtual Agents (IVAs) or Animated Pedagogical Agents (APAs) within the area of Intelligent Computer-Assisted Language Learning (ICALL). Although AI-powered IVAs hold the potential to improve the learning process in nearly any knowledge domain and personalize automation in teaching by embodying different roles in the learning environment, strikingly few studies have empirically attempted to assess IVAs impact on L2 learners’ academic achievement when learning English as a Foreign (EFL) so far. The present article addresses this issue via a systematic review of relevant interventionist IVA studies that were conducted in EFL settings and published within the 2015–2020 timeframe examining IVAs key affordances, major bar - riers in their adoption for language learning purposes, and the CALL research trends currently prevalent on the topic. Pedagogical implications for the effective implementation of IVA technology in L2 contexts are discussed and future research avenues in the area are highlighted.


Introduction
The advent of Artificial Intelligence (AI), as an emerging technology in education, has been acknowledged to hold the potential to revolutionize educational practice and foster a transformation of traditional educational systems via task automation and mechanization [32].This has been manifested in the gradual proliferation of various applications intended to provide customized learning [33], [46], [21], offer dynamic assessments, and facilitate meaningful interactions in online or blended learning experiences [74].This is also reflected in Baker et al.'s [4] recent classification of educational AI (AIEd) tools into instructional-oriented systems, used by learners to learn a subject matter, i.e. learner-facing and teacher-facing, used by teachers to reduce their workload and obtain insight into learners' learning progress to proactively and promptly offer feedback and guidance in the process, while system-facing AI systems provide information for administrators and managers on the institutional level.Intelligent tutoring systems (ITS), virtual agents, and intelligent virtual reality have been cited as the most commonly available AI-based learning systems currently in use to support personalized and collaborative modes of learning [45].This trend is also prevalent in ICALL contexts where a substantial amount of extant empirical research has predominantly been dedicated to an investigation of the pedagogical expediency of chatbot- [20], [35] and ITS-assisted approaches [13] in foreign language learning to the exception of Intelligent Virtual Agents (IVAs) up until recently.
In practice, the development of IVAs materializes, in theory, the notion of human-like thinking in technological terms [61], performing various tasks with several challenges in diverse sectors of human life, yet IVAs have often been defined rather inconsistently by experts in computer engineering.Graesser & McNamara [27] describe IVAs as either scripted or intelligently adaptive learning environments with talking heads and facial expressions that speak, point, gesture, and instruct the learner what to do [37] as experts, mentors [40], tutors [54] and learning companions [24].In Rickel et al. [59], IVAs act as autonomous agents, immersed in a virtual world, capable of face-to-face interaction, equipped with a human face, and synchronized lip motions with dialog.Humanoid embodiment and IVAs cognitive capabilities are highlighted by Traum [69] as key traits of IVAs that enable users' engagement in human activities and meaningful interaction.This contradicts Burden and Savin-Baden's [9] most recent definition where a humanoid body is not considered an essential characteristic of IVAs "as long as they exhibit human-like behaviors (speech, gesture, and movement) and other human characteristics (emotions, empathy, reasoning, planning, motivation) and can be seen on a computer screen, heard through a speaker, or accessed in some other way" [9, p.13].The all-encompassing nature of this definition suits the purposes of this review as it seeks to address the pedagogical potential of voice-based, general-use IVAs (Amazon Alexa, Siri, Google Assistant) in L2 learning practice.Based on empirical research evidence available in relevant interventionist IVA studies in EFL settings, this review critically assesses the role of AI in the EFL classroom by exploring IVA integration as an emerging trend within L2 teaching practice and learning based on the following research questions: RQ#1.What are the pedagogical affordances of IVAs in L2 educational settings?RQ#2.What are the barriers inhibiting the adoption of IVAs for EFL learning?RQ#3.What are the current research trends followed with respect to IVA use in the EFL context?
The review is organized as follows: Section 2 focuses on the nature of IVA technology as distinguished from related AI-supported learning systems in the field of education 66 http://www.i-jet.orgfollowed by an outline of the study's research methodology in Section 3. Review results are discussed per research question in Section 4 and are further summarized in Section 5 along with implications and future research avenues for EFL practitioners and CALL instructional designers while concluding remarks appear in the last section.

Types of AI-based virtual agents
When attempting to define IVAs, it is useful to examine the different traits that could be used to differentiate them from other forms of computerized systems that may either show intelligence or are based on some human characteristics.As illustrated by previous research on AIEd technology [5], [34], this becomes a necessity as the boundaries between what is an IVA and what is not are rather fuzzy, and AIEd is often applied invariably to denote a wide spectrum of AI entities such as chatbots, teachable agents (ITS), embodied conversational agents (ECAs) and tutors.Although similar to IVAs' design and philosophy, these AI-supported systems differ technologically, in terms of type of intelligence and humanoid features they possess, to effectively sustain dialogue-based CALL leading to more positive social interactions, and thus allowing learners to practice in L2 meaningful conversations autonomously [6] as illustrated below.
Chatbots, also known as bots or interactive agents, are computer applications that respond like smart entities when conversed with through text or voice [39] due to embedded Natural Language Processing (NLP) technology [2], offering wide applications in education, health care, and marketing [1] as platform-independent tools that are instantly available to users without needed installations.When a chatbot is able to communicate with a user, it acts as an advisor and a companion with whom the user can build a long-term relationship [57], as is the case of Wysa, an AI-based commercial smartphone application in which the user can chat with to reduce stress by texting without exploiting non-verbal behaviors.In the field of CALL, chatbot technology has been found to be associated in practice with an increase in L2 learning motivation [44], an improvement of overall L2 proficiency [14] as well as with minor facilitation of specific language skills such as lexical inferencing [36], while offering ample conversation opportunities for practice [68].On the other hand, intelligent tutoring systems (ITSs) or teachable agents [48] are computer learning systems intended to detect complex principles of learning and to help learners acquire declarative knowledge and procedural skills based on powerful intelligent algorithms adjusted to learners' educational needs [25].ITS systems work with one student at a time.They track learners' knowledge, skills, idiosyncratic profiles of cognitive and affective attributes, as well as other psychological traits [62] and adaptively respond by utilizing computational mechanisms in artificial intelligence and cognitive science [26].Although cognitively intelligent, both chatbots and ITSs are clearly differentiated from IVAs in view of the absence of human-like traits and as such they are examples of an Artificial Narrow Intelligence (ANI) stage (Figure 1).
Embodied conversational agents (ECAs), defined in 2000 by Justine Cassell [10], replaced the textual interaction of chatbots and the voice-only interaction of intelligent voice assistants with a more natural interaction, combining verbal and non-verbal communication; their presence seems to improve the user's interaction with this AI system.ECAs are, in fact, endowed with a humanoid body, capable of exploiting human-specific communication modalities, such as voice and facial expression, gaze, gestures, head movements, posture, and displacement [7].They represent a more intuitive interface between the user and the computer system, and their level of complexity can vary greatly depending on the context and applications in which they are deployed.Still mainly present in the world of ASI research, they are often used to validate psychological theories of human behavior by concretizing them through computer models that control and generate the behavior of the virtual agent [66].Tutors also form a sub-class of embodied agents exhibiting an increased level of complexity used mainly for educational purposes provided that they have a knowledge base in a certain field as well as the ability to help the learner by providing easy access to information and boosting motivation.
Intelligent Virtual Assistants (IVAs), or intelligent personal assistants (IPAs) can be distinguished from other AI expert systems in that they are restricted to user assistance.They represent autonomous entities capable of perceiving their environment and acting on it to achieve a goal [60] by interacting with humans mainly through a synthetic voice and assisting users to perform generic tasks to improve their daily lives [19].To understand the user, they use automatic natural language processing to match the user's text with executable commands [67] while many IVAs learn continuously using artificial intelligence (AI) techniques, including machine learning.A virtual assistant (incarnated or not) differs from a virtual companion as, contrary to the latter, it is neither programmed to hold long conversations nor to create relationships.However, the virtual assistant can exhibit characteristics of the virtual companion, such as a personality to better perform their tasks, offer a more convincing experience, and imitate human relationships [56].The explicit goal of a virtual assistant is to ensure natural and inherently cognitive, linguistic, and collaborative interactions with the user undertaken in a fluid manner akin to meaningful and mutually reciprocal human communication.Amazon's Alexa, Apple's Siri, and Google Assistant are the three predominant cloud-based, general-use IVAs in the world of artificial intelligence, widely used in mobile and stationary devices alike for swift and convenient two-way communication with users through hands-free control and verbal responses [17].These assist in completing basic tasks such as consulting the weather forecast, checking the latest news, and setting reminders [71].Moreover, persona voice assistants' search results in terms of language and location are more relevant to the user [18].Following the work by Burden & Savin-Baden [9], the main characteristics of IVAs are as presented below [9, p.14]: • Manifests itself in a visual, auditory, textual, or similar form, • May have some embodiment within a virtual world, • May present itself as humanoid in manifestation and behavior, • Will have a natural language capability, • Must exhibit a degree of autonomy, • May have the ability to express, recognize and respond to emotions • May exhibit some aspects of a personality, • May have some ability to reason in a human-like way, • May exhibit some elements of imagination, and • May even have a self-narrative, but unlikely to have any indications of sentience.
Giles and Bevacqua [22] add the physical medium to this list, i.e., the agent can "inhabit" a single device or migrate into several devices.
Based on the type of intelligence (cognitive, emotional, and social) displayed, IVAs can further be classified into analytical, human-inspired, and humanized, representing different stages in the evolution of AI-supported IVAs technology as suggested by Kaplan & Haelein [38].Expert Virtual Assistants belong to first-generation AI systems and have no kind of intelligence, except for the inherent information program it is endowed with.The program uses "If -Then" algorithm to complete tasks and can only answer certain questions [38, p. 18].When interacting with users, they only offer installed alternatives to questions without any additional clarifying information while emotion recognition and customer interaction through word search are also not feasible options.Compared to Expert Virtual Assistants, Analytical Virtual Assistants use cognitive intelligence.They analyze past experiences to make future decisions and generate a cognitive representation similar to the real world.In turn, Human-Inspired Virtual Assistants differ from the Analytical one in the sense that they also possess emotional intelligence; this greatly affects their decision-making processes.They are mostly human-like characters, acting as attentive listeners and effective interlocutors and displaying life-like behaviors such as speech, locomotion, gestures, and facial expressions (i.e. in line with Burden& Savin-Baden's [9] characteristics outlined above.They express reactions and make a conversation based on the emotions of a person, as well as counterfeit human emotions [12].Humanized Virtual Assistants are still evolving and are considered the most sophisticated, combining cognitive, emotional, and social intelligence in one.In the near future, they are envisaged as the main tool to hold and analyze considerable amounts of past experience that will enhance interaction with people and probably with other assistants without human effort [47], while at the same time exhibiting additional features of self-consciousness [50]. In contrast to other AI software applications, the innovative element of IVAs in education hangs on two fundamental aspects that critically affect how they offer vigorous scaffoldings and easily provide additional help to learners with advanced technical voice tools [51], i.e., (i) using underlying technology based on deep learning technology to recognize learners' utterances without necessitating self-generated training data [43] and (ii) integrating IVAs in devices (e.g., Google's assistant, Apple's Siri on smartphones, and Microsoft's Cortana on desktop PCs), ascertaining ease of accessibility mostly through clicking or giving spoken commands that can render IVAs as daily aides in everyday affairs.To what extent the potential of IVA implementation, as an emerging technology can be extended and maximized in the educational practice of L2 learning remains to be examined within the realm of this review study.

Methodology for the review
This study adopts a systematic approach to reviewing the literature on intelligent virtual agents to obtain comprehensive insights into the state-of-the-art [41].
The scientific method of systematic research is considered to be superior when compared to conventional literature reviews as it enhances certain aspects such as consistency, replicability, reliability, and validity [73] and reduces redundancies in the published literature, allowing disparities and trends for future studies to be identified.The review process followed here is conducted within the 'Preferred Reporting Items for Systematic Reviews and Meta-Analyses' (PRISMA) framework and can be divided into three concrete steps [42]: Phase 1 Planning includes journal selection, delineation of inclusion and exclusion criteria for study selection, and definition of categories utilized in the analysis.
Phase 2 Conducting the review involves study selection, data extraction, synthesis, and coding scheme.
Phase 3 Reporting the review consists of result analysis and discussion of main results, tendencies, implications, and conclusions.
To further elucidate the procedure followed, the following stages were deemed necessary to consider:

Search strategy
This study reviewed only relevant work published in English on IVAs within the timeframe spanning from 2015 up until 2022.Scopus, ScienceDirect, Google Scholar, and CrossRef were the primary databases used to serve the purposes of our study and were searched separately.Highly-esteemed international peer-reviewed journals in the fields of educational technology and CALL including Computers & Education, Computers in Human Behavior Computers & Education: Artificial Intelligence, RECALL, CALICO, Computer-Assisted Language Learning, JALT CALL Journal, Language Learning & Technology were also manually searched.The keywords used in this review that helped the authors determine the scope and nature of virtual assistants in relation to education and training are: 'virtual humans', 'intelligent virtual assistants AND education/training', 'intelligent virtual agents AND education/training', and 'pedagogical agents'.

Selection criteria
To answer the research questions, a set of selection and quality criteria were determined to enable us to identify relevant empirical studies on the topic of IVAs in EFL education.The selection of the reviewed studies considered the following inclusion and exclusion criteria:

Study quality assessment
Following the application of the inclusion and exclusion criteria, a checklist for the evaluation of the quality of the chosen articles from a methodological-design perspective was completed.Thus, emphasis was placed on empirically grounded analytical studies, which are considered to be the most accurate forms of experimental research to support or refute a hypothesis [58].
The educational potential of the reviewed studies was put under the lens based on the following aspects:

Data collection and data analysis
The publications that satisfied the aforementioned inclusion criteria were further categorized after considering past comparable systematic research [29], [31], [55] on the use of AI technology in education.Each one of the reviewed empirical studies served as the unit of analysis, while the coding system applied for data extraction arose inductively and was continuously improved through our interaction with the data [8].Following the PRISMA principles, the literature search and selection process are presented in Figure 2 [49].Ten (10) studies were found to be eligible for this review after duplicates were removed, abstracts were examined, and full-text papers were reviewed.

Profile of selected studies
Prior work on the use of IVAs for L2 learning is scant and limited to a few ground-breaking, small-scale studies that have only recently spurred a renewed interest in the topic throughout the last five years.Table 1 displays a summary of all 10 studies considered in this review.All of the papers reported IVA implementation within a formal instructional EFL context, with none being used in the digital wild.IVA application in the EFL classroom was explored in relation to its impact on different aspects of L2 language learning: L2 speaking and listening skills (7 studies), L2 learners' perceptions of IVA use (2 studies), and willingness to communicate (1 study).Results will be presented in this section to answer each of the initial research questions of this review.

Pedagogical affordances of IVAs in EFL
Underwood [70] is one of the earliest studies on the topic exploring the use of multiple voice-based IVAs (Alexa, Siri, and Google Assistant) with 11 elementary school EFL learners over a period of nine months, using a teacher-led design research study.Co-design strategies were further employed to encourage children to reflect on their IVA-enhanced L2 learning experiences and help them develop and express their own ideas about what AI language assistants might look like and how they might be used.Key findings of the study revealed that: (i) L2 learners' interactions with IVAs led to more meaningful L2 English exchanges and fun overall, even when miscommunication gaps occurred in cases when a virtual assistant failed to understand a particular command.Instead of giving up, the learners persisted and tended to rephrase their questions in ways more likely to be understood and answered by IVAs, (ii) although L2 learners faced some difficulty in understanding IVA responses due to their fast speech rate, L2 learners reported to have benefited most from their interactions with IVAs when aural input could also be displayed visually (e.g.use of Siri and Google Assistant) in smartphones and smart speakers with built-in displays.
In a follow-up study grounded on the interactionist approach to SLA and conducted within the tertiary education EFL context, Dizon [16] explored the potential of Amazon Alexa Echo Dot to support L2 listening and speaking skills for 37 undergraduate Japanese EFL beginner to intermediate learners in their first and second year of their studies who took the same elective English course to improve their communication skills through conversation, discussion, and presentation.Learners were initially surveyed in relation to their past experience with smart speakers responding that they had never used one prior to their participation in the quasi-experimental design of the study where the experimental group received a 10-week treatment of student-IVA interaction with Alexa, either individually or in pairs.Results showed that the treatment group significantly improved their L2 speaking proficiency, but not their L2 listening comprehension, as there was no significant difference between the control and experimental groups in this respect.This result was partly attributed either to EFL learners' inability to fully comprehend Alexa's responses as IVAs fail to successfully modify the output to promote enhanced L2 comprehension or to their focus on speaking practice during interactions, which may have thwarted them from paying close attention to Alexa's responses.Aligned with Underwood's [70] findings above, EFL learners' views of Alexa for in-class L2 learning were equally very positive, indicating that they not only enjoyed using the IVA for L2 learning but also perceived it to be a practical tool for learning English that could be utilized either for personalized or collaborative study.This finding correlates with Dizon and Tang's [15] mixed-method case design study on IVA use for self-directed, out-of-class language learning where EFL Japanese learners also perceived Alexa and the Echo Dot smart as a pedagogically effective tool with potential for the development of vocabulary acquisition and meaningful interaction in an L2.Hsu et al.'s [30] experimental study showed the impact of Amazon Alexa Echo Show on listening and speaking skills along with learners' perceptions towards Alexa's use for language learning purposes on 50 L2 Taiwanese college learners.The experimental group received seven Alexa sessions while all participants were asked to take pre-and post-mock TOEIC listening and speaking tests and complete a survey questionnaire.Results replicate Dizon's [16] findings in the experimental group, demonstrating a significant effect of IVA use on L2 learners' speaking but not on their listening ability which was mainly attributed to the opportunities provided by learner-IVA meaningful interactions to intermediate-level L2 learners to brainstorm on meaning, receive feedback, notice errors and modify their language.In relation to L2 learners' perceptions of using IVAs in the EFL class, findings concur with Dizon [16], Dizon and Tang [15], and Underwood [70], with learners highlighting Alexa's usefulness in the development of L2 speaking skills for specific purposes (e.g., presentation skills).The same positive disposition is also reported in Moussali and Cardoso's [52] small-scale study where Amazon Echo was considered a user-friendly, enjoyable, and helpful pedagogical tool for language learning, providing opportunities for input exposure and output practice while motivating learners to learn on their own.This is also highlighted in Moussali and Cardoso's [53] follow-up study that assessed Amazon's Alexa ability to recognize and process the different accents of non-native accented speech based on the accuracy of IVA's answers for pre-set questions.Results indicated that L2 learners with differing levels of linguistic proficiency faced overall no significant intelligibility issues in their communication with Alexa, as it could easily understand and accommodate accented speech, effectively detect pronunciation and lexical issues, and promptly provide learners with implicit feedback, prompting them to detect erroneous forms in their production of the target language.
Chen et al. [11] showed the effect of language proficiency on 29 L2 Taiwanese college learners' perceptions toward using Google Assistant (GA) when interacting with it for EFL learning purposes.In line with the findings reported by Dizon [16], Moussalli & Cardoso [52], [53], and Underwood [70], analysis of the data revealed L2 learners' overall favorable viewpoints toward the use of GA, which enticed their interest in considering ways to develop their vocabulary and oral skills.The perceived pedagogical utilization of GA was heavily influenced by the degree of mutual comprehensibility that L2 learners with different language proficiency levels achieved with GA.Contrary to Moussalli and Cardoso [53], results demonstrated that learners with higher language proficiency tended to benefit more from their GA interactions as they considered themselves better understood by the IVA than low-level learners who faced more challenges largely due to mispronouncing particular words.As in Hsu et al. [30], interacting with GA was found to be useful for intermediate and upper-intermediate L2 learners enabling them to identify their pronunciation errors or mistakes and offering more exposure to authentic pieces of conversational exchanges.The same finding was yielded in Gonulal's [23] study where intermediate to upper-intermediate EFL Turkish learners also seemed to have benefitted immensely from their interactions with an IVA in L2 learning in terms of L2 oral fluency and vocabulary acquisition.
GA has also been targeted in two related studies and studied for its impact on EFL learners' willingness to communicate (WTC) [65] and learners' oral proficiency outside the classroom [64].In line with [53], high-school L2 learners in Taiwan in the former study with low WTC showed more willingness to interact in English during Google-Assistant-language-learning (GALL) activities, were more confident to interact with other learners in English, and asked for help, stating that their interaction within the less threatening environment provided by GA helped them develop L2 fluency by lowering their levels of speaking anxiety.The effect of GA built-in on the smartphones of 89 Chinese college EFL learners' out-of-class oral proficiency was also studied by [64] via self-directed interactive activities over a period of one semester.Findings indicated that out-of-class use of GA significantly enhanced EFL learners' oral proficiency in terms of (i) fluency promoted within GA's anxiety-free and interactive environment in parallel to [53] and [65]; (ii) content and vocabulary that was supplemented by L1 and L2 learning support and multimodal feedback provided via audio, text, and visual aids; (c) pronunciation, due to the opportunities provided by GA for L2 learners to practice speaking accompanied by multimodal presentation of feedback akin to [16] and [52], [53] findings; and (d) use of simple grammatical structures.No significant improvement was found for high-proficient EFL learners' L2 speaking skills due to the simplistic, non-challenging nature of dialogue content used for practice between learners and the IVA.

Barriers in IVas implementation in EFL
Common barriers that hinder the effective integration of IVAs in the L2 learning process can be classified into three distinct categories based on the relevant reviewed studies: (a) communication breakdowns in IVA-learner interactions.L2 learners most often reported difficulties regarding the mutual comprehensibility issues that hindered communication with the IVA, thus engendering feelings of distraction from the learning process [64] and leading to abandonment [15].Technological issues such as fast rate of speech [70], [16], the advanced level of vocabulary [16], pronunciation errors [54], [11], mispronunciations and late responses [30], and inaccurate voice recognition [70], [52], [64] that often result to inappropriate search results when learners speak simultaneously were identified as the most common causes for miscommunication in learner-IVA interactions in EFL settings.Such issues were primarily found to be associated with pronunciation errors, pauses in speech, wrong sentence structure, and stuttering preventing the virtual assistant from understanding learners who determined to overcome these difficulties resort to repetition, rephrase a command or pronounce words differently [54], [11].(b) IVAs inability to imitate human-human interaction.Although the utility of IVAs is presented as a useful, easy-to-use, and convenient AI tool to promote speakinglistening skills in the L2 context, its limited linguistic abilities [65] in terms of simplistic modified output as evidenced in the production of IVAs' mechanical [11] and often irrelevant responses [65], negatively impacts L2 learners' opportunity to practice extensively their L2 speaking skills on an individual, on a less threatening basis given the absence of available corrective feedback to enhance their efforts.GA's limited capacity to provide unique humorous responses to users' utterances or commands has also been reported to significantly downgrade EFL learners-IVA interactions leading to 'humor fatigue' over IVA's canned responses to joke requests [23].(c) technical issues that were mainly related to Wi-Fi or connectivity issues [15], internet speed, and problems with the Automatic Speech Recognition Technology embedded in IVA systems [65].

Research trends in IVas in EFL
The mixed methods research design was predominantly employed in 6 out of the total 10 L2 studies on IVA reviews used here (Table 1) based on survey questionnaires and semi-structured interviews for data collection involving samples of L2 learners with low-level English language proficiency that ranged between 11 and 122 L2 learners.The experimental method was also used in two studies [16], [30] involving the administration of pre-and post-L2 speaking and listening tests to examine differences in performance between experimental and control groups before and after the implementation of IVA intervention in the L2 learning process.Finally, small-case studies were undertaken in two instances involving a small number of participants deploying either qualitative [70] or mixed data-gathering approaches [52].

Discussion
This review identifies the pedagogical benefits and challenges associated with the introduction of IVA technology in FL learning and teaching practice and maps out current research trends in the field.With respect to our first research question, using IVAs was found to engage L2 learners in meaningful and joyful L2 English language interactions and improve their L2 listening skills both inside and outside the classroom, thereby incrementally leading to increased autonomous learning.Learners were favorably inclined toward the use of IVA for self-directed FL learning as IVAs were considered to be entertaining and easy to set-up and use systems, providing realistic contexts for human-machine interactions, adapting to learners' language learning needs (e.g., pronunciation issues) in a less threatening environment with learners exhibiting greater levels of participation, enthusiasm, confidence, as well as willingness to take risks when engaging in L2 conversational exchanges.With respect to the second research question, challenges related to the effective integration of voice-based IVAs in L2 education are mainly linked to the quality of learner-IVA interactions [16], as this was reflected in the communication breakdowns attributed mainly to IVAs inadequacies in their embedded Automatic Speech Recognition technology as well as to their linguistic abilities that obviate the provision of rich modified output to promote meaningful interaction and, by extension L2 learning.Key findings for the third research question indicated that mixed methods research design was the most predominant methodology adopted by most studies in the field, followed by the experimental design that used small sample sizes of varied linguistic ability in both instructional and informal language learning contexts.However, as uncovered by our analysis, IVA integration, in the L2 education field is still relatively under-explored, suggesting the need for an interdisciplinary orientation in future research that will focus on IVA instructional design features and the underlying pedagogy for their deployment in EFL contexts [63].Such research could be directed toward the following areas: • the development of new techniques adopted to enhance the design and encoding of IVAs' responses database to natural language inputs as well as the increasing use of automated strategies for the acquisition and construction of databases using advanced technologies (e.g.Neural Networks) as a way to ensure L2 learner-IVA meaningful and authentic interactions [3].Humanoid intelligent agents or 'Holographic AIs' proclaiming future advances in Augmented Reality (AR) have most recently emerged as a possible alternative to disembodied vocal IVAs, with a potential pedagogical expediency that needs to be further substantiated [28].• the investigation of the extent to which L2 learners' cognitive and affective characteristics and contextual factors can influence active engagement and long-term gains in L2 learning with the aid of IVAs.• a multidisciplinary theory-driven exploration of L2 learner-IVA interaction as a reciprocal effective and meaningful communication process leading to significant learning gains based on research in the areas of learning theory, psychology, and instructional communication.• more longitudinal IVA intervention studies to evaluate their long-term effects on L2 learning-related aspects in terms of technical feasibility and pedagogical expediency involving focus group discussions to assess whether the users' learning needs and expectations are met [72].
Due to the nature of the review, selection, and filtering process, the following limitations can be associated with this systematic review: (i) it is likely that pertinent empirical research was missed (e.g.book chapters, conference proceedings) despite our thorough literature search.However, limiting our attention to only high-quality publications was to warrant that all studies included in our study had been subjected to a rigorous peer review process, (ii) included reviewed studies that contained the term 'intelligent virtual agent' as a descriptor in their title, abstract, summary or keyword list and were mainly written in English, thus related research reported in other languages was excluded, (iii) manual searches of certain CALL international peer-reviewed journals may have led to the omission or mistaken rejection of relevant articles, and (iv) the total of only ten IVA empirical studies in L2 learning leading to a cautionary approach with the interpretation of their results in EFL practice.

Conclusion
The aim of this review was to summarize current L2 interventionist studies on general-purpose voice-based IVAs addressing their pedagogical affordances and barriers inhibiting their effective integration in L2 educational practice.Evidence underscores the added pedagogical value of IVA newly-emergent technology in the iJET -Vol.18, No. 10, 2023 field of EFL learning promoting authentic interaction in the target language through increases in motivation and perceived novelty.Yet, as education enters the fourth industrial revolution era, substantial research needs to be expended on voice-based IVA instructional design and implementation in FL contexts for the improvement of specific L2 skills paving the way for eXtended reality (XR) language learning with the introduction of AR-based embodied virtual assistants.

Fig. 1 .
Fig. 1.Evolution of AI technology (Adapted from: Kaplan & Haenlein, 2019) (a) the instructional system design and research methods employed in the studies to measure the effective implementation of IVAs within EFL contexts in terms of successful language learning outcomes.(b) their scientific contribution with respect to the technological development and effective integration of AI-powered IVAs in EFL settings.(c) the impact of IVAs use on EFL learners' academic performance, engagement and motivation when embedded to support different teaching approaches.iJET -Vol.18, No. 10, 2023

Fig. 2 .
Fig. 2. The PRISMA process for literature search and selection