Paper — An Innovative English Teaching System Based on Computer Aided Technology and Corpus… An Innovative English Teaching System Based on Computer Aided Technology and Corpus Management

—With the development of modern science and technology, more and more computer technologies have been successfully applied in English teaching. Based on computer aided technology and big data corpus management, this paper improves the traditional teaching method into an innovative teaching mode with a big data corpus as English learning resource. On this basis, a computer multimedia teaching system was set up to realize automatic matching of subtitles and vivid restoration of contexts. The teaching system achieved excellent results in application verification. The research results can promote computer technology in English teaching.


Introduction
Corpus is a big electronic library with certain capacity established based on certain linguistic principles, with adoption of random sampling method and through collecting continuous language use texts or discourse segments appeared naturally. Applying corpus to English teaching can provide rich context for learners and is good for the language knowledge construction for the learners [1][2][3] The idea of making use of rich context of corpus to assist learners to accomplish active construction for language knowledge and the idea of teaching design based on constructivism stressing the important function of "context" for meaning construction is consistent. Researches about corpus in recent years have achieved great development, but there is certain technical difficulty in the application of powerful English corpus [4][5][6][7]. Therefore, in the mainly applications in abstract linguistic research field, dictionary compilation field as well as propositional work of important examinations, the majority of teachers lack necessary understanding of corpus [8]. This situation produces extremely low application rate of corpus in the field of computer assisted English teaching and its application potentiality has not been well explored [9][10][11][12]. There are also few researches on the important application value of corpus in assisting English teaching in educational technology circle.
With the continuous improvement of computer performance and continuous decrease of price as well as the rich English electronic publications and network English text resources and the popularization of scanner in teaching field, it is possible for individual teachers or small teaching collective to construct corpus for teaching [13][14][15]. Applying corpus into English teaching can help to change the traditional teaching mode focusing on teacher and assist the application of data driven learning model based on constructivism [16][17][18][19].

2
Basic Function of Corpus Index Software

KWIC index
The basic analysis methods of corpus are full text retrieval and word index. Word index form in most of corpus index software is KWIC index, which is "keyword search with context" [20]. Different corpus index software has different naming for key words and the key word in Concordance 3.2 is called as Headword. The following Figure is the KWIC index (making use of Make Full Concordance function) attained from full text retrieval for 1-6 text books of college English intensive reading with adoption of Concordance 3.2, in which Figure 1 is the alphabetical order of key words and Figure 2 is frequency order of key words.

Corpus basic parameters statistics
Concordance 3.2 includes some statistical functions for the basic parameters of corpus. Figure 3 is the statistical results for the basic properties of 1-6 text books of college English intensive reading with adoption of Concordance 3.2: In which, it includes types, tokens, type-token ratio, words/sentences and other parameters mentioned in the former introduction of common statistical parameters in corpus.
Make statistics for the word length distribution situation in corpus with word length distribution function, which is with important reference value for judging the difficulty of corpus text and language style [21][22][23][24]. Figure 4 is the word length distribution map of 1-6 text books of college English intensive reading with adoption of word length distribution. It can be seen from Figure 4 that there are 235364 tokens with word length at 4, accounting for 18.06% of the total tokens; there are 39149 tokens with word length at 9, accounting for 9% of the total tokens.

Collocation statistics of key words
Most of corpus retrieval software provides statistics for the frequency of collocation of key words and other parameters and some also provide the calculation function of collocation force between collocation words and key words [25][26][27]. Figure 5 is the statistical result for the collocation words of key word able in 1-6 text books of college English intensive reading with adoption of the collocation function of Concordance 3.2 [28][29][30][31].
It can be seen from the results that collocation function offers the statistical results of four words of one key word before and after the collocation position [32][33][34]. It can be seen from the example that the appearing frequency of be at 1-left position and to at 1-right position is obviously higher than other collocation words [35]. After making lemmatization for various forms of verb be, the situation becomes more obvious. It explains that in 1-6 text books of college English intensive reading, the fixed collocation "be able to" is with the highest appearing frequency among various uses of word able and should be emphasized. This frequency information can't be attained when searching the usage of one word in English dictionary, while if this frequency advantage is very distinctive, it is very necessary for the construction of language knowledge and skills. Source of Corpus

Electronic publications
With the increasingly rich electronic publications, it becomes very convenient and efficient to make use of the text provided by electronic publications as the source of corpus in teaching, especially some English learning materials, English texts and encyclopedia etc. in the form of electronic text [36].
Mp3 voices and corresponding texts are usually provided in English learning materials, such as various kinds of published VOA, BBC news English learning CDs as well as electronic discs of English salons and English abstracts; all of them can extract the text as corpus.
Some special English corpuses provide literature in text form and academic article, which can be used as an important source of corpus. For example, English Classics 1000 published by Fudan University Press provide two searching methods based on works name and author name, covering famous literature and academic works of English history, which is a valuable source of corpus.
English news is in close relationship with current affairs and life, which can timely reflect the changes and developments of contemporary English and represent the characteristics of contemporary English; therefore, it is an important source of corpus for English teaching. English news text can be collected from some famous English news websites and English news learning websites both at home and abroad.

Film subtitles
Use film subtitles to construct an audio and visual corpus for English learning is a field that has been ignored for a long time. This kind of corpus can be combined with related film and television resources and can be used to improve the English reading ability, translation ability and listening ability [37]. There are usually two ways to attain subtitles [38]; one is to extract dialogue from script text and the other is the extract from the plug-in subtitle file of film and television material.
Subtitles of most film and television materials on current network adopt plug-in form. Plug-in subtitle files focus on srt files, which can be opened directly by notepad as text file and is good to be used as the source of audio and visual corpus. We take the plug-in subtitles of the second part of film the Sound of Music as example to explain the structure of srt file [39][40].
It can be seen from this example that the plug-in subtitle file structure of srt form consists of three parts; the first part is number, which is the subtitles appearance order, as 1, 2, 3, 4 in above example; the second part is the duration time of this subtitle on screen, such as 00:00:15,975 --> 00:00:18,808 in above example, indicating that the following subtitles will be displayed on screen from 00:00:15,975 to 00:00:18,808; the third part is subtitle content, such as "Dear Father, now I know why you sent me here" in above example. This kind of subtitle file in srt form can be found on internet and usually Chinese-English corresponding subtitle can be found. There are many good websites providing subtitle downloading service for free and one of the comprehensive and famous websites is shooter website (http://www.shooter.com.cn).

Scan written text
At present, scanner can be attained easily for most of teachers. The error rate of transferring English written text into electronic text is very low, which can be accepted completely. When the teachers have English books, newspapers and other written materials suitable to be added into corpus on hand, while corresponding electronic text can't be attained from electronic resources and network resources due to copyright issue, adopting scanner will be the first choice.

Application Corpus for Teaching
Collect constructed text corpus for teaching with adoption of corpus index system and some other computer software tools with combination of above methods and many specific functions can be realized. We take Concordance 3.2 as example to explore some specific application mode of corpus for teaching.

Review vocabulary through KWIC index of intensive reading textbooks
In the former, we have introduced the KWIC index of all types in teaching material through indexing intensive reading textbooks. This can be a good material and tool for learners to review words. Through KWIC index, the learners can realize the construction and strengthening of language knowledge in a better way with the assistance of the specific application environment of each word in teaching material. Produce KWIC index into webpage form to upload to related websites assisting learners, as the tool of assisting learners to construct vocabulary knowledge.

Vocabulary learning model of learning environment model
Vocabulary learning model of learning environment model based on the constructivism of Jonathan stresses the active construction of learners for vocabulary knowledge under the guidance of teachers. In the process of vocabulary knowledge construction, rich and real example sentence with context for one or some key words is an essential condition. The learners need to make full exploration and consideration for example sentence and context and attain own conclusions. There can be adequate cooperation and discussion among members within the team. Main conclusion of this group can be proposed by taking this group as a unit to mutual reference and discussion among each group. Teachers can offer necessary help and guidance when necessary. It can also be verified in English dictionary. But it should be realized that not all the vocabulary information in dictionary is comprehensive and adequate. The significance of this learning mode lies in that realize active vocabulary meaning construction through exploring a lot of real contexts where vocabulary appears, but not lies in whether the summarization of learner is comprehensive and perfect; most of discussions are open and not taking seeking for standard answer as purpose. The following example is used to explain vocabulary learning model of learning environment model based on the constructivism of Jonathan. Figure 6 is to guide the students to make usage classification process for round with adoption of all contexts for round (32 in total) extracted from 1-6 text books of college English intensive reading. Figure 7 is deleting key word interest and used to guide students to make exercise of filling in blanks with adoption of all contexts for interest (10 in total) extracted from 1-6 text books of college English intensive reading with adoption of concordance 3.2. It has to point out that the main purpose of exercise of filling in blanks is reading and speculation for context and it is not important whether it can attain the deleted key word. On one hand, the context provided by the teacher is not adequate enough, which can attain the key words easily, under this situation, it is difficult to get results. But the exploration and discussion of learners within the team for context reaches the purpose of study and communication; on the other hand, the answer of filling in blanks is not the sole and can be open (especially under the situation with less co-occurrence times of context), all the answers can be accepted as long as meeting grammar, logic and context condition. It can be seen that this kind of vocabulary learning model of learning environment model based on the constructivism of Jonathan is on the contrary to traditional indoctrinate vocabulary teaching. Real corpus, open problem and discussion and cooperation among team members are good for improving the interest and enthusiasm of learner. This method also provides a new idea for computer assisted English teaching.

Assisted teaching with combination of English film and television information and film and television corpus
Assisted teaching with combination of English film and television information and film and television corpus can create two vivid contexts for learners. One is audio and visual context composed of film and television information and the second is textual context which is easy to check and compare and composed of film and television subtitle text and corpus. It is in line with that teaching design of constructivism "stressing on the important role of context for meaning construction". Film and television material assisted English teaching needs to possess two conditions. One is English dialogue and the other is English subtitles (it is better to have Chinese subtitles and make it easy for reference). A lot of English films in DVD form provide two kinds of subtitles of Chinese and English. If play the film files in other forms (such as avi or rmvb), it can consider about using MPC (Multi-Player Classic) with plug-in subtitle to reach the purpose of selecting Chinese and English subtitles. It must pay attention that when use plug-in subtitles, plug-in subtitles must have the same name as its corresponding film file and also in the same file.

Conclusion
In a long-term run, corpus is mainly controlled by linguists as well as compliers of major examination proposition groups and dictionary as a powerful weapon for foreign language learning and research. Its potentiality in the field of computer assisted teaching has not been well explored and the popularity degree is very low. There are active researches on computer assisted teaching field, which is with wide application and weak disciplinary uniqueness in education technology circle, while neglect the researches that corpus assisted English teaching has great significance in certain discipline teaching field. This paper introduces related knowledge and theory of corpus, discusses the theoretical basis of computer assisted English teaching based on corpus and makes preliminary discussion on the classification of teaching corpus as well as the application method of corpus with different contents in English teaching aroused by classification thought. It makes pioneering attempt in adopting film subtitle file to extract corpus for assisted teaching and constructs one film corpus library containing over 110 films with English subtitle and with over 1.1 million words. It is hoped to play the role of trigger and arouse the interest and emphasis of education technology circle for corpus assisted English teaching with great potentiality.