A Novel Machine Translation Method Based on Stochastic Finite Automata Model for Spoken English

— Stochastic finite automata have been applied to a variety of fields, machine translation is one of them. It can learn from data and build model automatically from training sets. Stochastic finite automata are well adapted for the constrained integration of pairs of sentences for language processing. In this paper, a novel machine translation method based on stochastic finite automata is proposed. The method formalized rational grammars by using stochastic finite automata. Through given pairs of source and target utterances, our proposed method will produce a series of conventional rules from which a stochastic rational grammar would be inferred, and the grammar is finally converted into a finite state automaton. The efficacy and accuracy of our proposed method is evaluated by a large number of English-Chinese and Chinese-English machine translation experiments.


Introduction
In mathematics and computer science, stochastic finite automata had been introduced to resolve problems of pattern recognition, language processing and so on. Stochastic finite automata have become a useful mechanism for language processing because of its several advantages. There were some algorithms proposed and used in grammatical inference based on stochastic finite state automata [1][2][3][4][5][6].
The most important reason of using stochastic finite machine for language translation is because that can be learned automatically from training datasets. So there are many regular grammars from finite state sets. Some of them are based on formal language theory which can be built by inferring simple grammars that recognize languages [7][8][9].
In this paper, a method using stochastic finite automata to learn rational grammar for language translation is proposed. The method can produce inferring finite-state transducer, which can perform the sentence-level and phrase-level language translation very well.
The rest organization of the paper is as follow. The basic definitions and notations are presented in Section 2. In Section 3, a novel method based on stochastic finite automata model for spoken English translation is proposed. The experiments and analysis are given in Section 4. Finally, Section 5 is devoted to conclusions.

Stochastic finite automata
Stochastic Finite Automata (SFA) can be represented by a tuple , where represents a finite set of states; represents the initial state of ; and is the source alphabet and target alphabet respectively.
represents the set of final states; represents a transition function, and represents a set of transitions, where represents the set of finite-length strings on , is the same. An example of transition function is , A translation establishes of length in SFA can be defined as a sequence of transitions as follows: (1) where , and . A pair can be called translation pair if and only if there is a translation form of length in SFA.
represents translation set correlated with the pair in SFA. Function can denote transition probabilities, and it is the embodiment of stochastic. A set of strings can be defined as , . The probability of a string x can be defined , the probability of a string x beginning from the state q is . A rational translation is the set of all translation pairs of SFA. So is a rational translation if and only if there is an alphabet , a regular language , and two morphisms and .
can be expressed as .

Statistical machine translation
Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora [10].
According to the probability distribution which can be modeled by the SFA, a string e in the target language (for example, English) can be translated to a string f in the source language (for example, China). So the language translation problem changed into pick up the one that gives the highest probability. Finding the best translation can be defined as follows: where denotes a target string which is translation of a source string f in . A stochastic finite-state language transducer can be defined as a tuple , where the definitions of are the same as those in stochastic finite automata, function and function which must meet the following requirements: The transitions set of SFA is the set of tuples in whose probabilities are greater than zero, and the final states set are those whose final-state probabilities are nonzero.
The sum of the probabilities of total translations in can be expressed as follows: where the probability of a translation can be defined as follows (5) If , it means that there is no translation for in SFA, the following equation can be gotten: Through stochastic finite-state language transducer , the translation of a source string s can be expressed as follows: The source and target regular languages can be denoted by and ,respectively.
It is a difficult to find an optimal solution for Equation (7), the approximate solution can be achieved on the basis of the following approximation [12]: So the approximate translation could be expressed as: The Equation (11) can be computed efficiently by solving the following recurrence: The final approximate translation will be expressed with the following translation form, which is a series of target strings. (15) An example of translation based on statistical machine is shown in Figure 1. The corresponding translation with finite-state transducers is shown in Figure 2.

A Machine Translation Method Based on Stochastic Finite Automata
In this paper, a machine translation method based on verb choice bias is put forward to introduce the preference of verbs to the object in the process of translation, which helps translation system to improve the accuracy of the selection of object candidates. We use conditional probability method based on stochastic finite automata to learn automatically from corpus and get the preference of verb to object.
We suggest the following steps for learning a stochastic finite automata transducer. Firstly, a finite set of string pairs is given, each string pairs will be transformed into a string z by alphabet , which yield a set of strings . Then a stochastic regular grammar can be inferred from . Next, the phrase pairs of the target verb phrase pair and the source verb and the target object are extracted respectively. The grammar rules will be transformed back into pairs of the form of symbols-strings again. Finally, based on the conditional probability method, the choice bias and cross semantic choice bias of the verbs is trained respectively, and the preference of the verb is added into the decoding process. The whole scheme can be shown in Figure 3. The first step of the transformation process can be modeled by the labeling function , and the inverse transformation is labeled ,which consists of a set of morphisms , . So for a string , . Because is typically the inverse of , the and will be determined by in our method. So the transformation of corpus is the most crucial step, and a simple inverse labeling mechanism must be designed.

Semantic choice bias based on conditional probability
First, the choice bias of the monolingual meaning is calculated only at the target side. It can be defined: a verb v in the corpus under semantic relations r, the noun n as the parameter of v would reflect the possibility of selection bias strength. It can be evaluated by : where represents the frequency of occurrence of verb v in the corpus, represents the frequency of co-occurrence of verb v and noun n under semantic relations r.
We can map the semantic relation r into verb-object relation. The selection bias of verb to object can be defined as : where represents the frequency of occurrence of verb in the corpus, represents the frequency of co-occurrence of verb and object under the verb-object relationship.
The calculation method of the choice bias eigenvalue in the translation interval (i, j) can be expressed as follows: (18) where N represents the number of pairs of phrase (verb, object) in the current translation interval, represents the choice bias of the monolingual meaning which was trained through conditional probability method.

Statistical alignment
Our machine translation method is combined with the alignments between source and target words. We can define the function as the alignment of a string pair , the denotes that the position j of string t is not match any position in string s. represents all the string matching between string t and string s.
represents the probability of translating string s into string t by a given alignment b [13].
The alignment between string s and string t can be obtained as follows: An example of Chinese-English sentence alignment is given as shown in Figure 4. ( , ) Each number in parentheses of the example denotes the position in the source string which is matched the position of target string. The graphical representation of the alignment is illustrated in Figure 5.

Fig. 5. Graphical representation of the alignment between a Chinese sentence and an English sentence
Another example which is a training sample composed by Chinese-English sentence is shown in Figure 6.  In the example, the English word "double" could be matched to the second Chinese word and the English word "room" to the third Chinese word.
Given source string s, target string t, and alignment b, the possible transformation can be defined as follows: (20) If the order of target string is not violated, each word from string t would be joined with the corresponding word from string s by the alignment b. Otherwise, the target word would be joined with the first word of source string which doesn't violate the order of target string.
When finishing the above steps, successive isolated source words can be joined to the first extended word which has target word assigned. Assumed that z is a transformed string obtained from the above step, and ) is a subsequence of z. Then the subsequence ) would be transformed into .

Stochastic regular grammar inference
In this paper, we used n-grams to infer regular grammar. The state transitions inferred from above examples are shown in Figures 8 and 9 [14]. The probabilities of the n-grams can be computed according to the counts of string in the training set. According to the sequence of words , the probability of word can be evaluated as follows: where represents the number of string occurences in the training set. The n-gram model can be represented as a stochastic finite automaton. For the ngram , there exists a transition from state to state according to Equation (21) [15]. In order to transform the grammar symbols to a finite-state automaton, based on two morphisms is used: , . For which represents a transition of the inferred grammar, its definition is , the corresponding stochastic finite automaton can be expressed as .If the states and are and , the probability of the transition will be when .

Experiments and Analysis
The bilingual training corpus is composed of partial subsets of the LDC corpus, there are a total of about 4 million pairs of Chinese and English aligned sentences. Table 1 illustrates the summary of this corpus [16].
Firstly, we use our proposed method to translate Chinese to English, the example of our experimental sentence shown as Figures 10.
Results for the Train Set 1 and Train Set 2 corpora from Chinese-English translation are shown in Tables 2 and 3.

Conclusion
In this paper, a method based on stochastic finite automata has been proposed for inferring stochastic regular grammars. Our proposed method which was trained from source-target pairs, can achieve better effect in Chinese-English and English-Chinese translation. The effectiveness of the machine translation method had been evaluated with sufficient training data. Experimental results show that our method works better than traditional machine translation methods. 6 6