Parallel AES Encryption Engine for Many core processor Arrays using Masked S-Box

— With the ever increasing growth of data communication, hardware encryption technology will become an irreplaceable safety technology. In this paper, I present a method of AES encryption and decryption algorithm with 128 bit key on an FPGA. In order to protect “data-at-rest” in memory from differential power analysis attacks with high-throughput advanced encryption standard (AES) engine with masked S-Box is proposed. By exploring different granularities of data-level and task-level parallelism, we map 2 implementations of an Advanced Encryption Standard (AES) cipher with online key expansion on a fine-grained many-core system.


I. INTRODUCTION
With the development of information technology, protection of information through encryption is very important in day to day life. In 2001, national institute of standard and technology replaces the data encryption standard and select the Rijndael algorithm as the advanced encryption standard(AES) [1]. AES has been used in many applications, such as secure communication system, digital video/audio recorder, RFID tags and smart cards etc. One of the main advantage of Rijndael algorithm is that it can be used for both hardware and software implementation.
To satisfy many application numerous hardware implementation of AES has been reported to achieve high throughput even though time consuming and costly. One of the main block of AES is the SubByte transformation [1] which uses S-box look-up table that is stored in memory. This data stored in storage are under the risk of information leakage in embedded applications. The differential power analysis (DPA) attack [2] was further developed as one of the most promising power analysis attacks which is related to the power consumption. So the protection of data from DPA is very important. For that instead of using S-Box lookup table masked S-Box is being implemented. We perform the masked S-Box mainly over GF(2⁴). Therefore, we only need to transform the input values from GF(2⁸) to GF(2⁴) and transform the output values back from GF(2⁴) to GF(2⁸) which reduces the hardware resources.
This paper present the online expansion of two type AES implementation on a fine grained many core system to achieve high performance and throughput per unit of chip.

II. AES ALGORITHM
AES is a key iterated block cipher that contains several round of transformation on the state. It is a symmetric encryption algorithm uses 128 bit key to generate output cipher text. It takes 128 bits of data block and each 128-bit data block is considered as a 4-by-4 array of bytes, called the state. The number of iteration in the AES, Nr, is defined by the length of the round key, which are 10 for key lengths of 128 bits.
The figure 1 shows the basic steps of AES algorithm with online key expansion. The steps include: 1. SubBytes: Nonlinear bite transformation which replace each input byte with the byte value from the substitution box. Substitution box is explained in section All Rights Reserved © 2014 IJARECE 2. ShiftRow: Each row of the state is left shifted according to the row number. First row no shifting is done, for 2 nd row 1byte shifting is done and so on. 3. MixColumn: Each column of the array is considered as a polynomial over GF(2⁸) and modular multiplication is done with irreducible polynomial x⁴+1. The resulting polynomial is then multiplied with a fixed polynomial given in equation (1).

III. MASKED S-BOX
In SubByte transformation, each byte is replaced with a value from S-Box. Since there are only 256 representation of 1 byte, a lookup table of S-Box can be implemented. So the power and time consumption is reduced. But this result in differential power analysis (DPA) attach [3] [4].
So here S-Box using galois field can be implemented to avoid DPA attach. It can be implemented by taking the multiplicative inverse and apply the affine transformation. But calculating the multiplicative inverse in GF(2⁸) is very expensive. So masked S-Box is implemented that calculates multiplicative inverse of GF(2⁸) using GF(2⁴). The input byte is mapped to two elements of GF(2⁴) and then find out the multiplicative inverse using GF(2⁴). After that the two elemnts inverse mapping to GF(2⁸) is done. Figure 2 shows the steps to find out the masked s-box.

IV. FINE GRAINED MANY CORE ARCHITECTURE
The performance of architecture is roughly proportional to the square root of its complexity. So as the complexity is decreased the performance will increase but it may increase the logical area. So a many core architecture can perform better with complexity. That is instead of using single complicated core many core is used, which increases the performance.

V. AES IMPLEMENTATION
In this paper I present two different AES implementation with online key expansion and the throughput of the design is measured.

A. One task one processor (OTOP)
Each step in the AES algorithm is considered as a task as shown in the dataflow diagram in figure 3. Each task is mapped on to one processor in many core processors. So we call this implementation One Task One processor. For single iteration about 10 cores are required and after completing first iteration the same cores are used for the following iteration.

B. Loop unrolled nine times
To enhance the throughput, new design is implemented as shown in figure 4. Here each loop is done by another set of core. So loop unrolled nine times break the data dependency and work on multiple data block. About 60 cores are required to implement this design VI. RESULT I have implemented the proposed design with hardware description language which is synthesized using Xilinx ISE 14.1and ported the design to Spartan-6 LX45 FPGA. The table 1 shows the throughput obtained from the two designs. From this table it is clear that the loop unrolled nine times design is very much faster than one task one processor design.

VII. CONCLUSION
Secure -data-at-rest‖ and enhance the throughput are the important factor for large data transformation system. so, modern systems shift the data encryption from a software platform to a hardware platform. But the hardware based encryption still facing the possibility of DPA attacks. In this case, an AES with masked S-box has been proposed to resist the DPA attach with acceptable area on FPGA. The proposed masked -Box needs to map the input values from GF(2⁸) to GF(2⁴) at the beginning of the operation and map the result back from GF(2⁴) to GF(2⁸) once at the end of the operation Which reduce about 20% area resources. By implementing the design using Loop unrolled nine times not only protect from DPA attach but also increases the throughput.
ACKNOWLEDGMENT I would like to express my heartfelt gratitude and thanks to my beloved guide Ms. Neethu Bhaskaran, Assistant Professor, Dept. of Electronics and Communication Engineering, SNGCE Kadayiruppu, whose guidance I could complete the thesis work to the level I had planned, for the regular reviews and suggestions. It gives me great pleasure to thank her for the conviction she brought in into selecting the topic of work, and the technical and literary guidance she imparted through the different stages of its execution.