Multi-Dimensional Constrained Cloud Computing Task Scheduling Mechanism Based on Genetic Algorithm

— A new Cloud Computing task scheduling mechanism with multi-dimensional constrained based on Genetic Algorithm (GA) was proposed to reduce time to complete tasks and to improve users’ satisfaction by reducing expenses. The algorithm improves fitness function with comprehensive efficiency constituted of shortest completion time and minimum cost. Finally, simulation results showed that this algorithm is highly efficient and meet not only performance needs but also economic needs of users.


INTRODUCTION
Currently IT resources were used as services in Cloud Computing to provide to user through high-speed Internet. Users could dynamic access to acquire resources what their need. As a new computing model, tasks in Cloud Computing would be distributed across hardware resources pool constituted by IaaS (Infrastructure as a Service), and would dynamic gain computing, storage, networking and software resources on demand. Because of hyper-scale, on-demand service, virtualization, high reliability, high availability, low price and others, Cloud Computing is attracting more and more people.
As an important part of Cloud Computing environment, the core function of task scheduling [1], in essence, is assigning tasks to resource nodes that were selected for meeting user's demand and some maximize evaluation standards based on resource nodes running status. However, because of heterogeneity and dynamics of Cloud Computing environment, traditional task scheduling algorithm simply to improve system throughput or shorten task completion time does not applied in Cloud Computing efficiently. At the same time, the dynamic demand of users often results in uneven loading of Cloud Computing environment and other issues. So the traditional task scheduling algorithm is not effective to meet demand of performance and user's satisfaction. Therefore, people proposed a lot of different algorithms to resolve combinational optimization problem effectively, such as Immune Evolutionary Algorithm [2], GA [3][4], Euclidean Distance Liked-Load Balance Scheduling Algorithm [5], Hybrid Particle Swarm Algorithm [6]and Time-shared, Space-shared allocation algorithm, etc. Those algorithms had made good progress obviously. GA with using shortening task completion time as the only evaluation criteria would not meet requirements of entire Cloud Computing load balancing and user's satisfaction when it was applied to Cloud Computing task scheduling process. Especially it was very difficult to maintain the demand balance between performance and economy. This paper putted forward a multi-dimensional constraints task scheduling algorithm based on GA to improve the task accomplished comprehensive benefit(TACB), which is consisted of performance benefit and economy benefit. Taking both task completion time and the economic needs of user into account would improve efficiency of task scheduling in Cloud Computing environment.

A. Cloud Computing Task-Resource Model
In this paper, defined one set T = {T 1 , T 2 , T 3 , ..., T n } as the set of user tasks ,which is consisted of n independent tasks, and each task has a quintuple form description: T i = <ID, LENGTH,INSIZE,OUTSIZE, PERF> Where ID represents the task identifier, LENGTH is task length, INSIZE is the input file size of Map-Reduce operation, OUTSIZE is the output file size of Map-Reduce operation, and PERF represents the TACB value. R={R 1 , R 2 , R 3 , ...., R m } is a set of resource nodes consisted of m heterogeneous resources, and each resource node could be expressed with a septet form: R i =<ID, pesNUMBER, MIPS, RAM, SIZE, BW, COST> In this septet, ID is the resource node identifier, pesNUMBER, MIPS, RAM, SIZE, BW represent respectively CPU numbers, CPU processing performance, SPECIAL FOCUS PAPER MULTI-DIMENSIONAL CONSTRAINED CLOUD COMPUTING TASK SCHEDULING MECHANISM BASED ON GENETIC… memory sizes, file system sizes and bandwidth in each resource node, and COST is total expenses of each node.

B. Multi-dimensional Constraints Description
Here, defined the task expectation time of completion matrix with ETC.
& In this matrix, et ij is the expectation completion time that task i is executed on resource node j.
Matrix AL represents resource-task allocation matrix [7], which record allocation situation of resource nodes and tasks.
In this matrix, al ij = 1 means that task i was assigned to resource node j, and al ij = 0 means not assigning.
In order to meet performance and economy needs, task scheduling used TACB as the evaluation criteria. Performance benefit referred to the task completion time, and economy benefit used task execution expenses as reference. Using evaluation criteria with TACB would avoid high costs because of excessive pursuit for high performance, or low costs resulting in low performance. On basis of maintaining performance requirements of users, GA with multi-dimensional constrained would reduce as few expenses as possible.
Defined Performance Benefits Evaluation Function(PBEF): (1) et ij is an element of ETC matrix, and al ij is an element of AL matrix. et ij has same position in ETC with al ij in AL. k is a coefficient of reference. The PBEF indicates that different way to assign tasks would have different completion time. The shorter task completion time is, the higher performance benefits are. On the contrary, the assigning way has low performance benefits if it has long executing time.
The economic benefits are primarily based on the costs of task scheduling. The evaluation function is: In summary, TACB of Cloud Computing was consisted of performance benefits and economic benefits to evaluate the efficiency of task scheduling: [0,1], where ! and " represent respectively user's preference of task completion time or costs. ! is larger when users are more interesting in the completion time. On the contrary, " being larger indicates that lower costs are more popular.
III. THE TASK SCHEDULING MECHANISM WITH GA GA was first proposed in 1975 by Professor John Holland, and its two most notable features are parallelism and global solution space search. It is an effective parallel global search algorithm based on the theory of natural evolution [8] and genetic variation to solve global optimal solution in nature. According to specific issues of Cloud Computing environment, GA would construct a fitness standard to evaluate genetic population, which are consisted of multiple-solutions (each solution mappings of a chromosome), and execute selection, crossover, variation operation through multi-generation reproduction to find best individual as the optimal solution. Its implementation process is shown in Fig.1.  Those show that task 0 was assigned to the resource node 1, task 1 and task 3 were assigned to the node 2,task 2 was assigned to the node 0, and task 4 was assigned to node 3. When got all decoding results, we could determine the value of TACB based on fitness function.

B. GA Operation Processes 1) Fitness Function Selection
The fitness function of GA is very important and would relate to the convergence speed of GA and the effectiveness of solution. According to the comprehensive consideration of user's performance and economic needs, GA used TACB as the fitness evaluation standards in Cloud Computing environment: [0,1]. ! and " represent user preference of task completion time and expense. TimePer (i, j), CostPer (i, j) were got respectively from function (1) and function (2). This paper focuses on the effectiveness of algorithm compared with other existing algorithms, so it chose the no significant demand preference to simulate. Assumed that: !="=0.5. Whether or not different demand preference affecting the efficiency of algorithm is not considered here.

2) Selection Operation
Selection operation with Roulette-Selection is used to determine how to select individuals from parent populations to inherit in next generations for determining reorganization or cross-individual. According to the proportion of each chromosome's fitness value in current population, GA selected the chromosome with largest value to execute crossover operation and mutation operation. The probability of the individual selection is: is the sum of all chromosomes TACB values. SCALE is the population size. The individual would have more possible to generate with larger TACB value.

3) Crossover Operation and Variation Operation
Crossover operation would select two chromosomes to reform new chromosomes with interchanging their part of genes according to the crossover probability, and it's the main method to generate new individuals. This paper chose single point crossover, which is simple and effective, as the crossover method. Variation operation would reform new individuals by changing some genes in chromosomes with variation probability, and it's the helper method to generate new individuals. It determines the local search ability of GA, and maintains the diversity of population. In GA, Crossover operation and variation operation cooperate with each other to support for completion global and local search on solution search space. We used constant C cross , C variation to represent the crossover probability and mutation probability. Variable r was acquired from Random function to confirm whether execute crossover operation or variation operation or not. r [0,1]. If r > C cross , execute crossover operation, or not; If r > C variation , execute variation operation, or not.

(4) Catastrophe Optimal Operation
To further improve global search ability and avoid emerging local optimal solution in the process of evolution, GA introduced catastrophe theory, which is natural evolutionary process. When external environment changes tremendous, the vast majority of creatures would disappear, and individual species would survive and have adequate room for evolution. In this paper, GA used catastrophe operation as a way to avoid producing local optimal solution. It eliminated all current outstanding individuals at one point and permitted some individuals away from the current extremes value point to have room for full evolution to avoid falling into local optimal solution. The disaster counter COUNT was used to control the time of catastrophe operation in this algorithm. Here assumed: COUNT = 300. If there was not produce optimal solution in 300 generations, it would execute catastrophe operation, and stop to execute catastrophe operation without the optimal solution changing after several times.

IV. SIMULATION
This simulation was implemented on the CloudSim [9,10] platform to simulate and implementation task scheduling algorithm. This simulation software of Cloud Computing was developed by the Melbourne University of Australia. 5 virtual datacenters were created with 200 physical hosts in the Cloud Computing simulation environment. Allowing for specific application of Cloud Computing environment, those physical hosts were divided into three levels of price according to different performance. Using C1, C2, C3 as unit prices of each virtual machine, which is total prices of CPU, memory, storage and bandwidth. Here multi-dimensional constraints GA was compared with time-sharing algorithm, space-sharing algorithm and traditional GA that is only in order to shorten task completion time. In this simulation, Fig.2

V. SUMMARIZES
Although traditional GA in order to shorten task completion time only shortened completion time of task effectively when it is applied to task scheduling in Cloud Computing, it could not improve user's satisfaction of Cloud Computing services. This paper provided a new task scheduling mechanism based on GA with multi-dimensional constraints, and it used TACB, which is consisted of performance benefit and economic benefit, as the evaluation standard of GA. At Last, simulation experiments showed that this algorithm can complete Cloud Computing task scheduling efficiently.