Aggregation Tree Based Data Aggregation Algorithm in Wireless Sensor Networks

— In wireless sensor networks, efficient and effective data aggregation algorithms can prolong the network lifecycle by reducing communication of redundant data and improve the security of the networks. Tradition data aggregation algorithms in wireless sensor networks mainly aim to improve the energy utilization, and ignore the security and lifecycle. In order to get a good trade-off between these requirements, we proposed a data aggregation algorithm based on constructing a data aggregation tree. After give a formalism description of the problem, we proposed a data aggregation tree constructing algorithm. By minimize the maximal energy consumption of nodes, the algorithm can prolong the lifecycle. In data aggregation scheduling algo-rithm, we select the number of communications carefully to get the trade-off between low weighted delay and high network lifecycle. The simulation experiments show that, the proposed data aggregation algorithm consumes less energy while aggregating data from sensor nodes, and thus can prolong the network lifecycle.


INTRODUCTION
The wireless sensor networks consist thousands of or even tens of thousands of different kinds of sensors are important computing platforms. The sensors in a wireless sensor network are deployed randomly in an area to monitor and collect data of the real world environment [1]. The collected data in the area are usually redundant, and the same data can be collected by multiple sensors. The transmission of data consumes about 70% of the energy in a wireless sensor network, so in order to reduce the communication overhead between sensors and prolong the lifecycle of the underline wireless sensor network, the data collected from different sensors must be aggregated [2]. Until now, researchers have proposed many data aggregation algorithms for data of wireless sensor networks. With respect to the objective of aggregation, these algorithms can be classified into data centered [3], remaining energy based [4], optimized [5] and performance based [6] algorithms. The main idea of these algorithms is that, decrease the collision probability of data in wireless channels by reducing the transmitting data, and thus improve the efficiency of data collecting; check the validity of sensor nodes by comparing data collected from adjacent nodes, discard the data from the disabled nodes to reinforcement the accuracy of data; and aggregate the data sensed from different kinds of sensors to bridge the gap between the high-level user requirements with the low-level raw sensed data [7].
Wireless sensor networks come from the military applications, and have been applied to many fields, such as emergence surveillance [8], environment monitoring [9], target tracking [10], and so on, whereas the secure aggregation of data is still a hot research in wireless sensor networks. In order to assure the security of aggregation of sensed data, the key points can be placed into either the raw data or the aggregation process. In order to improve the security of raw data, data authorization can be used, but it increases the total communication overhead of the network [11]. By secured aggregation algorithms, the aggregation node can reduce the offense of data from invalid nodes, but different applications need different aggregation algorithms, and they cannot make sure the valid redundant data be acquired, In this paper, we study the problem of data aggregation in wireless sensor networks. In order to prolong the network lifecycle, we aim to select the number of communications of each node carefully to get the trade-off between low weighted delay and high network lifecycle, and thus make the whole network work longer.

II. RELATED WORKS
Cluster based topology management mechanism [12][13] is commonly used in wireless sensor networks, where a head node is selected in each cluster to manage the nodes in the same cluster, collect data from the intra-cluster nodes, and transmit data between inter-clusters. So, the head node of a cluster must know the semantic of collected data, which makes it impossible to transmit encrypted data between the head node with bases.
ESPDA, proposed by Li et al. [14], is an energy efficient secure data aggregation protocol. In this protocol, sensor nodes generate pattern codes based on the raw data, and send them to the head node. The head node classifies the raw data according to the received pattern codes, selects the pattern code subset via pattern comparing algorithms, and requires the selected sensors to transmit the encrypted data. SIA, proposed by Przydatek et al. [15], is a secure data aggregation framework for large scale wireless sensor networks. In this framework, they defined a kind of nodes, called aggregators, which aggregate the queried data to reduce the communication cost of the network. In addition, they applied effective random sampling and interactive verifying mechanisms to assure that the data aggregated by aggregators are the maximal approximation of the real values.
SecureDAV, proposed by Mahimkar et al. [16], is a secure data aggregation and verification protocol in wireless PAPER AGGREGATION TREE BASED DATA AGGREGATION ALGORITHM IN WIRELESS SENSOR NETWORKS sensor networks. This protocol assigns keys for nodes of the same cluster by applying the elliptic curve based secret sharing scheme. Each normal node computes the average value of the intra-cluster data and gives its partial signature, and the head node collects the signature of other intra-cluster nodes, give a complete signature to the average values and send it to the base. Finally, the base verifies the signature via the corresponding public key. SRDA, proposed by Sanli et al. [17], is a reference data based secure aggregation protocol. This protocol ascertains the differences by comparing the raw data and the reference data, and the sensor nodes transmit differential data rather than the raw data to reduce the communication overhead. Moreover, Sugandhi et al. [18] proposed a secure and energy-saving data aggregation and authorization protocol with respect to the risk of leaking information while aggregating data. Sun et al. [19] proposed a credible behavior based secure data aggregation and route algorithm by studying the credibility of networks. Wu et al. [20] designed a pattern code based efficient secure data aggregation protocol in wireless sensor networks.
These secure data aggregation algorithms have their specific advantages. However, some of them ignore the limits of resources in wireless sensor networks, and their computation complexity is very high, which consumes much energy while aggregating data; and some of them only focus the security of the data and ignore the security while transmitting data between different sensors.

III. PROBLEM STATEMENT
Given a sensor network ployed in a L ! L region, the communication radius of each node is r , and the only base node sink is in the center of the region, and the link set Given a sensor network , we aim to build a data aggregation tree T = (V , E ! ) , where for each node i ! V , its parent node is i P , the link between i with i P is denoted as i e . The aim of data aggregation is building a is the set of scheduled links during time s . After executing the scheduling sequence, we have 1) All collected data from nodes are aggregated to the sink node; 2) they don't collide. In this paper, we define the collision as the coordinate collision model [10], and assume that all nodes in the sensor network can receive and send data, but they can't be done at the same time. In addition, we also assume the interference radiuses of all nodes equal to I r .

Definition 1. Link collision.
In the coordinate collision model, two communication links collide, if and only if the receive end of one node is within the radius of another node.
Via constructing an aggregation tree, the constructed aggregation tree should satisfy aggregation delay, average weighted delay and network lifecycle three conditions. Definition 2. Aggregation delay. For one data aggregation process, if the data collected from all nodes arrive the sink node after t time intervals, that is For an aggregation tree of any structure, the aggregation delay satisfies where i ! represents the number of child nodes of node i , and i h represents the number of hops from the root node to node i .
As the importance of different regions is different, every node is assigned a weight reflecting its importance. Let the weight of node i be i w . If i j w w ! , then we hope that the data collected by node i arrive the sink node earlier than the data collected by node j . In order to quantify the weighted fairness, we use the following average weighted delay to measure fairness. Definition 3. Average weighted delay. In an aggregation cycle, if the data collected by node i arrive the sink node after i t interval, then the average weighted delay of node i is i i i D w t ! " , and the average weighted delay of all nodes in the aggregation cycle is Definition 4. Network lifecycle. For any scheduling algorithm, if there is a point, where energy runs out, after L aggregation periods, then the network lifecycle of the aggregation tree constructing process is L .
In the aggregation tree based data aggregation, the nodes have three states, i.e. sending, receiving and sleeping, where the power consumption in the sleeping state is far smaller the other two states, so it is ignored in this paper. Then, the power consumption of node i in each aggregation period is i From three definitions defined above we can see that, the performance of data aggregation algorithms mainly depend on the structure of aggregation tree and the scheduling strategy. So, the optimal aggregation scheduling process should construct the sequence of scheduling links, The construction of aggregation tree is the preparation of the whole data aggregation process, and there are two reasons to build the aggregation tree. On one hand, there is only one base node in the whole sensor network, so the tree structure is suitable to this condition. On the other hand, the tree structure can reduce the overhead of communication in the whole network, and this is necessary for energy limited wireless sensor networks. However, the problem of constructing the optimal aggregation tree has been proved to be NP-hard. Current algorithms of constructing aggregation trees are all approximation algorithms, and they mainly aim to prolong the network lifecycle or reduce the aggregation delay. For the tree structure, the degree of nodes and the depth of the tree are critical for the network lifecycle and aggregation delay.
The algorithm proposed by Li et al. [14] can reduce the aggregation delay, but it doesn't take the balance of energy consumption of sensors into consideration. Then, some sensor nodes may run out their energy very quickly, which makes the network lifecycle be very short. In wireless sensor networks, the energy consumed by CPU operations is much smaller than the energy consumed by data transmission, and the energy consumed by data transmission depends on the number of sending and receiving operations. In order to save energy consuming, it needs to limit the number of operations for sending and receiving data, and then the nodes with the most children nodes will be the bottleneck of the whole network. So, in order to prolong the network lifecycle, the degrees of nodes in the aggregation tree must be balanced. In this paper, we propose a data aggregation tree constructing algorithm. The main idea of the proposed algorithm is adjusting the tree structure according to the consumed energy of each node after the initial aggregation tree is constructed. For predicting the consumed energy for each node, we estimate the consumed energy based on its location in the tree. The detail of the algorithm is in algorithm 1. 1. Assuming s v as root and r as radius, construct a breadth-first searching tree G ; If v i ! DS , 16. Assuming DS as non-leaf nodes and other nodes as leaf nodes, construct a tree; 17. For all non-root nodes v i ! V , let its degree be 18. Solve the following optimal problem: In the algorithm, we first construct a breadth-first searching tree for G . If the level of node i is ; for lines 9-16, we construct a connected tree; and finally for all non-root nodes, we select the up level node with the least children nodes as its parent node, and build the final aggregation tree DAT = (V , E ! ) .

V. DATA AGGREGATION SCHEDULING ALGORITHM
The aim of data aggregation scheduling algorithm is to assign intervals for nodes. In this paper, we proposed an approximating maximal weighted independent set based scheduling algorithm. Firstly, we select the link set, where each link is able to communicate, and construct the link collision matrix according to the collision relationships in the data aggregation tree. Then, based on the collision matrix, we ascertain the communication links in each interval according to constructing an approximate maximal weighted independent set.
In order to reduce the energy consumption of nodes, the state-of-the-art algorithms schedule each link only once, but they can't make sure the data with high priorities are scheduled first. If we don't limit the number of communications for each node, then the energy consumption of the network will increase greatly. So, we need to select the number of communications gently to get the trade-off between low weighted delay and high network lifecycle.
Definition 5. Marginal link set. For the tree T = (V , E ! ) , if all children nodes of the node i v has been scheduled after 1 t ! intervals, then the link i e , whose sending end is i v , is the marginal link of T at time t . We denote ( ) t FLS T as the set of marginal link at time t . PAPER AGGREGATION TREE BASED DATA AGGREGATION ALGORITHM IN WIRELESS SENSOR NETWORKS Definition 6. Accumulated weight. For e i ! E ! and in the interval t , the sum of remaining weight that hasn't been used to transmit data is the accumulated weight in this link, i.e.

( )
i t ACW e . Definition 7. Active link set. In the interval t , the link set that can be used to schedule, including marginal link and non-marginal link with accumulated weighted greater than threshold ! , is the active link set at time t , i.e.

Definition 8. Collision link set. For
i t e E ! " , if i e collides with n links in t E , then these n links construct the collision link set. The sum of all weights in the collision link set is the collision set weight for i e to t E .In the scheduling algorithm that limits the number of communications, the marginal links are the objects to schedule, and even there exists high priority links, they should be scheduled after they become marginal links.
In this paper, we assume the active links as the objects to schedule, add the intermediate links, whose priorities are higher than the predefined threshold ! , into the scheduling queue, and then make a balance between weighted delay and energy consumption. The detail of the proposed data aggregation scheduling algorithm is in algorithm 2.
The output SCHE of algorithm 2 satisfies the following three conditions: ! The data collected by all nodes arrive at sink node; ! If links e i ,e j ! E(s) (1 s t ! ! ), then these is not a collision between i e and j e in T .
Based on the above conditions, we can conclude that the proposed algorithm is an effective and secure data aggregation algorithm.

VI. EXPERIMENTS
In the experiments, we denote the proposed algorithm as aggreTree, and compare it with two state-of-the-art data aggregation algorithms, DAS and IAS. In the experimental setting, we randomly deploy a number of sensor nodes in an area of 200m ! 200m , and the sink node is at the center of the area. The parameters of the simulated wireless sensor network are the same as [18].
Firstly, we compare the aggregation delay, the average weighted delay and the network lifecycle of the three algorithms, and the results are in figures 1-3 respectively. In figure 1, the proposed algorithm has the least aggregation delays under different node densities, which means that the proposed algorithm consumes less time while aggregating data from sensor nodes. In figure 2, the proposed algorithm also has the least average weighted delay, and the reason is that by adjusting the tree structure, the proposed algorithm can have a good trade-off between low weighted delay and high network lifecycle. From figures 1 and 2, we can infer that the proposed algorithm will result longer network lifecycle, and this assumption is verified in figure 3.
Next, we let 0 ! ! , 40 ! ! and ! ! " , and compare the five different results. Figure 4-6 are the comparison results aggregation delay, the average weighted delay and the network lifecycle. From these figures we can see that, as we increase ! from 0 to 40 to ! , the aggregation delay reduces, the network lifecycle increases, but the average weighted delay reduces and then increases again. So, in order to get good performance, we must carefully select the parameter threshold ! for different performance metrics.
Finally, we let 200 N ! , 600 and 1000, and observe the effect of ! on the aggregation delay, the average weighted delay and the network lifecycle. The results are in figures 7-9 respectively. From these figures we can see that, as ! increases the aggregation delay decreases, the network lifecycle increases, but for average weighted delay, it doesn't change very much. The reason is that, the average weighted delay takes the aggregation tree reconstruction into consideration, so the communication cost reduces, and thus the average weighted delay doesn't have an obvious relationship with parameter threshold ! .   Data aggregation is a hotspot in the research of wireless sensor networks. While randomly deployed in a region, sensors usually collect redundant data, which will consume more energy. For the energy-limited sensors, this is a critical problem. Tradition data aggregation algorithms in wireless sensor networks mainly aim to improve the energy utilization, and ignore the security and lifecycle. In order to get a good trade-off between these requirements, we proposed a data aggregation algorithm based on constructing a data aggregation tree. After give a formalism description of the problem, we proposed a data aggregation tree constructing algorithm. By minimize the maximal energy consumption of nodes, the algorithm can prolong the lifecycle. In data aggregation scheduling algo-rithm, we select the number of communications carefully to get the trade-off between low weighted delay and high network lifecycle. The simulation experiments show that, the proposed data aggregation algorithm consumes less energy while aggregating data from sensor nodes, and thus can prolong the network lifecycle.