A Modified Lempel Ziv Welch compressive data collection in Wireless Sensor Networks

— Nowadays wireless sensor networks have wide applications in many fields such as medical, industrial, military, etc. Sensors are having limited processing capabilities and energy is an important constraint in wireless sensor networks as it determines the lifetime of the network. In a large scale wireless sensor networks, the sensor nodes have to collect more data when they are moving towards the sink. So the sensor nodes energy, nearer to the sink may get drained off more quickly and an alternate path has to be chosen which increases the delay in the network. Compressive data gathering is one of the techniques which reduce the data size, balance the energy in large scale wireless sensor networks. The existing technique takes more computations and increase in complexity on compressing the data. So a modified compressive data gathering protocol is designed which uses a lossless compression algorithm called modified Lempel Ziv Welch compression algorithm which is a simple and fast method for compressing the data. A tree is constructed and parent-children nodes are assigned and child node carries the compressed data and intermediate parent node aggregates and compresses it until it reaches the sink. The original data is reconstructed at the sink using Modified Lempel Ziv Welch decompression algorithm

Hundreds of nodes scattered throughout a field assemble together, establish a routing topology, and transmit data back to a collection point. The application demands for robust, scalable, low-cost and easy to deploy networks are perfectly met by a wireless sensor network. If one of the nodes should fail, a new topology would be selected and the overall network would continue to deliver data. If more nodes are placed in the field, they only create more potential routing opportunities Data aggregation is a type of data gathering technique in wireless sensor network. With the help of data aggregation, we reduce the energy consumption by eliminating redundancy. There is extensive research in the development of new algorithms for data aggregation. Types of data gathering technique are tree based aggregation, cluster based aggregation, and chain based aggregation. There are many data aggregation algorithms proposed such as LEACH, TAG, and PEGASIS etc...
The data collected at children nodes are aggregated in parent nodes in the tree based approach, cluster head in clustering technique, a leader node in chain based aggregation. Data aggregation techniques are very useful for data gathering and to conserve the energy, maintaining the QOS and extend the network lifetime Tree Topology integrates the characteristics of Star and Bus Topology. Earlier we saw how in Physical Star network Topology, computers (nodes) are connected by each other through a central hub. And we also saw in Bus Topology, workstation devices are connected by the common cable called Bus. After understanding these two network configurations, we can understand the tree topology better. In Tree Topology, the numbers of Star networks are connected using Bus. This main cable seems like a main stem of a tree, and other star networks as the branches. It is also called Expanded Star Topology. Ethernet protocol is commonly used in this type of topology.
The compressive data gathering is an emerging and useful method of collecting data from sensor nodes to sink to reduce the data size on large scale wireless sensor networks. The sensor nodes near to the sink need to collect more number of data from the nodes that are placed far away. At each level amount of data gathered becomes large and the sensor nodes may not have enough energy to transmit the data to the sink and hence the node failure occurs. As a result if must find an alternate path to reach sink takes some time, which increases the delay in the network. Thus the compressive data gathering saves the energy and reduces the delay.

II. RELATED WORK
In [1] the author does in network aggregation with the objective of reducing the delay and save energy. Interference model is developed by considering the interference constraints on links with Scheduling policies such as myopic and non-myopic policies are proposed. This model achieves good performance when network size is small. Suitable for Bluetooth and FH-CDMA Networks. The objective in [2] is to maximise the quality of data at the sink under deadline and energy constraints. An optimisation framework is designed. That maximizes the aggregated data from the sink within a deadline. An optimal data aggregation policy and scheduling policy are provided. Polynomial time algorithm is proposed which uses local information at each hop. The sink provides a constraint on how long data can be gathered from the predecessor nodes and transmitted to the sink. Maximised the aggregated information and accuracy. [3] constructs the shortest path tree (SPT) find the shortest path to sink. Load balanced latency efficient data aggregation scheduling is proposed. Latency minimized and load balanced assignment problem (LMLBA) solves parent-children assignment problem. Compared with SPT and MSL (Minimum Sleep Latency) algorithm, LMLBA realises better trade-off between min sleep latency and balanced load and effective data aggregation scheduling for duty cycled WSNs. ɸ 1 x 1 ɸ 2 x A Probabilistic model is proposed in [7] which data communication over link is successful with certain probability. Cell based path scheduling (CPS) algorithm is proposed schedules multiple super nodes on multiple paths concurrently. Zone based path scheduling (ZPS) speeds up continuous data collection forming data transmission pipeline. CDG technique along with pipeline technology gives efficient network capacity. A delay efficient data aggregation scheduling with SINR constraints in [6] in which graph based interference model and reduced routing graph is proposed. Uses Breadth first search in the proposed graph. The compressive scheduling algorithm reduces the delay better when compared to previous distributed algorithm by merging the links that has been sent to the sink. Improved approximation ratio of proposed algorithms.
FAST approach in [12] deals with tree construction and scheduling under protocol interference model Connected 3-hop dominating sets (C3DS) based structure is designed. A distributed collision free TDMA schedule is used for scheduling. The algorithm outperforms other approaches, provides parallel transmissions and gave an upper bound agg latency with 12R+Δ−2 timeslots. S. Ji and Z. Cai (2013) proposed the paper Distributed data collection in large-scale asynchronous wireless sensor networks under the generalized physical interference model. Its objective is avoiding collision and energy efficiency. R°-PCR career sensing range is derived to avoid collision and transmission overhead. Distributed data collection (DDC) algorithm is proposed. Apply R°-PCR to DDC and Distributed data aggregation is proposed. Delay and capacity of DDC and DDA under Poisson node distribution model. Through R°-PCR fewer transmissions with more data per transmission is achieved and the delay is reduced in DDA.
Agriculture Data Aggregation (GRIDA) scheme in [17] aggregates the data by eliminating the repeated values in the farming field. This method reduces power consumption, packet delivery rate and reduce the delay. The aggregation node may consume data as it moves up towards the sink. Cognitive Path Planning (CPP) determines path planner for path selection in [18]. Data collector (DC) moves in the network and compares the path by analysing smoothing and safeness by knowing its obstacles. Avoids node failure as it selects its right path by avoiding obstacles and if data collector fails leads to latency. The MA is considered as an ant and an ant colony optimization-based dynamic energy efficient mobile agent routing (ADEEMA) algorithm [16]. Routing Optimal Degree (ROD) algorithm is proposed to analyse the performance of chosen route. A reinitialising rule is proposed on node failures due to change in topology. If mobile agent fails entire breakdown of the network. Modified Multi Itinery Planning (MIP) in [3] where multiple mobile agents with set of interested nodes makes the data to reach sink with good speed and minimum spanning tree used for path selection. Compressive sensing theory in [4] compresses the original data to some transformation domain .These weighed sums are received at the sink as Projection based compressive data gathering. They can be decoded by solving convex optimisation problem. It distributed the transmission load throughout the network. The future work is if used along with scheduling may reduce the latency.
[13] proposes a Minimum Spanning Tree Projection (MSTP) in which MSTP and e-MSTP outperforms previous schemes such as non-CS, plain-CS, and hybrid -CS. The future work is that optimal solution can be found by finding the position of projected nodes which will minimise transmission cost and load balancing.
The objective of [5] is to reduce the computational cost, reduce the delay and energy consumption. CDG (Compressed Data Gathering). Aggregated data is compressed and weighed encoded sum is sent to sink. Forwarding Tree Construction and Scheduling (FTCS): Multiple forwarding trees are constructed and encoded sum received in node is called projection based PCDG .Link scheduling algorithm: forwarding trees links are scheduled using TDMA. Reduced when compared to the decentralised FCTS. This joint model achieves minimum latency with less transmission overhead. Energy load is balanced. A Modified Huffman coding algorithm represents the length of the code exponentially in [6]. Real data along with delay is given to ADC and converted to binary. This binary value is represented as positive and negative integers. It compresses with high compression ratio but Look up table is large An improved LZW algorithm is proposed in [19] uses Dictionary look up based algorithm where repetited substrings have a codeword. A dictionary table is maintained in which dictionary capacity selected based on capacity of nodes. Pre-processing carried out for changing other document files to text files as traditional algorithm transmits only text files. Thereby Reduces amount of storage, improve the compression ratio and reduce the size of dictionary. It is suitable for sparse network with less no of nodes. Mobile Agent Path Design Algorithm (MAPDA) generate binary sparse random matrices from the graph. Another algorithm called Orthogonal Matching Pursuit (OMP) algorithm for signal recovery and reconstruction also proposed. Though it has less computation than Gaussian matrix, there still exists a computational complexity and better approach for achieving good throughput has to be designed. The compressive sensing theory in [2] is used in finding sparse random matrices and mobile agent collects those measurements and reach the sink. Though it is energy balanced, this type of compression involves complex computations and data packet size may get increased in collecting such measurements

III. ARCHITECTURE OF PROPOSED SYSTEM
The architecture in fig 3.1 describes that the user sends a request to the base station or sink node which disseminates the message by sending beacon signal to the wireless sensor network. The interested nodes form a tree structure with child nodes and parent nodes by considering the parameters for parent-child assignment. The sensed data is locally collected in child nodes and a lossless compression algorithm such as MLZW (Modified Lempel-Ziv-Welch) is applied to the collected data. The compressed data are then aggregated in parent node. Though we use compressed data collection, some upper aggregated nodes perform more aggregation operations which may consume energy and may lead to node failure due to lack of energy. So an energy balanced technique is proposed which perform reconstruction mechanism on energy lagging co-ordinating nodes by distributing energy in uniform from the neighbouring non transmitting nodes. The final aggregated data is sent to the sink nodes, which perform decoding of the aggregated data and send to the user through internet.  Each node compares its Qt value with neighbouring nodes and the node with better Qt value will be set as parent node otherwise set as child node End

COMPRESSION AND DECOMPRESSION 3.3.1 MLZW COMPRESSION
It is a lossless compression algorithm and dictionary based. It is a simple process. It replaces strings of characters with single code. It starts with a dictionary of single characters and gradually extends the table with the input of characters of a string. It is good for text compression. It may use binary or ASCII values for characters. Algorithm 1 shows how the data is compressed using MLZW algorithm. Table: 2 MLZW Compression Table   The compression table is shown in Table: 2 first read the null character before the string and enters in the compression table and read the first character say M of the string and output the code for the S/N and then read the next character say A and enter the combination of MA and its code in the extended dictionary table. Whenever MA combination repeated the corresponding codeword will be used on further entries. A will be entered in S column and the next character from the string in C column and the process continues as such till the end of string. Un encoded length = 9 symbols × 5 bits/symbol = 45 bits Encoded length = (6 codes × 5 bits/code) + (1 codes × 6 bits/code) = 36 bits (1) =0.8

Algorithm 2: MLZW Compression Algorithm
Input: Sensed data is applied with modified Lempel Ziv Welch

MLZW DECOMPRESSION
The decoding algorithm works by reading a value from the encoded input and outputting the corresponding string from the initialized dictionary. In order to rebuild the dictionary in the same way as it was built during encoding, it also obtains the next value from the input and adds to the dictionary the concatenation of the current string and the first character of the string obtained by decoding the next input value, or the first character of the string just output if the next value cannot be decoded (If the next value is unknown to the decoder, then it must be the value that will be added to the dictionary this iteration, and so its first character must be the same as the first character of the current string being sent to decoded output). The decoder then proceeds to the next input value (which was already read in as the "next value" in the previous pass) and repeats the process until there is no more input, at which point the final input value is decoded without any more additions to the dictionary. In this way the decoder builds up a dictionary which is identical to that used by the encoder, and uses it to decode subsequent input values . MLZW decompression table shown in Table 3.

Modified Compressive Data Gathering
Modified Compressive Data Gathering uses MLZW algorithm on the sensor nodes and aggregated data are carried by the intermediate nodes. The total aggregated data is received by sink. Algorithm 4 shows the step by step procedure of MCDG.

Algorithm 4: Modified Compressive Data Gathering
Input: Data sensed from sensor node Output: compressed and aggregated data at sink Begin Let N be the number of nodes, s i , ϕ i and x i be the sensed data, compressed data and node id respectively

Input next
Output Code string Extended Dictionary Let Z C be the total aggregated and compressed data received at sink, Z D be the decompressed data and Z be the average of all sensed data from the sensor network. At node i, s i is compressed to ϕ i ϕ i ||x i are compressed Now transmit ϕ i ||x i to intermediate node say j and aggregates its information along with compressed information received from node i are aggregated Successive upper nodes are aggregated until sink is reached and the total aggregated data at the sink be Z C = ϕ i ||x i + ϕ j|| x j +..... ϕ n|| x n Z D =s i ...s n End

PERFORMANCE EVALUATION
Our proposed work is examined and compare with the existing compression technique. NS2 simulator is used to evaluate the performance. The graph is plotted for PDR, Energy consumption and delay. The packet delivery ratio in the proposed work is shown in Figure: 3 with simulation time in X axis and PDR on Y axis and values are plotted. The graph is drawn in Figure: 4 with simulation time in X axis and delay on Y axis The val ues for The Delay in the existing work and proposed work are plotted and compared in this graph. The delay has been reduced upto 28% comparing with existing CDG. The graph is drawn in Figure: 5 with simulation time in X axis and energy in Y axis. The values for the energy in the existing work and proposed work are plotted and compared in the above graph. The energy consumption is reduced by 69% compared with existing CDG.

IV. CONCLUSION
Compressive data gathering in existing system suffers from increase in complexity on compressing the data which uses the compressive sensing theory. The theory involves matrix multiplications which increase microprocessor computations inside sensors which consume additional energy and the modified compressive data gathering overcomes these drawbacks by simplifying the compression technique and maintains the quality of data without losing data on compression by lossless compression algorithm. The experimental results show that the delay and energy consumption has been reduced on comparing with the existing work. In phase II compression on heterogeneous data is to be designed. An energy reconstruction algorithm can be proposed to reconstruct energy on the nodes when their energy got drained on more transmissions.