Fountain Codes Based Distributed Storage Algorithms for Largescale
Wireless Sensor Networks
Abstract
We consider largescale networks with nodes, out of which are in possession, (e.g., have sensed or collected in some other way) information packets. In the scenarios in which network nodes are vulnerable because of, for example, limited energy or a hostile environment, it is desirable to disseminate the acquired information throughout the network so that each of the nodes stores one (possibly coded) packet and the original source packets can be recovered later in a computationally simple way from any nodes for some small .
We developed two distributed algorithms for solving this problem based on simple random walks and Fountain codes. Unlike all previously developed schemes, our solution is truly distributed, that is, nodes do not know , or connectivity in the network, except in their own neighborhoods, and they do not maintain any routing tables. In the first algorithm, all the sensors have the knowledge of and . In the second algorithm, each sensor estimates these parameters through the random walk dissemination. We present analysis of the communication/transmission and encoding/decoding complexity of these two algorithms, and provide extensive simulation results as well^{1}^{1}1This work was accomplished while S.A.A and Z.K. were spending a summer research internship at Bell Labs & AlcatelLucent, Murray Hill, N.J., 2007, and it was submitted as US patent in [2]. They would like to thank Bell Labs & AlcatelLucent staff members for their hospitality..
1 Introduction
Wireless sensor networks consist of small devices (sensors) with limited resources (e.g., low CPU power, small bandwidth, limited battery and memory). They can be deployed to monitor objects, measure temperature, detect fires, and other disaster phenomena. They are often used in isolated, hard to reach areas, where human involvement is limited. Consequently, data acquired by sensors may have short lifetime, and any processing on it within the network should have low complexity and power consumption [18].
We consider a largescale wireless sensor networks with sensors. Among them, sensors have collected (sensed) some information. Since sensors are often shortlived because of limited energy or hostile environment, it is desirable to disseminate the acquired information throughout the network so that each of the nodes stores one (possibly coded) packet and the original source packets can be recovered in a computationally simple way from any of nodes for some small . Here, the sensors do not know locations of each other, and they do not maintain any routing tables.
Various solutions to the centralized version of this problem have been proposed, and are based on well known coding schemes such as Fountain codes [6] or MDS codes [16]. To distribute the information from multiple sources throughout the network so that each node stores a coded packet as if obtained by centralized LT (Luby Transform) coding [12], Lin et al. [11] proposed a solution that uses random walks with traps. To achieve the desired code degree distribution, they employed the Metropolis algorithm to specify transition probabilities of the random walks. In this way, the original source packets are encoded by LT codes and the decoding process can be done by querying any arbitrary sensors. Because of properties of LT codes, the encoding and decoding complexity are linear and therefore have low energy consumption.
In the methods of [11], the knowledge of the total number of sensors and sources is required for calculating the number of random walks that each source needs to initiate and for calculating the probability of trapping at each sensor. Another type of global information, namely, the maximum node degree (i.e., the maximum number of neighbors) in the network, is also required to perform the Metropolis algorithm. However, for a largescale sensor network, such global information may not be easy to obtain by each individual sensor, especially when there is possibility of change in topology. Moreover, the algorithms proposed in [11] assume that each sensor encodes only after receiving enough source packets. This requires each sensor to maintain a large enough temporary memory buffer, which may not be practical in real sensor networks.
In this paper, we propose two new algorithms to solve the distributed storage problem in largescale sensor networks. We refer to these algorithms as LTCodes based Distributed StorageI (LTCDSI) and LTCodes based Distributed StorageII (LTCDSII). Both algorithms use simple random walks without trapping to disseminate source packets. In contrast to the methods in [11], both algorithms demand little global information and memory at each sensor. In LTCDSI, only the values of and are needed, whereas the maximum node degree, which is more difficult to obtain, is not required. In LTCDSII, no sensor needs to know any global information (that is, knowing and is no longer required). Instead, sensors can obtain good estimates for those parameters by using some properties of random walks. Moreover, in both algorithms, instead of waiting until all the necessary source packets are collected to do encoding, each sensor makes decisions and performs encoding online upon each reception of resource packets. This mechanism reduces the memory demand significantly.
The main contributions of this paper are as follows:

We propose two new algorithms (LTCDSI and LTCDSII) for distributed storage in largescale sensor networks, using simple random walks and LT codes. These algorithms are simpler, more robust, and less constrained in comparison to previous solutions.

We present complexity analysis of both algorithms, including transmission, encoding, and decoding complexity.

We evaluate and illustrate the performance of both algorithms by extensive simulation.
This paper is organized as follows. We start with a short survey of the related work in Section 2. In Section 3, we introduce the network model and present Luby Transform (LT) codes. In Section 4, we propose two LT codes based distributed storage algorithms called LTCDSI and LTCDSII. We then present simulation studies and provide performance analysis of the proposed algorithms in Section 5, and concluded in Section 6.
2 Related Work
The most related work to one presented here is [11, 10]. Lin el al. studied the question “how to retrieve historical data that the sensors have gathered even if some sensors are destroyed or disappeared from the network?” They analyzed techniques to increase persistence of sensed data in a random wireless sensor network, and proposed two decentralized algorithms using Fountain codes to guarantee the persistence and reliability of cached data on unreliable sensors. They used random walks to disseminate data from multiple sensors (sources) to the whole network. Based on the knowledge of the total number of sensors and sources , each source calculates the number of random walks it needs to initiate, and each sensor calculates the number of source packets it needs to trap. In order to achieve some desired packet distribution, the transition probabilities of random walks are specified by the well known Metropolis algorithm [11].
Dimakis el al. in [4, 6] proposed a decentralized implementation of Fountain codes that uses geographic routing, where every node has to know its location. The motivation for using Fountain codes is their low decoding complexity. Also, one does not know in advance the degrees of the output nodes in this type of codes. The authors proposed a randomized algorithm that constructs Fountain codes over a grid network using only geographical knowledge of nodes and local randomized decisions. Fast random walks are used to disseminate source data to the storage nodes in the network.
Kamara el al. in [9, 8] proposed a novel technique called growth codes to increase data persistence in wireless sensor networks, namely, increase the amount of information that can be recovered at the sink. Growth coding is a linear technique in which information is encoded in an online distributed way with increasing degree of a storage node. Kamara el al. showed that growth codes can increase the amount of information that can be recovered at any storage node at any time period whenever there is a failure in some other nodes. They did not use robust or soliton distributions, but proposed a new distribution depending on the network condition to determine degrees of the storage nodes. The motivation for their work was that i) Positions and topology of the nodes are not known. ii) They assume a round time of node updates, meaning with increasing the time , degree of a symbol is increased. This is the idea behind growth degrees. iii) They provide practical implementations of growth codes and compare its performance with other codes. iv) The decoding part is done by querying an arbitrary sink, if the original sensed data has been collected correctly then finish, otherwise query another sink node.
Lun el. al. in [13] proposed two decentralized algorithms to compute the minimumcost subgraphs for establishing multicast connections using network coding. Also, they extended their work to the problem of minimumenergy multicast in wireless networks as well as they studied directed pointtopoint multicast and evaluated the case of elastic rate demand.
3 Wireless Sensor Networks and Fountain Codes
In this section, we introduce our network model and provide background of Fountain codes and, in particular, one important class of Fountain codes—LT (Luby Transform) codes [12].
3.1 Network Model
Our wireless sensor network consists of nodes that are uniformly distributed at random in a region for . The density of the network is given by
(1) 
where is the twodimensional Lebesgue measure (or area) of . Each sensor node has an identical communication radius ; thus any two nodes can communicate with each other if and only if their distance is less than or equal to 1. This model is known as random geometric graphs [7, 15]. Among these nodes, there are source nodes that have information to be disseminated throughout the network for storage. These nodes are uniformly and independently distributed at random among the nodes. Usually, the fraction of source nodes, i.e., , is not very large (e.g., , or ).
Note that, although we assume the nodes are uniformly distributed at random in a region, our algorithms and results do not rely on this assumption. In fact, they can be applied for any network topology, for example, regular grids.
We assume that no node has knowledge about the locations of other nodes and no routing table is maintained; consequently, the algorithm proposed in [5] cannot be applied. Moreover, we assume that each node has limited or no knowledge of global information, but know its neighbors. The limited global information refers to the total numbers of nodes and sources . Any further global information, for example the maximal number of neighbors in the network, is not available. Hence, the algorithms proposed in [11, 10] are not applicable.
Definition 1.
(Node Degree) Consider a graph , where and denote the set of nodes and links, respectively. Given , we say and are adjacent (or is adjacent to , and vice versa) if there exists a link between and , i.e., . In this case, we also say that and are neighbors. Denote by the set of neighbors of a node . The number of neighbors of a node is called the node degree of , and denoted by , i.e., . The mean degree of a graph is then given by
(2) 
where is the total number of nodes in .
3.2 Fountain Codes
For source blocks and a probability distribution with , a Fountain code with parameters is a potentially limitless stream of output blocks . Each output block is obtained by XORing randomly and independently chosen source blocks, where is drawn from a specially designed distribution . This is illustrated in Figure 1. Fountain codes are rateless, and one of their main advantage is that the encoding operations can be performed online. The encoding cost is the expected number of operation sufficient for generating an output symbol, and the decoding cost is the expected number of operations sufficient to recover the input blocks. Another advantage of Fountain codes, as opposed to purely random codes is that their decoding complexity can be made low by appropriate choice of , with little sacrifice in performance. The decoding of Fountain codes can be done by message passing.
Definition 2.
(Code Degree) For Fountain codes, the number of source blocks used to generate an encoded output is called the code degree of , and denoted by . By constraction, the code degree distribution is the probability distribution of .
3.3 LT Codes
LT (Luby Transform) codes are a special class of Fountain codes which uses Ideal Soliton or Robust Soliton distributions [12]. The Ideal Soliton distribution for source blocks is given by
(3) 
Let , where is a suitable constant and . The Robust Soliton distribution for source blocks is defined as follows. Define
(4) 
and let
(5) 
The Robust Soliton distribution is given by
(6) 
The following result provides the performance of the LT codes with Robust Soliton distribution [12, Theorems 12 and 13].
Lemma 3 (Luby [12]).
For LT codes with Robust Soliton distribution, original source blocks can be recovered from any encoded output blocks with probability . Both encoding and decoding complexity is .
4 LTCodes Based Distributed Storage (LTCDS) Algorithms
In this section, we present two LTCodes based Distributed Storage (LTCDS) algorithms. In both algorithms, the source packets are disseminated throughout the network by a simple random walk. In the first one, called LTCDSI algorithm, we assume that each node in the network has limited the global information, that is, knows the total number of sources and the total number of nodes . Unlike the scheme proposed in in [10], our algorithm does not require the nodes to know the maximum degree of the graph, which is much harder to obtain than and . The second algorithm, called LTCDSII, is a fully distributed algorithm which does not require nodes to know any global information. The price we pay for this benefit is extra transmissions of the source packets to obtain estimates for and .
4.1 With Limited Global Information—LTCDSI
In LTCDSI, we assume that each node in the network knows the values of and . We use simple random walks [1, 17] for each source to disseminate its information to the whole network. At each round, each node that has packets to transmit chooses one node among its neighbors uniformly independently at random, and sends the packet to the node . In order to avoid localcluster effect—each source packet is trapped most likely by its neighbor nodes—we let each node accept a source packet equiprobably. To achieve this, we also need each source packet to visit each node in the network at least once.
Definition 4.
(Cover Time) Given a graph , let be the expected length of a random walk that starts at node and visits every node in at least once. The cover time of is defined by
(7) 
For a simple random walk on a random geometric graph, the following result bounds the cover time [3].
Lemma 5 (Avin and Ercal [3]).
If a random geometric graph with nodes is a connected graph with high probability, then
(8) 
As a result of Lemma 5, we can set a counter for each source packet and increase the counter by one after each forward transmission until the counter reaches some threshold to guarantee that the source packet visits each node in the network at least once. The detailed descriptions of the initialization, encoding and storage phases (steps) of LTCDSI algorithm are given below:

Initialization Phase:

Each node in the network draws a random number according to the distribution given by (3) (or given by (6)). Each source node generates a header for its source packet and puts its ID and a counter with initial value zero into the packet header. We set up tokens for initial and update packets. We assume that a token is set to zero for an initial packet and for an update packet.

Each source node sends out its own source packet to another node which is chosen uniformly at random among all its neighbors .

The chosen node accepts this source with probability and updates its storage as
(9) where and denote the packet that the node stores before and after the updating, respectively, and represents XOR operation. No matter whether the source packet is accepted or not, the node puts it into its forward queue and set the counter of as
(10)


Encoding Phase:

In each round, when a node receives at least one source packet before the current round, forwards the headofline (HOL) packet in its forward queue to one of its neighbor , chosen uniformly at random among all its neighbors .

Depending on how many times has visited , the node makes its decisions:

If it is the first time that visits , then the node accepts this source packet with probability and updates its storage as
(11) 
If has visited before and where is a system parameter, then the node accepts this source packet with probability 0.

No matter is accepted or not, the node puts it into its forward queue and increases the counter of by one:
(12) 
If has visited before and then the node discards the packet forever.



Storage Phase:
When a node makes its decisions for all the source packets , i.e., all these packets have visited the node at least once, the node finishes its encoding process by declaring the current to be its storage packet.
The pseudocode of these steps is given in LTCDSI Algorithm 1.
The following theorem establishes the code degree distribution of each storage node induced by the LTCDSI algorithm.
Theorem 6.
When a sensor network with nodes and sources finishes the storage phase of the LTCDSI algorithm, the code degree distribution of each storage node is given by
(13)  
where is given in the initialization phase of the LTCDSI algorithm from distribution (i.e., or ), and is the code degree of the node resulting from the algorithm.
Proof.
For each node , is drawn from a distribution (i.e., or ). Given , the node accepts each source packet with probability independently of each other and . Thus, the number of source packets that the node accepts follows a Binomial distribution with parameter . Hence,
and thereafter (13) holds. ∎
Theorem 6 indicates that the code degree is not the same as . In fact, one may achieve the exact desired code degree distribution by letting all the sensors hold the received source packets in their temporary buffer until they collect all source packets. Then they can randomly choose packets. In this way, the resulting degree distribution is exactly the same as or . However, this requires that each sensor has enough buffer or memory, which is usually not practical, especially when is large. Therefore, in LTCDSI, we assume each sensor has very limited memory and let them make their decision upon each reception.
Fortunately, from Figure 2, we can see that at the high degree end, the resulting code degree distribution obtained by the LTCDSI algorithm (13) perfectly matches the desired code degree distribution, i.e., either the Ideal Soliton distribution (3) or the Robust Soliton distribution (6). For the resulting degree distribution and the desired degree distributions, the difference only lies at the low degree end, especially at degree 1 and degree 2. In particular, the resulting degree distribution has higher probability at degree 1 and lower probability at degree 2 than the desired degree distributions. The fact that higher probability at degree 1 turns out to compensate the lower probability at degree 2 so that the resulting degree distribution has very similar encoding and decoding behavior as LT codes using either the Ideal Soliton distribution or the Robust Soliton distribution. In our future study, we will provide theoretical analysis and prove that the degree distribution in 13 is equivalent, but not the same, as the degree distributed used in LT encoding [12]. Therefore, we have the following theorem, which can be proved by the same method for Lemma 3, see [12].
Theorem 7.
Suppose sensor networks have nodes and sources and the LTCDSI algorithm uses the Robust Soliton distribution . Then, when and are sufficient large, the original source packets can be recovered from any storage nodes with probability . The decoding complexity is .
Theorem 7 asserts that when and are sufficiently large, the performance of the LTCDSI is similar to LT coding.
Another main performance metric is the transmission cost of the algorithm, which is characterized by the total number of transmissions (the total number of steps of random walks).
Theorem 8.
Denote by the total number of transmissions of the LTCDSI algorithm, then we have
(14) 
where is the total number of sources, and is the total number of nodes in the network.
Proof.
We know that each one of source packets is stooped and discarded if and only if it has been forwarded for times, for some constant . Then the total number of transmissions of the LTCDSI algorithm for all packets is a direct consequence and it is given by (14).∎
4.2 Without any Global Information—LTCDS–II
In many scenarios, especially when a change in network topology occurs because of, for example, node mobility or node failures, the exact values of and may not be available to all nodes. Therefore, to design a fully distributed storage algorithm which does not require any global information is very important and useful. In this subsection, we present such an algorithm based on LT codes, called LTCDSII. The idea behind this algorithm is to utilize some features of simple random walks to do inference to obtain individual estimates of and for each node.
Definition 9.
(InterVisit Time) For a random walk on a graph, the intervisit time of node , , is the amount of time between any two consecutive visits of the random walk to node . This intervisit time is also called return time.
For a simple random walk on random geometric graphs, the following lemma provides results on the expected intervisit time of any node. The proof is straightforward by following the standard result of stationary distribution of a simple random walk on graphs and the mean return time for a Markov chain [1, 17, 14]. For completeness, we provide the proof in Appendix 6.1.
Lemma 10.
For a node with node degree in a random geometric graph, the mean intervisit time is given by
(15) 
where is the mean degree of the graph given by Equation (2).
From Lemma 10, we can see that if each node can measure the expected intervisit time , then the total number of nodes can be estimated by
(16) 
However, the mean degree is a global information and may be hard to obtain. Thus, we make a further approximation and let the estimate of by the node be
(17) 
Hence, every node computes its own estimate of . In our distributed storage algorithms, each source packet follows a simple random walk. Since there are sources, we have individual simple random walks in the network. For a particular random walk, the behavior of the return time is characterized by Lemma 10. On the other hand, Lemma 12 below provides results on the intervisit time among all random walks, which is called interpacket time for our algorithm, defined as follows:
Definition 11.
(InterPacket Time) For random walks on a graph, the interpacket time of node , , is the amount of time between any two consecutive visits of those random walks to node .
For the mean value of interpacket time, we have the following lemma, for which the proof is given in Appendix 6.2.
Lemma 12.
For a node with node degree in a random geometric graph with simple random walks, the mean interpacket time is given by
(18) 
where is the mean degree of the graph given by (2).
From Lemma 10 and Lemma 12, it is easy to see that for any node , an estimation of can be obtained by
(19) 
After obtaining estimates for both and , we can employ similar techniques used in LTCDSI to do LT coding and storage. The detailed descriptions of the initialization, inference, encoding, and storage phases of LTCDSII algorithm are given below:

Initialization Phase:

Each source node generates a header for its source packet and puts its ID and a counter with initial value zero into the packet header.

Each source node sends out its own source packet to one of its neighbors , chosen uniformly at random among all its neighbors .

The node puts into its forward queue and sets the counter of as
(20)


Inference Phase:

For each node , suppose is the first source packet that visits , and denote by the time when has its th visit to the node . Meanwhile, each node also maintains a record of visiting time for each other source packet that visited it. Let be the time when source packet has its th visit to the node . After visiting the node times, where is system parameter which is a positive constant, the node stops this monitoring and recoding procedure. Denote by the number of source packets that have visited at least once upon that time.

For each node , let be the number of visits of source packet to the node and let
(21) (22) Then, the average intervisit time for node is given by
(23) Let and , then the interpacket time is given by
(24) Then the node can estimate the total number of nodes in the network and the total number of sources as
(25) and
(26) 
In this phase, the counter of each source packet is incremented by one after each transmission.


Encoding Phase:
When a node obtains estimates and , it begins encoding phase which is the same as the one in LTCDSI Algorithm except that the code degree is drawn from distribution (or ) with replacement of by , and a source packet is discarded if , where is a system parameter which is a positive constant.

Storage Phase:
When a node has made its decisions for source packets, it finishes its encoding process and becomes the storage packet of .
The total number of transmissions (the total number of steps of random walks) in the LTCDSII algorithm has the same order as LTCDSI.
Theorem 13.
Denote by the total number of transmissions of the LTCDSII algorithm, then we have
(27) 
where is the total number of sources, and is the total number of nodes in the network.
Proof.
In the interference phase of the LTCDSII algorithm, the total number of transmissions is upper bounded for some constants . That is because each node needs to receive the first visit source packet for times, and by Lemma 10, the mean intervisit time is .
In the decoding phase, the same as in the LTCDSI algorithm, in order to guarantee that each source packet visits all the nodes at least once, the number of steps of the simple random walk is . In other words, each source packet is stopped and discarded if and only if the counter reaches the threshold for some system parameter . Therefore, we have (27). ∎
4.3 Updating Data
Now, we turn our attention to data updating after all storage nodes saved their values , but a sensor node, say , wants to update its value to the appropriate set of storage nodes in the network. The following updating algorithm applies for both LTCDSI and LTCDSII. For simplicity, we illustrate the idea with LTCDSI.
Assume the sensor node prepared a packet with its ID, old data , new data along with a timetolive parameter initialized to zero. We will use also a simple random walk for data update.
(28) 
If we assume that the storage nodes keep ID’s of the accepted packets, then the problem becomes simple. We just run a random walk and check for the coming packet’s . Assume the node keeps track of all ’s of its accepted packets. Then accepts the updated message if of the coming packet is already included in the ’s list. Otherwise forwards the packet incrementing the timetolive counter. If this counter reaches the threshold value, then the packet will be discarded.
The following steps describe the update scenario:

Preparation Phase:
The node prepares its new packet with the new and old data along with its ID and counter. Also, add an update counter initialized at for the first updated packet. So, we assume that the following steps happen when is set to .
(29) chooses at random a neighbor node , and sends its .

Encoding Phase:
The node checks if the is an update or firsttime packet. If it is firsttime packet it will accept, forward, or discard it as shown in LTCDSI algorithm 1. If is an updated packet, then the node will check if is already included in its accepted list. If yes, then it will update its value as follows.
(30) If no, it will add this updated packet into its forward queue with incrementing the counter
(31) The will be discarded if where is a system parameter. In this case, we need to be large enough, so all old data will be updated to the new data .

Storage Phase:
If all nodes are done with updating their values . One can run the decoding phase to retrieve the original and update information.
Now, since we run only one simple random walk for each update, if is the number of nodes updating their values, then we have the following result.
Lemma 14.
The total number of transmissions needed for the update process is bounded by .
5 Performance Evaluation
In this section, we study performance of the proposed LTCDSI and LTCDSII algorithms for distributed storage in wireless sensor networks through simulation. The main performance metric we investigate is the successful decoding probability versus the decoding ratio.
Definition 15.
(Decoding Ratio) Decoding ratio is the ratio between the number of queried nodes and the number of sources , i.e.,
(32) 
Definition 16.
(Successful Decoding Probability) Successful decoding probability is the probability that the source packets are all recovered from the querying nodes.
In our simulation, is evaluated as follows. Suppose the network has nodes and sources, and we query nodes. There are ways to choose such nodes, and we pick one tenth of these choices uniformly at random:
(33) 
Let be the size of the subset these choices of query nodes from which the source packets can be recovered. Then, we evaluate the successful decoding probability as
(34) 
Figure 3 shows the decoding performance of LTCDSI algorithm with Ideal Soliton distribution with small number of nodes and sources. The network is deployed in , and the system parameter is set as . From the simulation results we can see that when the decoding ratio is above 2, the successful decoding probability is about . Another observation is that when the total number of nodes increases but the ratio between and and the decoding ratio are kept as constants, the successful decoding probability increases when and decreases when . This is also confirmed by the results shown in Figure 4. In Figure 4, The network has constant density as and the system parameter .
In Figure 5, we fix the decoding ratio as 1.4 and 1.7, respectively, and fix the ratio between the number of sources and the number of nodes as , i.e., , and change the number of nodes from 500 to 5000. From the results, it can be seen that as grows, the successful decoding probability increases until it reaches some platform which is the successful decoding probability of real LT codes. This confirms that LTCDSI algorithm has the same asymptotical performance as LT codes.
To investigate how the system parameter affects the decoding performance of the LTCDSI algorithm, we fix the decoding ratio and change . The simulation results are shown in Figure 6. For the scenario of 1000 nodes and 100 sources, is set as 1.6, and for the scenario of 500 nodes and 50 sources, is set as 1.8. The code degree distribution is also the Ideal Soliton distribution, and the network is deployed in . It can be seen that when , keeps almost like a constant, which indicates that after steps, almost all source packets visit each node at least once.
Figure 7 compares the decoding performance of LTCDSII and LTCDSI with Ideal Soliton distribution with small number of nodes and sources. As in Figure 3, the network is deployed in , and the system parameter is set as . To guarantee each node obtain accurate estimations of and , we set . It can be seen that the decoding performance of the LTCDSII algorithm is a little bit worse than the LTCDSI algorithm when decoding ratio is small, and almost the same when is large. Figure 8 compares the decoding performance of LTCDSII and LTCDSI with Ideal Soliton distribution with medium number of nodes and sources, where the network has constant density as and the system parameter . We observe different phenomena. The decoding performance of the LTCDSII algorithm is a little bit better than the LTCDSI algorithm when decoding ratio is small, and almost the same when is large. That is because for the simulation in Figure 8, we set which is larger than set for the simulation in Figure 6. The larger value of guarantees that each node has the chance to accept each source packet, which results in a more uniformly distribution.
Figure 9–Figure 10 shows the histogram of the estimation results of and of each node for three scenarios: Figure 9 shows the results for 200 nodes and 20 sources; and Figure 10 shows the results for 1000 nodes and 100 sources. In the first two scenarios, we set . From the results we can see that, the estimations of are more accurate and concentrated than the estimations of . This is because the estimation of only depends on the ratio between the expected intervisit time and the expected interpacket time, which is independent of the mean degree and the node degree . On the other hand, the estimation of is actually depends on and . However, in the LTCDSII algorithm, each node approximates as its own node degree , which causes the deviation of the estimations of .
To investigate how the system parameter affects the decoding performance of the LTCDSII algorithm, we fix the decoding ratio and , and change . The simulation results are shown in Figure 11. From the simulation results, we can see that when is chosen to be small, the performance of the LTCDSII algorithm is very poor. This is due to the inaccurate estimations of and of each node. When is large, for example, when , the performance is almost the same.
6 Conclusion
In this paper, we studied a model for largescale wireless sensor networks, where the network nodes have low CPU power and limited storage. We proposed two new decentralized algorithms that utilize Fountain codes and random walks to distribute information sensed by sensing source nodes to storage nodes. These algorithms are simpler, more robust, and less constrained in comparison to previous solutions that require knowledge of network topology, maximum degree of a node, or knowing values of and [4, 6, 9, 10, 11]. We computed the computational encoding and decoding complexity of these algorithms and simulated their performance with small and large numbers of and nodes. We showed that a node can successfully estimate the number of sources and total number of nodes if it can only compute the intervisit time and interpacket time.
Our future work will include Raptor codes based distributed networked storage algorithms for sensor networks. We also plan to provide theoretical results and proofs for the results shown in this paper, where the limited space is not an issue. Our algorithm for estimating values of and is promising, we plan to investigate other network models where this algorithm is beneficial and can be utilized.
Acknowledgments
The authors would like to thank the reviewers for their comments. They would like to express their gratitude to all Bell Labs & AlcatelLucent staff members for their hospitality and kindness.
7 Appendix
7.1 Proof of Lemma 10
7.2 Proof of Lemma 12
Proof.
For a given node and simple random walks, each simple random walk has expected intervisit time . We now view this process from another perspective: we assume there are nodes uniformly distributed in the network and an agent from node follows a simple random walk. Then the expected intervisit time for this agent to visit any particular is the same as . However, the expected intervisit time for any two nodes and is which gives the expected interpacket time.∎
References
 [1] D. Aldous and J. Fill. Reversible Markov Chains and Random Walks on Graphs. Preprint, available at http://statwww.berkeley.edu/users/aldous/RWG/book.html, 2002.
 [2] S. A. Aly, Z. Kong, and E. Soljanin. Fountain codes based distributed storage algorithms. U.S. patent, Submitted, October, 2007.
 [3] C. Avin and G. Ercal. On the cover time of random geometric graphs. In Proc. 32nd International Colloquium of Automata, Languages and Programming, ICALP’05, Lisboa, Portugal, July, 2005.
 [4] A. G. Dimakis, V. Prabhakaran, and K. Ramchandran. Decentralized erasure codes for distributed networked storage. IEEE/ACM Transactions on Networking (TON), 14(SI):2809 – 2816, June 2006.
 [5] A. G. Dimakis, V. Prabhakaran, and K. Ramchandran. Ubiquitous access to distributed data in largescale sensor networks through decentralized erasure codes. In Proc. of 4th IEEE Symposium on Information Processing in Sensor Networks (IPSN ’05), Los Angeles, CA, USA, April, 2005.
 [6] A. G. Dimakis, V. Prabhakaran, and K. Ramchandran. Distributed fountain codes for networked storage. Acoustics, Speech and Signal Processing, ICASSP 2006, may 2006.
 [7] E. N. Gilbert. Random plane networks. J. Soc. Indust. Appl. Math., 9:533–543, 1961.
 [8] A. Kamra, J. Feldman, V. Misra, and D. Rubenstein. Data persistence in sensor networks: Towards optimal encoding for data recovery in partial network failures. In Workshop on Mathematical performance Modeling and Analysis, June 2005.
 [9] A. Kamra, V. Misra, J. Feldman, and D. Rubenstein. Growth codes: Maximizing sensor network data persistence. In Proc. of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications, Sigcomm06, pages 255 – 266, Pisa, Italy, 2006.
 [10] Y. Lin, B. Li, , and B. Liang. Differentiated data persistence with priority random linear code. In Proc. of 27th International Conference on Distributed Computing Systems (ICDCS’07), Toronto, Canada, June, 2007.
 [11] Y. Lin, B. Liang, and B. Li. Data persistence in largescale sensor networks with decentralized fountain codes. In Proc. of the 26th IEEE INFOCOM07, Anchorage, Alaska, May 612, 2007.
 [12] M. Luby. LT codes. In Proc. 43rd Symposium on Foundations of Computer Science (FOCS 2002), 1619 November 2002, Vancouver, BC, Canada, 2002.
 [13] D. S. Lun, N. Ranakar, R. Koetter, M. Medard, E. Ahmed, and H. Lee. Achieving minimumcost multicast: A decentralized approach based on network coding. In In Proc. the 24th IEEE INFOCOM, volume 3, pages 1607– 1617, March 2005.
 [14] R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1995.
 [15] M. Penrose. Random Geometric Graphs. Oxford University Press, New York, 2003.
 [16] M. Pitkanen, R. Moussa, M. Swany, and T. Niemi. Erasure codes for increasing the availability of grid data storage. In Proc. of the Advanced International Conference on Telecommunications and International Conference on Internet and Web Applications and Services (AICT/ICIW ), 2006.
 [17] S. Ross. Stochastic Processes. Wiley, New York, second edition, 1995.
 [18] I. Stojmenovic. Handbook of sensor networks, algorithms and architechtrues. Wiley series on parallel and distributed computing, 2005.