Low Complexity Differentiating Adaptive Erasure Codes in Multimedia Wireless Broadcast
Abstract
Based on the erasure channel FEC model as defined in multimedia wireless broadcast standards, we illustrate how doping mechanisms included in the design of erasure coding and decoding may improve the scalability of the packet throughput, decrease overall latency and potentially differentiate among classes of multimedia subscribers regardless of their signal quality. We describe decoding mechanisms that allow for linear complexity and give complexity bounds when feedback is available. We show that elaborate coding schemes which include precoding stages are inferior to simple Ideal Soliton based rateless codes, combined with the proposed twophase decoder. The simplicity of this scheme and the availability of tight bounds on latency given preallocated radio resources makes it a practical and efficient design solution.
I Introduction
Multimedia Broadcast/Multicast Services (MBMS) [1] is a pointtomultipoint interface specification for existing and upcoming 3GPP cellular networks, designed to provide efficient delivery of broadcast and multicast multimedia content delivery, both within a cell and within the core network. It has been widely recognized that the appropriate application layer forward error correction (ALFEC) for MBMS are adaptive coding techniques based on punctured [2] or rateless (Fountain) codes [3, 4], as their redundancy can be flexibly adapted to different channel/network conditions. With the proliferation of mobile video traffic, the impact of Fountain codes will be growing, and so will the importance of decreasing both their encoding/decoding complexity and their overhead, in order to match the strict latency constraints of streaming applications.
In this paper we explore the decoding efficiency in terms of the communication cost between the server and the client of the multimedia wireless broadcast, incurred to completely recover all data in linear decoding time. We also address another important design challenge of wireless broadcast streaming, namely, catering to priority subscribers. Certain 3G network subscribers might not claim special bandwidth rights with their mobile providers but they may be subscribed to a multimedia streaming service with a guaranteed Quality of Service (QoS). Hence, it is natural that these service privileges be accommodated within the application layer of the network, using the application layer FEC. The strength of Fountain codes that matters most in multimedia broadcast, and makes it scalable to a large number of clients (such as in video broadcast of popular sport events, parades, presidential debates and inaugurations) is the statistical equality of encoded symbols, not their differentiation features. We introduce a twophase decoder that allows for differentiation while preserving broadcastfriendly features of Fountain codes.
This paper illustrates how the proposed Fountainbased adaptive FEC approach exhibits not only linear decoding time, but also a low reconstruction delay which is controlled by the client, within the framework of his QoS privileges. This mechamism leverages the peeling decoder and streamlines several existing mechanisms, including inactivation [5] and doping [6], as well as a minimal feedback. The user may opt for a peeling decoder (i.e. Belief Propagation  BP), which is simple but the overhead is larger as we have to make sure the ripple (set of oneterm equations) will never become empty, or he may choose a decoder based on Gaussian elimination (GE), which is complex but the overhead is smaller. The inactivation decoder combines BP and GE, and trades overhead for complexity. Finally, doping guarantees small overhead and linear decoding but requires minimal feedback.
One of the contributions of this work is the observation that our model of the doped peeling decoder [6] can be successfully applied to the peeling decoder with inactivations. Using this model, the performance bounds and their tradeoffs (decoding complexity and doping communication cost) for all decoding options are clearly defined, and the user can control the tradeoffs given available communication and computation resources. Most importantly, we show that complex solutions with precodes are not necessary, as the smalloverhead lineartime decoding can be achieved by doping or inactivating a simple Ideal Soliton based code in the second phase of decoding, which is also used for differentiation. We next briefly present the usage model of Fountain codes in MBMS, which provides a motivation for our approach, and then introduce the proposed decoding mechanisms and their analysis. Section V compares the costbased performance of the Ideal Soliton code and a Fountain code whose distribution is defined by the standard [3], using both analytical estimates and simulation results. In section VI we consider an example of the proposed ALFEC use case which shows that priority users could be satisfied in a scalable fashion.
Ii Rateless Erasure Codes in Multimedia Broadcast
Iia 3GPP Chunked Content Distribution: Delivery and Repair
The utilization of Fountain codes for the application layer FEC in wireless multimedia broadcast has followed the framework proposed by the 3GPP MBMS, where the content is partitioned into source blocks (chunks usually corresponding to video frames), and each source block is further divided into source symbols of bitlength . For simplicity, we will assume that each encoded symbol fits into one packet, i.e. the encoding procedure XORes a subset of the symbols, and encloses the resulting array of bits, termed encoded symbol, into one packet. If several encoded symbols were put into one packet, then one packet erased by the channel would affect many symbols. Although packaging optimality is a relevant problem, we abstract it here through the erasure parameter associated with the applicationlevel transmission channel.
The symbols XORed in a packet represent one binary equation, where the terms (source symbol indices) are signaled in the packet header. The result of the XORing, delivered as the packet load, is the value of the equation. The number and the identities (indices) of the equation terms are random, although following a given probability distribution. The 3GPP MBMS standards propose a timelimited delivery phase in which to first broadcast packets, each carrying one source symbol of the block i.e. a distinct oneterm (singleton) equation, and then a number of paritycheck symbols (higherdegree equations). The MBMS framework is seamlessly incorporated into existing 3GPP architecture, with the exception of some services designed for evolved 3GPP networks only (i.e. 3G Long Term Evolution, or 3G LTE), such as broadcasting in MBMS single frequency networks (MBSFN). We do not consider such services, following an assumption that the entire system will be evolving further, and, hence, our goal is to use the MBMS basic framework as an abstract platform only to demonstrate usefulness of the proposed approach. For a complex analysis of the standardbased MBMS ALFEC readers are referred to [7].
The basic MBMS includes dedicated channel resources (MBMS radio bearers) used to broadcast multimedia content to multiple wireless receivers. Figure 1 illustrates the basic MBMS architecture in which denotes Broadcast/Multicast Service Center, a logical entity that controls seamless broadcast from the content servers by coordinating between the 3GPP radio resource allocation controllers, and the streaming data users. Apart from the MBMS radio bearers, the radio resources include unicast (interactive) channels from the so called repair servers to multimedia wireless users.
The scenario in which during the delivery phase each wireless broadcast client collects a set of encoded symbols resulting in a solvable system of equations is not very likely. Hence, once the delivery (broadcast) session expires, a unicastbased file repair mechanism is available in the postdelivery phase. Despite the expected uniform distribution of repair sessions, the fact that the server potentially serves many requests may cause a communication bottleneck. For that reason, we believe that the repair mechanism must be accounted for in the design of the coding scheme, assuming a certain deterministic order in serving repair requests to allow for a good QoS control, and to mitigate the fact that the highpriority users may be handicapped by the low SNR. Let us denote by the feedback delay, which quantifies the overhead resources used for communication to and from the repair server. Specifically, to communicate with the repair server, each user has to establish a context switch facilitated by the BMSC in both physical layer, and the upper protocol layers, which includes allocating a different radio bearer (for unicast), and coordination among many network management instances. In addition, includes the service waiting time with the repair server. Consequently, note that the prioritybased serving order will cause the expected value of to vary according to the privileges of a specific user. As for the context switching delay, we here assume that it is a fairly deterministic but significant part of . To assess the implications of the repair system latency, we next describe the tradeoffs in the communication overhead.
IiB Successful Decoding: Overdesigning Communication Overhead vs Allowing Repairs
For a highquality multimedia delivery, the decoding failure probability must be constrained to zero. Hence, we seek to quantify the cost in terms of the communication overhead for an applicationlevel FEC that does not allow for any undecoded symbols. A related performance measure, prominent in the analysis of practical Fountain codes [8], is the overheadfailure curve, describing the failure probability as a function of the overhead Here, where is the number of collected encoded symbols. Typically, is a quickly decreasing function. In case of random Fountain codes, where a random number of uniformly selected source symbols is combined in each encoded symbol, the failure probability is easy to calculate as the probability that this random set of equations is not of full rank, and can be bounded by [9]. However, random Fountain codes do not satisfy the multimedia latency requirements as the decoding complexity is high. The linear decoding time may be achieved by optimizing probability distribution of the number of terms the equations have. This number of terms is often called the output symbol degree. Luby (LT) codes [10] have become popular thanks to such an optimized distribution, i.e. the Robust Soliton (RS), which promises linear decoding time. The RS is a design that grew out of the Ideal Soliton (IS) distribution (2), which was the ideally linear distribution (in terms of average decoding time) obtained analytically. To compensate for the variance of the empirical distribution of sampled symbol degrees, which may cause the linear decoder to stall, the RS design moves some probability mass from the higher degrees to degree one. As a result, the empirical decoding time of LT codes is close to linear. However, to achieve acceptable failure probability LT code design required a sizable overhead. This motivated the design of another popular rateless code, dubbed Raptor [11], which combines a precode stage with LT encoding to generate the output symbols decodable with constant overhead. This more complex design is difficult to rigorously analyze, and, instead, some heuristics are used to optimize the performance [8].
The overhead may be decreased if we allow for a repair procedure to identify and fetch the missing symbols, given strictly limited overhead in the upfront collected output symbols. The symbols missing to reconstruct the entire source block from the collected equations can be identified through an attempted decoding procedure. The decoding can be iterative, i.e. a messagepassing erasure (peeling) decoder, or it can rely on classical algorithms for solving linear system of equations, such as Gaussian elimination (GE). A system of equations solvable through GE may not be solvable by iterative decoding. Even though the GEbased decoder is optimal, its complexity may be prohibitive. Rateless codes should be designed so that all input symbols can be recovered with high probability using an iterative decoder on a set of equations (collected coded symbols) slightly larger than . We here consider only the iterative decoder, as multimedia latency constraints dictate linear decoding time. Given a peeling decoder (PD), the repair symbols can be determined in a sequential manner [6]. Here, if the decoder stalls, an assisting procedure identifies a symbol capable of repairing (doping) the decoder, and immediately requests it from the server.
IiC Communication Overhead of the Repair Process
We here specify repair communication overhead in terms of bitdelay equivalents. To distinctly specify the identified source symbol to the repair server, we need bits. Adding the bits that the server uses to transfer one vector symbol from the field (of cardinality ), this makes bits of persymbol repair cost. As a source block is most frequently equivalent to a video frame (of size ), and we assume that , and note that bits. The sequential repair (i.e. every time the peeling decoder stalls) incurs the total persymbol cost of where is the bitequivalent feedback roundtrip delay. In this paper, we propose sequential identification of repair symbols, while avoiding sequential repair (doping). The doping symbols will be considered free variables to be revealed at the end. This postponedrepair design, akin to [5], allows for complete linear decoding safe for a set of symbols that will be either requested from the repair server at the end of the procedure, or solved by Gaussian Elimination, or the combination of the two. Our stochastic model of the decoding procedure puts a tight bound on the number of symbols that must be repaired, and demonstrates that a simple encoding procedure based on IdealSoliton distribution of equation degrees yields a diminishingly small repair overhead.
Let us denote the percentage of the undecoded symbols by which is a random variable. The per symbol cost of this postponed repair would then amount to This lowers the cost with respect to plain doping [6], while still maintaining the linear decoding time. We show, both analytically and through simulations, that a simple Fountain code, with well designed linear decoder, results in Hence, both the persymbol and the total communication overhead can be made relatively small, since usually .
Iii Design Preliminaries
In [12], the author shows that the recoverable fraction of input symbols depends on the output degree distribution of the code. The results in [12] are of interest for realtime systems using rateless codes, including multimedia wireless broadcast. Apart from emphasizing the importance of the output degree distribution, they imply that if the erasure rate is above a certain value, given the limited duration of the session, the collected system of equations will not be sufficient for iterative decoding under any distribution. This motivates the extensions to the iterative decoder, presented in Section IV, and assisted by doping.
In order to establish a tight bound on the communication cost, we focus in this paper on pure LT codes. Moreover, we consider LT codes based on the IS, as it allows for a straightforward analysis of the occasional assistance to the decoding process when it gets stalled [6]. The availability of this assistance obviates the need for overheadfailure analysis, as we are allowed to get additional symbols on demand, i.e. to keep doping a minimum set of equations until it reaches full rank. In addition, we consider the LT codes used in the standardized Raptor designs, but not in their systematic form. The assumed existence of clients that cannot decode at all was the motivation behind the choice of the systematic structure of the MBMS standardized Fountain (Raptor) codes [3, 4], where some of the encoded symbols are equivalent to the source symbols (singleton equations), and hence, the decoding is trivial at the expense of complex encoding. We argue that multimedia clients must have decoding capabilities, or otherwise expect only the besteffort service. Besides, the systematic structure compromises the concept of ratelessness in terms of the statistical equality of encoded symbols. Let us point out that the systematic implementation is compelling only for erasurefree channels, as otherwise, if the receiver can handle only the systematic symbols, the single eligible rateless code is the repetition code, which is inefficient.
The standardized Raptor proposes two mechanisms that combine iterative decoding with Gaussian elimination [5] so that the complexity remains linear, while the collected set is more likely to be sufficient for decoding with a slight but acceptable complexity increase. When combined with these inactivation mechanisms, iterative decoder is allowed to continue until all of repair symbols are identified, and then send a single doping request at the end. Upon receiving the requested symbols, the decoder can completely recover the source block in linear time by backsubstituting the “doped” symbols. The delay in decoding stems only from the repair latency at the end.
We next present the analytical model of the peeling decoder assisted by doping, before describing our implementation of the inactivation mechanisms based on the two flavors of LT code (the IS, and the standardized Raptor distribution). With Raptor implementation, we omit the precode stage, given that the inactivation mechanisms in the peeling decoder (PD) play the role of a precode in decreasing the probability of failure. Besides, this decreases the complexity, which is of paramount importance for mobile applications, and simplifies the analysis. Our analytical results and simulations justify this approach as the repair cost is lower when compared with the plots presented for standardized Raptor codes (with precodes) in [8] (Figure 3.4, pg. 270).
Iv Solution: Enhanced Peeling Decoders
One of the contributions of this work is the observation that our model of the doped peeling decoder [6] can be successfully applied to the peeling decoder with inactivations. We next present the adapted model.
Iva Model of the Basic Peeling Decoder
Let us have a set of code symbols that are linear combinations of unique input symbols, indexed by the set . Let the degrees of linear combinations be random numbers that follow distribution with support . We equivalently use and its generating polynomial where Let us denote the graph describing the peeling decoding process at time by (see Figure 2 depicting the graph at for ).
We start with a decoding matrix where code symbols are described using columns, so that iff the th code symbol contains the th input symbol. Number of ones in the column corresponds to the degree of the associated code symbol. Input symbols covered by the code symbols with degree one constitute the ripple. In the first step of the decoding process, one input symbol in the ripple is processed by being removed from all neighboring code symbols in the associated graph . If the index of the input symbol is , this effectively removes the th row of the matrix, thus creating the new decoding matrix We refer to the code symbols modified by the removal of the processed input symbol as output symbols. Output symbols of degree one may cover additional input symbols and thus modify the ripple. Hence, the output degree distribution changes to .
At each subsequent step of the decoding process one input symbol in the ripple is processed by being removed from all neighboring output symbols and all such output symbols that subsequently have exactly one remaining neighbor are released to cover that neighbor. Consequently, the support of the output symbol degrees after input symbols have been processed is and the resulting output degree distribution is denoted by . Since the encoded symbols are constructed by independently combining random input symbols, we can assume that the input symbols covered by the degreeone symbols are selected uniformly at random from the set of undecoded symbols. Hence, we model the th step of the decoding process by selecting a row uniformly at random from the set of rows in the current decoding matrix , and removing it from the matrix. After rounds or, equivalently, when there are rows in the decoding matrix, the number of nonzero coefficients in a column is denoted by . The probability that the column is of degree when its length is , is described iteratively
(1)  
for and
Let the starting distribution of the column degrees (for the decoding matrix ) be denoted by By construction, for which, together with (1), completely defines the dynamics of the decoding process.
Let where is a small positive value. At time the total number of decoded and doped symbols is and the number of output symbols is Here, is an increasing function of . The unreleased output symbol degree distribution polynomial at time is where and Each decoding iteration processes a random symbol of degreeone from the ripple. Released output symbols are its coded symbol neighbors whose output degree is two. Releasing output symbols by processing a ripple symbol corresponds to performing, in average, independent Bernoulli experiments with probability of success Hence, the number of released symbols (or the ripple increment) at any decoding step is modeled by a discrete random variable with Binomial distribution which for large n can be approximated with a (truncated) Poisson distribution. In [6] we model the ripple process as a random walk, i.e. a partial sum of shifted Poisson random variables, and analyze the stopping time of this process. Readers interested in detail analysis are referred to Appendix of [6].
IvA1 The Ideal Soliton Advantage
Let the starting distribution of the column degrees (for the decoding matrix ) be Ideal Soliton, denoted by
(2) 
and After rearanging and canceling appropriate terms in (1), we obtain, for
(3) 
We assume that as, by design, we desire to have the set of upfront delivered symbols as small as the set of source symbols. The probability of degree symbols among output symbols can be approximated with
Hence, the probability distribution of the unreleased output node degrees at any time remains the Ideal Soliton
(4) 
This stationary character of the IS based decoding induces the IID (Independent Identically Distributed) nature of the ripple increment, as, according to (4), the fraction of degreetwo output symbols for the IS based Fountain code is expected to be for any decoding iteration Hence,
(5) 
or, equivalently, where denotes Poisson distribution. With ripple increment of the IS decoding being an IID Poisson of unit mean, the analysis of the stopping time as their partial sum is straightforward, and results in a tight bound of doping frequency.
Otherwise, the analytical models for ripple evolution, characterizing the decoding of LT codes with generic distribution are extremely complex. The distribution of the output symbols in the cloud (i.e. the symbols of degree larger than one) can only be characterized through the joint nonstationary distribution of the ripple of cardinality and cloud of size , at any step [13, 14]. As a result, the stopping time of the ripple is hardly tractable. The stopping time corresponds to the event of empty ripple, which would mean the failure of the decoding process, if it weren’t for the possibility of doping.
IvB Model of Doping and Inactivation
With doping, we define as a sequence of stoppingtime random variables where index identifies a doping round. is the stopping time interval, equivalent to the number of decoded symbols between dopings or interdoping yield. The interdoping yield is evaluated using the following recursive expression
(6)  
Here, is Poisson pdf of intensity evaluated at and is the tupple convolution of evaluated at resulting in a Poisson pdf of intensity evaluated at In special case when further simplifying assumptions lead to the approximation that all interdoping yields are described by a single random variable whose pdf is given by the following recursive expression, based on (6),
(7)  
where denotes Poisson distribution of intensity evaluated at and Now, the expected value of the interdoping yield is
(8) 
Finally, the doping process is a renewal process (ignoring the final stages when ), and thus, the Wald Equality [15] implies that the expected number of dopings, i.e the additional singletons the decoder needs to obtain to complete the peeling process, is
(9) 
We may use several techniques to select these singletons. The best and most tractable results are obtained with degreetwo doping, choosing the symbols present in the remaining degreetwo equations, which makes decoding and doping steps indistinguishable in terms of ripple increments. Evaluation of for relevant values of (i.e. ) shows that dopings are on the order of (see Fig. 6). A recent contribution, based on our model of the ripple process, analyzes several other doping mechanisms, and their usage for wireless broadcast [16].
The concept of inactivation in the decoding of rateless codes has been introduced in [5]. We distinguish dynamic inactivation (DI) from permanent inactivation (PI). We observe that the dynamic inactivation has the stochastic properties of the presented random walk model for doping, as an instance of DI occurs under the same conditions as the doping, i.e. when the decoding process stalls.
IvB1 Dynamic Inactivation
The basic idea behind DI is to designate a source symbol in the decoding matrix as decoded but of an unknown value whenever an empty ripple occurs. Assigning the unknown value is equivalent to introducing a free variable in the solution of the system of equations. Let us utilize our matrix to explain how the DI can be implemented to restart the PD, for the first time, at the decoding step One of the ways to mark a source symbol as decoded for the remainder of the peeling process is to add an extra (empty) row to corresponding to a free variable and then expand this modified with another column containing ones only at the positions and The codeword is also extended with a zero symbol at the position corresponding to the added column, which models the equation (in GF2). To restart the PD for the time at the step , we extend with additional column. The symbol marked as decoded is chosen in such a way that a new ripple symbol is released allowing the PD to continue (any of the two circled symbols of column in Figure 3). Now, the symbol is not being released in the way it happens with decoding or doping. That is, in the matrix equivalent, the row is not erased entirely. Instead, all unit coefficients in this row (corresponding to coded symbols where the source symbol appears) are replaced by zeros, and ones are written in the respective columns at the added row (see the vertical arrows depicting this modification in Figure 3). Hence, from the moment of first inactivation, the free variables are percolating the columns, making every consequent release dependent on the value of free variables. The completion of this modified peeling process results in a decoding matrix with the block structure presented in Figure 3. We permute columns of the matrix to have ones in the upperleft block appear on the diagonal, making it an identity matrix describing the source symbols, while the upper right corner is an all zero matrix (as in the upper submatrix of Figure 5). The bottom submatrix describes the free variables, reflecting the dependence of the solution upon these unknown values. The values of the dynamically inactive symbols can be determined by Gaussian elimination. Assuming that the number of DIs is small (by eqn (8), as shown in [6, 17]), and the matrix is of full rank, the superquadratic complexity term of this last stage of the decoding does not prevail, and the overall decoding complexity is linear. If the matrix is not of full rank, we dope the symbols that have been dynamically inactivated.
IvB2 Permanent Inactivation
One of the main novelties introduced with the RaptorQ variant of standardized Raptor codes [8] is the use of permanent inactivation (see Figures 5 and 4). We here describe permanent inactivation (PI) and analyze the impact of this technique on the decoding linearity and the communication overhead when combined with dynamic inactivation (DI). With PI, the degree distribution of the initial matrix is changed. For any column (equation), we select random symbols from the first rows (source symbols), where is sampled from the probability distribution with support . The rest of the rows contribute to the overall degree of the column according to a uniform distribution over the range The righthand side of Figure 5 depicts the sampling process, while the upper matrix in Figure 4 shows the initial matrix. The decoding process is illustrated in the lower matrix of Figure 4, while the left hand side of Figure 5 shows the end result of decoding with PI and DI (after permutations, and before doping). Note that the structure of the final matrix does not differ from the case without PI if PD is applied in conditional mode, explained in subsection IVB3. The only notable difference is that the identity matrix is of size and, hence, the bottom part is thicker.
We take to be on the order of to maintain linear decoding complexity, while improving the matrix rank. It is known [18] that random matrices have a better rank profile than sparse matrices (such as LT generator matrix). The probability of full rank of a pure binary random matrix of size and sufficiently large is It turns out that sufficiently large is on the order of 10. Otherwise, we calculate the probability of full row rank according to
(10) 
Our simulations show that sampling the degrees by distribution will result in an improved rank profile of the upfrontdelivered set of equations (see the green curves in the closeup in Figure 7, Section V, depicting the decreased number of uncovered symbols, one of the main manifestations of rank deficiency).
IvB3 Decoding Modes
If the initial matrix has a form presented in the upper part of Figure 4, i.e. we propose to apply the peeling decoder only to the rows that are not permanently deactivated, as if the symbols associated with permanent rows are given to us as side information. We refer to such decoding as conditional, implying that it is conditional on the knowledge of the last rows. Let us refer to the first symbols of a column as the upper subcolumn, and the last symbols as the lower subcolumn. Similarly to decoding with DI only, releasing a degreeone upper subcolumn results in propagating the nonzero coefficients from the lower subcolumn to all the columns containing the released source symbol. The first rows of the submatrix (Fig. 5), created after the permutation of columns, define a submatrix of size where is the number of upfront delivered symbols, and is the number of DIs.
It is clear that is a thick random matrix, i.e. even if due to DIs. Our simulations also confirm that is of full rank with high probability. Hence, the permanently inactivated symbols can be solved by GE of small complexity, given that is on the order of This justifies the conditional decoding method, i.e. the fact that we consider permanently deactivated rows as known side information. the lower part of the submatrix consisting of rows, hence, of dimensions is created by dynamic inactivations, regardless of the existence of permanently inactivated equations. Its rank is discussed in Subsection VA.
In the unconditional mode of decoding we run PD over all matrix rows, and hence, the decoding process is plain PD as long as there are no dynamic inactivations. If a dynamic inactivation occurs, the procedure is the same for both modes: a row is added at the bottom of the matrix, and then a column is appended with unit coefficients in the inactivated row and in the added one. According to our simulations, the number of inactivations, and hence the overhead, is much larger for the unconditional mode. This is expected, as the degree distribution of columns with permanent inactivations deviates from Ideal Soliton.

V Cost Analysis of Doping with Inactivations
We consider two performance measures: communication overhead in terms of the percentage of dopings (or their absolute number), and decoding complexity. We treat the overhead in the upfront delivered set of symbols as a parameter, since we assume that the broadcast session duration is determined by design, considering broadcast channel statistics (see our use case example in the next section). The quality of the erasure channel to a particular client is random, and so is the required doping overhead. Given that the upfront delivery was not sufficient, we may decide to dope more or less, depending on the strategy of the cost tradeoff, and how sophisticated our decoding method is. The graphs in Figures 7 and 9 depict the tradeoff between overhead and complexity. For a source block of length the complexity is calculated as
(11) 
where is the number of PIs, is the number of DIs, the number of uncovered symbols, and is the number of dopings we request, , while is the exponent in the complexity of Gaussian elimination . The complexity cost is normalized per nondoped source symbol (Fig. 9).
Note that an estimate of the complexity may be obtained based on the analytical values of the above variables. The lower bound on is given by the equations (7), (8). Evaluation of (8) for results in while for which corresponds to and of (Fig. 6). As grows, the bound for is becoming tighter, and its value insignificant (Figure 6). Since the model assumes the bound is expected to be looser for smaller as the finite decoding stages where have more impact. It is interesting to observe in Figure 7 that, while for the simulation value of does not match the bound of as soon as the number of inactivations hits the This suggests a possible way of quantifying the impact of the finite in our model, although this problem is outside the scope of this paper.
The estimate of is obtained based on the following reasoning. The probability that a source symbols is not a neighbor of an output node is , where is the average degree of an output node. The probability that a source node is not a neighbor of any output node is As the average degree of the output nodes (counting only within the upper subcolumns – with degrees sampled from IS) is the probability of uncovered nodes is hence,
(12) 
For Raptor LT (RLT), the average degree is some constant, independent of For the standardized Raptor distribution [3], simulated here, Even for , this is significantly lower than resulting in which for saturates to approximately For medium to large (which is our range of interest), this is significantly larger than for IS.
GE is considered to be of cubic complexity, although there exist methods which leverage the matrix structure, which can slightly lower the exponent. In our graphs we take the lower bound of for the exponent (which is not tight). Figure 9 illustrates that for the nonlinear complexity term has visible but still moderate effect on the overall complexity, even if the number of dopings is equal to (red curves with square markers), when this term contributes with the value of
Va Rank Deficiency After Inactivations
In plain terms, minimum required number of repair symbols corresponds to the number of equations missing for the upper submatrix to be of full rank This number is always smaller or equal as our decoder, for the sake of decoding linearity, inactivates some of the symbols that could be solved by GE.
For the upper decoding submatrix is a thick matrix even for as it contains rows. When it is of full rank, the number of dopings may be decreased down to if we decide to solve the inactivated variables through Gaussian elimination. Certainly, for the slight complexity increase in such cases (min dopings curves, in black, Figure 9) is due not only to higher inside the term in the base of but also due to minimal
To estimate the ranks of submatrices involved in decoding, we apply the results from [19] and [9]. They state that there exists a threshold on the probability of the unit value of an IID (Independent Identically Distributed) binary matrix element, above which the rank sufficiency/defficiency of such a random matrix resembles a completely uniform binary matrix. Consider such a random matrix of size Let be a constant, Suppose further that is a function decreasing to sufficiently slowly with Then this probability threshold can be expressed as
(13) 
Practically, this rank similarity with the purely random matrix holds provided does not tend to either zero or one too rapidly. Note now that, for ISbased codes, the average degree of the upper subcolumns is approximately Hence, the number of unit elements is and, hence, the probability of ones, under the IID assumption, is For larger , this is sufficiently above the threshold (13) to ensure good rank properties of the upper submatrix. This expectation is confirmed by our simulations, as presented in Figure 7, where the minimum number of dopings for the IS reaches zero for regardless of the permanent inactivations.
Differently from the ISbased LT codes, the degree distribution for Raptorbased codes does not depend on As grows, the the density of unit elements diverges from the threshold (13). From the graph perspective, the number of edges in the decoding graph is increasingly insufficient to cover all source symbols, as demonstated by the estimate of in the previous subsection. In the Raptor design, this relaxation is expected to be compensated for by the precode. In the design that insists on simple codes and low decoding delay as here, the uncovered symbols must be recovered through doping. As a baseline, except when the rank of the submatrix is when The importance of the matrix density in terms of the uncovered (mustdope) symbols is illustrated in Figure 7. Note that the red curves in Figure 7 denote dopingonly approach which dopes uncovered and inactivated symbols, while dashed black curves denote doping of only those symbols that cannot be decoded through GE, hence, the minimal doping. The closeup in the same figure illustrates the inferiority of RLT codes in terms of , and the doping percentage graphs in the same figure reflect this in the total doping overhead. With Raptor, the uncovered symbols form the majority in the doping structure, and that is why it has a larger overhead despite the fact that the number of DIs is slightly smaller (for small only). This is because the peeling decoder is applied to source symbols only, and is signifficant. PI has similar effect on both designs  it practically eliminates the occurance of uncovered nodes among last input symbols. However, even for the problem of uncovered symbols with RLT in the upper submatrix becomes more pronounced for larger .
The manner in which “partial” GE is performed after the PD has finished has much bearing both on the complexity and on the dopinginduced (repair) communication cost. Our simulations with IS show that for when and the matrix is singular in fewer than of cases, and for and it happens slightly sooner. Hopefully, if the GE decoding of submatrix D succeeds, both permanently and dynamically inactivated symbols are known without any doping, and which is vanishing for the IS.
Let us take a closer look at the reasons for such a high likelihood that the matrix D is of full rank. The two submatrices of matrix and have different structures. As mentioned, is a random matrix, formed by the permuted lower subcolumns whose degree is sampled from the uniform distribution Given that additionaly this is is a thick matrix, it is nonsingular with very high probability (10). The submatrix is formed by the propagation of lefthandside (LHS) graph edges belonging to the LHS node connected to the inactivated source symbol (see Figure 3). The LHS nodes in the IS graph have Poisson degree distribution of mean which remains stationary throughout decoding, as the righthandside (RHS) nodes maintain the IS distribution. Hence, the average number of unit coefficients propagated to rows of the submatrix is Under the IID assumption, we may apply the same reasoning as for the upper submatrix, which is that is nonsingular with high probability, based on the threshold (13). The expected density of is confirmed by the simulations. In addition, Figure 6 shows that the number of dopings when and decreased by the result of (10) applied to matches both the lower bound and the simulations.
VB Performance Comparison: IS vs. Raptor LT
While Figure 7 clearly illustrates the advantage of the IS based codes in terms of the repair communication cost, Figure 9 may leave the reader under impression that this advantage is taken away by the increased complexity cost with respect to RLT (graph for minimal doping, when ). To illustrate that this is not the case, we introduce Figures 8 and 10, which provide a fair performance comparison between the two code variants. Figure 8 plots the minimal doping curves from both subgraphs in Figure 7 (marked as Raptor LT and IS), against the minimal dopings allowed for the IS variant to achieve exactly the same critical complexity as the RaptorLT variant (black curves in the lower subgraph of Figure 9). Marked as IS cost balanced, these curves show that, when the complexity is the same, the IS doping is still significantly below the minimal RLT amount, and not much higher that the IS minimum value. To further illustrate the IS advantage, Figure 10 plots the RLT complexity cost (from the lower subgraph of Figure 9) against the IS complexity cost when the minimum number of dopings is matched to RLT level (i.e. by the black curves in the lower subgraph of Figure 7). Hence, these curves, marked as the IS dopingbalanced curves, are to be compared with the black curves, marked by RLT w/ mindop, where the IS demonstates better performance again.
For good channels (larger e.g 1150) with IS ALFEC, the DIs ensure linearity of decoding of an already solvable system of equations, as the value of mustdope symbols whp (see the red pointer in the upper subgraph of Figure 7). This is not the case for the RLT distribution, as indicated by the red pointer in the corner of the lower subgraph of Figure 7.
In conclusion, apart from the tractability of their analytical model, the IS based codes are superior in terms of both repair costs, communication and complexity, and their simplicity and true ratelessness may be of further utility with multimedia broadcast scenarios where cooperative peertopeer schemes are allowed.
Vi Example Use Case
Let us assume that the density of multimedia subscribers is about which is the current addressable market for mobile TV services. As the density of mobile subscribers in urban areas is typically around 300 users, this results in a moderate estimate of about 50 concurrent multimedia clients. Note that these figures are bound to grow, as the current trends are exponential. If the mobile provider desides not to allocate extra bandwidth to ALFEC, the repair overhead for each user will be equal to the experienced erasure rate. Let us assume that the expected average rate is which is realistic according to [7]. This incurs the total repair overhead per cell of For the repair overhead amounts to packets, incurring high maximum repair delay.
Now, assume that the proposed approach is used, and the delivery phase duration is designed so that for an average user (i.e. experiencing erasure rate ) This is equivalent to broadcasting symbols, meaning that the provider accepts the ALFEC communication overhead of However, according to (8),(9), and confirmed by simulation results presented in Figure 7, for the repair overhead is almost zero, with slightly increased but still linear complexity, and certainly lower than with unit persymbol complexity. Hence, upper bounding the peruser repair overhead to we obtain the total repair overhead per cell of Hence, the total application layer communication overhead (ALFEC plus repair) amounts to as opposed to without ALFEC.
Given the obvious savings on the provider side, let us consider the effect on a particular priority subscriber to multimedia broadcast streaming. Assume that the application layer buffer stores the next block (a video segment) while the current one is being repaired, and the last one is played out. This allows for a repair time of about half second. Now, if the erasure rate is as estimated, the repair time will be due to the transfer of at most a couple of packets. If the channel is better, no repair delay will be incurred. For e.g. due to mobility and extreme interference, would be approximately which incurs a delay of at most 10 packet transfers. Hence, the buffer size would be sufficient to guarantee steady video quality, without artifacts (i.e. freezing) due to dropped segments. The proposed application of delayed dopings will minimize feedback to onetime request of priority dependent delay Besides the fact that high priority users would be repaired first, with no waiting time, it is likely that the unicast bearer would offer a better physical channel (PHY) FEC, minimizing the loss of repair packets. Note that the bandwidth overhead due to a stronger PHY FEC is negligible given such a low repair overhead.
Hence, under the assumptions of this use case, the proposed ALFEC, based on the lowcomplexity twophase decoder of a simple IS code, provides both good quality of experience and scalability to large number of multimedia subscribers.
References
 [1] “Technical specification group services and system aspects; multimedia broadcast/multicast service; protocols and codecs,” 3GPP TS 26.346 V6.1.0, June 2005.
 [2] EN 302 304 V1.1.1, “Digital video broadcasting (dvb): Transmission systems for handheld terminals.”
 [3] M. Luby et al., “Raptor forward error correction scheme for object delivery,” in Internet Engineering Task Force, http://tools.ietf.org/html/rfc5053, Sep 2007.
 [4] ——, “RaptorQ forward error correction scheme for object delivery,” in Internet Engineering Task Force, http://tools.ietf.org/html/draftietfrmtbbfecraptorq03, Aug 2010.
 [5] A. Shokrollahi, S. Lassen, and R. Karp, “Systems and processes for decoding chain reaction codes through inactivation,” in U.S. Patent number 6,856,263, Feb 2005.
 [6] S. KokaljFilipovic, P. Spasojevic, and E.Soljanin, “Doped fountain coding for minimum delay data collection in circular networks,” IEEE JSAC Journal. Sel. A. Commun., vol. 27, pp. 673–684, June 2009.
 [7] C. Bouras, N. Kanakis, V. Kokkinos, and A. Papazois, “Application layer forward error correction for multicast streaming over lte networks,” Int. Journal of Comm. Systems, Wiley InterScience, 2012.
 [8] A. Shokrollahi and M. Luby, “Raptor codes,” Foundations and Trends in Communications and Information Theory, vol. 6, p. 213322, 2011.
 [9] V. Kolchin, Random Graphs. Cambridge, 1999.
 [10] M. Luby, “LT codes,” in The 43rd Annual IEEE Symposium on Foundations of Computer Science, Nov. 2002, pp. 271–280.
 [11] M. A. Shokrollahi, “Raptor codes,” IEEE Trans. Inform. Theory, vol. 52, pp. 2551–2567, 2006.
 [12] S. Sanghavi, “Intermediate performance of rateless codes,” in Information Theory Workshop, 2007. ITW ’07, 2007.
 [13] R. Karp et al., “Finite length analysis of lt codes,” in ISIT, June 2004.
 [14] E. Maneva and A. Shokrollahi, “New model for rigorous analysis of LT codes,” in ISIT, July 2006.
 [15] R.Gallager, Discrete Stochastic Processes. Kluwer Academic Publishers, 1995.
 [16] G. Yue, M. Uppal, and X. Wang, “Doped LT decoding with application to wireless broadcast service,” in IEEE ICC’11, Jun 2011.
 [17] S. KokaljFilipovic et al., “ARQ with doped fountain decoding,” in IEEE ISSSTA ’08, Aug 2008.
 [18] E. Berlekamp, “The technology of errorcorrecting codes,” Proc. IEEE, vol. 68, no. 5, pp. 564–593, 1980.
 [19] C. Cooper, “On the distribution of rank of a random matrix over a finite field,” Random Struct. Algorithms, vol. 17, no. 34, pp. 197–212, Oct. 2000.