Fountain Codes under Maximum Likelihood Decoding
Abstract
This dissertation focuses on fountain codes under maximum likelihood (ML) decoding. Fountain codes are a class of erasure correcting codes that can generate an endless amount of coded symbols and were conceived to deliver data files over data networks to a potentially large population of users. First Luby transform (LT) codes are considered, which represent the first class of practical fountain codes. Concretely, the focus is on Luby transform (LT) codes under inactivation decoding, an efficient ML decoding algorithm that is widely used in practical systems. More precisely, the decoding complexity of LT codes under inactivation decoding is analyzed in terms of the expected number of inactivations. The proposed analysis is based on a dynamical programming approach. This analysis is then extended to provide the probability distribution of the number of inactivations. Additionally a lower complexity approximate analysis is introduced and a code design example is presented that illustrates how these analysis techniques can be used to design LT codes. Next Raptor codes under ML decoding are considered. An upper bound to the probability of decoding failure of ary Raptor codes is developed, considering the weight enumerator of the outer code (precode). The bound is shown to be tight, specially in the error floor region, by means of simulations. This bound shows how Raptor codes can be analyzed similarly to a traditional serial concatenation of (fixedrate) block codes. Next, a heuristic method is presented that yields an approximate analysis of Raptor codes under inactivation decoding. It is also shown by means of an example how the results in this thesis can be used to design Raptor codes. Raptor codes are next analyzed in a fixedrate setting. Concretely, a Raptor code ensemble with an outer code picked from the linear random ensemble is considered. For this ensemble, the average weight enumerator and its growth rate are provided. Furthermore, sufficient and necessary conditions for the ensemble to have a minimum distance growing linearly with the block length are presented. The ensemble analyzed resembles standard Raptor codes, as it is shown by means of simulations. Finally a new class of fountain codes is introduced, that consists of a parallel concatenation of a block code with a linear random fountain code (LRFC). This scheme is specially interesting when the block code is a maximum distance separable (MDS) code. In this case, the scheme can provide failure probabilities lower than those of LRFC codes by several orders of magnitude, provided that the erasure probability of the channel is not too high.
Vom Promotionsausschuss der Technischen Universität HamburgHarburg \degreetitleDoktorIngenieur (Dr.Ing.) \subjectFountain Codes under Maximum Likelihood Decoding
To my beloved wife, parents and sister.
Acknowledgements.
The journey towards this dissertation was fascinating but also long and challenging. It was only thanks to the technical advise, encouragement and love of those around me that I was able to finish it. First, I would like to express my heartfelt gratitude to my advisor Prof. Gerhard Bauch for his guidance and valuable suggestions. I would also like to thank him explicitly for his support clearing the bureaucratic hurdles of the PhD. The next thank you goes to Prof. Amin Shokrollahi for accepting to review the thesis and for the helpful discussion during the PhD defense. I would also like to thank Prof. Christian Schuster for his efficiency handling the examination process. Words cannot express my gratitude to my mentor at DLR, Gianluigi Liva, for his continuous support and guidance, and for teaching me all I know about channel coding. His passion and dedication to research have been an inspiration to me during all these years. I am also deeply in debt with Enrico Paolini for his advice and scientific rigour. I would like to extend my gratitude also to my colleagues at DLR for creating a wonderful working atmosphere. Especially, I would like to thank Balázs, Federico, Giuliano and Javi for the technical and nontechnical discussions. I would also like to thank Sandro Scalise and Simon Plass for supporting my research. Last, but not least, I would like to thank my loving and caring family. I am infinitely thankful to my parents for giving me the most valuable present, a good education. A great thanks goes to my sister, for taking care of me like a mother. I would also like to thank my Italian family for making me feel welcome from the very first moment. I cannot put in words my gratitude to Paola for her love, patience, constant support and for making me feel at home. The very last “thank you” goes to little Francesco for pushing me to finalize this dissertation. Munich, May 2017.Contents
 1 Channel Models
 2 Block Codes: Basics and Performance Bounds
 3 Fountain Codes: Basics and Performance Bounds
 4 Notation
 5 Linear Random Fountain Codes
 6 LT codes
 7 Raptor Codes
 8 Inactivation Decoding
 9 Analysis under Random Inactivation
 10 Code Design
 11 Summary
 12 Performance under ML Decoding
 13 Inactivation Decoding Analysis
 14 Code Design
 15 Summary
 16 Raptor Code Ensemble
 17 Distance Spectrum
 18 Positive Distance Region
 19 Experimental Finite Length Results
 20 Summary
 21 Scheme Description
 22 Maximum Distance Separable Precode
 23 Generic Precode in a FixedRate Setting
 24 Numerical Results
 25 Summary
 A Proof of Theorem 1
 B Proof of Theorem 6
 C Proof of Theorem 9
List of Figures
 1 Binary erasure channel model
 2 model
 3 Packet erasure coding diagram
 4 Bipartite graph of an code
 5 Iterative decoding example, step 0.
 6 Iterative decoding example, step 1.
 7 Iterative decoding example, step 2.
 8 Iterative decoding example, step 3.
 9 Iterative decoding example, step 4.
 10 Example of ripple and cloud in the bipartite graph of an code.
 11 Ideal soliton distribution example,
 12 Robust soliton distribution example,
 13 vs. for the ideal and robust soliton distribution,
 14 vs. for a standard and a trivially systematic under decoding.
 15 Systematic code.
 16 Graph representation of a Raptor code.
 17 Number of and redundant symbols in R10 Raptor codes vs.
 18 Precode rate vs. for R10 Raptor codes
 19 Constraint matrix of a R10 Raptor code for and
 20 Systematic Raptor code.
 21 Structure of before inactivation decoding starts.
 22 Structure of after the triangulation process.
 23 Structure of after the zero matrix procedure.
 24 Structure of after .
 25 Structure of after backsubstitution.
 26 Triangulation procedure example, .
 27 Triangulation procedure example, .
 28 Triangulation procedure example, .
 29 Triangulation procedure example, .
 30 Triangulation procedure example, .
 \thesubsubfigure Before triangulation
 32 Example of decoding graph of an code under inactivation decoding
 33 Connected components of the decoding example.
 34 Average number of inactivations vs. relative overhead for an code with and with degree distribution
 35 Distribution of the number of inactivations for an code with , and degree distribution
 36 Average number of inactivations for an and a
 37 Evolution of the ripple and the cumulative number of inactivations for a with and
 38 Average number of inactivations vs. for and
 39 Probability of decoding failure, vs. for and
 40 Probability of decoding failure vs. for a Raptor with a Hamming outer code
 41 Expected probability of decoding failure vs. for two Raptor code ensembles where the outer code is selected from the (70,64) linear random ensemble constructed over and
 42 Surrogate output degree distribution for a Raptor code with a linear random precode.
 43 Average number of inactivations vs. for a Raptor code with a linear random outer code with degree distribution
 44 Distribution of number of inactivations for a Raptor code with a linear random outer code with and degree distribution
 45 Surrogate output degree distribution for an R10 Raptor code.
 46 Average number of inactivations vs. for a R10 Raptor code with
 48 Comparison of the output degree distribution of R10 Raptor codes, with the output degree distribution obtained through optimization .
 49 Probability of decoding failure vs. for binary Raptor codes with a Hamming outer code and degree distributions and
 50 Number of inactivations vs. for binary Raptor codes with a Hamming outer code and degree distributions and
 51 Fixedrate Raptor code as serially concatenated code
 52 Constraint matrix of a Raptor code with a linear random precode, with , and
 53 Growth rate vs. normalized output weight for a linear random code and different ensembles of Raptor codes
 54 Overall rate vs. the normalized typical minimum distance for different ensembles of Raptor codes
 55 Positive growth rate regions of a fixedrate Raptor code ensemble with degree distribution .
 57 Positive growth rate region of for
 58 Typical minimum distance as a function of the block length
 \thesubsubfigure good ensemble, ,
 \thesubsubfigure good ensemble, ,
 61 Positive growth rate region for the degree distribution
 62 CER vs. for two Raptor codes, one with linear random outer code and the other with the standard outer code of R10 Raptor codes
 63 Novel fountain coding scheme seen as a parallel concatenation of a linear block code and a .
 64 vs. for a concatenated code built using a over
 65 vs. for a concatenated code built using a over
 66 vs. symbols for the concatenation of a code and a over and
 67 vs. for the concatenation of a and over and
 68 vs. for the concatenation of a (63,57) Hamming code with a code in
 69 CER vs. erasure probability for the concatenation of a (63,57) Hamming code with a code in
 70 vs. transmitter overhead in a system with receivers and for different fountain codes.
 71 Number of inactivations vs. for a R10 Raptor code and
 72 Number of inactivations vs. for a R10 Raptor code and
 73 Number of inactivations vs. for a R10 Raptor code and
 74 Number of inactivations vs. for a R10 Raptor code and
 75 Number of inactivations vs. for a R10 Raptor code and
 76 Number of inactivations vs. for a R10 Raptor code and
 77 Number of inactivations vs. for a R10 Raptor code and
List of Tables
Chapter \thechapter Acronyms
 ARQ
 automatic retransmission query
 AWGN
 additive white Gaussian noise
 BCH
 Bose Chaudhuri Hocquenghem
 BEC
 binary erasure channel
 BSC
 binary symmetric channel
 BP
 belief propagation
 CER
 codeword error rate
 COWEF
 conditional outputweight enumerator function
 CRC
 cyclic redundancy check
 FEC
 forward error correction
 GE
 Gaussian elimination
 GRS
 generalized ReedSolomon
 HDPC
 highdensity paritycheck
 ID
 inactivation Decoding
 i.i.d.
 independent and identically distributed
 IOWEF
 input outputweight enumerator function
 MDS
 maximum distance separable
 ML
 maximum likelihood
 MWI
 maximum weight inactivation
 LDGM
 lowdensity generator matrix
 LDPC
 lowdensity paritycheck
 LRFC
 linear random fountain code
 LT
 Luby transform
 PMF
 probability mass function
 QEC
 ary erasure channel
 RI
 random inactivation
 RS
 ReedSolomon
 RSD
 robust soliton distribution
 SA
 simulated annealing
 SPC
 single paritycheck
 TCP
 Transmission Control Protocol
 WE
 weight enumerator
 WEF
 weight enumerator function
[3cm]
[aa] number of codewords of weight \nomenclature[ac] code ensemble
[ak] number of input symbols \nomenclature[ah] number of intermediate symbols \nomenclature[ahb] binary entropy function \nomenclature[am] number of output symbols collected by the receiver \nomenclature[an] number of output symbols generated by the encoder
[ar]overall rate of a fixedrate Raptor code \nomenclature[aro]outer code rate \nomenclature[ari]inner fixedrate LT code rate \nomenclature[aw]Hamming weight \nomenclature[apf]probability of decoding failure
[ge]relative receiver overhead, \nomenclature[ge]erasure probability of the channel \nomenclature[gd]absolute receiver overhead \nomenclature[gw]normalized output weight \nomenclature[gD]Transmitter overhead \nomenclature[gl]normalized Hamming weight of the intermediate word \nomenclature[go]output degree distribution of an LT code \nomenclature[goa]average output degree of an LT code
[rv]row vector of input (source) symbols \nomenclature[rc]row vector of output symbols \nomenclature[ry]row vector of received output symbols \nomenclature[rG]generator matrix \nomenclature[rH]matrix corresponding to the nonerased positions of
[su]row vector of input (source) symbols \nomenclature[sv]row vector of intermediate symbols \nomenclature[sc]row vector of output symbols \nomenclature[sy]row vector of received output symbols \nomenclature[sG]generator matrix of the inner LT code \nomenclature[sGt]matrix corresponding to the nonerased positions of \nomenclature[sH]parity check matrix of the outer block code (precode) \nomenclature[sM]constraint matrix
[xw] cloud \nomenclature[xn]neighbourhood of a node in a graph \nomenclature[xo]Landau big O notation \nomenclature[xos]Landau small notation \nomenclature[xr]ripple \nomenclature[xp]positive growth rate region of a fixedrate Raptor code \nomenclature[xf]Galois field of order \nomenclature[xkra]Krawtchouk polynomial of degree with parameters and .
[za]degree of output symbol \nomenclature[zab]reduced degree of output symbol \nomenclature[zb]binomial coefficient \nomenclature[zc]smallest integer larger than or equal to x \nomenclature[zf]largest integer smaller than or equal to x \nomenclature[zap]closest integer to x \nomenclature[zrank]rank of matrix
Chapter \thechapter Introduction
I just wondered how things were put together.
Claude E. Shannon
In the early years of communication systems it was not known whether error free communication was possible over a communication channel that introduced errors using a rate that was not vanishingly small. It was C.E Shannon, who in his landmark paper from 1948 Shannon1948 () proved that error free communication is possible if one communicates at a rate lower than the channel capacity. This milestone gave birth to the Information Age in which we live nowadays.
Initially the research community focused on the communication channels that arise in the physical layer of a communication system. At the physical layer of a communication system the thermal noise generated by thermal vibrations of the atoms in conductors can be accurately modeled as additive white Gaussian noise (AWGN), giving rise to the AWGN channel. The AWGN channel was one of the first models to be studied. Another simpler model of the physical layer is the binary symmetric channel (BSC) channel that was also widely studied during the early days of the Information Age. The BSC can be seen as a degradation of the AWGN when the input to the channel is constrained to be binary and symmetric and the receiver applies hard decision detection.
After the publication of Shannon’s work a humongous amount of research has been carried out in the field of channel coding. The dominant motivation in the research community was getting closer and closer to Shannon’s capacity with an affordable complexity. In the early decades of channel coding, algebraic codes were the main focus of research. The most prominent fruits of this research were Hamming, Golay, Reed Muller, BCH and ReedSolomon (RS) codes hamming:1950 (); golay:1949 (); muller:1954 (); reed:1954 (); hocquenghem:1959 (); bose:1960 (); reed:RS (). Algebraic coding usually aims at finding codes with good distance properties, usually by maximizing the minimum distance of a code. Due to their good distance properties, algebraic codes tend to exhibit a low probability of error under optimal (maximum likelihood) decoding. The main disadvantage of algebraic codes, is that in general soft decoding tends to be complex, specially for large block lengths.
The first paradigm change in coding was shifting the focus towards probabilistic codes where the aim is at improving the average performance of a code with constraints on the encoding and decoding complexity Costello:history (). At this stage, the research community had realized that the structure of the codes needed to be tailored to simplify the implementation in practical systems. Convolutional codes, introduced by Elias in Elias55:2noisy () are generally considered to be the first class of probabilistic codes Costello:history (). Optimal decoding algorithms for convolutional codes were first derived by Viterbi viterbi1967error () and then by Bahl, Cocke, Jelinek and Raviv bahl1974optimal (). Another important milestone in coding was the introduction of concatenated codes by Forney forney1966concatenated (), which involve a serial cascade of two linear block codes, usually denoted as inner and outer code. The main advantage of concatenated codes is that the inner and outer codes can be short and easy to decode. Hence, it is possible to decode concatenated codes using so called 2 stage decoders (decoding first the inner and then the outer coder). This decoder is suboptimal but it still shows a very good performance. In fact, the serial concatenation of RS and convolutional codes developed by NASA ccsds:bluebookarticle (), and inspired in Forney’s concatenated codes, was for many years one of the best performing coding schemes known and was widely used in practice.
The second paradigm change came with turbo codes, introduced in 1993 berrou1996near (). Thanks to iterative soft decoding algorithms both turbo and lowdensity paritycheck (LDPC) codes were able to approach the Shannon limit in AWGN channels with a modest complexity. LDPC codes had been proposed and studied by Gallager in his doctoral thesis in 1963 Gallager63 () but later they had been largely forgotten because their potential for long block lengths was not recognised. Shortly after the introduction of turbo codes, LDPC codes were rediscovered in MacKayldpc (), where it was observed that their performance was better than that of convolutional and concatenated codes, and similar to that of turbo codes. Nowadays, the majority of practical wireless communication systems use turbo or LDPC codes since these codes allow to close largely the gap to capacity in most cases.
In the meantime digital communications have become ubiquitous and channel coding problems are no longer exclusive to the physical layer of communications systems. In this thesis we deal exclusively with erasure channels which are generally not typical from the physical layer. The binary erasure channel (BEC) was introduced by Elias in Elias55:2noisy (). In this channel the transmitters sends one bit (either zero or one) and the receiver either receives this bit error free or receives an erasure. The BEC was originally regarded as a purely theoretical channel. However, this changed with the emergence of the Internet. It was soon realized that erasure channels are a very good abstraction model for the transmission of data over the Internet, where packets get lost due to, for example, buffer overflows at intermediate routers. Erasure channels also find applications in wireless and satellite channels where deep fading events can cause the loss of one or several packets.
Reliable communication in data networks can be achieved by using an automatic retransmission query (ARQ) mechanism in which the receiver requests the retransmissions of the information they have not been able to decode successfully. However, ARQ mechanisms present some limitations. The first is that they rely intensively on feedback. The second limitation enters into play in a reliable multicasting application, where one single transmitter wants to send an object (a file) to a set of receivers. In this scenario different receivers suffer different losses. If the number of receivers is large, the transmitter needs to process a large number of feedback messages and it also needs to perform a large number of retransmissions. For such applications, one would desire to have an alternative to ARQ that does not rely so much on feedback and whose efficiency scales better with the number of users.
Probably, one of the first works proposing erasure coding as an alternative to ARQ mechanisms is Metzer84:retransmission (), where an algorithm is proposed for the transmission of a file to multiple receivers. Instead of retransmitting lost packets, the transmitter sends redundancy packets until all receivers acknowledge the reception of the file. In that work ReedSolomon codes and linear random codes were considered, which become impractical due to their complexity for mediumlarge block lengths, i.e., for block lengths exceeding the few thousands.
Tornado codes were proposed for transmission over erasure channels luby97:PracticalLossRes (); luby2001efficient (). Tornado codes have linear encoding and decoding complexity (under iterative decoding). However, the encoding and decoding complexity is proportional to their block lengths and not their dimension, luby97:PracticalLossRes (). Hence, they are not suitable for low rate applications such as reliable multicasting, where the transmitter needs to adapt its code rate to the user with the worst channel (highest erasure probability). Another family of codes with good performance over erasure channels are LDPC codes. Several works have considered LDPC codes over erasure channels oswald2002capacity (); miller04:bec (); paolini2012maximum () and they have been proved to be practical in several scenarios even under maximum likelihood (ML) decoding. For example, in paolini2012maximum () a decoding speed of up to 1.5 Gbps was reported for a LDPC using maximum likelihood (ML) decoding. However, for a fixed code dimension, the decoding complexity of LDPC codes increases with the block length. Thus, as the erasure rate of the channel increases one is forced to increase the block length (i.e., decrease the rate), and the decoding complexity increases.
Although solutions based on linear block codes usually outperform ARQ mechanisms in the reliable multicasting setting, they still present some limitations. The first limitation is that the rate, and hence the block length, needs to be fixed apriori. In the chosen rate turns out not to be low enough, it can happen that some users are unable to recover the original file. Furthermore, block codes usually need to be carefully designed taking into account the information and block lengths. Thus, if one decides to change these parameters one usually needs to carry out a new code design.
The concept of a digital fountain was introduced in byers98:fountain () as an ideal solution to the problem of distributing data to a large number of users. A fountain code is basically an erasure code that is able to generate a potentially endless amount of encoded symbols. As such, fountain codes find application in contexts where the channel erasure rate is not known a priori. The first class of practical fountain codes, LT codes, was introduced in luby02:LT (). LT codes admit a sparse graph representation and can be decoded efficiently by means of iterative decoding when the code dimension (or number of input symbols, usually denoted by ) is large. The main drawback of LT codes is that in order to have a low probability of unsuccessful decoding, the encoding and iterative decoding cost per output/input^{1}^{1}1The encoding cost is defined as the encoding complexity in terms of operations normalized by the number of output symbols and the decoding cost as the decoding complexity normalized by the number of input symbols. symbol has to grow at least logarithmically with the dimension of the code, . Thus, LT codes have a scalability problem. On the one hand we need the number of input symbols to be very large so that iterative decoding succeeds with high probability. On the other hand, by making large the encoding and iterative decoding cost increase.
Raptor codes were introduced in shokrollahi2001raptor () and published in shokrollahi04:raptor (),shokrollahi06:raptor () as an evolution of LT codes. They were also independently proposed in maymounkov2002online (), where they are referred to as online codes. Raptor codes consist of a serial concatenation of an outer block code, commonly referred to as precode, with an inner LT code. The basic idea behind Raptor codes is relaxing the LT code design, thus, requiring only the recovery of a fraction of the input symbols, where is usually small. This can be achieved with linear complexity, both in encoding and (iterative) decoding. The outer code is responsible for recovering the remaining fraction of input symbols, . If the precode is lineartime encodable, then the Raptor code has linear encoding complexity on the number of input symbols , and therefore the overall encoding cost per output symbol is constant with respect to the number of input symbols . If iterative decoding is used and the outer code can be decoded iteratively with linear complexity (in the number of input symbols ), the decoding complexity is also linear which results in a constant decoding cost per symbol. Furthermore, in shokrollahi06:raptor () it was shown that Raptor codes under iterative decoding are universally capacityachieving on the binary erasure channel. This means that a Raptor code can achieve the capacity of all BECs, no matter which value the erasure probability takes. Thus, they can be used for transmission over an erasure channel whose erasure probability is unknown and they are still guaranteed to achieve capacity.
Both LT and Raptor codes have been analyzed in depth under the assumption of iterative decoding and very large input blocks (at least in the order of a few tens of thousands symbols). However, often much smaller input block lengths are used due to different reasons. For example, the decoders have sometimes limited memory resources allocated, the files to be delivered are often of smaller size, and sometimes a short latency is desired. This leads to the need of efficient short fountain codes. This is the reason why, for the Raptor codes standardized in 3GPP Multimedia Broadcast Multicast Service (MBMS) and IETF it is recommend to use between and input symbols (see MBMS12:raptor () and luby2007rfc () for more details). For these input block lengths, the performance under iterative decoding degrades considerably. In fact, these codes are decoded using an efficient ML decoding algorithm known as inactivation decoding shokrollahi2005systems ().
The focus of this doctoral thesis is on the analysis and design of fountain codes under ML decoding inspired by practical applications. Major parts of the results in this dissertation have been published in lazaro:ITW (); lazaro:scc2015 (); lazaro:Allerton2015 (); lazaro2011concatenation (); lazaro2013parallel (); lazaro:ISIT2015 (); lazaro:JSAC (); lazaro:Globecom2016 (); garrammone2013fragmentation ().
The remaining of this thesis is organized as follows. Chapter Fountain Codes under Maximum Likelihood Decoding provides some preliminaries on erasure channels, block codes and fountain codes. The two main classes of fountain codes, LT and Raptor codes are introduced in Chapter Fountain Codes under Maximum Likelihood Decoding. In Chapter Fountain Codes under Maximum Likelihood Decoding LT codes under inactivation decoding are considered. The main contribution of this chapter is an analysis of the decoding complexity of LT codes under inactivation decoding using a dynamical programming approach. Chapter Fountain Codes under Maximum Likelihood Decoding focuses on Raptor codes under inactivation decoding. First, an upper bound on the probability of decoding failure of Raptor codes under ML decoding is presented. Then, a heuristic analysis of inactivation decoding is presented that provides an approximation of the number of inactivations. Chapter Fountain Codes under Maximum Likelihood Decoding contains several results related to the distance spectrum of an ensemble of fixedrate Raptor codes. In Chapter Fountain Codes under Maximum Likelihood Decoding a novel fountain coding scheme is presented that consists of a parallel concatenation of a linear block code with a linear random fountain code (LRFC). This scheme is particularly interesting when the outer code is a maximum distance separable (MDS) code. Some concluding remarks are presented in Chapter Fountain Codes under Maximum Likelihood Decoding. Appendix Fountain Codes under Maximum Likelihood Decoding contains a comparison of the performance of the different inactivation techniques used in practice. Finally, Appendix Fountain Codes under Maximum Likelihood Decoding contains some proofs that were omitted from Chapters Fountain Codes under Maximum Likelihood Decoding and Fountain Codes under Maximum Likelihood Decoding.
Chapter \thechapter Background
Everything should be made as simple as possible, but not simpler.
Albert Einstein
In this chapter we briefly introduce the communication channels that are considered in this thesis. Concretely, we present three different channels, the binary erasure channel (BEC), the ary erasure channel (QEC) and the packet erasure channel. We then present some basic concepts related to block codes and fountain codes. Finally, the notation used in the thesis is described.
1 Channel Models
1.1 The Memoryless Binary Erasure Channel
The memoryless binary erasure channel (BEC) Elias55:2noisy () is a communication channel with a binary input alphabet and a ternary output alphabet , as depicted in Figure 1. The symbol “” denotes an erasure. Let be the random variable associated to the input of the channel and be the random variable associated with the output of the channel. The transition probabilities of the channel are:
When the symbols “” or “” are received there is no uncertainty about the symbol transmitted. However, when symbol “” is received the receiver does not know which symbol was transmitted.
1.2 The ary Erasure Channel
The ary erasure channel (QEC) is a communication channel with a qary input alphabet and an output alphabet of cardinality , , as depicted in Figure 2. Again, symbol “” denotes an erasure. Let be the random variable associated to the input of the channel and be the random variable associated to the output of the channel. The transition probabilities of the channel are:
1.3 The Packet Erasure Channel
The packet erasure channel is a communication channel in which the input is a packet, that is, an array of symbols belonging to the alphabet , i.e. . Similarly to the BEC and QEC, in the packet erasure channel at the output the input is received error free with probability , and an erasure is received with probability .
The packet erasure channel can be seen as parallel, fully correlated BECs massey:81 (). Thus, the capacity of the packet erasure channel is
and
Furthermore, all coding methods and performance bounds from the BEC can be applied to the packet erasure channel with slight modifications.
The packet erasure channel has a great practical importance. For example, let us consider a satellite or terrestrial communication link. The data to be transmitted is usually split into packets and each of these packets is transmitted using a channel code at the physical layer. At the receiver side, channel decoding is performed at the physical layer in order to correct the errors introduced by the (physical) channel. After channel decoding some residual errors might still be present. At this stage error detection is carried out and packets containing errors are marked as erased (discarded). It is easy to see how, under the assumption of perfect error detection, the upper layers can abstract the behavior of the lower layers as a packet erasure channel.
The packet erasure channel can also be used to abstract the behavior of a computer data network such as the Internet. In this case, generally, the packets need to be forwarded through different intermediate nodes before reaching their destination. In this case, packet losses can occur due to, for example, a buffer overflow in some intermediate node. Additionally, during transmission bit errors can occur. Protocols (i.e. IP protocol) usually add a cyclic redundancy check (CRC) to each packet, that is used to detect and discard erroneous packets. All in all, the behavior of the data network can be abstracted by the upper layers as a packet erasure channel between the encoder and decoder.
Figure 3 shows the block diagram of a typical digital communication system that makes use of erasure coding in a single link communication. At upper layers, a packet erasure channel encoder is used which accepts at its input source packets and generates output packets. Before transmission, each frame is protected by an erasure code. At the receiver side channel decoding is performed at the physical layer in order to correct the errors introduced by the (physical layer) channel. After channel decoding some residual errors might be present. At this stage error detection is carried out and packets containing errors are marked as erased (discarded). Next, this packets are passed on to the packet erasure channel decoder which then recovers the original source packets.
Due to the easy mapping of the packet erasure channel to the BEC and QEC, for ease of exposition all the results in this thesis will be stated in the BEC/QEC setting, being the extension to the packet erasure channel straightforward. This approach is quite widespread in the recent literature of coding for erasure channels.
2 Block Codes: Basics and Performance Bounds
Consider the transmission over the BEC with a binary linear block code . It is possible to show that the block error probability, satisfies the following inequality
where is the Singleton bound singleton1964maximum (),
(1) 
In this bound, equality is achieved only if is a maximum distance separable (MDS) code, i.e., if the code minimum distance is:
Berlekamp derived an upper bound on the average block error probability of random binary linear block codes berlekamp:bound ()
(2) 
If we compare (1) and (2) we can see how the Berlekamp bound is composed of the Singleton bound plus a correction term.
Let us denote by the best code among all binary linear block codes, where by best we mean the one with the minimum block error probability over a BEC. We have that:
That is, the Singleton and the Berlekamp bounds provide lower and upper bounds to the block error probability of the best binary linear block code with parameters .
The block error probability of a linear block code not only depends on its minimum distance, , but also on its weight enumerator, , that corresponds to the number of codewords of Hamming weight . Unfortunately, when dealing with modern (turbo/ LDPC) codes, deriving the exact weight enumerator of a code is a very challenging problem berlekamp78:intractability (). For this reason it is convenient to work with code ensembles since it is usually easier to derive average results for the ensemble.
A code ensemble is a set of codes together with a probability distribution that gives the probability of the occurrence of each of the codes in the ensemble. We will illustrate the concept of code ensemble by means of an example.
Example 1.
The binary linear random ensemble is given by all possible codes obtained by generating at random a parity check matrix in which each element of the parity check matrix takes value one with probability 1/2. This ensemble contains all codes with , since the rank of can be smaller than .
Let us consider a binary linear block code ensemble . The ensemble average weight enumerator is defined as
where denotes expectation over all the codes in the ensemble , and is the weight enumerator of code .
Consider a binary linear block code ensemble with average weight enumerator . The average block error probability for codes in the ensemble, , can be upper bounded as CDi2001:Finite ()
(3) 
3 Fountain Codes: Basics and Performance Bounds
Consider a fountain code of dimension . The fountain encoder receives at its input input symbols (also called source symbols) out of which it generates output symbols (also called coded symbols). The key property of a fountain code is that the number of output symbols does not need to be fixed apriori. Additional output symbols can be generated on the fly in an ondemand fashion. For this reason, fountain codes are said to be rateless.
We consider the transmission over an erasure channel with a fountain code with input symbols. In this setting, the output symbols generated by the fountain encoder are transmitted through an erasure channel where they are erased with probability . We denote by the number of output symbols that are not erased by the channel at a given receiver. We define the absolute (receiver) overhead as:
We also define the relative overhead as the absolute overhead normalized by the number of input symbols, formally:
Given the fact that fountain codes are rateless ( not fixed) it is useful to define the performance bounds of fountain codes in terms of the absolute receiver overhead. More concretely, we are interested in bounds to the probability of decoding failure as a function of the absolute receiver overhead, .
A lower bound to the performance of fountain codes is obtained assuming an ideal fountain code that allows the receiver to decode successfully whenever output symbols are received, i.e., whenever . The performance on an ideal fountain code is, hence, given by:
Thus, for any given fountain code its decoding failure probability can be lower bounded as
Let us consider a linear random fountain code (LRFC)^{2}^{2}2 linear random fountain codes (LRFCs) are defined in Section 5. on a finite field of order . In Liva2013 () it was shown how the probability of decoding failure of an linear random fountain code (LRFC) can be upper bounded as
(4) 
Let us now denote by the best code among all ary fountain codes with input symbols, where by best we mean the one with the minimum block error probability over a QEC. We have that:
That is, the performance of an ideal fountain code and the bound in (4) provide lower and upper bounds to the probability of decoding failure of the best ary fountain code with input symbols, when used to transmit over a ary erasure channel.
4 Notation
In this section we introduce several definitions which will be used throughout the thesis.
Definition 1 (notation).
Let and be two real functions. We write:
if for sufficiently large values of , there exists a constant so that
For example, if a function is , given , we can find a value such that is upper bounded by for sufficiently large . This notation is also known as Landau notation and it is employed to characterize the behaviour of a function when its argument tends to infinity Graham:1994 ().
Another useful asymptotic notation is the small notation whose formal definition is introduced next.
Definition 2 (notation).
Let and be two real functions. We write:
if and only if for any constant and sufficiently large
Note that although the definitions of notation and notation are similar, they are not equivalent. For example, consider . We can say that is but this would not be true for little notation.
Definition 3 (Exponential equivalence).
Two realvalued positive sequences and are said to be exponentially equivalent CoverThomasBook (), writing , when
(5) 
If and are exponentially equivalent, then
(6) 
Chapter \thechapter Linear Random Fountain Codes, Lt and Raptor Codes
Within this chapter we present three fountain code constructions that can be found in literature. First we introduce linear random fountain codes (LRFCs), which are probably the conceptually simplest fountain code one can think of. We then introduce LT codes, and describe their encoding and decoding procedures. Finally, we introduce Raptor codes, which are arguably the best performing fountain coding scheme known.
5 Linear Random Fountain Codes
For the sake of completeness, let us start by formally defining a Galois Field
Definition 4 (Galois Field).
We denote by a Galois field or finite field of order . A Galois Field is a set of elements on which the addition and multiplication operations fulfil the following properties:

is an Abelian group under addition with identity element denoted by .

is a multiplicative group with identity element denoted by .

multiplication is distributive over addition
A ary linear random fountain code (LRFC) is a fountain code that accepts at its input a set of input (or source) symbols, , where . At its output, the linear random fountain code encoder can generate an unlimited amount of output symbols (also known as coded symbols) , where can grow indefinitely and . The th output symbol is generated as:
where the coefficients are picked from with uniform probability. If we assume to be fixed, LRFC encoding can be seen as a vector matrix multiplication:
where is an with elements picked uniformly at random from .
Let us now assume that the output symbols produced by the LRFC encoder are transmitted over a ary erasure channel, and let us also assume that out of the output symbols generated by the LRFC encoder, the receiver collects , denoted by . Denoting by the set of indices corresponding to the nonerased symbols, we have
We can now cast the received output symbols as
(7) 
with given by the columns of with indices in .
LRFC decoding is performed by solving the system of equations in (7). Note that matrix is dense, since its elements are picked uniformly at random in . Due to the high density of LRFC decoding is quite complex; hence, LRFCs are only practical for small values of (at most in the order of the hundreds).
The performance of these codes is remarkably good and follows a relatively simple model. Under ML decoding, the decoding failure probability of a binary LRFC shokrollahi06:raptor (); Medard08:ARQ () can be accurately modeled as for . Actually, can be upper bounded by berlekamp:bound (); shokrollahi06:raptor (); Medard08:ARQ ().
In Liva10:fountain (), LRFC on finite fields of order equal or larger than (, ) were analyzed. It was shown that for an LRFC over , the failure probability under ML decoding is bounded as Liva10:fountain ()
(8) 
where both bounds are tight already for , and become tighter for increasing .
6 LT codes
Luby transform (LT) codes were introduced in luby02:LT () as the first practical implementation of a fountain code. They were originally introduced together with an iterative decoding algorithm that will be explained in detail in Section 6.1.
An LT code accepts at its input a set of symbols, , that are commonly referred to as input symbols (or source) symbols. At its output, the LT encoder can generate an unlimited amount of output symbols (also known as coded symbols) , where can grow indefinitely. A key concept when dealing with LT codes is the degree of an output symbol or output degree, which is defined as the number of input symbols that were used to generate the output symbol under consideration. An LT code is defined by an output degree distribution , where corresponds to the probability that an output symbol of degree is generated, and is the maximum output degree.
In order to generate one output symbol the LT encoder performs the following steps:

Randomly choose a degree according to the degree distribution .

Choose uniformly at random distinct input symbols.

Compute the output symbol as a xor of the selected input symbols.
If we assume for a moment that the number of output symbols is fixed, the LT encoding operation can be seen as a vector matrix multiplication:
where is an binary^{3}^{3}3Unless otherwise stated we will always consider binary LT codes. matrix which defines the relation between the input and the output symbols. The element of is set to one only if input symbol was used to generate output symbol . Otherwise, element is set to zero. From this description it is easy to see how binary LRFCs can be considered a particular type of LT code in which the output degree distribution corresponds to a binomial distribution with parameters and .
LT codes admit a bipartite graph representation. In the bipartite graph of an LT code there are two different types of nodes, corresponding to input and output symbols. Let us introduce the notation to refer to the degree of an output symbol . An output symbol of degree will have neighbors in the bipartite graph. We will use the notation to denote the set of neighbours, i.e. the neighbourhood of a node.
The bipartite graph of an LT code is related to its matrix representation, and can be derived from . We will illustrate this by means of an example. Figure 4 shows the bipartite graph representation of an LT code with input symbols and output symbols. In the figure, input symbol are represented by red circles and output symbol using blue squares. The generator matrix of the LT code represented in the figure corresponds to
An important parameter of an LT code is its average output degree , that is given by
In LT literature, degree distributions are commonly represented in polynomial form. Given a degree distribution , its polynomial representation is given by
This representation can be used to derive moments of the degree distribution (that is a probability mass function) in a very compact form. For example, the average output degree can be expressed as the first derivative of evaluated at ,
6.1 Iterative Decoding
LT codes were introduced in luby02:LT () together with a suboptimal, low complexity decoding algorithm. Although a more proper name for it would be that of peeling decoder, this decoder is usually referred to as iterative decoder. In this thesis we will use the terms iterative decoding and peeling decoder interchangeably.
Iterative decoding of LT codes is best described using a bipartite graph. Let us assume that the receiver has collected output symbols that we will denote by . We will consider a bipartite graph containing the collected output symbols, , and the input symbols .
Algorithm 1 (Iterative decoding).

Search for an output symbol of degree one.

If such an output symbol exists move to step 2.

If no output symbols of degree one exist, iterative decoding exits and decoding fails.


Output symbol has degree one. Thus, denoting its only neighbour as , the value of is recovered by setting .

Denoting by the set of neighbours of . For each :

Update the value of as: , where denotes addition over .

Remove input symbol and all its attached edges from the graph.


If all input symbols have been recovered, decoding is successful and iterative decoding ends. Otherwise, go to step 1.
In order to illustrate iterative decoding we will provide a small example. Figure 5 shows the bipartite graph before iterative decoding starts. We can see that the number of source symbols is and the number of output symbols collected by the receiver (not erased by the channel) is .
Iterative decoding starts by searching for a degree one output symbol. In Figure 6 we can see that output symbol is the only output symbol with degree one. Using the decoder recovers . Afterwards, the decoder performs the xor (addition over ) of with all its neighbors. After doing so all edges attached to are erased.
The second run of iterative decoding is shown in Figure 7. The decoder finds the only degree one output symbol , and uses it to recover . Next, the decoder performs the xor (addition over ) of with its other neighbor, , and erases the edges attached to .
Figure 8 depicts the third iteration. We can see how the only degree one output symbol is used to solve . Then the decoder performs the xor of to its other neighbor, and the edges are removed from the graph.
Finally, the last iteration is shown in Figure 9. Now there are two degree one output symbols, and . In this case we assume the decoder chooses at random to recover the last input symbol .
The following proposition (luby02:LT ()) provides a necessary condition for decoding to be successful with high probability.
Proposition 1.
A necessary condition for decoding to be successful with high probability is .
Proof.
The proof uses the “balls into bins” argument that was presented in luby02:LT (). Let us first assume that and are very large and let us assume that at encoding each output symbol chooses its neighbors with replacement^{4}^{4}4This means that an output symbol will be allowed to choose multiple times the same neighbor. However, this will happen with a negligible probability for large enough values of .. Let us consider a randomly chosen input symbol and an output symbol of degree . The probability that is not in the neighborhood of corresponds to:
Let us denote by the probability that does not have any edges to the received symbols. This probability corresponds to the probability of not belonging to the union of the neighborhoods of the received output symbols. Under the replacement assumption we have that
If we now let tend to infinity, we have
where we have made use of the relationship
Let us denote by the expected number of input symbols not covered by any output symbol,
A necessary condition for successful decoding with high probability is that the is vanishingly small. If we relax this condition and let simply be a small positive number, we have
This leads us to the statement in the proposition. ∎
Note that the condition in Proposition 1 is valid for any decoding algorithm and not only for iterative decoding.
The performance of LT codes under iterative decoding has been object of study in several works and is well understood, Karp2004 (); maneva2006new (); Maatouk:2012 (); shokrollahi2009theoryraptor (). Iterative decoding of LT codes can be seen as an iterative pruning of the bipartite graph of the LT code. If we take an instance of decoding in which iterative decoding is successful, we have that initially all input symbols are unresolved (not yet decoded). At every iteration exactly one input symbol is resolved and all edges attached to the resolved input symbol are erased from the graph. Decoding continues until all input symbols are resolved, which is the case after iterations. Let us consider the iterative decoder at some intermediate step in which input symbols are yet unresolved and symbols have already been resolved. Following Karp2004 () we shall introduce some definitions that provide an insight into the iterative decoding process.
Definition 5 (Reduced degree).
We define the reduced degree of an output symbol as the degree of the output symbol in a reduced bipartite graph in which only unresolved input symbols are present.
Thus, at the initial stage of iterative decoding, when all input symbols are unresolved, the reduced degree of a symbol is equal to its actual degree. However, as iterative decoding progresses the reduced degree of an output symbol decreases if his neighbors get resolved.
Definition 6 (Output ripple).
We define the output ripple or simply ripple as the set of output symbols of reduced degree 1 and we denote it by .
Definition 7 (Cloud).
We define the cloud as the set of output symbols of reduced degree and we denote it by .
Figure 10 shows the bipartite graph of an LT code in which input symbols are unresolved. It can be observed how output symbols and belong to the ripple since they have reduced degree one and output symbols and belong to the cloud since their degree is 2 or larger.
It is easy to see how during the iterative decoding process, after every iteration at least one symbol leaves the ripple (assuming decoding is successful). Moreover, at each iteration some output symbols might leave the cloud and enter the ripple if their reduced degree decreases from two to one. Note also that iterative decoding fails if the ripple becomes empty before iterations. Thus, if one is able to track the size of ripple it is possible to derive the performance of LT codes under iterative decoding. In Karp2004 () a finite length analysis of LT codes is proposed that models the iterative decoder as a finite state machine, based on a dynamic programming approach. The full proof of the analysis in Karp2004 (), that was published only in abstract form, can be found in shokrollahi2009theoryraptor (). This analysis can be used to derive the error probability of the iterative decoder and it also allows to compute the first order moments of the ripple and the cloud. This analysis was extended in Maatouk:2012 (), where the second moment of the ripple size was analyzed. In maneva2006new () another analysis of LT codes under iterative decoding is proposed that has lower complexity and is based on the assumption that the number of output symbol collected by the receiver follows a Poisson distribution.
6.1.1 Degree Distributions
In this section we present the two best well known degree distributions, the ideal soliton distribution and the robust soliton distribution. Both distributions were designed for iterative decoding.
Ideal Soliton Distribution
The first distribution we will present is known as ideal soliton distribution luby02:LT () and is based on these two design principles:

The expected number of output symbols in the ripple at the start of iterative decoding is one.

The expected number of output symbols leaving the cloud and entering the ripple is one at every iteration.
Thus, the expected ripple size is during the whole decoding process. The ideal soliton distribution, which we denote by , has the following expression.
(9) 
Note that the distribution varies with the number of input symbols . The average output degree of the ideal soliton distribution is luby02:LT ()
where is the harmonic sum up to :
Since, the harmonic sum can be approximated as , we can approximate the average output degree of as
For illustration we provide a plot of the ideal soliton distribution for in Figure 11.
In practice the ideal soliton distribution does not show a good performance. The reason behind this poor performance is that its design only takes into account the expected value of symbols entering the ripple. In practice, however, there are statistical variations in the iterative decoding process that make the ideal soliton distribution fail with high probability.
Let us denote the probability of decoding failure by . A lower bound to is the probability that the decoding cannot start at all because the ripple is empty (no degree one output symbols), we shall denote this probability by . This probability corresponds to
If we now let (and ) tend to infinity keeping the relative receiver overhead constant, this expression simplifies to:
This implies the probability of decoding failure is in practice very high, since one usually wants to operate at low (the overhead should ideally be small).
Robust Soliton Distribution
The robust soliton distribution was introduced in the original LT paper from Luby, luby02:LT (). This distribution is an improvement of the ideal soliton. In fact, the design goal of the robust soliton distribution is ensuring that the expected ripple size is large enough at each point of the decoding with high probability. This ensures that iterative decoding does not get stuck in the middle of the decoding process.
The robust soliton distribution is actually a family of parametric distributions that depend on two parameters and . Let . The robust soliton distribution is obtained as:
(10) 
where and are given by
and
Therefore, the robust soliton distribution is obtained as a mixture of the ideal soliton distribution with a correction term . The average output degree for this distribution can be upper bounded by luby02:LT () :
For illustration we provide a plot of a robust soliton distribution in Figure 12. We can observe how the probability of degree one output symbols is increased with respect to the ideal soliton distribution. Moreover, a spike appears in the distribution at .
In Figure 13 we provide a performance comparison for the ideal and robust soliton distribution for . More concretely we show the probability of decoding failure under iterative decoding, , vs. the relative receiver overhead . It can be observed how the asymptotic lower bound to for the ideal soliton distribution holds and is actually tight for high . Moreover, we can observe how the probability of decoding failure of the robust soliton distribution is much lower than that of the ideal soliton distribution.
6.2 Maximum Likelihood Decoding
As we saw in Section 6, for fixed the relation between source symbols and output symbols can be expressed by a system of linear equations:
where we recall, that was the generator matrix of the fixedrate LT code. That is, under the assumption that the number of output symbols is fixed.
Let us assume that out of the output symbols generated by the LT encoder the receiver collects , that we denote by . Denoting by the set of indices corresponding to the nonerased symbols, we have
The dependence of the received output symbols on the source symbols can be expressed as:
(11) 
with given by the columns of with indices in .
LT decoding consists in finding the solution to the system of linear equations in (11). The solution will be unique only if has full rank, that is, if its rank is . If is rank deficient the system of equations does not have a unique solution and the receiver is not able to recover all source symbols^{5}^{5}5In this thesis we focus on problems in which it is necessary to recover all source symbols, therefore, we declare a decoding failure whenever one or several source symbols cannot be recovered..
Iterative decoding is a suboptimal algorithm, it is not always able to find the solution when has full rank. For example, if has full rank but does not have any row with Hamming weight one (degree one output symbol), iterative decoding is unable to find the solution.
A maximum likelihood (ML) decoding algorithm is an optimal decoding algorithm, in the sense that it always finds the solution to the system of linear equations whenever has full rank. Therefore the performance of any ML decoding algorithm depends only on the rank properties of and, more concretely, on the probability of having full rank. In schotsch:2013 () the performance of LT codes under ML decoding was studied and a lower bound to the probability of decoding failure was derived:
(12) 
The lower bound is very tight for reception overhead slightly larger than .
In practice, different ML decoding algorithms can be used to solve a system of equations and they all provide the same solution, that is unique when is full rank. However, different ML decoding algorithms have different decoding complexity, and some algorithms are more suitable than others for practical use.
6.3 Complexity Considerations
So far, the only performance metric we have dealt with is the probability of decoding failure. The other important metric when dealing with any coding scheme is its complexity both in encoding and decoding. Let us define complexity as the total number of operations (xor or symbol copy) needed for encoding / decoding. Since we consider binary LT codes, we only perform xor operations, which correspond to additions over . Note that decoding also requires copying the content of output symbols into input symbols. For the sake of completeness, we shall also count symbol copy as one operation. Let us also define the encoding cost as the encoding complexity normalized by the number of output symbols and the decoding cost as the decoding complexity normalized by the number of input symbols.
6.3.1 Encoding Complexity
Let us first consider encoding complexity. Generating an output symbol of degree requires operations. Thus, given a degree distribution , the encoding cost will be given by the average output degree . In proposition 1 we have shown how a necessary condition for decoding to be successful with high probability is . This implies that the encoding cost will need to be at least .
6.3.2 Iterative Decoding Complexity
We consider now the complexity of LT iterative decoding. Let us assume a generic degree distribution , with average output degree that requires a relative receiver overhead for decoding to be successful with high probability. If we think of a bipartite representation of our LT code, we can think of encoding as drawing the edges in the graph, where every edge implies performing one operation (xor or symbol copy). Similarly, iterative decoding starts operating on a bipartite graph containing the received output symbols and input symbols. During iterative decoding edges are erased from the graph, being each edge again associated to one operation. At the end of iterative decoding all edges are erased from the graph^{6}^{6}6Actually, at the last iteration of iterative decoding some edges might still be present in the graph since we might have more than one output symbol in the ripple. We neglect this effect for the sake of simplicity.. Thus the decoding cost under iterative decoding corresponds to .
In proposition 1 we have shown how a necessary condition for decoding to be successful with high probability is . This implies that the iterative decoding cost will need to be at least .
6.3.3 Maximum Likelihood Decoding Complexity
Many different ML decoding algorithm exists that can be used to solve a linear system of equations. All ML algorithms lead to the same solution, that in our case is unique when matrix is full rank. The ML decoding complexity will vary depending on which ML decoding algorithm is used.
The best known algorithm is probably Gaussian elimination. This algorithm has a decoding complexity of and is generally not practical for values of beyond the hundreds. The problem of solving systems of linear equations is a well known problem that appears not only in erasure correction. Several algorithms exist that have a lower (asymptotic) complexity than Gaussian elimination. For example, the Wiedemann algorithm wiedemann1986solving () can be used to solve sparse systems of linear equations with a complexity of . In lamacchia91:solving () different algorithms are studied to solve large systems of sparse linear equations over finite fields. In this work, the running times of different decoding algorithms are compared for systems of equations arising from integer factorization and the computation of discrete logarithms. The main finding of the paper is that if the system of equations is sparse, there exists a class of algorithms that in practice requires shorter running times than the Wiedemann algorithm when is below . This class of algorithms is usually known as structured or intelligent Gaussian elimination. They consist of reducing the system of equations to a much smaller one than can be solved using other methods (Gaussian elimination, for example). Let us assume that Gaussian elimination is used to solve the reduced system of equations, and let us also assume that our intelligent Gaussian elimination algorithm is able to reduce the size of the system of equations from to , where . Since the complexity of Gaussian elimination is , for large enough , the intelligent Gaussian elimination algorithm will reduce complexity at least by a factor . Despite having a higher asymptotic complexity (the complexity is still ) these algorithms have shorter running times than other algorithms, such as the Wiedemann algorithm (provided that is large enough and not too large).
6.4 Systematic Lt Codes
In practical applications it is desirable that fountain codes are systematic, that is, the first output symbols should correspond to the input symbols. Thus, if the quality of the transmission channel is good and no erasures occur, the receiver does not need to carry out decoding. A straightforward way of making a fountain code systematic is simply transmitting the first input symbols and afterwards start transmitting output symbols from the fountain code. We will refer to this construction as trivially systematic LT code. This construction shows a poor performance since the receiver overhead needed to decode successfully increases substantially shokrollahi2003systematic ().
Figure 14 shows the probability of decoding failure for a robust soliton distribution (RSD) for with parameters with and under ML decoding. In particular, two codes are considered, a standard LT code and a trivially systematic LT code over a BEC with erasure probability . It can be observed the trivial systematic code performs much worse than the standard non systematic LT code.
The bad performance of trivially systematic LT codes might seem surprising at first. The intuition behind this bad performance is the following. Assume that a substantial fraction of systematic symbols are received, for example, let us assume the decoder has received of the systematic symbols and that the remaining fraction have been erased. In order to be able to decode, the receiver will need to receive output symbols with neighbors within the yet unrecovered input symbols. Moreover, any output symbol having neighbors only within the received systematic symbols will be useless for decoding. Let us now assume that an output symbol of degree is received. The probability that all its neighbors are within the received systematic symbols is
Under the assumption that is large, , and that output symbols choose their neighbours with replacement, a simplified expression for this probability can be obtained. Under these assumptions, we have that the probability that one of the neighbors of an output symbol is within the received systematic symbols is . Hence, the probability that all neighbors are within the received systematic symbols is . Thus, when the fraction of received systematic symbols is close to one, and is not too large, most of the received output symbols will not help at all in decoding. A more detailed analysis of this effect can be found in shokrollahi2009theoryraptor ().
In practice a different systematic construction is used that was patented in shokrollahi2003systematic () and that will be presented next.
Let us recall that (for fixed ) LT encoding can be seen as a vectormatrix multiplication:
where is the row vector of input (source) symbols, is the row vector of output symbols, and is an binary matrix which defines the relation between the input and the output symbols (generator matrix). To construct a systematic LT code we start with an LT code with generator matrix in the shape
where is a fullrank matrix that corresponds to the first output symbols and is a matrix. First, one needs to compute the inverse matrix of , . The next step is computing:
Vector is then used as input to the LT encoder. Thus, the output of the LT encoder will be:
where is the identity matrix. Hence, the first output symbols correspond to the input symbols . For illustration Figure 15 shows a graph representation of a systematic LT code.
At the decoder side two different scenarios can be considered. In case none of the first output symbols of our systematic LT code are erased, there is obviously no need to carry out decoding. In case some erasures do occur, decoding can be done in two steps. First, standard LT decoding can be carried out to recover . This consists of solving the system of equations
where is a matrix that corresponds to the columns of associated to the output symbols that were not erased by the channel and are the received output symbols. This system of equations can be solved in several ways, for example using iterative decoding or inactivation decoding. Finally, the input symbols can be recovered computing
Note that this last step corresponds to LT encoding (since by construction is sparse, this last step is actually less complex than a standard vector matrix multiplication).
The main advantage of this construction is that its performance in terms of probability of decoding failure is similar to that of nonsystematic LT codes shokrollahi2011raptor (). However, this comes at some cost in decoding complexity, since an additional LT encoding needs to be carried out at the decoder.
7 Raptor Codes
Raptor codes were originally patented in shokrollahi2001raptor () and published in shokrollahi04:raptor (); shokrollahi06:raptor (). They were also independently proposed in maymounkov2002online (), where they are referred to as online codes. Raptor codes are an evolution of LT codes. More concretely, Raptor codes are a serial concatenation of an outer (fixedrate) block code (usually called precode) with an inner LT code.
At the input we have a vector of input (or source) symbols, . Out of the input symbols, the outer code generates a vector of intermediate symbols , where . Denoting by the employed generator matrix of the outer code, of dimension , the intermediate symbols can be expressed as
By definition, , i.e., the intermediate word is a codeword of the outer code .
The intermediate symbols serve as input to an LT code that can generate an unlimited number of output symbols, , where can grow unbounded. Hence, Raptor codes inherit the rateless property of LT codes. For any the output symbols can be expressed as
(13) 
where is the generator matrix of the (fixedrate) LT code. Hence, is an binary matrix, each column of being associated with a received output symbol as seen in Section 6.
Figure 16 shows a graph representation of a Raptor code, where the input symbols are represented as green diamondshaped nodes, the intermediate symbols as red circular nodes and the output symbols as blue squared nodes.
The design principle of Raptor codes can be intuitively explained as follows. In Chapter 6 we saw that a necessary condition for LT codes to be successfully decoded with high probability is that the average output degree is . This implies an encoding cost of and a decoding cost of as well (under iterative decoding). The main idea behind Raptor codes is relaxing the requirements on the LT code. Instead of requiring that the LT recovers all its input symbols, the inner LT code of a Raptor code is only required to recover with high probability a constant fraction of the intermediate symbols. This can be achieved with a constant average output degree. Let us assume that is large and that the receiver has collected output symbols. From the proof of Prop. 1, we have that for asymptotically large the fraction of intermediate symbols with no edges attached (uncovered) will correspond to:
Let us assume that all the covered intermediate symbols can be recovered by the LT code. The uncovered intermediate symbols can be considered as erasures by the outer code. If the outer code is an erasure correcting code that can recover with high probability from a fraction of erasures, we will be able to recover all input symbols with high probability.
If the precode is lineartime encodable, then the Raptor code has a linear encoding complexity, since the LT code has constant average output degree (i.e., the average output degree does not increase with ). Therefore, the overall encoding cost per output symbol is constant with respect to . If the precode also accepts a linear time decoding algorithm (iterative decoding), and the LT code is decoded using iterative decoding, the decoding complexity is also linear. Hence, the decoding cost per symbol is constant. Furthermore, already in the original Raptor code paper shokrollahi06:raptor (), Shokrollahi showed that Raptor codes under iterative decoding are universally capacityachieving on the binary erasure channel. Hence, they achieve the capacity of any erasure channel no matter which erasure probability the channel has.
7.1 Raptor Decoding
The output symbols generated by the Raptor encoder are transmitted over a BEC at the output of which each transmitted symbol is either correctly received or erased. Let us denote by the number of output symbols collected by the receiver of interest, where , being the absolute receiver overhead. We denote by the received output symbols. Denoting by the set of indices corresponding to the nonerased symbols, we have
The relation between the received output symbols and the input symbols can be expressed as:
(14) 
where
(15) 
with given by the columns of with indices in .
Raptor decoding consist of recovering the input symbols given the received output symbols . Although it is possible to perform Raptor decoding by solving the linear system of equations in (14), this is not done in practice for complexity reasons. The decoding algorithms employed in practice, iterative decoding or inactivation decoding, require that the system of equations is sparse in order to show good performance and matrix is not sparse in general.
In practice, instead of the generator matrix of the Raptor code, another matrix representation is used that is usually referred to as constraint matrix, since it is an alternative representation of the coding constraints of the outer and inner code. The constraint matrix of a Raptor code is defined as:
(16) 
where is the parity check matrix of the outer code (precode) with size . Thus, is a binary matrix.
By definition, the intermediate word of a Raptor code is a codeword of the precode, . Hence, one can write
(17) 
where is a zero column vector of size . Similarly, one can express the vector of received output symbols as:
(18) 
Putting together (17) and (18), we have
(19) 
In practical Raptor decoders (19) is used for decoding. The main advantage of the constraint matrix is that it preserves the sparsity of the generator matrix of the LT code. Moreover, it also preserves the sparsity of the parity check matrix of the precode, in case it is sparse.
The system of equations in (19) can be solved using different techniques, such as iterative decoding, standard Gaussian elimination or inactivation decoding. Similarly to LT codes, most works on Raptor codes consider large input blocks ( at least in the order of a few tens of thousands symbols) and iterative decoding. However, in practice smaller blocks are used, usually due to memory limitations at the decoders. For example, in the most widespread binary Raptor codes, R10 (release 10), values of ranging from to are recommended (see Section 7.2). For these input block lengths, the performance of iterative decoding suffers a considerable degradation. Therefore, instead of iterative decoding, ML decoding is used (inactivation decoding).
7.2 R10 Raptor Codes
The state of the art binary Raptor code is the R10 (release 10) Raptor code. This code is systematic and was designed to support a number of input symbols ranging from