Advances in Detection and Error Correction for Coherent Optical Communications: Regular, Irregular, and Spatially Coupled LDPC Code Designs
1 Introduction
Forward error correction (FEC) in optical communications has been first demonstrated in 1988 [1]. Since then, coding technology has evolved significantly. This pertains not only to the codes but also to encoder and decoder architectures. Modern highspeed optical communication systems require highperforming FEC engines that support throughputs of 100 GBit/s or multiples thereof, that have low power consumption, that realize net coding gains (NCGs) close to the theoretical limits at a target bit error rate (BER) of below , and that are preferably adapted to the peculiarities of the optical channel. ^{†}^{†} L. Schmalen and A. Leven are with Nokia Bell Labs, Lorenzstr. 10, 70435 Stuttgart, Germany. Email: {first.last}@nokiabelllabs.com ^{†}^{†} S. ten Brink is with the University of Stuttgart, Institute of Telecommunications, Pfaffenwaldring 47, 70569 Stuttgart, Germany. ^{†}^{†}This is the version of the following article: “Advances in Detection and Error Correction for Coherent Optical Communications: Regular, Irregular, and Spatially Coupled LDPC Code Designs”, which appeared as Chapter 3 in the book Enabling Technologies for High Spectralefficiency Coherent Optical Communication Networks edited by X. Zhou and C. Xie, which has been published in final form at DOI:10.1002/9781119078289 (ISBN 9781118714768 (print) and ISBN 9781119078289 (online)). This article may be used for noncommercial purposes in accordance with Wiley Terms and Conditions for SelfArchiving.
Forward error correction coding is based on deterministically adding redundant bits to a source information bit sequence. After transmission over a noisy channel, a decoding system tries to exploit the redundant information for fully recovering the source information. Several methods for generating the redundant bit sequence from the source information bits are known. Transmission systems with 100 GBit/s and 400 GBit/s today typically use one of two coding schemes to generate the redundant information: BlockTurbo Codes (BTCs) or LowDensity ParityCheck (LDPC) codes. In coherent systems, socalled soft information is usually ready available and can be used in high performing systems within a softdecision decoder architecture. Softdecision information means that no binary 0/1 decision is made before entering the forward error correction decoder. Instead, the (quantized) samples are used together with their statistics to get improved estimates of the original bit sequence. This chapter will focus on softdecision decoding of LDPC codes and the evolving spatially coupled LDPC codes.
In coherent optical communications, the signal received after carrier recovery may be affected by different distortions than those that commonly occur in wireless communications. For instance, the signal at the input of the signal space demapper may be affected by phase slips (also called cycle slips [2]), with a probability depending on the nonlinear phase noise introduced by the optical transmission link [3]. The phase slips are not an effect of the physical waveform channel but, rather, an artifact of coarse blind phase recovery algorithms with massive parallelization at the initial digital signal processing (DSP) receiver steps [4]. If such a phase slip is ignored, error propagation will occur at the receiver and all data following the phase slip cannot be properly recovered. Several approaches to mitigate phase slips have been proposed. Of these, the most common is differential coding, rendering a phase slip into a single error event. In order to alleviate the penalty caused by differential coding, iterative decoding between an FEC decoder and a differential decoder can be beneficial [5]. This solution leads however to an increased receiver complexity, as several executions of a softinput softoutput differential decoder (usually based on the BCJR algorithm^{1}^{1}1termed after the initial letters of its inventors Bahl, Cocke, Jelinek and Raviv [6].) have to be carried out.
In this chapter, we first show how the use of differential coding and the presence of phase slips in the transmission channel affect the total achievable information rates and capacity of a system. By means of the commonly used Quadrature PhaseShift Keying (QPSK) modulation, we show that the use of differential coding does not decrease the capacity, i.e., the total amount of reliably conveyable information over the channel remains the same. It is a common misconception that the use of differential coding introduces an unavoidable “differential loss”. This perceived differential loss is rather a consequence of simplified differential detection and decoding at the receiver. Afterwards, we show how capacityapproaching coding schemes based on LDPC and spatially coupled LDPC codes can be constructed by combining iterative demodulation and decoding. For this, we first show how to modify the differential decoder to account for phase slips and then how to use this modified differential decoder to construct good LDPC codes. This construction method can serve as a blueprint to construct good and practical LDPC codes for other applications with iterative detection, such as higher order modulation formats with nonsquare constellations [7], multidimensional optimized modulation formats [8], turbo equalization to mitigate ISI (e.g., due to nonlinearities) [9, 10] and many more. Finally, we introduce the class of spatially coupled (SC)LDPC codes, which are a specialization of LDPC codes with some outstanding properties and which can be decoded with a very simple windowed decoder. We show that the universal behavior of spatially coupled codes makes them an ideal candidate for iterative differential demodulation/detection and decoding.
This chapter is structured as follows: In Sec. 2 we formally introduce the notation, system model and differential coding. We highlight some pitfalls that one may encounter when phase slips occur on the equivalent channel. We propose a modified differential decoder that is necessary to construct a capacityapproaching system with differential coding. In Sec. 3, we introduce LDPC codes and iterative detection. We highlight several possibilities of realizing the interface between the LDPC decoder and the detector and give design guidelines for finding good degree distributions of the LDPC code. We show that with iterative detection and LDPC codes, the differential loss can be recovered to a great extend. Finally, in Sec. 4, we introduce SCLDPC codes and show how a very simple construction can be used to realize codes that outperform LDPC codes while having similar decoding complexity.
2 Differential Coding for Optical Communications
In this section, we describe and study the effect of differential coding on coherent optical communication systems and especially on the maximum conveyable information rate (the socalled capacity). We assume a simple, yet accurate channel model based on additive white Gaussian noise (AWGN) and random phase slips. We start by giving a rigorous description of higherorder modulation schemes frequently used in coherent communications and then introduce in Sec. 2.2 the channel model taking into account phase slips which are due to imperfect phase estimation in the coherent receiver. We will then introduce differential coding and show how the differential decoder has to be modified in order to properly take into account phase slips. We show that differential coding as such does not limit the capacity of a communication system, provided that an adequate receiver is used.
2.1 HigherOrder Modulation Formats
In this section, the interplay of coding and modulation will be discussed in detail. We only take on an IQperspective of digital modulation, representing digital modulation symbols as complex numbers. The sequence of complex numbers (where I denotes the real part and Q the imaginary part) is then used to generate the actual waveform (taking into account pulse shaping and eventually electronic predistortion), i.e., to drive the optical modulators generating the I and Q component. For a thorough overview of coding and modulation in the context of coherent communications, we refer the interested reader to [11, 12].
When talking about digital modulation, especially in the context of coded modulation, we are mostly interested in the mapping function, which is that part of the modulator that assigns (complex) modulation symbols to bit patterns. We introduce in what follows the notation necessary for describing the mapping function. Let denote the number of bits that are assigned to one complex modulation symbol , and let be a binary tuple with denoting the field of binary numbers. The onetoone modulation mapping function maps the tuple to the (complex) modulation symbol , where is chosen from the set of modulation symbols . The set is also commonly referred to as constellation. The mapping function is illustrated in Fig. 1. In this chapter, we only consider onetoone mappings. One such mapping is , where denotes the decimal expansion of the binary digit number .
In the context of differential coding of higherorder modulation formats, it is advantageous if the constellation fulfills certain properties. One such property is the rotational invariance of the constellation.
Definition 1 (Rotational Invariance of Constellation)
We say that a constellation exhibits a fold rotational invariance if we recover the original constellation after rotating each modulation symbol by an amount , in the complex plane. Formally, we say that a constellation exhibits a fold rotational invariance if (with )
Example 2.1
Consider the two constellations with 8 and 16 points shown in Fig. 2. The rectangular 8QAM (quadrature amplitude modulation) constellation of Fig. 2(a) has a twofold rotational invariance as any rotation of the constellation by leads again to the same constellation. The 16QAM constellation shown in Fig. 2(b) exhibits a fourfold rotational invariance as any rotation of the constellation by leads again to the same constellation.
Before introducing differential coding and modulation, we first describe the channel model including phase slips.
2.2 The Phase Slip Channel Model
In coherent receivers for highspeed optical communications, it is usually not feasible to employ decisiondirected blind phase recovery [4] so that usually, feedforward phase recovery algorithms have to be employed. FeedForward carrier recovery algorithms exploit the rotational invariance of the constellation to remove the modulation prior to estimating the phase. However, due to the necessary phase unwrapping algorithm in the feedforward phase estimator, a phenomenon called phase slip occurs^{2}^{2}2Sometimes, phase slips are also denotes as cycle slips, however, we employ the term phase slip in this chapter.. These are mostly due to coarse blind phase recovery algorithms with massive parallelization including preliminary hard decisions and phase unwrapping at the initial digital signal processing (DSP) receiver steps [4].
Figure 3 displays the phaseslip channel model we employ in the following. The channel input is a complex modulation symbol . The first noise contribution is complexvalued AWGN. In the field of coding and in the broad body of literature on forward error correction, the terms and are frequently used to characterize AWGN channels. Therein, denotes the energy per modulation symbol^{3}^{3}3Note that in this chapter we use lower case letters to denote random variables as well as their realizations to avoid confusion, unless it is not clear from the context.. The noise (where ) is characterized by the twosided noise power spectral density where is the variance of both noise components and , i.e., . The received symbol in our model is obtained by , where describes the phase slips. Phase slips and will be discussed in detail below.
Frequently, especially for comparing different coding schemes, is used instead of . Herein, denotes the energy per information bit whereas denotes the energy per transmit symbol. For example, if a code of rate , corresponding to an overhead of %, is used, the ratio of code bits versus information bits amounts to , i.e., code bits are transmitted for each information bit. Thereof, code bits are assigned to one modulation symbol . This means that if the modulation symbols will be transmitted each with energy , the amount of energy conveyed by each information bit amounts to
As is normalized to the information bits of the transmission system, it allows us to immediately evaluate the net coding gain (NCG). The NCG is frequently used to assess the performance of a coding scheme and is defined as the difference (in dB) of required values between coded and uncoded transmission for a given output BER. Note that the NCG takes into account the coding rate and the number of bits assigned to each modulation symbol, which are included in .
In optical communications, the optical signaltonoise ratio (OSNR) is also frequently employed. The OSNR is the signaltonoise ratio measured in a reference optical bandwidth, where frequently a bandwidth of 12.5 GHz is used corresponding to nm wavelength. The OSNR relates to the and as
where is the previously introduced reference bandwidth, corresponds to the symbol rate of the transmission, is the aforementioned rate of the code with and corresponds to the number of bits mapped to each modulation symbol.
Returning to the description of the channel model of Fig. 3, we see that the noisy signal additionally undergoes a potential phase rotation yielding . If the constellation shows a fold rotational invariance with even (which is the case for most of the practically relevant constellations), we introduce the following probabilistic phase slip model
The probability that a phase slip occurs is thus
(1) 
For a given phase slip probability, which may be obtained from measurements [2], and which depends on the nonlinear phase noise introduced by the optical transmission link and variance of the additive Gaussian noise due to amplification, we obtain the value by solving (1) for . For the practically most important cases with , and , we get
(2) 
Experimental measurements [13] suggest that the phase slip probability depends on the equivalent bit error rate before the FEC decoder. Such a dependency was also suggested in [3]. We may thus model empirically as
(3) 
where is the factor between slip rate and preFEC bit error rate for the equivalent BPSK channel. Given and , we can compute from (3) and subsequently from (2) or (1). Using , we can use a pseudorandom number generator to generate a sequence of with the probability mass function defined above.
2.3 Differential Coding and Decoding
Several approaches to mitigate phase slips have been proposed in the literature. Probably the most common is differential coding, rendering a phase slip into a single error event. In this section, we restrict ourselves for simplicity to constellations with a fold rotational invariance where , , i.e., .
We consider two different cases:

In the first case, we have . To each constellation point, we assign a state , . An example of such a constellation is the widely used QPSK constellation with , which is shown in Fig. 4 together with its state assignment.

In the second case, we have . We restrict ourselves to the practical case with , where is an integer number. In this case, we employ differential coding as described in [14]: The constellation is divided into disjoint regions such that these regions are preserved when rotating the constellation by . We assign a state label to each disjoint region. The regions are selected such that each region contains exactly constellation points and such that a rotation of the constellation by an angle , does neither change the regions nor the assignment of points to a region. For the constellation points within each region we employ a rotationally invariant bit mapping, which means that the bit mapping of points inside a region is not changed by a rotation of the constellation by an angle . The popular 16QAM constellation is an example of such a constellation with , and . The state assignment and rotationally invariant mapping are exemplarily discussed in Example 2.2 and shown in Fig. 5.
Example 2.2
We consider the transmission of the popular 16QAM constellation [15]. It can be easily verified that the 16QAM constellation shows a fold rotational invariance. As shown in Fig. 5, we label the four quadrants of the complex plane by states , , , and . Inside the first quadrant , we employ a Gray labeling (also denoted by mapping) to assign the bits and to the four points. The mapping of the bits and in the three remaining quadrants is obtained by applying a rotational invariant mapping, i.e., by rotating the Gray mapping of by multiples of . In this case, even by rotating the constellation by multiples of , the bits and can always be recovered unambiguously.
We employ differential coding with bits to encode and reliably transmit the region, i.e., the state. Within each of these regions, exactly constellation points are placed, to which a rotationally invariant bit mapping is assigned. This means that whenever the constellation is rotated by an angle that is a multiple of , the bit patterns assigned to constellation points within the region can still be uniquely identified. Note that we restrict ourselves to stateregion assignments such that the rotation of a complete region gives another valid region, i.e., , there exists a , such that
Note that this restriction does not impose any problems for practical systems as most of the practically relevant constellations can be described in this form. In what follows, we impose another, slightly more stringent condition on the states. We assume that the states are assigned in what we denote as rotational order. Formally,
Definition 2
We define a sequence of states , that are assigned to a region of the complex plane, to be in rotational order, if and only if the following condition
is fulfilled.
We can easily verify that the state assignments of the constellations given in Fig. 4 and Fig. 5 are in rotational order. Again, note that the restriction of the states to be in rotational order does not yet impose any major constraint, as we have not yet defined an encoding map. We group the states into the set .
The main step in differential coding is to impose memory on the modulation. We assume that the transmission starts at time instant . We introduce the differential memory and set . The differential encoder can be considered to be the function
which takes as input the bits and the differential memory and generates a new state that is saved in the differential memory . This new state selects the symbol to be transmitted (if ) or the region from which the symbol is selected using the bits . Note that the differential function is not unique but depends on the assignment of bit patterns to state transitions. Consider the example of the QPSK constellation shown in Fig. 4. We can give two distinct differential encoding maps. The first differential encoding function is the natural differential code. The state transition diagram of the natural differential code is visualized in Fig. 6 and is also given in Tab. 1. The second encoding function, baptized Gray differential code is given in Tab. 2. Note that all other differential coding maps for the QPSK constellation can be transformed into one of these two forms by elementary transformations of the constellation and the state assignment.
As the differential code can be understood as a Markov process, we can employ the BCJR algorithm [6] to carry out bitwise Maximum A Posteriori (MAP) decoding of the differential code. For this, we may represent the differential code using a socalled trellis diagram. The trellis diagram is an “unrolled” version of the state diagram of Fig. 6. Figure 7 shows four segments of a trellis diagram for the natural differential encoding map. Four segments of the trellis diagram of the Gray differential encoding map are given in Fig. 8. The different input bit patterns can be distinguished by different line styles (dashed, dotted, solid and “waved”).
If phase slips occur on the channel, memory is imposed on the channel as well. If this additional memory is not properly accounted for in the BCJR decoder of the differential code, the performance of the decoder will rapidly decrease, due to the decoder not being properly adapted to the channel model, as has been observed in [16]. We therefore need to extend the trellis to properly take into account the phase slips. One such extension introduces additional states that correspond to the memory of the phase slip channel [17]. We introduce states where the second index tracks the current phase slip state (see Fig. 3), while the first index is still responsible for describing the differential code. The occurrence of a phase slips () leads to a different . For the running example of a differential code for , we have no longer a trellis diagram (or a state transition diagram) with 4 states and state transitions, but instead a trellis diagram with states and state transitions. One segment of this extended trellis diagram is shown in Fig. 9 for the Gray differential encoding map. In order to distinguish the additional state transitions corresponding to phase slips, we use grey scales. The original trellis is obtained by utilizing only those state transitions that correspond to , which correspond to the black lines. The state transitions corresponding to and are given by grey lines while the state transitions corresponding to are given by light grey lines, as these have the lowest probability of occurrence.
As the trellis diagram of Fig, 9 may be challenging to implement, we seek for a way to reduce its complexity. By observing that the memory of the phase slip channel collapses with the memory of the differential encoder, we may get a more compact representation of the trellis and only need states. This is possible as a phase slip does not introduce a new state, but only to a different state transition to one of the existing states. In fact we have
The state transitions are given exemplarily for the case of the Gray differential encoder in Tab. 3. This means that we can still use a trellis diagram with states but have to insert additional state transitions taking into account all possible values of . Figure 10 shows the extended trellis diagram taking into account the possible slips, indicated by the slip value . Again, we use differential grey scales to represent the state transitions corresponding to different values of . The trellis diagram of Fig. 10 is a simplification of the extended trellis diagram with only states (instead of 16) and state transitions (instead of 256). Another approach to take into account phase slips into an extended trellis has been presented in [13].
0  

0  
0  
0  
\hdashline[2.5pt/5pt]  
1  
1  
1  
1  
\hdashline[2.5pt/5pt]  
2  
2  
2  
2  
\hdashline[2.5pt/5pt]  
3  
3  
3  
3 
2.4 Maximum a Posteriori Differential Decoding
In what follows, we use the BCJR decoder [6] to carry out bitwise maximum a posteriori differential decoding. The BCJR decoder makes a decision on the transmitted symbol (equivalent to a state) based on the maximization
At each time instant , the most probable state is computed given the complete received sequence . We will not give a complete derivation of the BCJR algorithm and refer the interested reader to the literature, e.g., [6], [18]. We merely summarize the equations in the Appendix.
We use the technique of EXtrinsic Information Transfer (EXIT) charts [19] to characterize the behavior of the differential decoder based on the BCJR algorithm. EXIT charts plot the extrinsic output mutual information as a function of the input mutual information and are a tool to characterize single components in iterative decoders. Bit interleavers statistically decouple the respective encoding/decoding components such that a single parameter is sufficient to track their input/output relations. This parameter may be the signaltonoise ratio at the output of a processing block, or, as is the case for EXIT charts, the mutual information between transmitted bits and the received and processed soft bit loglikelihood ratio (LLR) values. For some channels and some codes, the individual transfer characteristics (or EXIT curves) can be obtained analytically, while for most cases, one has to resort to Monte Carlo simulation for computing the mutual information. EXIT curves can be defined not only for channel encoders/decoders such as convolutional codes or paritycheck codes, but also for components of many serially or parallel concatenated detection and decoding schemes: For example, EXIT curves have been used for describing channel interfaces such as mappers/demappers (detectors) for spectrally efficient modulation, or equalizers of multipath channels; even the decoder of an LPDC code can be viewed as a serial concatenation, with a variable node decoder and a check node decoder that, both, can be described by EXIT curves, respectively.
The main advantage of the EXIT chart technique is that the individual component processing blocks can be studied and characterized separately using EXIT curves, and that the interaction of two (or more) such processing blocks can be graphically predicted in the EXIT chart without performing a complex simulation of the actual fullyfletched concatenated coding scheme itself. As it turns out, the EXIT curves must not intersect to allow convergence to low bit error rates, and thus, code design reduces to finding good pairs of EXIT curves that match well, or, more constructively as in the case of LDPC codes, to apply curvefitting algorithms to determine variable and check node degree profiles that match well. A decoding trajectory visualizes the iterative exchange of information between the processing blocks, and shows the progress of the decoding.
While the EXIT chart is exact on the binary erasure channel (BEC) for sufficiently long/infinite sequence lengths, the reduction to single parameter tracking of the involved distributions is just an approximation for other channels. It has been observed, however, that the predicted and actually simulated decoding trajectories match quite well, proving the usefulness of the method, with many successful code designs performed in practice up to date.
Figure 11 shows the EXIT characteristics of the differential decoder for a QPSK constellation and both differential encoding maps. We can clearly see that the characteristic of the detector employing the nonmatched trellis diagram has a nonincreasing shape, which is an indicator of a mismatched model used within the decoder: the decoder trellis does not leave the possibility open for phase slips to occur, but forces the result to a simply differentially encoded target sequence, which, however, is not the case after the phase slip channel. This nonincreasing shape is the reason for the error floor that has been observed in [16]. The decreasing EXIT characteristic means that during iterative decoding, the overall system performance actually decreases, which can lead to a severe error floor. In [20], the authors proposed to employ hybrid turbo differential decoding (HTDD): by a careful execution of the differential decoder only in those iterations where the extrinsic information is low enough, the operating point in the EXIT chart is in the range of an increasing characteristic. This approach allows the authors of [20] to mitigate the detrimental effect of phase slips on iterative differential decoding and to realize codes with relatively low error floors which can be combated using a highrate outer code.
If we employ the trellis diagram of Fig. 10 incorporating the phase slip model instead of the nonmatched trellis diagram, we can see that the EXIT characteristics are monotonically increasing, which is a prerequisite for successful decoding with low error floors. In the next section, we use the EXIT characteristics to compute the information theoretic achievable rates of the differentially encoded system. Further note that for (see Sec. 2.2), the value of , even for , which may entail an error floor unless the channel code is properly designed.
2.5 Achievable Rates of the Differentially Coded Phase Slip Channel
According to Shannon’s information theory [21, 22], the capacity of a communication channel is the maximum amount of information (usually expressed in terms of bits per channel use) that can be reliably conveyed over the channel. In information theory, the capacity is usually maximized over the input distribution of the channel. In this chapter, we are only interested in the maximum achievable information rate for uniform channel inputs , as we do not wish to impose any constraints on the data sequence. One possibility to achieve a nonuniform channel input is the use of constellation shaping [23, 24], which is however beyond the scope of this chapter. The comparison between the achievable rate of the channel affected by phase slips and the achievable rate of the original AWGN channel shows how much the performance may be sacrificed by the presence of phase slips. In order to compute the achievable rates of the differentially encoded channel affected by phase slips, we employ the EXIT chart technique.
By utilizing a slightly modified way of computing EXIT curves of the BCJR decoder, we can also compute the achievable rates of the coded modulation schemes [25]. For this, we make use of the chainrule of mutual information [26, 27] and compute the mutual information of the equivalent bit channel experienced by the channel decoder after differential detection. This can be done by (numerically, simulationbased) computing the EXIT curve of the differential detector using a priori knowledge that is modeled as coming from a BEC, and integrating over such curves. Specifically, EXIT curves like those depicted in Fig. 11 are determined for many different values (and several different phase slip probabilities factors ) but now with a priori knowledge based on a BEC model: By integration, we determine the area under these curves [26, 27, 25] and obtain the respective mutual information limits that are plotted into Figs 12 and 13 at the corresponding values and phase slip probabilities factors , respectively. Note that this mutual information is available to the channel decoder provided that perfect iterative decoding over inner differential detector and outer LDPC decoder is performed. Thus, we still need to design an appropriate LDPC code and iterative decoding scheme to actually approach these promised rates as closely as possible. Indeed, the subsequent sections explain how to construct such codes and coding schemes in more detail. The achievable rate of the noniterative system with separate differential decoding and channel decoding is obtained from .
Figures 12 and 13 show the numerically computed achievable rates for the QPSK constellation without differential coding on an AWGN channel that is not affected by phase slips (dotted lines, marker “”) and additionally the achievable rates for differentially encoded QPSK for a channel affected by phase slips (solid lines) with . In Fig. 12 we set and we observe that the achievable rate of the differential QPSK transmission equals the achievable rate of a conventional coherent QPSK transmission, independent of the differential encoding map. Additionally, we plot the achievable rates for a simplified system that carries out differential decoding (leading to the wellknown effect of error doubling) followed by error correction decoding (dashed lines). We see that at a spectral efficiency of (corresponding to system with % overhead for coding), the simplified system leads to an unavoidable loss in of dB (Gray differential encoding map) or dB (natural differential encoding map) respectively. This performance difference becomes even more severe if low spectral efficiencies (i.e., high coding overheads) are targeted.
If phase slips occur on the channel (), we can observe in Fig. 13 that for high spectral efficiencies (above 1.5 bits/channel use), the loss in information rate due to the phase slips is not severe, unless becomes large. For example, for , the capacity loss at a spectral efficiency of bit/channel use is only approximately dB. The transmission at very low spectral efficiencies, requiring codes with very large overheads, is however seriously affected by the phase slip channel.
3 LDPC Coded Differential Modulation
In the previous section, we have compared the achievable rates of various systems for an AWGN channel () and we have found that differential coding can be used without entailing a decrease of the communication system’s achievable rate. This means that at least from an information theoretic perspective, we can employ differential coding to combat phase slips without introducing any decoding penalty. Information theory however does not tell us what constructive method we may use to achieve this capacity.
One particularly promising way to approach the capacity with differential coding is the use of coded differential modulation with iterative decoding, as proposed first in [5] with convolutional codes and in [28] with LDPC codes. This scheme extends the bitinterleaved coded modulation (BICM) [29] method to account for differential encoding and employs iterative decoding and detection [30, 31] to improve the overall system performance. The adaptation of this scheme to optical communications has been considered in [32] for the channel not affected by phase slips and in [17, 13, 16, 20] for the channel affected by phase slips. Note that other schemes have been proposed that do not rely on iterative differential decoding, including the slip resilient code presented in [33, 34] and block differential modulation [35].
Figure 14 shows the general transmitter (top) and iterative receiver (bottom) of the coded differential modulation system with iterative decoding and detection. In this general block diagram, a block FEC encoder takes as input a binary length vector of inputs bits , where and generates a binary length vector of code bits . Almost all of the popular channel codes that are used in optical communications are such block codes. The amount of redundant bits that are added by the FEC encoder is commonly expressed in terms of the code rate which is defined as the ratio of the information block length and the code dimension , i.e.,
In optical communications, often the overhead is used to quantify the amount of redundant information. The overhead of the code and its rate are interrelated by
The block of code bits is interleaved by a permutation to yield a permuted version . Ideally, a random permutation is employed, but sometimes, a structure in the permutation is necessary to facilitate implementation (parallelization) or to improve the error correction capabilities of the code. Note that the permutation is sometimes implicitly included in the FEC encoder and does not need to be explicitly implemented. The interleaved block is differentially encoded (as discussed in Sec. 2.3) yielding a block of modulation symbols (where denotes the smallest integer larger or equal than ).
At the receiver, the differential decoder and the FEC decoder iteratively decode the signal, where the output of the FEC decoder is used to yield an improved differential decoding result in a subsequent iteration by sharing socalled extrinsic information between the decoder components. For a thorough description and introduction to the concept of iterative detection and decoding we refer the interested reader to [36, 18]. In the remainder of this section, we assume that the employed FEC scheme is a lowdensity paritycheck (LDPC) [37, 18] code. We will first give an introduction to LDPC codes and then show how irregular LDPC codes can be designed to be welladapted to differential coding. We do not show explicitly how decoding is performed, as we intend to take on a more code designoriented perspective. We will only give equations for performing differential decoding and LDPC decoding in the Appendix.
We restrict ourselves in the remainder of this chapter to the case where , i.e., every state is assigned to the modulation symbol . We will however give hints on how to deal with the case in Sec. 3.3.
3.1 LowDensity ParityCheck (LDPC) Codes
Lowdensity paritycheck (LDPC) codes were developed in the 1960s by Gallager in his landmark Ph.D. thesis [37]. These codes were not further investigated for a long time due to the perceived complexity of long codes. With the discovery of turbo codes in 1993 [38] and the sudden interest in iteratively decodable codes, LDPC codes were rediscovered soon afterwards [39, 40]. In the years that followed, numerous publications from various researchers paved the way for a thorough understanding of this class of codes leading to numerous applications in various communication standards, such as, e.g., WLAN (IEEE 802.11) [41], DVBS2 [42], and 10G Ethernet (IEEE 802.3) [43]. LDPC codes for softdecision decoding in optical communications were studied in [44]. Modern highperformance FEC systems are sometimes constructed using a softdecision LDPC inner code which reduces the BER to a level of to and a harddecision outer code which pushes the system BER to levels below [44]. An outer cleanup code is used as most LDPC codes exhibit a phenomenon called error floor: above a certain signaltonoise ratio (SNR), the BER does not drop rapidly anymore but follows a curve with a small slope. This effect is mainly due to the presence of trapping sets or absorbing sets [45, 46]. The implementation of a coding system with an outer cleanup code requires a thorough understanding of the LDPC code and a properly designed interleaver between the LDPC and outer code for avoiding that the errors at the output of the LDPC decoder—which typically occur in clusters—cause uncorrectable blocks after outer decoding. With increasing computing resources, it is now also feasible to evaluate very low target BERs of LDPC codes and optimize the codes to have very low error floors below the system’s target BER [47]. A plethora of LDPC code design methodologies exist, each with its own advantages and disadvantages. The goal of an LDPC code designer is to find a code that yields high coding gains and which possesses some structure facilitating the implementation of the encoder and decoder. We point the interested reader to numerous articles published on this topic, e.g., [48, 49, 50] and references therein. An introduction to LDPC codes in the context of optical communications is given in [51]. An overview of coding schemes for optical communications is also provided in [12] and the references therein. For a thorough reference to LDPC codes together with an overview of decoding algorithms and construction methods, we refer the interested reader to [18].
An LDPC code is defined by a sparse binary parity check matrix of dimension , where is the code word length (in bits) of the code and denotes the number of parity check equations defining the code. Usually^{4}^{4}4provided that the paritycheck matrix has full row rank, i.e., . If the paritycheck matrix is rankdeficient, the number of information bits , the number of information bits equals . The overhead of the code is defined as . A related measure is the rate of the code, which is defined as . Sparse means that the number of “1”s in is small compared to the number of zero entries. Practical codes usually have a fraction of “1”s that is below 1% by several orders of magnitude. We start by introducing some notation and terminology related to LDPC codes. Each column of the parity check matrix corresponds to one bit of the FEC frame. The single bits of the code are also often denoted as variables. Similarly, each row of corresponds to a parity check equation and ideally defines a single parity bit (if has full rank).
3.1.1 Regular and Irregular LDPC Codes
LDPC codes are often classified into two categories: regular and irregular LDPC codes. In this chapter, we consider the latter, which also constitutes the more general, broader class of codes. The parity check matrix of regular codes has the property that the number of “1”s in each column is constant and amounts to (called variable degree) and that the number of “1”s in each row is constant and amounts to (called check degree). Clearly, has to hold and we furthermore have . Irregular LDPC codes [52] have the property that the number of “1”s in the different columns of is not constant. In this chapter, we mainly consider columnirregular codes, which means that only the number of “1”s in the columns is not constant but the number of “1”s in each row remains constant. The irregularity of the paritycheck matrix is often characterized by the degree profile of the parity check matrix [50].
We denote the number of columns of the paritycheck matrix with ones by . We say that these columns have degree . Normalizing this value to the number of total bits per codewords yields
which is the fraction of columns with degree , i.e., with ones (e.g., if , half the columns of have three “1”s).
Similarly, we can define the check degree profile by defining that denotes the number of rows of with exactly “1”s. The normalized check profile is given by , the fraction of rows with “1”s. We have the . In most of the codes we consider, however, all rows of have the same number of “1”s. In that case, we have and . Example 3.1 illustrates the degree distribution of such an irregular LDPC code.
Example 3.1
Consider the following LDPC code of size with paritycheck matrix of size , i.e., of rate , corresponding to an overhead of %. Note that the zeros in are not shown for clarity.
The first 8 columns of have two “1”s per column, i.e., . Furthermore, the middle 16 columns each contain three “1”s, i.e., . Finally, the last 8 columns contain five “1”s, i.e., . Normalizing leads to
Note that .
The number of “1”s in each row of is constant and amounts to .
3.1.2 Graph Representation of LDPC Codes
LDPC codes are often represented by a socalled Tanner graph [50]. This graph is an undirected bipartite graph in which the nodes can be partitioned into two disjoint sets and each edge connects a node from the first set to a node from the second set. The Tanner graph allows for an easy description of the decoding algorithm of LDPC codes, which we will not detail here. We will give a summary of the iterative decoding algorithm in the Appendix.
Figure 15 shows the graph representation of the toy code given in Example 3.1. The circular nodes on the bottom of the graph represent the variable nodes, which correspond to the bits in the codeword. As each codeword contains bits, there are variable nodes , . The variable node has one connection to the transmission channel (arrow from the bottom) and additional connections towards the top where equals the number of “1”s in the th column of . For instance, the first variables (where ) of the code have 2 connections towards the graph part of the code and an additional connection from the transmission channel. As in Example 3.1, the variable nodes can be divided into three groups, corresponding to the degree of these variables.
The rectangular nodes on the top of the graph are the so called check nodes. Each check node corresponds to one of the rows of the paritycheck matrix of the code and defines a code constraint. The number of connections of the check nodes with the graph corresponds to the number of “1”s in the respective row of . In the above example, every row has “1”s, so that each of the check nodes has exactly connected edges. If has a nonzero entry at row and column , i.e, , then an edge connects variable node to check node .
As drawing the graph of the code in this way quickly becomes cumbersome and confusing due to the large number of edges, we resort to a simplified (and rotated) representation shown in Fig. 16. In this figure, we do not draw all the edges, but only the beginning and end of each edge and assume that the permutation of the edges is managed by an interleaver . The interleaver thus ensures that the connections between the different nodes corresponds to the one given by the paritycheck matrix .
3.1.3 Design of Irregular LDPC Codes
The design of irregular LDPC codes consists of finding good degree distributions, i.e, good values and (or ) such that the rate of the code has the desired value (given by the system designer) and such that the NCG achievable by this code is maximized, i.e., the code is able to successfully recover the bit stream at the lowest possible value. A comprehensive body of literature on the design of irregular codes exists (see [18] and references therein) and we only introduce the basics to describe the optimization of codes tailored to sliptolerant differential decoding in Sec. 3.2.
The optimization of irregular LDPC codes requires the use of edgeperspective degree distributions [50].
Definition 3 (Edgeperspective degree distribution)
In the Tanner graph representation of the code, we denote by the fraction of edges that are connected to variable nodes of degree . We have
(4) 
Similarly, denotes the fraction of edges that are connected to check nodes of degree . Again, we have
Using the technique of EXIT charts [19, 27, 53], good values of and potentially may be found that can then be used to design a paritycheck matrix fulfilling these degree distributions. We constrain the maximum possible variable node degree to be and the maximum possible check node degree to be .
The inverse relationship between and , or between and , respectively, reads
(5) 
The (iterative) LDPC decoding process may be understood as a process where two decoders pass information between each other. The first decoder is the variable node decoder (VND) which processes each of the variable nodes of the code. The second decoder is the check node decoder (CND), which processes each of the check nodes. Each of these decoders has a certain information transfer (EXIT) characteristic. Before describing the transfer characteristics, we introduce the function that interrelates mean (and variance, which amounts in the case of symmetric messages, for details, see [19] and [50]) and mutual information for the Gaussian random variable describing the messages that are exchanged in the iterative decoder, with
which can be conveniently approximated [54] by
with , , and .
In the case of LDPC codes and transmission over an AWGN channel, the information transfer characteristics are obtained as [55]
(6)  
(7) 
where
Equation (6) describes the characteristic of the VND while (7) describes the characteristic of the CND. For codes with regular check node degree, (7) can be simplified to
As holds in the context of iterative decoding, a condition for successful decoding is that
(8) 
where the inverse function of the strictly monotonically increasing function given in (7) can be found using numerical methods. The task of the code designer is to find a degree distribution minimizing such that (8) is fulfilled. Usually, the condition (8) is evaluated at discrete values of only, simplifying the implementation.
Some more conditions usually apply to the degree distributions. One of these is the socalled stability condition [50], which, in the case of an AWGN channel ensures that
3.2 Code Design for Iterative Differential Decoding
As described in Sec. 2.5, the differential decoder based on the BCJR algorithm can be characterized by an EXIT characteristic . Before optimizing the LDPC code towards the interworking with the differential decoding, we first have to define the decoder scheduling as we are concerned with a threefold iterative decoder loop: decoding iterations are carried out within the LDPC decoder and between LDPC decoder and differential decoder. In this chapter, we restrict ourselves to the following scheduling:

In a first initial step, the differential decoder is executed and generates initial channelrelated information.

Using this initial channelrelated information, a single LDPC iteration is carried out, i.e., a single execution of the check node and variable node computing processors.

Using the accumulated variable node information from the LDPC graph, excluding the intrinsic channelrelated information from the initial differential decoding execution (step a)), the differential decoder is executed again, yielding improved channelrelated information.

With the improved information from step c), another single LDPC iteration is carried out. If the maximum number of allowed iterations is not yet reached, we continue with step c).

If the maximum number of iterations is reached, the accumulated variable node information is used to get an a posteriori estimate of each bit.
In what follows, we now describe in detail how to find good degree distributions for iterative differential decoding. In [56] and [57], conditions for degree distributions were derived and it was analyzed if it is possible to construct codes that work equally well for differential coding and conventional nondifferential transmission. In this work, we solely consider the case of differential coding and we aim at showing different possibilities of degree distribution optimization with the goal to show the best possibility for LDPC coded differential modulation with the above mentioned decoder scheduling.
We only consider column irregular codes in the remainder of this chapter, i.e., the number of “1”s in each row of the paritycheck matrix is constant and amounts to . Such a constraint is often imposed as it simplifies the hardware that is needed to implement the check node decoding operation, which is the most difficult operation in the LDPC decoder. The complexity of this operation scales roughly linearly with the check node degree (i.e., the number of “1”s per row) and having a constant degree allows the hardware designer to implement a fixed and optimized check node computation engine. The second constraint that we impose is that we only have three different variable node degrees, namely variable nodes of degree 2, variable nodes of degree 3, and variable nodes of degree . This is in line with the findings given in [58] that show that the degree distributions are often sparse and that only a few different values are often sufficient. Having only three different variable node degrees simplifies the hardware implementation, especially the design of the required bit widths in a fixed point implementation.
Contrary to many degree distribution approaches proposed in the literature [50, 56, 59] we first fix the rate of the final code as the rate is usually constrained by the system design parameters (e.g., speed of analogtodigital and digitaltoanalog converters, pulse shape, channel bandwidth, framing overhead, etc.). With fixed rate , we remove the dependencies [58] of the degree distribution. We further assume that no nodes of degree 1 are present in the code, i.e., and thus . As , we can uniquely determine as
(9) 
As the rate of the code is given by [50]
(10) 
we can eliminate another dependency and by combining (10) with (9), we get
(11) 
For checkregular codes with regular check node degree (i.e., ), (11) can be simplified to
(12) 
This means that and are uniquely determined by . If we only allow , and to be nonzero, then and are uniquely determined by and we have