A Unified Framework for LinearProgramming Based Communication Receivers
^{†}^{†}thanks: This work was supported in part by Science Foundation Ireland (Grant No. 07/SK/I1252b), in part by the Institute of Advanced Studies, University of Bologna (ISAESRF Fellowship), and in part by the Swiss National Science Foundation (Grant No. 113251). The material in this paper was presented in part at the 46th International Allerton Conference on Communication, Control and Computing, Monticello, Illinois, September 2008.
M. F. Flanagan was with DEIS, University of Bologna, Via Venezia 52, 47023 Cesena, Italy, and with Institut für Mathematik, University of Zürich, Winterthurerstrasse 190, CH8057 Zürich, Switzerland. He is now with the School of Electrical, Electronic and Communications Engineering, University College Dublin, Belfield, Dublin 4, Ireland (email:mark.flanagan@ieee.org).
Abstract
It is shown that a large class of communication systems which admit a sumproduct algorithm (SPA) based receiver also admit a corresponding linearprogramming (LP) based receiver. The two receivers have a relationship defined by the local structure of the underlying graphical model, and are inhibited by the same phenomenon, which we call pseudoconfigurations. This concept is a generalization of the concept of pseudocodewords for linear codes. It is proved that the LP receiver has the ‘maximum likelihood certificate’ property, and that the receiver output is the lowest cost pseudoconfiguration. Equivalence of graphcover pseudoconfigurations and linearprogramming pseudoconfigurations is also proved. A concept of system pseudodistance is defined which generalizes the existing concept of pseudodistance for binary and nonbinary linear codes. It is demonstrated how the LP design technique may be applied to the problem of joint equalization and decoding of coded transmissions over a frequency selective channel, and a simulationbased analysis of the error events of the resulting LP receiver is also provided. For this particular application, the proposed LP receiver is shown to be competitive with other receivers, and to be capable of outperforming turbo equalization in bit and frame error rate performance.
Linearprogramming, factor graphs, sumproduct algorithm, decoding, equalization.
I Introduction
The decoding algorithms for some of the best known classes of errorcorrecting code to date, namely concatenated (“turbo”) codes [1] and lowdensity parity check (LDPC) codes [2], have been shown to be instances of a much more general algorithm called the sumproduct algorithm (SPA) [3, 4, 5]. This algorithm solves the general problem of marginalizing a product of functions which take values in a semiring . In the communications context, is equal to , the set of nonnegative real numbers, and the maximization of each marginal function minimizes the error rate for each symbol, under the assumption that the system factor graph is a tree [5]. It has been recognized that many diverse situations may allow the use of SPA based reception [6], including joint iterative equalization and decoding (or turbo equalization) [7], joint iterative equalization and multiuser detection (MUD) [8], and joint sourcechannel decoding [9].
Recently, a linearprogramming (LP) based approach to decoding linear (and especially LDPC) codes was developed for binary [10, 11] and nonbinary coding frameworks [12, 13]. The concept of pseudocodeword proved important in the performance analysis of both LP and SPA based decoders [14, 15, 16]. Also, linearprogramming decoders for irregular repeataccumulate (IRA) codes and turbo codes were described in [17]. While the complexity of LP decoding is conjectured to be higher than for SPA decoding, the LP decoder has many analytical advantages, such as the property that a codeword output by the LP is always the maximum likelihood (ML) codeword, and the equivalence of different pseudocodeword concepts in the LP and SPA domains [11, 13]. For the case of LDPC codes, tight connections were observed between the LP decoding and minsum decoding frameworks [18].
Recently, some authors have considered use of similar linearprogramming techniques in applications beyond coding. An LPbased method for lowcomplexity joint equalization and decoding of LDPC coded transmissions over the magnetic recording channel was proposed in [19]. In this work, the problem of ML joint detection, which may be expressed as an integer quadratic program, is converted into a linear programming relaxation of a binaryconstrained problem. In the case where there is no coding, it was shown in [19] that for a class of channels designated as proper channels, the LP solution matches the ML solution at all values of signal to noise ratio (SNR); however, for some channels which are not proper, the system evinces a frame error rate floor effect. The work of [20] considered an LP decoder which incorporates nonuniform priors into the original decoding polytope of [11], and also application of an LP decoder to transmissions over a channel with memory, namely the nonergodic Polya channel. In both [19] and [20], performance analysis proved difficult for the case where the channel has memory.
In this paper it is shown that the problem of maximizing a product of valued functions is amenable to an approximate (suboptimal) solution using an LP relaxation, under two conditions: first, that the semiring corresponds to under real addition and multiplication, and second, that all factor nodes of degree greater than one are indicator functions for a local behavior. Fortunately, these conditions are satisfied by a large number of practical communication receiver design problems. Interestingly, the LP exhibits a “separation effect” in the sense that degree factor nodes in the factor graph contribute the cost function, and the remaining nodes determine the LP constraint set. This distinction is somewhat analogous to the case of SPAbased reception where degree factor nodes contribute initial messages exactly once, and all other nodes update their messages periodically. Our LP receiver generalizes the LP decoders of [11, 13, 17]. A general design methodology emerges, parallel to that of SPA receiver design:

Write down the global function for the transmitterchannel combination. This is a function proportional to the probability mass function of the transmitter configuration conditioned on the received observations.

Draw the factor graph corresponding to the global function.

Read the LP variables and constraints directly from the factor graph.
The proposed framework applies to any system with a finite number of transmitter configurations, and also allows for treatment of “hidden” (latent) state variables (as was done for the SPA receiver case in [3]). It allows for a systematic treatment of variables with known values (e.g. known initial/final channel states or pilot symbols). Incorporation of priors for any subset of transmitter variables is straightforward, and thus the polytopes of [20, Section II] follow as simple special cases of our framework. Our framework allows derivation of LP decoders also for tailbiting trellis (TBT) codes; in this case the relevant pseudoconfigurations correspond to the TBT pseudocodewords as defined in [14]. It is proved that the LP receiver error events, which we characterize as a set of linearprogramming pseudoconfigurations, are equivalent to the set of graphcover pseudoconfigurations, which are linked to error events in the corresponding SPA receiver. Furthermore, we define a general concept of pseudodistance for the transmission system, which generalizes the existing concept of pseudodistance for binary and nonbinary linear codes to the case of the general LP receiver.
In order to illustrate the LP receiver design methodology outlined above, we provide a stepbystep derivation of an LP receiver which performs joint equalization and decoding of coded transmissions over a frequency selective channel. We then provide a simulation study of a simple case of this receiver, including error rate results and pseudodistance spectra, together with a complete description of error events at low values of pseudodistance. Performance results are also presented for a lowdensity code transmitted over an intersymbol interference channel. We note that a similar line of work is considered in [27, 28]; the LP presented therein is equivalent to the one we derive in Section VII, except that it does not deal with the case of known states in the trellis. The proposed LP receiver is capable of handling channels which are problematic for competitive LPbased techniques, and is shown to have error rate performance outperforming that of turbo equalization over some channels.
This paper is organized as follows. Section II introduces the general problem to be solved, along with appropriate notations, and Section III develops a general linear program which solves this problem. Section IV introduces an efficient linear program which provides a suboptimal solution, and Section V develops an equivalent program with a lower description complexity. Section VI introduces general concepts of system pseudoconfigurations and pseudodistance. Section VII provides a detailed development of an LP receiver which performs joint equalization and decoding, and Section VIII presents a detailed simulationbased analysis of this receiver for the case of binarycoded transmissions over an intersymbol interference channel.
Ii Problem Statement and Notations
We begin by introducing some definitions and notation. Suppose that we have variables , , where is a finite set, and the variable lies in the finite set for each . Let ^{1}^{1}1All vectors in the paper are row vectors; also, the notation denotes a vector whose entries are equal to with respect to some fixed ordering on the elements of .; then is called a configuration, and the Cartesian product is called the configuration space. Suppose now that we wish to find the configuration which maximizes the product of realvalued functions
(1) 
where is a finite set, and for each . We define the optimum configuration to be the configuration which maximizes (1). The function is called the global function [5]. In the communication receiver design context, the global function is taken to be any monotonically increasing function of the probability mass function of some set of transmitterchannel variables (information bits, coded symbols, state variables etc.) conditioned on the received observations. Maximization of the global function therefore corresponds to maximum a posteriori (MAP) reception^{2}^{2}2Note that in most cases of practical importance, a single maximizes (1) with probability ; henceforth, we will assume a unique .. As we shall see, the key to solving this optimization problem via a lowcomplexity LP is that the factors in the factorization (1) each have a small number of arguments, i.e., is small for each .
The factor graph for the global function and its factorization (1) is a (bipartite) graph defined as follows. There is a variable node for each variable () and a factor node for each factor (). An edge connects variable node to factor node if and only if is an argument of . Note that for any , is the set of for which is an argument of . Also, for any , the set of for which is an argument of is denoted .
Let denote the set of all such that factor node is an indicator function for some local behavior , i.e.,
(2) 
where the indicator function for the logical predicate is defined by
In the communication receiver design application, the set comprises constraints such as paritycheck constraints and statespace constraints, and also may account for variables with known values (pilot symbols, known states).
Note that we write any as , i.e., is indexed by . Also we define the global behavior as follows: for any , we have if and only if for every . The configuration is said to be valid if and only if .
Next define to be the set of indices of variable nodes which have neighbours not belonging to , i.e.,
We assume that for every , the factor node has degree equal to one. This allows us to define, for each ,
In the communication receiver design context, the set corresponds to the set of observables, i.e., the set of variables for which noisy observations are available, and each represents the probability (density) of the symbol conditioned on the corresponding received observation(s).
So, without loss of generality we may write
(3) 
We assume that the function is positivevalued for each . Also, denoting the Cartesian product , we define the projection
This function simply maps any configuration into the configuration subset consisting only of the observables. Also, we adopt the notation for elements of (i.e., vectors of observables).
We assume that the mapping is injective on , i.e., if and , then . This corresponds to a ‘wellposed’ problem. Note that in the communication receiver design context, observations (or “channel information”) may only be contributed through the nodes for . Therefore, failure of the injectivity property in the communications context would mean that one particular set of channel inputs could correspond to two different transmit information sets, which would reflect badly on system design.
Iii Maximization of the Global Function by Linear Programming
Using (2) and (3), we may write
For each , , let , i.e., is a real vector of length which acts as an ‘indicator vector’ for the value . Building on this, for we define the indicator vector , which is the concatenation of the individual indicator vectors for each of the elements of . It is easy to see that is an injective function on .
Next, we define the vector according to
where for each , . This allows us to develop the formulation of the optimum configuration as
(4)  
where in the second line we have used the fact that the inner product “sifts” the value out of the vector , and the third line we have expressed the sum of inner products as a single inner product of the corresponding pair of concatenated vectors. Note that the optimization has reduced to the maximization of an inner product of vectors, where the first vector derives only from observations (or “channel information”) and the second vector derives only from the global behavior (the set of valid configurations). For any vector of the same dimension as , we adopt the notation
Then the maximization problem (4) may then be recast as a linear program LP1 as shown below.
LP1: Optimum Configuration Cost Function: Constraints (Polytope ): The cost function is maximized over the convex hull of all points corresponding to valid configurations: (5) Receiver Output: (6)
The “polytope of valid configurations” generalizes the “codeword polytope” defined in [11] and [13] in the context of binary and nonbinary linear codes, respectively. The linear program LP1 has constraint complexity exponential in the number of LP variables, rendering it unsuitable for practical application.
Iv LP Relaxation
In order to reduce the description complexity of LP1, we introduce auxiliary variables whose constraints, along with those of the elements of the vector defined previously, will form the relaxed LP problem. We introduce auxiliary variables for each , , and we form the vector
Also, we define the vector as an extension of via where for each (recall that ). For we define the indicator vector ; the function is injective on .
The new LP optimizes the cost function over the polytope defined with respect to variables and , as shown in the following.
LP2: Efficient Relaxation Cost Function: Constraints (Polytope ): (7) (8) (9) Receiver Output: (10)
LP2 is a direct generalization of the LP of [11] to the case of arbitrary behavioral contraints. It comprises variables and constraints^{3}^{3}3Throughout the paper, when considering LP complexities we will omit constraint complexities due to upper and lower bounds on the LP variables., where denotes the number of neighbours of which belong to . Note that (7) and (8) imply that we may view as a random vector, and for each the vector may be interpreted as a probability distribution on the local configuration ; (9) then expresses each vector (for each ) as the induced probability distribution on . It may be easily checked that is then the expectation of with respect to this distribution^{4}^{4}4If represents a probability distribution on , we denote the expectation of with respect to the distribution as .; this interpretation of the cost function will be useful in our treatment of system pseudodistance in Section VI. A similar probabilistic interpretation was also considered in the context of pseudocodewords of graphcover decoding in [14]^{5}^{5}5Another interpretation of the polytope is that the projection of onto is formed by the intersection of convex hulls corresponding to the local behaviors, i.e., for all ..
If the LP solution is an integral point in (i.e., all of its coordinates are integers), the receiver output is the configuration (we shall prove in the next section that this output is indeed in ). Of course, in the communications context, we are usually only interested in a subset of the configuration symbols, namely the information bits. If the LP solution is not integral, the receiver reports .
V Efficient LinearProgramming Relaxation and its Properties
We next define another linear program, and prove that its performance is equivalent to that defined in Section IV. This new program achieves lower description complexity than LP2 by removing unnecessary constraints from the formulation. We remove constraints in two ways: from the variable set, and by defining constraints with respect to an ‘anchor node’.
For each , let be an arbitrary element of , and let (note that for each , , otherwise is not a ‘variable’). For each , , define . Note that this indicator vector is the same as except that the entry corresponding to has been removed. Correspondingly, for each we define the indicator vector . Again, the mapping is injective.
Now, we define the vector similarly to but with entries corresponding to each removed, i.e.,
and we define the vector by
and for each , .
Also, for each , let be an arbitrary element of , i.e., is an arbitrary neighbouring factor node of the nonobservable variable and is referred to as the “anchor node” for that variable node in the factor graph. The LP is then as follows.
LP3: Lowcomplexity Relaxation Cost Function: Constraints (Polytope ): Constraints (7) and (8), together with (11) and (12) Receiver Output: (13)
The receiver output is equal to the configuration in the case where the solution to LP3 is an integral point in , and reports if the solution is not integral. LP3 comprises variables and constraints, where denotes the degree of and denotes the number of neighbours of which belong to .
The following theorem ensures the equivalence of the linear programs LP2 and LP3, and also assures the optimum certificate property, i.e., if the output of either LP is a configuration, then it is the optimum configuration. In the communications context, the optimum corresponds to the maximum likelihood transmit configuration; thus in this case we have the maximum likelihood certificate property.
Theorem V.1
The two linear programs LP2 and LP3 produce the same output (configuration or FAILURE). Also, if either LP output is an integral point in the LP polytope, then it corresponds to the optimum configuration, i.e., .
It is straightforward to show that the mapping
defined by
and with inverse
is a bijection from one polytope to the other (i.e., satisfies the constraints of LP3 for some vector if and only if with satisfies the constraints of LP2 for the same vector ). Also
(14)  
implying that the bijection preserves the cost function up to an additive constant.
Next, we prove that for every configuration , there exists such that . Let , and define
Letting and , it is easy to check that and (and that in fact ). This property ensures that every valid configuration has a “representative” in the polytope, and thus is a candidate for being output by the receiver.
Next, let and let be such that . Suppose that all of the coordinates of are integers. Then, by (7) and (8), for any we must have
for some .
Next we note that for any , , if then (using (9))
(15) 
and thus . Therefore, there exists such that
Therefore, is a valid configuration (). Also we may conclude from (15) that
and therefore . Also, from the definition of the mapping , we have .
Summarizing these results, we conclude that optimizes the cost function over and is integral if and only if optimizes the cost function over and is integral, where and . A graphical illustration of these relationships is shown in Figure 1. Thus both LP receivers output either the optimum configuration or , and have the same performance. LP3 has lower descriptive complexity and is suitable for implementation (we shall use it to solve the joint equalization and decoding problem in Section VII); however, for theoretical work LP2 is more suitable (we shall use this LP throughout Section VI).
Vi Pseudoconfigurations
In this section, we show a connection between the failure of the LP and SPA receivers based on pseudoconfiguration concepts, and define a general concept of pseudodistance for LP receivers.
Via Connecting the failure mechanisms of the LP and SPA receivers
We first define what is meant by a finite cover of a factor graph.
Definition VI.1
Let be a positive integer, and let . Let be the factor graph corresponding to the global function and its factorization given in (1). A cover configuration of degree is a vector where for each . Define as the following function of the cover configuration of degree :
(16) 
where, for each , , is a permutation on the set , and for each , ,
A cover of the factor graph , of degree , is a factor graph for the global function and its factorization (16). In order to distinguish between different factor node labels, we write (16) as
where for each , .
It may be seen that a cover graph of degree is a graph whose vertex set consists of copies of (labeled ) and copies of (labeled ), such that for each , , the copies of and the copies of are connected in a onetoone fashion determined by the permutation .
Definition VI.2
The cover behavior is defined as the set of all cover configurations such that for each , . For any , a graphcover pseudoconfiguration is defined to be a valid cover configuration (i.e., one which lies in the behavior ).
Definition VI.3
For any graphcover pseudoconfiguration, the (unscaled) graphcover pseudoconfiguration vector is defined by
and
for each , . The normalized graphcover pseudoconfiguration vector is then defined by .
The set of graphcover pseudocodewords has previously been shown to be responsible, to an approximate degree, for the failure of SPA decoding of binary linear codes (see e.g. [16]). Such arguments generalize in a straightforward manner to the present context; the following provides a brief overview. The SPA receiver passes messages on the edges on the factor graph of the global function; SPA processing begins by passing the message from each degree factor node neighbouring (), and thereafter follows a preset (usually periodic) messagepassing schedule. The messagepassing algorithm is “local” in that the message passed from any node to any other node is a function only of the messages incoming at from all neighbours of other than . Assume that the SPA receiver running on the original graph yields the optimum configuration ; recall that this is the maximum, over all valid configurations , of the function . Of course, the SPA receiver does not actually seek to maximize this function, but instead seeks to marginalize this function with respect to each relevant local variable , and subsequently choose the value of which maximizes each marginal (see [5] for further details).
Next consider the SPA decoding algorithm operating on a cover graph of of degree , with the same schedule except that we replace messagepassing between any pair of original nodes in at any iteration by parallel messagepassing between the set of copies of these nodes in the cover of at iteration . Then, since SPA processing on the cover graph begins by passing the (replicated) message from the (replicated) degree factor node neighbouring , (for each ), and the schedule matches at each iteration as described above, a straightforward inductive argument shows that the set of messages passed from nodes to nodes (for fixed and considering all ) in the cover graph at iteration consist of identical copies of the message passed from node to node at iteration . Then the SPA decoder running on the cover graph of degree of must yield the cover configuration with for all (sometimes called a lifting of the configuration [16]). However, if we assume that the SPA receiver returns the maximum, over all valid configurations , of the global function, i.e.,
(17) 
this yields a contradiction whenever there exists a graphcover pseudoconfiguration with lower cost .
Note that the above reasoning holds under the assumption that the SPA algorithm has the property that it always returns the optimum configuration for the graph on which it operates. This is only an approximation, and it is for this reason that the role of graphcover pseudoconfigurations in SPA decoding is only an approximate model, whereas for LP receiver the model is exact. However the approximation can be quite accurate; SPA decoding failure is exactly characterized by the computation tree pseudocodewords of the system’s factor graph, and the graphcover pseudocodewords may be taken as an approximation of the computation tree pseudocodewords, since the local neighbourhood of any variable node is identical to some depth in both graphs. For more discussion on these connections in the context of linear codes, see e.g. [16].
(19) 
Definition VI.4
A linearprogramming pseudoconfiguration (LP pseudoconfiguration) is a rational point in the polytope of the linear program LP2.
Next, we state the equivalence between the set of LP pseudoconfigurations and the set of graphcover pseudoconfigurations. The result is summarized in the following theorem.
Theorem VI.1
There exists an LP pseudoconfiguration if and only if there exists a graphcover pseudoconfiguration with normalized pseudoconfiguration vector .
The proof of Theorem VI.1 follows the lines of the proof of [13, Theorem 7.1]; the details are omitted. Theorem VI.1 shows that the pseudoconfigurations which exactly characterize the performance of the LP receiver are precisely equivalent to the pseudoconfigurations which (due to the argument above) approximately characterize performance of the SPA receiver.
ViB Pseudodistance
In this section we define the concept of system pseudodistance for communication systems where the set of variables is observed through complex additive white Gaussian noise (AWGN). For each , we have an observation which is formed by passing the symbol through a modulation mapper and adding complex Gaussian noise with variance per real dimension (here ). The mapper operates according to the following rule: for , is mapped to . Note that this includes cases where different symbols may use different mappers, e.g. orthogonal frequency division multiplexing (OFDM) systems with adaptive modulation. Then
(18) 
In what follows, we denote the transmitted and received vectors by and respectively.
Suppose that the actual transmitter configuration is , and let . The LP receiver LP2 favours the pseudoconfiguration over if and only if , i.e., if and only if
Using (18), this condition is easily seen to be equivalent to
Using , this may be rewritten as
where we introduce , , and
In the absence of noise, the modulated signal point in the signal space with dimensions and coordinates and is given by and for all . The squared Euclidean distance from this point to the plane is then given by , where
and
Thus the decision boundary is the same as that induced under ML reception by a signal vector at a Euclidean distance from the transmit signal vector in the signal space; this motivates the following definition.
Definition VI.5
The effective Euclidean distance or system pseudodistance between the configuration and the pseudoconfiguration