ReducedDimension Linear Transform Coding of Correlated Signals in Networks
Abstract
A model, called the linear transform network (LTN), is proposed to analyze the compression and estimation of correlated signals transmitted over directed acyclic graphs (DAGs). An LTN is a DAG network with multiple source and receiver nodes. Source nodes transmit subspace projections of random correlated signals by applying reduceddimension linear transforms. The subspace projections are linearly processed by multiple relays and routed to intended receivers. Each receiver applies a linear estimator to approximate a subset of the sources with minimum mean squared error (MSE) distortion. The model is extended to include noisy networks with power constraints on transmitters. A key task is to compute all local compression matrices and linear estimators in the network to minimize endtoend distortion. The nonconvex problem is solved iteratively within an optimization framework using constrained quadratic programs (QPs). The proposed algorithm recovers as special cases the regular and distributed KarhunenLoève transforms (KLTs). Cutset lower bounds on the distortion region of multisource, multireceiver networks are given for linear coding based on convex relaxations. Cutset lower bounds are also given for any coding strategy based on information theory. The distortion region and compressionestimation tradeoffs are illustrated for different communication demands (e.g. multiple unicast), and graph structures.
I Introduction
The compression and estimation of an observed signal via subspace projections is both a classical and current topic in signal processing and communication. While random subspace projections have received considerable attention in the compressed sensing literature [1], subspace projections optimized for minimal distortion are important for many applications. The KarhunenLoève transform (KLT) and its empirical form Principal Components Analysis (PCA), are widely studied in computer vision, biology, signal processing, and information theory. Reduced dimensionality representations are useful for source coding, noise filtering, compression, clustering, and data mining. Specific examples include eigenfaces for face recognition, orthogonal decomposition in transform coding, and sparse PCA for gene analysis [2, 3, 4].
In contemporary applications such as wireless sensor networks (WSNs) and distributed databases, data is available and collected in different locations. In a WSN, sensors are usually constrained by limited power and bandwidth resources. This has motivated existing approaches to take into account correlations across highdimensional sensor data to reduce transmission requirements (see e.g. [5, 6, 7, 8, 9, 10, 11]). Rather than transmitting raw sensor data to a fusion center to approximate a global signal, sensor nodes carry out local data dimensionality reduction to increase bandwidth and energy efficiency.
In the present paper, we propose a linear transform network (LTN) model to analyze dimensionality reduction for compressionestimation of correlated signals in multihop networks. In a centralized setting, given a random source signal with zeromean and covariance matrix , applying the KLT to yields uncorrelated components in the eigenvector basis of . The optimal linear least squares order approximation of the source is given by the components corresponding to the largest eigenvalues of . In a network setting, multiple correlated signals are observed by different source nodes. The source nodes transmit lowdimensional subspace projections (approximations of the source) to intended receivers via a relay network. The compressionestimation problem is to optimize the subspace projections computed by all nodes in order to minimize the endtoend distortion at receiver nodes.
In our model, receivers estimate random vectors based on “oneshot” linear analogamplitude multisensor observations. The restriction to “oneshot”, zerodelay encoding of each vector of source observations separately is interesting due to severe complexity limitations in many applications (e.g. sensor networks). Linear coding depends on firstorder and secondorder statistics and is robust to uncertainty in the precise probabilistic distribution of the sources. Under the assumption of ideal channels between nodes, our task is to optimize signal subspaces given limited bandwidth in terms of the number of realvalued messages communicated. Our results extend previous work on distributed estimation in this case [5, 6, 7, 8]. For the case of dimensionalityreduction with noisy channel communication (see e.g. [6]), the task is to optimize signal subspaces subject to channel noise and power constraints.
For noisy networks, the general communication problem is often referred to as the joint sourcechannelnetwork coding problem in the informationtheoretic literature and is a famously open problem. Beyond the zerodelay, linear dimensionalityreduction considered here, endtoend performance in networks could be improved by (i), nonlinear strategies and (ii), allowing a longer coding horizon. Partial progress includes nonlinear lowdelay mappings for only simple network scenarios [12, 13, 14]. For the case of an infinite coding horizon, separation theorems for decomposing the joint communication problem have been analyzed by [15, 16, 17].
Ia Related Work
Directly related to our work in networks is the distributed KLT problem. Distributed linear transforms were introduced by Gastpar et al. for the compression of jointly Gaussian sources using iterative methods [5][18]. Simultaneous work by Zhang et al. for multisensor data fusion also resulted in iterative procedures [8]. An alternate proof based on innovations for second order random variables with arbitrary distributions was given by [19]. The problem was extended for nonGaussian sources, including channel fading and noise effects to model the nonideal link from sensors to decoder by Schizas et al. [6]. Roy and Vetterli provide an asymptotic distortion analysis of the distributed KLT, in the case when the dimension of the source and observation vectors approaches infinity [20]. Finally, Xiao et al. analyze linear transforms for distributed coherent estimation [7].
Much of the estimationtheoretic literature deals with singlehop networks; each sensor relays information directly to a fusion center. In multihop networks, linear operations are performed by successive relays to aggregate, compress, and redistribute correlated signals. The LTN model relates to recent work on routing and network coding (Ahlswede et al. [21]). In pure routing solutions, intermediate nodes either forward or drop packets. The corresponding analogy in the LTN model is to constrain transforms to be essentially identity transforms. However, network coding (over finite fields) has shown that mixing of data at intermediate nodes achieves higher rates in the multicast setting (see [22] regarding the sufficiency of linear codes and [23] for multicast code construction). Similarly in the LTN model, linear combining of subspace projections (over the real field) at intermediate nodes improves decoding performance. Lastly, the maxflow mincut theorem of FordFulkerson [24] provides the basis for cutset lower bounds in networks.
The LTN model is partially related to the formulation of Koetter and Kschischang [25] modeling information transmission as the injection of a basis for a vector space into the network, and subspace codes [26]. If arbitrary data exchange is permitted between network nodes, the compressionestimation problem is related to estimation in graphical models (e.g. decomposable PCA [27], and treebased transforms (treeKLT) [28]). Other related work involving signal projections in networks includes joint sourcechannel communication in sensor networks [29], random projections in a gossip framework[30], and distributed compressed sensing [31].
IB Summary of Main Results
We cast the network compressionestimation problem as a statistical signal processing and constrained optimization problem. For most networks, the optimization is nonconvex. Therefore, our main results are divided into two categories: (i) Iterative solutions for linear transform coding over acyclic networks; (ii) Cutset bounds based on convex relaxations and cutset bounds based on information theory.

Section V analyzes an iterative optimization method involving constrained quadratic programs for noisy networks with power allocation over subspaces.

Section VI introduces cutset lower bounds to benchmark the minimum mean square error (MSE) for linear coding based on convex relaxations such as a semidefinite program (SDP) relaxation.

Section VIF describes cutset lower bounds for any coding strategy in networks based on informationtheoretic principles of sourcechannel separation. The lower bounds are plotted for a distributed noisy network.
IC Notation
Boldface upper case letters denote matrices, boldface lower case letters denote column vectors, and calligraphic upper case letters denote sets. The norm of a vector is defined as . The weighted norm where is a positive semidefinite matrix (written ). Let , , and denote matrix transpose, inverse, and trace respectively. Let denote the Kronecker matrix product of two matrices. The matrix denotes the identity. For , the notation denotes the product of matrices. A matrix is written in vector form by stacking its columns; i.e. where is the th column of . For random vectors, denotes the expectation, and denotes the covariance matrix of the zeromean random vector .
Ii Problem Statement
Fig. 1 serves as an extended example of an LTN graph. The network is comprised of two sources, two relays, and two receiver nodes.
Definition 1 (Relay Network)
Consider a relay network modeled by a directed acyclic graph (DAG) and a set of weights . The set is the vertex/node set, is the edge set, and is the set of weights. Each edge represents a communication link with integer bandwidth from node to . The indegree and outdegree of a node are computed as
(1)  
(2) 
As an example, the graph in Fig. 1 consists of nodes . Integer bandwidths for each communication link are marked.
Definition 2 (Source and Receiver Nodes)
Given a relay network , the set of source nodes is defined as . We assume a labeling of nodes in so that , i.e. the first nodes are source nodes. The set of receiver nodes is defined as .^{1}^{1}1For networks of interest in this paper, an arbitrary DAG may be augmented with auxiliary nodes to ensure that source nodes have indegree and receiver nodes have outdegree . Let . We assume a labeling of nodes in so that , i.e. the last nodes are receiver nodes.
In Fig. 1, and .
Iia Source Model
Definition 3 (Basic Source Model)
Given a relay network with source/receiver nodes , the source nodes observe random signals . The random vectors are assumed zeromean with covariance , and crosscovariances . Let . The distributed network sources may be grouped into an dimensional random vector with known secondorder statistics ,
(3) 
More generally, each source node emits independent and identically distributed () source vectors for a discrete time index; however, in the analysis of zerodelay linear coding, we do not write the time indices explicitly.
Remark 1
A common linear signalplusnoise model for sensor networks is of the form ; however, neither a linear source model nor the specific distribution of is assumed here. A priori knowledge of secondorder statistics may be obtained during a training phase via sample estimation.
In Fig. 1, two source nodes observe the corresponding random signals in .
IiB Communication Model
Definition 4 (Communication Model)
Given a relay network with weightset , each edge represents a communication link of bandwidth from to . The bandwidth is the dimension of the vector channel. We denote signals exiting along edge by and signals entering node along edge by . If communication is noiseless, . For all relay nodes and receiver nodes, we further define to be the concatenation of all signals incident to node along edges .
A noisy communication link is modeled as: . The channel noise is a Gaussian random vector with zeromean and covariance . The channel input is power constrained so that . The power constraints for a network are given by set . The signaltonoise ratio (SNR) along a link is
(4) 
Fig. 1(b) illustrates the signal flow of an LTN graph.
IiC Linear Encoding over Graph
Source and relay nodes encode random vector signals by applying reduceddimension linear transforms.
Definition 5 (Linear Encoding)
Given a relay network , weightset , source/receiver nodes , sources , and the communication model of Definition 4, the linear encoding matrices for are denoted by set . Each represents the linear transform applied by node in communication with node . For , transform is of size and represents the encoding . For a relay , transform is of size , and . The compression ratio along edge is
,  (5a)  
.  (5b) 
In Fig. 1, the linear encoding matrices for source node and are and respectively. The linear encoding matrices for the relays are , , . The output signals of source node are and . Similarly, the output signal of relay is
(6) 
IiD Linear Estimation over
Definition 6 (Linear Estimation)
Given relay network , weightset , source/receiver nodes , sources , and the communication model of Def. 4, the set of linear decoding matrices is denoted . Each receiver estimates a (zeromean) random vector which is correlated with the sources in . We assume that the secondorder statistics , are known. Receiver applies a linear estimator given by matrix to estimate given its observations and computes . The linear least squares estimate (LLSE) of is denoted by .
In Fig. 1, receiver reconstructs while receiver reconstructs . The LLSE signals and are computed as
(7)  
(8) 
Definition 7 (Distortion Metric)
Let and be two real vectors of the same dimension. The MSE distortion metric is defined as
(9) 
IiE CompressionEstimation in Networks
Definition 8 (Linear Transform Network )
An LTN model is a communication network modeled by DAG , weightset , source/receiver nodes , sources , sets , and from Definitions 16. Secondorder source statistics are given by (Definition 3). The operational meaning of compressionestimation matrices in and is in terms of signal flows on (Definition 4). The desired reconstruction vectors have known secondorder statistics and . The set denotes the LLSE estimates formed at receivers (Definition 6). For noisy networks, noise variables along link have known covariances . Power constraints are given by set in Definition 4.
Given an LTN graph , the task is to design a network transform code: the compressionestimation matrices in and to minimize the endtoend weighted MSE distortion. Let positive weights represent the relative importance of reconstructing a signal at receiver . Using indexing term for receiver nodes, we concatenate vectors as and LLSE estimates as . The average weighted MSE written via a weighted norm is
(10) 
where contains diagonal blocks .
Remark 2
The distortion is a function of the compression matrices in and the estimation matrices in . In most network topologies, the weighted MSE distortion is nonconvex over the set of feasible matrices. Even in the particular case of distributed compression [5], currently the optimal linear transforms are not solvable in closed form.
Iii Linear Signal Processing in Networks
The linear processing and filtering of source signals by an LTN graph is modeled compactly as a linear system with inputs, outputs, and memory elements. At each time step, LTN nodes transmit random signals through edges/channels of the graph.
Iiia Linear System
Consider edge as a memory element storing random vector . Let and . The network is modeled as a linear system with the following signals: (i) input sources concatenated as global source vector ; (ii) input noise variables concatenated as global noise vector ; (iii) memory elements concatenated as global state vector at time ; (iv) output vectors concatenated as .
IiiA1 Statespace Equations
The linear system^{2}^{2}2When discussing zerodelay linear coding, the time indices on vectors , , and are omitted for greater clarity of presentation. is described by the following statespace equations for ,
(11)  
(12) 
The matrix is the stateevolution matrix common to all receivers, is the sourcenetwork connectivity matrix, and is the noisetonetwork connectivity matrix. The matrices , , and represent how each receiver’s output is related to the state, source, and noise vectors respectively. For networks considered in this paper, and .
IiiA2 Linear Transfer Function
A standard result in linear system theory yields the transfer function (assuming a unity indeterminate delay operator) for each receiver ,
(13)  
(14) 
where and . For acyclic graphs, is a nilpotent matrix and for finite integer . Using indexing term , the observation vectors collected by receivers are concatenated as . Let
(15) 
and let be defined similarly with respect to matrices . Then the complete linear transfer function of the network is . Analog processing of signals without error control implies noise propagation; the additive noise is also linearly filtered by the network via .
Example 1
IiiB Layered Networks
Definition 9 (Layered DAG Network)
A layering of a DAG is a partition of into disjoint subsets such that if directed edge , where and , then . A DAG layering (nonunique) is polynomialtime computable [32].
Given a layered partition of an LTN graph, source nodes with indegree may be placed in partition . Similarly, receivers with outdegree may be placed in partition . The transfer function in Eqn. (15) may be factored into a product of matrices,
(16) 
where for is the linear transformation of signals between nodes in partition and (note the reverse ordering of the with respect to the partitions ). If an edge exists between nodes in nonconsecutive partitions, an identity transform is inserted to replicate signals between multiple layers. Due to the linearity of transforms, for any layered partition of , the layered transforms can be constructed. The are structured matrices comprised of subblocks , identity matrices, and/or zero matrices. The block structure is determined by the network topology.
Example 2
For the multiple unicast network of Fig. 1, a valid layered partition of is , , , and . Let , , and let be partitioned as . According to the layering, the transfer matrix is factored in product form ,
Iv Optimizing CompressionEstimation Matrices
Our optimization method proceeds iteratively over network layers. To simplify the optimization, we first assume ideal channels (highSNR communication) for which . Then the linear operation of the network is with . Linear transform coding is constrained according to bandwidth compression ratios .
Iva MSE Distortion at Receivers
According to the linear system equations, Eqns. (11)(14), each receiver receives filtered source observations . Receiver applies a linear estimator to estimate signal . The MSE cost of estimation is
(17) 
Setting the matrix derivative with respect to in Eqn. (17) to zero yields: . For a fixed transfer function , the optimal LLSE matrix is
(18) 
If in Eqn. (18) is singular, the inverse may be replaced with a pseudoinverse operation to compute .
Let denote a block diagonal global matrix containing individual decoding matrices on the diagonal. For an LTN graph with encoding transfer function , we write the linear decoding operation of all receivers as where are the observations received. The weighted MSE cost in Eqn. (10) for reconstructing signals at all receivers is written as
(19) 
By construction of the weighting matrix , the MSE in Eqn. (19) is a weighted sum of individual distortions at receivers, i.e. .
IvB Computing Encoding Transforms
The optimization of the network transfer function is more complex due to block constraints imposed by the network topology on matrices . In order to solve for a particular linear transform , we assume all linear transforms , and the receivers’ decoding transform are fixed. Then the optimal is the solution to a constrained quadratic program. To derive this, we utilize the following identities in which :
(20)  
(21) 
We write the network’s linear transfer function as and define the following matrices
(22)  
(23)  
(24) 
To write in terms of the matrix variable , we also define the following,
(25)  
(26)  
(27) 
where , , and are a scalar, vector, and positive semidefinite matrix respectively. The following lemma expresses as a function of the unknown matrix variable .
Lemma 1
IvC Quadratic Program with Convex Constraints
Due to Lemma 1, the weighted MSE is a quadratic function of if all other network matrices are fixed. The optimal must satisfy block constraints determined by network topology. The block constraints are linear equality constraints of the form . For example, if contains an identity subblock, this is enforced by setting entries in to zero and one accordingly, via linear equality constraints.
Theorem 1 (Optimal Encoding)
Let encoding matrices , and decoding matrix be fixed. Let . The optimal encoding transform is given by the following constrained quadratic program (QP) [33, Def. 4.34]
(29)  
s. t. 
where represent linear equality constraints on elements of . The solution to the above optimization for is obtained by solving a corresponding linear system
(30) 
If the constraints determined by the pair are feasible, the linear system of Eqn. (30) is guaranteed to have either one or infinitely many solutions.
The QP of Eqn. (29) follows from Lemma 1 with additional linear equality constraints placed on . The closed form solution to the QP is derived using Lagrange dual multipliers for the linear constraints, and the KarushKuhnTucker (KKT) conditions. Let represent the Lagrangian formed with dual vector variable for the constraints,
(31)  
(32)  
(33) 
Setting and yields the linear system of Eqn. (30), the solutions to which are and dual vector . Since the MSE distortion is bounded by a minimum of zero error, the linear system has a unique solution if is full rank, or infinitely many solutions of equivalent objective value if is singular.
Remark 3
Beyond linear constraints, several other convex constraints on matrix variables could be applied within the quadratic program. For example, the norm of a vector defined by