Reduced-Dimension Linear Transform Coding of Correlated Signals in Networks

Reduced-Dimension Linear Transform Coding of Correlated Signals in Networks

Naveen Goela,  and Michael GastparThis work was supported in part by the National Science Foundation under Grant CCF-0627024 and made with U.S. Government support under and awarded by DoD, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a. The material in this paper was presented in part at the IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan in April 2009 and at the IEEE International Symposium on Information Theory, Seoul, South Korea in June 2009.Copyright © 2012 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org.N. Goela and M. Gastpar are with the Department of Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA 94720-1770 USA (e-mail: {ngoela, gastpar}@eecs.berkeley.edu).M. Gastpar is also with the School of Computer and Communication Sciences, Ecole Polytechnique Fédérale (EPFL), Lausanne, Switzerland.
Abstract

A model, called the linear transform network (LTN), is proposed to analyze the compression and estimation of correlated signals transmitted over directed acyclic graphs (DAGs). An LTN is a DAG network with multiple source and receiver nodes. Source nodes transmit subspace projections of random correlated signals by applying reduced-dimension linear transforms. The subspace projections are linearly processed by multiple relays and routed to intended receivers. Each receiver applies a linear estimator to approximate a subset of the sources with minimum mean squared error (MSE) distortion. The model is extended to include noisy networks with power constraints on transmitters. A key task is to compute all local compression matrices and linear estimators in the network to minimize end-to-end distortion. The non-convex problem is solved iteratively within an optimization framework using constrained quadratic programs (QPs). The proposed algorithm recovers as special cases the regular and distributed Karhunen-Loève transforms (KLTs). Cut-set lower bounds on the distortion region of multi-source, multi-receiver networks are given for linear coding based on convex relaxations. Cut-set lower bounds are also given for any coding strategy based on information theory. The distortion region and compression-estimation tradeoffs are illustrated for different communication demands (e.g. multiple unicast), and graph structures.

Karhunen-Loève transform (KLT), linear transform network (LTN), quadratic program (QP), cut-set bound.

I Introduction

The compression and estimation of an observed signal via subspace projections is both a classical and current topic in signal processing and communication. While random subspace projections have received considerable attention in the compressed sensing literature [1], subspace projections optimized for minimal distortion are important for many applications. The Karhunen-Loève transform (KLT) and its empirical form Principal Components Analysis (PCA), are widely studied in computer vision, biology, signal processing, and information theory. Reduced dimensionality representations are useful for source coding, noise filtering, compression, clustering, and data mining. Specific examples include eigenfaces for face recognition, orthogonal decomposition in transform coding, and sparse PCA for gene analysis [2, 3, 4].

In contemporary applications such as wireless sensor networks (WSNs) and distributed databases, data is available and collected in different locations. In a WSN, sensors are usually constrained by limited power and bandwidth resources. This has motivated existing approaches to take into account correlations across high-dimensional sensor data to reduce transmission requirements (see e.g. [5, 6, 7, 8, 9, 10, 11]). Rather than transmitting raw sensor data to a fusion center to approximate a global signal, sensor nodes carry out local data dimensionality reduction to increase bandwidth and energy efficiency.

In the present paper, we propose a linear transform network (LTN) model to analyze dimensionality reduction for compression-estimation of correlated signals in multi-hop networks. In a centralized setting, given a random source signal with zero-mean and covariance matrix , applying the KLT to yields uncorrelated components in the eigenvector basis of . The optimal linear least squares -order approximation of the source is given by the components corresponding to the largest eigenvalues of . In a network setting, multiple correlated signals are observed by different source nodes. The source nodes transmit low-dimensional subspace projections (approximations of the source) to intended receivers via a relay network. The compression-estimation problem is to optimize the subspace projections computed by all nodes in order to minimize the end-to-end distortion at receiver nodes.

In our model, receivers estimate random vectors based on “one-shot” linear analog-amplitude multisensor observations. The restriction to “one-shot”, zero-delay encoding of each vector of source observations separately is interesting due to severe complexity limitations in many applications (e.g. sensor networks). Linear coding depends on first-order and second-order statistics and is robust to uncertainty in the precise probabilistic distribution of the sources. Under the assumption of ideal channels between nodes, our task is to optimize signal subspaces given limited bandwidth in terms of the number of real-valued messages communicated. Our results extend previous work on distributed estimation in this case [5, 6, 7, 8]. For the case of dimensionality-reduction with noisy channel communication (see e.g. [6]), the task is to optimize signal subspaces subject to channel noise and power constraints.

For noisy networks, the general communication problem is often referred to as the joint source-channel-network coding problem in the information-theoretic literature and is a famously open problem. Beyond the zero-delay, linear dimensionality-reduction considered here, end-to-end performance in networks could be improved by (i), non-linear strategies and (ii), allowing a longer coding horizon. Partial progress includes non-linear low-delay mappings for only simple network scenarios [12, 13, 14]. For the case of an infinite coding horizon, separation theorems for decomposing the joint communication problem have been analyzed by [15, 16, 17].

I-a Related Work

Directly related to our work in networks is the distributed KLT problem. Distributed linear transforms were introduced by Gastpar et al. for the compression of jointly Gaussian sources using iterative methods [5][18]. Simultaneous work by Zhang et al. for multi-sensor data fusion also resulted in iterative procedures [8]. An alternate proof based on innovations for second order random variables with arbitrary distributions was given by [19]. The problem was extended for non-Gaussian sources, including channel fading and noise effects to model the non-ideal link from sensors to decoder by Schizas et al. [6]. Roy and Vetterli provide an asymptotic distortion analysis of the distributed KLT, in the case when the dimension of the source and observation vectors approaches infinity [20]. Finally, Xiao et al. analyze linear transforms for distributed coherent estimation [7].

Much of the estimation-theoretic literature deals with single-hop networks; each sensor relays information directly to a fusion center. In multi-hop networks, linear operations are performed by successive relays to aggregate, compress, and redistribute correlated signals. The LTN model relates to recent work on routing and network coding (Ahlswede et al. [21]). In pure routing solutions, intermediate nodes either forward or drop packets. The corresponding analogy in the LTN model is to constrain transforms to be essentially identity transforms. However, network coding (over finite fields) has shown that mixing of data at intermediate nodes achieves higher rates in the multicast setting (see [22] regarding the sufficiency of linear codes and [23] for multicast code construction). Similarly in the LTN model, linear combining of subspace projections (over the real field) at intermediate nodes improves decoding performance. Lastly, the max-flow min-cut theorem of Ford-Fulkerson [24] provides the basis for cut-set lower bounds in networks.

The LTN model is partially related to the formulation of Koetter and Kschischang [25] modeling information transmission as the injection of a basis for a vector space into the network, and subspace codes [26]. If arbitrary data exchange is permitted between network nodes, the compression-estimation problem is related to estimation in graphical models (e.g. decomposable PCA [27], and tree-based transforms (tree-KLT) [28]). Other related work involving signal projections in networks includes joint source-channel communication in sensor networks [29], random projections in a gossip framework[30], and distributed compressed sensing [31].

I-B Summary of Main Results

We cast the network compression-estimation problem as a statistical signal processing and constrained optimization problem. For most networks, the optimization is non-convex. Therefore, our main results are divided into two categories: (i) Iterative solutions for linear transform coding over acyclic networks; (ii) Cut-set bounds based on convex relaxations and cut-set bounds based on information theory.

  • Section III reviews linear signal processing in networks. Section IV outlines an iterative optimization for compression-estimation matrices in ideal networks under a local convergence criterion.

  • Section V analyzes an iterative optimization method involving constrained quadratic programs for noisy networks with power allocation over subspaces.

  • Section VI introduces cut-set lower bounds to benchmark the minimum mean square error (MSE) for linear coding based on convex relaxations such as a semi-definite program (SDP) relaxation.

  • Section VI-F describes cut-set lower bounds for any coding strategy in networks based on information-theoretic principles of source-channel separation. The lower bounds are plotted for a distributed noisy network.

  • Sections IV-VI provide examples illustrating the tradeoffs between compression and estimation; upper and lower bounds are illustrated for an aggregation (tree) network, butterfly network, and distributed noisy network.

I-C Notation

Boldface upper case letters denote matrices, boldface lower case letters denote column vectors, and calligraphic upper case letters denote sets. The -norm of a vector is defined as . The weighted -norm where is a positive semi-definite matrix (written ). Let , , and denote matrix transpose, inverse, and trace respectively. Let denote the Kronecker matrix product of two matrices. The matrix denotes the identity. For , the notation denotes the product of matrices. A matrix is written in vector form by stacking its columns; i.e. where is the -th column of . For random vectors, denotes the expectation, and denotes the covariance matrix of the zero-mean random vector .

Ii Problem Statement

(a) Linear Transform Network\pscircleop[opsep=0](51,-85)op15 \pscircleop[opsep=0](131,-85)op26 (b) Signal Flow on Graph\pscircleop[opsep=0](91,-85)op34 \pscircleop[opsep=0](131,-85)op13 \pscircleop[opsep=0](131,-85)op45 \pscircleop[opsep=0](131,-85)op23 \pscircleop[opsep=0](131,-85)op46
Fig. 1: (a) Linear Transform Network: An LTN model with source nodes and receivers . Source nodes observe vector signals . All encoding nodes linearly process received signals using a transform . Receivers and compute LLSE estimates and of desired signals and . (b) Signal Flow Graph: Linear processing of source signals results in signals transmitted along edges of the graph.

Fig. 1 serves as an extended example of an LTN graph. The network is comprised of two sources, two relays, and two receiver nodes.

Definition 1 (Relay Network)

Consider a relay network modeled by a directed acyclic graph (DAG) and a set of weights . The set is the vertex/node set, is the edge set, and is the set of weights. Each edge represents a communication link with integer bandwidth from node to . The in-degree and out-degree of a node are computed as

(1)
(2)

As an example, the graph in Fig. 1 consists of nodes . Integer bandwidths for each communication link are marked.

Definition 2 (Source and Receiver Nodes)

Given a relay network , the set of source nodes is defined as . We assume a labeling of nodes in so that , i.e. the first nodes are source nodes. The set of receiver nodes is defined as .111For networks of interest in this paper, an arbitrary DAG may be augmented with auxiliary nodes to ensure that source nodes have in-degree and receiver nodes have out-degree . Let . We assume a labeling of nodes in so that , i.e. the last nodes are receiver nodes.

In Fig. 1, and .

Ii-a Source Model

Definition 3 (Basic Source Model)

Given a relay network with source/receiver nodes , the source nodes observe random signals . The random vectors are assumed zero-mean with covariance , and cross-covariances . Let . The distributed network sources may be grouped into an -dimensional random vector with known second-order statistics ,

(3)

More generally, each source node emits independent and identically distributed () source vectors for a discrete time index; however, in the analysis of zero-delay linear coding, we do not write the time indices explicitly.

Remark 1

A common linear signal-plus-noise model for sensor networks is of the form ; however, neither a linear source model nor the specific distribution of is assumed here. A priori knowledge of second-order statistics may be obtained during a training phase via sample estimation.

In Fig. 1, two source nodes observe the corresponding random signals in .

Ii-B Communication Model

Definition 4 (Communication Model)

Given a relay network with weight-set , each edge represents a communication link of bandwidth from to . The bandwidth is the dimension of the vector channel. We denote signals exiting along edge by and signals entering node along edge by . If communication is noiseless, . For all relay nodes and receiver nodes, we further define to be the concatenation of all signals incident to node along edges .

A noisy communication link is modeled as: . The channel noise is a Gaussian random vector with zero-mean and covariance . The channel input is power constrained so that . The power constraints for a network are given by set . The signal-to-noise ratio (SNR) along a link is

(4)

Fig. 1(b) illustrates the signal flow of an LTN graph.

Ii-C Linear Encoding over Graph

Source and relay nodes encode random vector signals by applying reduced-dimension linear transforms.

Definition 5 (Linear Encoding)

Given a relay network , weight-set , source/receiver nodes , sources , and the communication model of Definition 4, the linear encoding matrices for are denoted by set . Each represents the linear transform applied by node in communication with node . For , transform is of size and represents the encoding . For a relay , transform is of size , and . The compression ratio along edge is

, (5a)
. (5b)

In Fig. 1, the linear encoding matrices for source node and are and respectively. The linear encoding matrices for the relays are , , . The output signals of source node are and . Similarly, the output signal of relay is

(6)

Ii-D Linear Estimation over

Definition 6 (Linear Estimation)

Given relay network , weight-set , source/receiver nodes , sources , and the communication model of Def. 4, the set of linear decoding matrices is denoted . Each receiver estimates a (zero-mean) random vector which is correlated with the sources in . We assume that the second-order statistics , are known. Receiver applies a linear estimator given by matrix to estimate given its observations and computes . The linear least squares estimate (LLSE) of is denoted by .

In Fig. 1, receiver reconstructs while receiver reconstructs . The LLSE signals and are computed as

(7)
(8)
Definition 7 (Distortion Metric)

Let and be two real vectors of the same dimension. The MSE distortion metric is defined as

(9)

Ii-E Compression-Estimation in Networks

Definition 8 (Linear Transform Network )

An LTN model is a communication network modeled by DAG , weight-set , source/receiver nodes , sources , sets , and from Definitions 1-6. Second-order source statistics are given by (Definition 3). The operational meaning of compression-estimation matrices in and is in terms of signal flows on (Definition 4). The desired reconstruction vectors have known second-order statistics and . The set denotes the LLSE estimates formed at receivers (Definition 6). For noisy networks, noise variables along link have known covariances . Power constraints are given by set in Definition 4.

Given an LTN graph , the task is to design a network transform code: the compression-estimation matrices in and to minimize the end-to-end weighted MSE distortion. Let positive weights represent the relative importance of reconstructing a signal at receiver . Using indexing term for receiver nodes, we concatenate vectors as and LLSE estimates as . The average weighted MSE written via a weighted -norm is

(10)

where contains diagonal blocks .

Remark 2

The distortion is a function of the compression matrices in and the estimation matrices in . In most network topologies, the weighted MSE distortion is non-convex over the set of feasible matrices. Even in the particular case of distributed compression [5], currently the optimal linear transforms are not solvable in closed form.

Iii Linear Signal Processing in Networks

The linear processing and filtering of source signals by an LTN graph is modeled compactly as a linear system with inputs, outputs, and memory elements. At each time step, LTN nodes transmit random signals through edges/channels of the graph.

Iii-a Linear System

Consider edge as a memory element storing random vector . Let and . The network is modeled as a linear system with the following signals: (i) input sources concatenated as global source vector ; (ii) input noise variables concatenated as global noise vector ; (iii) memory elements concatenated as global state vector at time ; (iv) output vectors concatenated as .

Iii-A1 State-space Equations

The linear system222When discussing zero-delay linear coding, the time indices on vectors , , and are omitted for greater clarity of presentation. is described by the following state-space equations for ,

(11)
(12)

The matrix is the state-evolution matrix common to all receivers, is the source-network connectivity matrix, and is the noise-to-network connectivity matrix. The matrices , , and represent how each receiver’s output is related to the state, source, and noise vectors respectively. For networks considered in this paper, and .

Iii-A2 Linear Transfer Function

A standard result in linear system theory yields the transfer function (assuming a unity indeterminate delay operator) for each receiver ,

(13)
(14)

where and . For acyclic graphs, is a nilpotent matrix and for finite integer . Using indexing term , the observation vectors collected by receivers are concatenated as . Let

(15)

and let be defined similarly with respect to matrices . Then the complete linear transfer function of the network is . Analog processing of signals without error control implies noise propagation; the additive noise is also linearly filtered by the network via .

Example 1

Fig. 2 is the LTN graph of a noisy relay network. Let state , , and output . The linear system representation is given as follows,

By evaluating Eqn. (14),

Dropping the time indices and writing in addition to , the linear transfer function of the noisy relay network is of the following form: .

\pscircleop[opsep=0, scale=1.232](55.5,-30)op13 \pscircleop[opsep=0, scale=1.232](40.5,-10)op12 \pscircleop[opsep=0, scale=1.232](70.5,-10)op23
Fig. 2: The LTN graph of a noisy relay network with and . The linear processing of the network is modeled as a linear system with input and output .

Iii-B Layered Networks

Definition 9 (Layered DAG Network)

A layering of a DAG is a partition of into disjoint subsets such that if directed edge , where and , then . A DAG layering (non-unique) is polynomial-time computable [32].

Given a layered partition of an LTN graph, source nodes with in-degree may be placed in partition . Similarly, receivers with out-degree may be placed in partition . The transfer function in Eqn. (15) may be factored into a product of matrices,

(16)

where for is the linear transformation of signals between nodes in partition and (note the reverse ordering of the with respect to the partitions ). If an edge exists between nodes in non-consecutive partitions, an identity transform is inserted to replicate signals between multiple layers. Due to the linearity of transforms, for any layered partition of , the layered transforms can be constructed. The are structured matrices comprised of sub-blocks , identity matrices, and/or zero matrices. The block structure is determined by the network topology.

Example 2

For the multiple unicast network of Fig. 1, a valid layered partition of is , , , and . Let , , and let be partitioned as . According to the layering, the transfer matrix is factored in product form ,

Example 3

Consider the setting of Example 1 for the relay network shown in Fig. 2. A valid layered partition of is , , . According to the layering, the transfer matrix may be written in product form ,

Iv Optimizing Compression-Estimation Matrices

Our optimization method proceeds iteratively over network layers. To simplify the optimization, we first assume ideal channels (high-SNR communication) for which . Then the linear operation of the network is with . Linear transform coding is constrained according to bandwidth compression ratios .

Iv-a MSE Distortion at Receivers

According to the linear system equations, Eqns. (11)-(14), each receiver receives filtered source observations . Receiver applies a linear estimator to estimate signal . The MSE cost of estimation is

(17)

Setting the matrix derivative with respect to in Eqn. (17) to zero yields: . For a fixed transfer function , the optimal LLSE matrix is

(18)

If in Eqn. (18) is singular, the inverse may be replaced with a pseudo-inverse operation to compute .

Let denote a block diagonal global matrix containing individual decoding matrices on the diagonal. For an LTN graph with encoding transfer function , we write the linear decoding operation of all receivers as where are the observations received. The weighted MSE cost in Eqn. (10) for reconstructing signals at all receivers is written as

(19)

By construction of the weighting matrix , the MSE in Eqn. (19) is a weighted sum of individual distortions at receivers, i.e. .

Iv-B Computing Encoding Transforms

The optimization of the network transfer function is more complex due to block constraints imposed by the network topology on matrices . In order to solve for a particular linear transform , we assume all linear transforms , and the receivers’ decoding transform are fixed. Then the optimal is the solution to a constrained quadratic program. To derive this, we utilize the following identities in which :

(20)
(21)

We write the network’s linear transfer function as and define the following matrices

(22)
(23)
(24)

To write in terms of the matrix variable , we also define the following,

(25)
(26)
(27)

where , , and are a scalar, vector, and positive semi-definite matrix respectively. The following lemma expresses as a function of the unknown matrix variable .

Lemma 1

Let transforms , , and be fixed. Let , , be defined in Eqns. (22)-(24), and , , and be defined in Eqns. (25)-(27). Then the weighted MSE distortion of Eqn. (19) is a quadratic function of ,

(28)
{proof}

Substituting the expressions for , , in Eqns. (22)-(24) into Eqn. (19) produces the intermediate equation: Directly applying the vector-matrix identities of Eqns. (20)-(21) results in Eqn. (28).

(a) LTN Block Diagram(b) Distortion vs. Compression(c) ConvergenceDistributedPoint to Point
Fig. 3: (a) Block diagram of the “hybrid network” example. (b) The end-to-end distortion vs. compression for varying bandwidth . The network operates in one of three modes (distributed, hybrid, or point-to-point) as described in Example 4. (c) Convergence of for five different initializations of the iterative algorithm for the operating point .

Iv-C Quadratic Program with Convex Constraints

Due to Lemma 1, the weighted MSE is a quadratic function of if all other network matrices are fixed. The optimal must satisfy block constraints determined by network topology. The block constraints are linear equality constraints of the form . For example, if contains an identity sub-block, this is enforced by setting entries in to zero and one accordingly, via linear equality constraints.

1:  Identify compression matrices and corresponding linear equalities for network . Identify estimation matrices . [Sec. III, Sec. IV-C]
2:  Initialize randomly to feasible matrices.
3:  Set , .
4:  repeat
5:     Compute given . [Eqn. (18)]
6:     for  do
7:        Compute given , , , . [Theorem 1]
8:     end for
9:     Compute . [Eqn. (19)]
10:     Set .
11:     Set .
12:  until  or .
13:  return  , .
Algorithm 1 Ideal-Compression-Estimation(, , )
Theorem 1 (Optimal Encoding)

Let encoding matrices , and decoding matrix be fixed. Let . The optimal encoding transform is given by the following constrained quadratic program (QP) [33, Def. 4.34]

(29)
s. t.

where represent linear equality constraints on elements of . The solution to the above optimization for is obtained by solving a corresponding linear system

(30)

If the constraints determined by the pair are feasible, the linear system of Eqn. (30) is guaranteed to have either one or infinitely many solutions.

{proof}

The QP of Eqn. (29) follows from Lemma 1 with additional linear equality constraints placed on . The closed form solution to the QP is derived using Lagrange dual multipliers for the linear constraints, and the Karush-Kuhn-Tucker (KKT) conditions. Let represent the Lagrangian formed with dual vector variable for the constraints,

(31)
(32)
(33)

Setting and yields the linear system of Eqn. (30), the solutions to which are and dual vector . Since the MSE distortion is bounded by a minimum of zero error, the linear system has a unique solution if is full rank, or infinitely many solutions of equivalent objective value if is singular.

Remark 3

Beyond linear constraints, several other convex constraints on matrix variables could be applied within the quadratic program. For example, the -norm of a vector defined by