Analysis of Recurrent Linear Networks for Enabling Compressed Sensing of Time-Varying Signals
Recent interest has developed around the problem of dynamic compressed sensing, or the recovery of time-varying, sparse signals from limited observations. In this paper, we study how the dynamics of recurrent networks, formulated as general dynamical systems, mediate the recoverability of such signals. We specifically consider the problem of recovering a high-dimensional network input, over time, from observation of only a subset of the network states (i.e., the network output). Our goal is to ascertain how the network dynamics lead to performance advantages, particularly in scenarios where both the input and output are corrupted by disturbance and noise, respectively. For this scenario, we develop bounds on the recovery performance in terms of the dynamics. Conditions for exact recovery in the absence of noise are also formulated. Through several examples, we use the results to highlight how different network characteristics may trade off toward enabling dynamic compressed sensing and how such tradeoffs may manifest naturally in certain classes of neuronal networks.
We consider the analysis of recurrent networks for facilitating recovery of a high-dimensional, time-varying, sparse input in the presence of both corrupting disturbance and confounding noise. The network receives an input and generates the observations (network outputs), via its recurrent dynamics, i.e.,
where, here, are the network states, is the corrupting disturbance and is the confounding noise. Our focus is on how the network dynamics, embedded in , impact the extent to which can be inferred from in the case where the dimensionality of the latter is substantially less than that of the former. We will focus exclusively on the case where these dynamics are linear.
Such a problem, naturally, falls into the category of sparse signal recovery or compressed sensing (CS), for under-determined linear systems [1, 2, 3]. It is well known that for such problems, exact and stable recovery can be achieved under certain assumptions related to the statistical properties of the observed signal [4, 5, 6, 7]. Classical CS, however, does not typically consider temporal dynamics associated with the recovery problem.
Given the natural sparsity of electrical signals in the brain, CS has been linked to important questions in neural decoding [8, 9], i.e., how the brain represents and transforms information. Understanding the dynamics of brain networks in the context of CS is a crucial aspect of this overall problem [10, 8, 11, 12]. Such networks are, of course, not static. Thus, recent interest has grown around so-called dynamic CS and, specifically, on the recovery of signals subjected to transformation via a dynamical system (or, network). In this context, sparsity has been formulated in three ways: 1) In the network states (state sparsity) [13, 14, 15]; 2) In the structure/parameters of the network model (model sparsity) [16, 17]; and 3) In the inputs to the network (input sparsity) [18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]. Here, we consider this latter category of recovery problems.
Our motivation is to understand how three stages of a generic network architecture – an afferent stage, a recurrent stage, and an output stage (see Fig. 1) – interplay in order to enable an observer, sampling the output, to recover the (sparse) input. Such an architecture is pervasive in sensory networks in the brain wherein a large number of sensory neurons, receiving excitation from the periphery, impinge on an early recurrent network layer that transforms the afferent excitation en route to higher brain regions [29, 30]. Moreover, beyond neural contexts, understanding network characteristics for dynamic CS may aid in the analysis of systems for efficient processing of naturally sparse time-varying signals ; and in the design of resilient cyber-physical systems [18, 19, 21], wherein extrinsic perturbations are sparse and time-varying. Toward these potential instantiations, our specific aim in this paper is to elucidate fundamental dynamical characteristics of linear networks for exact and stable recovery of the (sparse) input signal, corrupted by an input disturbance.
I-B Paper Contributions
To achieve our specific aim, we develop and present the following contributions:
We develop analytical conditions on the network dynamics, related to the classical notion of observability for a linear system, such that the network admits exact and stable (in the presence of output noise) input recovery.
We derive an upper bound in terms of the network dynamics, for the -norm of the reconstruction error over a finite time horizon. This error can be defined in terms of both the disturbance and the noise.
Based on the error analysis, we characterize a basic tradeoff between the ability of a network to simultaneously reject input disturbances while still admitting stable recovery.
We highlight network characteristics that optimally balance this tradeoff, and demonstrate via several examples their ability to reconstruct corrupted time-varying input from noisy observations. In particular, we highlight an example of a rate-based neuronal network, and how specific features of the network architecture mediate these tradeoffs.
I-C Prior Results in Sparse Input Recovery
The sparse input recovery problem for linear systems can be formulated in both the spatial and temporal dimension. Our contributions are related to the former. As mentioned above, in this context, previous work has considered recovery of spatially sparse inputs for network resilience [18, 19] and encoding of inputs with sparse increments . In , conditions for exact sparse recovery are formulated in terms of a coherence-based observability criterion for intended applications in cyber-physical systems. Our contributions herein provide a general set of analytical results, including performance bounds, pertaining to exact and stable sparse input recovery of linear systems in the presence of both noise and disturbance.
A second significant line of research in sparse input recovery problems pertains to the temporal dimension. There, the goal is to understand how a temporally sparse signal (i.e., one that takes the value of zero over a nontrivial portion of its history) can be recovered from the state of the network at a particular instant in time. This problem forms the underpinning of a characterization of ‘memory’ in dynamical models of brain networks [22, 23, 24, 25, 26, 27]. In particular, in  the problem of ascertaining memory is related to CS performed on the network states, over a receding horizon of a scalar-valued input signal. In contrast to these works, we consider spatial sparsity of vector-valued inputs with explicit regard for both disturbance and an overt observation equation, i.e., states are not be directly sampled, but are transformed to an (in general lower-dimensional) output.
I-D Paper Outline
The remainder of the paper is organized as follows. In Section II we provide motivation of the current work and formulate the problem in detail. In Section III we develop theoretical results on the performance of the proposed recovery method. Simulation results for several different scenarios are provided in Section IV. Finally, conclusions are formulated in Section V.
Ii Problem Formulation
We consider a discrete-time model for a linear network, formulated in the typical form a linear dynamical system, i.e.,:
where is an integer time index, is the activity of the network nodes (e.g., in the case of a rate-based neuronal network [31, 32, 33, 34, 35, 36, 37], the firing rate of each neuron), is the extrinsic input, is the input disturbance, is the measurement noise independent from , and is the observation at time . The matrix describes connections between nodes in the network, contains weights between input and output and is the measurement matrix. Such a model is, of course, quite general and can be used to describe recurrent dynamics in neuronal networks [38, 39, 40, 41, 42], machine learning applications such as pattern recognition and data mining [43, 44, 45, 46], etc.
We consider the case of bounded disturbance and noise, i.e., , . Since , the number of input nodes, is larger than , the number of output nodes, takes the form of a “wide” matrix. We assume that at each time at most input nodes are active (-sparse input), leading to an constraint to (1) at each time point:
It is clear that Problem () is a non-convex discontinuous problem, which is not numerically feasible and is NP-Hard in general . For static cases, such optimization problems fall into the category of combinatorial optimization which require exhaustive search to find the solution .
Thus, throughout this paper, we follow the typical relaxation methodology used for such problems wherein the norm is relaxed to the norm, resulting in the problem:
In the case that either input disturbance, or measurement noise, or both exist, we solve the following convex optimization Problem ():
where is 2-norm of a surrogate parameter that aggregates the effects of disturbance and noise. In the case of noisy measurement with no disturbance . In the next section, we show conditions for the network (1) under which Problems () and () result in exact and stable solutions.
We will develop our results in several steps. First, we consider two cases for the observation matrix in the absence of input disturbance, and proceed to establish existence and performance guarantees for solutions to the convex problems () and () for each case. After that, we continue the analysis to characterize the ability of a network to reject input disturbances while still admitting stable recovery in the presence of disturbance and noise, simultaneously.
We begin by recalling some basic matrix notation and matrix norm properties that will be used throughout this paper. Given normed spaces and , the corresponding induced norm or operator norm denoted by over linear maps , is defined by
where is the set of eigenvalues of (or the spectrum of ).
A vector is said to be -sparse if , in other words it has at most nonzero entries.
The restricted isometry constant of a matrix is defined as the smallest number such that for all -sparse vectors the following equation holds
Iii-B Case 1: Full-rank Square Observation Matrix without Input Disturbance
In the first case, we consider (1) in the absence of input disturbance () and we assume that the linear map , , has no nullspace, , which means that the network states can be exactly recovered by inverting the observation equation (1) (the trivial case being C equal to the identity). Our first result establishes a one to one correspondence between sparse input and observed output for the system (1).
Suppose that the sequence from noiseless measurements is given, and , , , are known. Assume the matrix satisfies the RIP condition (7) with isometry constant . Then, there is a unique -sparse sequence of and a unique sequence of that generate .
See Appendix A. ∎
Having established the existence of a unique solution, we now proceed to study convex Problems () and () that recover these solutions. First, we provide theoretical results for the stable recovery of the input where measurements are noisy i.e., ().
(Noisy recovery) Assume that the matrix satisfies the RIP condition (7) with . Suppose that the sequence is given and generated from sequences and -sparse based on
where and , , , are known. Then, the solution to Problem () obeys
are given explicitly below:
Assume that the the sequences and sparse are the solutions of Problem (). First we derive the bound for the in the following Lemma.
Suppose that the sequence is given and generated from sequences and -sparse based on (8), where and , , , are known. Then, any solution to Problem () obeys
See Appendix B ∎
From Lemma 5, non-singularity of and the equation we can derive a bound for as
which results in
Now, denote where can be decomposed into a sum of vectors for each , each of sparsity at most . Here, corresponds to the location of non-zero elements of , to the location of largest coefficients of , to the location of the next largest coefficients of , and so on. Also, let . Extending the technique in [7, 49], it is possible to obtain a cone constraint for the input in the linear dynamical systems.
(Cone constraint) The optimal solution for the input in Problem () satisfies
See Appendix C. ∎
We can further establish a bound for the right hand side of (15):
The optimal solution for the input in Problem () satisfies the following constraint
See Appendix D. ∎
We now state a Theorem that characterizes the solution for the noiseless case (P1), which follows as a special case of (P2) as the noise variance approaches zero.
Iii-C Case 2: Observation Matrix Satisfying Observability Condition without Input Disturbance
In this case, we consider (1) in the absence of input disturbance () with the linear map , . Thus, direct inversion of is not possible in this case. For any positive , we define the standard linear observability matrix as
If , then the system (1) is observable in the classical sense 111The system is said to be observable if, for any initial state and for any known sequence of input there is a positive integer such that the initial state can be recovered from the outputs , ,…, . . However, we do not assume any knowledge of the input other than the fact that it is -sparse at each time. Note that if we simply iterate the output equation in (1) for time steps and exploit the fact that the input vector is -sparse as shown in Fig. 2, we obtain:
where is as follows:
where is the matrix corresponding the active columns of (corresponding to nonzero input entries) at time step (see Fig. 2). In general, we do not know where active columns of are located at each time a priori. We define as the set of all possible matrices satisfying the structure in (20), where the cardinality of this set is .
In the next Theorem, we establish conditions under which a one to one correspondence exists between sparse input and observed output for the system (1).
Suppose that the sequence from noiseless measurements is given, and , and are known. Assume and the matrix satisfies the RIP condition (7) with isometry constant . Further, assume
Then, there is a unique -sparse sequence of and a unique sequence of that generate .
See Appendix E. ∎
The rank condition implies that all columns of the observability matrix must be linearly independent of each other (i.e., the network is observable in the classical sense) and of all columns of . Since the exact location of the nonzero elements of the input vector are not known a priori, this condition is specified over all . Thus, (21) is a combinatorial condition. From our simulation studies, we observe that this condition holds for random Gaussian matrices almost always and, moreover, can be numerically verified for certain salient random networks (see also Example 3 in Section IV).
Having established the existence of a unique solution, we now proceed to study the convex problems () and () that recover these solutions for this case. First, we provide theoretical results for the stable recovery of the input where measurements are noisy i.e., (P2).
See Appendix F. ∎
(Noiseless recovery) Assume and the matrix satisfies the RIP condition (7) with . Further, assume (21) holds. Suppose that the sequence is given and generated from sequences and -sparse inputs based on dynamical equation (8), where and , and are known. Then the sequences and are the unique minimizer to Problem ().
Imposing an RIP condition on the combined matrix bears some conceptual similarity to the formulation of an overcomplete dictionary in the classical compressed sensing literature [53, 54]. In this sense, the matrix (i.e., the afferent stage) can be interpreted as a dictionary that transforms the sparse input onto the recurrent network states.
Iii-D Case 3: Optimal Network Design to Enable Recovery in the Presence of Disturbance and Noise
Finally, we show how eigenstructure of the network implies a fundamental tradeoff between stable recovery and rejection of disturbance (i.e., corruption).
It is easy to see from (9) that the upper-bound of the recovery error is reduced by decreasing the maximum singular value of . Thus, from now on we use the upper-bound of the input recovery error as a comparative measure of performance. In the absence of both disturbance and noise, the best error performance is achieved when , i.e., the network is static, which in intuitive since in this scenario any temporal effects would smear the salient parts of the signal.
On the other hand, having dynamics in the network should improve the error performance in the presence of the disturbance. To demonstrate this, consider (1) with nonzero. When , i.e., a static network, the disturbance can be exactly transformed to the measurement equation resulting in as a surrogate measurement noise with
In this case, the error upper-bound can be obtained by exploiting the result of Theorem 4 as
When is nonzero, it is not possible to exactly map the disturbance to the output as above. Nevertheless, it is straightforward to approximate the relative improvement in performance. For instance, consider a system with symmetric and where the disturbance and input are in displaced frequency bands. Then it is a direct consequence of linear filtering that the power spectral density of the disturbance can be attenuated according to
where is the frequency of the disturbance. So, for instance, if ,
where is the eigenvalue of matrix . Assuming the input is sufficiently displaced in frequency from the disturbance, the error upper-bound can be then readily approximated using the results of Theorem 4 as follows
By comparing (24) and (27), it is easy to verify that A and C can be designed in a way to reduce error upper-bound at least by a factor of two, assuming and are close to each other. In the examples below, we will show that, in fact, performance in many cases can exceed this bound considerably.
In this section, we present several examples that demonstrate the developed results. For solving our convex optimization problems, we used CVX with MATLAB interface [55, 56]. To create example networks, we generated the matrices , , using a Gaussian random number generator in MATLAB.
Iv-a Example 1: One-step and Sequential Recovery
In this experiment, we consider a dynamical system with sparse input which satisfies conditions in Theorem 11. Here, we consider random Gaussian matrices for , and , with , . The input is defined as the image of a digit, shown in Fig. 3 with values between 0 and 1, where the horizontal axis is treated as time, i.e., column of the image is the input to the system at time .
We proceed to perform input recovery in two ways: (i) by solving () in one step over the entire horizon , i.e., one-step recovery; and (ii) by solving () times, sequentially, i.e., recovery at each time step. We compare the outcomes for two cases:
Fig. 3A shows the recovered input for the case that for both one-step and sequential recovery, and it can be seen that sparse input can be recovered in two cases perfectly. This is expected, since in this case, can be inverted at each time step.
Satisfying Observability Condition
Fig. 3B of the figure illustrates the results for the case that , but where satisfies the observability condition. It is clear that sequential dynamic CS can not recover the input exactly. However, from our results (Theorem 12) we expect that one-step recovery (over the entire horizon) is possible, as is evidenced in the figure.
Iv-B Example 2: Recovery in the Presence of Disturbance and Noise
Fig. 4A shows the mean square error () versus the maximum singular value of , for several random realization of , in the case of full rank . In this study, is assumed to follow an uniform distribution while . It can be seen from this figure that by increasing , the recovery performance is degraded, as we expect based on the derived bound for the error in (9).
To contrast Fig. 4A, we consider the case when disturbance is added to the input. In Fig. 4B, we show the versus for several random diagonal matrices when and . As expected from our results, can not be arbitrary small, since in this case the disturbance would entirely corrupt the input.
We conducted simulation experiments to examine the effect of the noise and disturbance strength on the reconstruction error. Fig. 5 shows the average for the reconstructed input versus and , respectively with , for random trials (different random matrices A, B, C in each trial). This figure shows that the reconstruction error decreases as a function of noise energy.
The next study illustrates recovery in the presence of both disturbance and noise for a smoothly changing sequence of 64 images (frames). Each frame, is corrupted with disturbance and at each time, and the difference between two consecutive frames is considered as the sparse input to the network. The disturbance is assumed to be a random variable drown from a Gaussian distribution, , passed through a fifth-order Chebyshev high pass filter. Each frame has pixels and . Furthermore, we consider random Gaussian matrices for and with .
We proceeded to design the matrix to balance the performance bound (9) and the ability to reject the disturbance as per Section III-D. Fig. 6A shows the original frames at different times. We assumed the first frame is known exactly. Frames recovered from the output of a static network , i.e., are depicted in Fig. 6B. In contrast, frames recovered from the output of the designed dynamic network are shown in Fig. 6C. It is clear from the figure that quality of recovery is better in the latter case. Fig. 7 illustrates the PSNR, defined as as a function of frame number with and without dynamics. It can be concluded from this figure that having a designed matrix results in recovery that is more robust to disturbance, while without dynamics, error propagates over time, and the reconstruction quality is degraded.
Iv-C Example 3: Input Recovery in an Overactuated Rate-Based Neuronal Network
A fundamental question in theoretical neuroscience centers on how the architecture of brain networks enables the encoding/decoding of sensory information [57, 10]. In our final example, we use the results of Theorems 4 and 12 to highlight how certain structural and dynamical features of neuronal networks may provide the substrate for sparse input decoding.
Specifically, we consider a firing rate-based neuronal network  of the form
with input rates , output rates , a feed-forward synaptic weight matrix , and a recurrent synaptic weight matrix . We consider neurons which receive synaptic inputs from afferent neurons, i.e., neurons that impinge on the network in question. Here, is a diagonal matrix whose diagonal elements are the time constants of the neurons. A discrete version of (28), alongside a linear measurement equation can be written in the standard form (1) where is related to connections between nodes in the network, and contains weights between input and output nodes. For this example, we assume that the network connectivity has a Watts–Strogatz small-world topology  with connection probability and rewiring probability .
The recurrent synaptic matrix is defined as
For the purposes of illustration, we select the diagonal elements of matrix , from a uniform distribution . We study the recovery performance associated with the network over 100 time steps, assuming a timescale of milliseconds and an discretization step of . At each time step, the nonzero elements of the input vector , i.e., firing rate of the afferent neurons, are drawn from an uniform distribution . Moreover, we assume that elements of the observation matrix are drawn from a Gaussian distribution . Finally, we assume and are drawn from lognormal distributions and , respectively. The latter assumption is chosen for illustration only and is not related to known physiology.
Iv-C1 Recovery Performance from Error Bounds
We proceed to conduct a Monte Carlo simulation of 100 different realizations of , and . Fig. 8A illustrates that the maximum singular value of the matrix decreases as a function of the percent of inhibitory neurons. Thus, we anticipate from our derived performance bounds that performance in terms of mean square error (MSE) should be best for networks with high inhibition in the presence of noise. This prediction bears out in Fig. 8B, where we indeed observe a monotone relationship between MSE and inhibition. On the one hand, low-inhibition is favorable for facilitating recovery in the presence of disturbance depicted in Fig. 8C. Such tradeoffs are interesting to contemplate when considering the functional advantages of network architectures observed in biology, such as the pervasive 80-20 ratio of excitatory to inhibitory neurons [34, 59]. Together, Figs. 8B and 8C illustrate how the excitatory-inhibitory ratio mediate a basic tradeoff in the capabilities of a rate-based neuronal network.
Iv-C2 Recoverable Sparsity based on Theorem 12
Having ascertained the performance tradeoff curves, we sought to characterize in more detail the level of recoverable sparsity with specific connection to Theorem 12 and (21). We considered networks as above, but with and 20/80 for the ratio of inhibitory/excitatory over 10 time steps for 100 random trials. Thus, the output of the network is of lower dimension than the network state space and the observability matrix is of nontrivial construction. Fig. 9A shows that for this setup, the rank condition (21) holds up to . Thus, Theorem 12 predicts that recover will be possible (to within the RIP condition on ) for signals with 13 nonzero elements. Fig. 9B validates this theoretical prediction by illustrating recovery performance in the absence of disturbance and noise for different values . It is observed that when the rank condition holds, reconstruction is perfect and that the probability of exact recovery is decreased by increasing , as expected.
In this paper, we present several results pertaining to the effect of temporal dynamics on compressed sensing of time-varying signals. Specifically, we considered the recovery of sparse inputs to a linear dynamical system (network) from limited observation of the network states. We provide basic conditions on the system that ensure solution existence and, further, derive several bounds in terms of the system dynamics for recovery performance in the presence of both input disturbance and observation noise. We show that dynamics can play opposing roles in mediating accurate recovery with respect to these two different sources of corruption. Thus, our results indicate tradeoffs that may inform the design of dynamical systems for time-varying compressed sensing. These tradeoffs are illustrated through a series of examples, including one that highlights how the developed theory could be used to interrogate the functional role of inhibition in a neuronal network.
V-B Implications and Future Work
The results can have both engineering and scientific impacts. In the former case, the goal may be to design networks to process time-varying signals that are naturally sparse, such as high-dimensional neural data, or to be resilient to time-varying sparse perturbations. In the latter case, the goal is to understand how the naturally occurring architectures of networks, such as those in the brain, confer advantages for processing of afferent signals. In both cases, a precursor to further study are a set of verifiable conditions that overtly link network characteristics/dynamics to sparse input processing. Our paper provides such conditions for networks with linear dynamics and develops illustrative examples that highlight these potential applications. Treatment of systems with nonlinear dynamics, as well as a more detailed examination of random networks using the theory, are left as subjects for future work.
Appendix A Proof of Lemma 3
Based on the assumption on the null space of the linear map , given , there is a unique sequence of . We now prove the uniqueness of . First, consider the following equations:
The remainder of the proof is by contradiction. Let us assume that the sequence is not unique and there is another sequence of -sparse which satisfies (30), leading to
Matrix is non-singular, hence equation (33) can be simplified as follows:
Based on the assumption that the matrix satisfies the RIP condition (7) with isometry constant and the fact that the support of the vectors are at most , the lower bound of the RIP condition for results in
which means that and the sequence of -sparse vectors is unique.
Appendix B Proof of Lemma 5
If is the solution to Problem (), then satisfies the inequality in () which means that which can be reformulated as
which results in
Appendix C Proof of Lemma 6
For each and we have
Therefore, we have the following equation