Fast -Minimization Algorithm for Sparse Approximation Based on an Improved LPNN-LCA framework
The aim of sparse approximation is to estimate a sparse signal according to the measurement matrix and an observation vector. It is widely used in data analytics, image processing, and communication, etc. Up to now, a lot of research has been done in this area, and many off-the-shelf algorithms have been proposed. However, most of them cannot offer a real-time solution. To some extent, this shortcoming limits its application prospects. To address this issue, we devise a novel sparse approximation algorithm based on Lagrange programming neural network (LPNN), locally competitive algorithm (LCA), and projection theorem. LPNN and LCA are both analog neural network which can help us get a real-time solution. The non-differentiable objective function can be solved by the concept of LCA. Utilizing the projection theorem, we further modify the dynamics and proposed a new system with global asymptotic stability. Simulation results show that the proposed sparse approximation method has the real-time solutions with satisfactory MSEs.
Sparse approximation is frequently used in many different application areas including data analytic, image processing, communication, feature extraction, audio processing and so on [1, 2, 3, 4, 5]. The aim of sparse approximation algorithms is to represent an observation using a sparse signal selected from a specified measurement matrix. The relationship between the unknown sparse signal and the observation can be described as
where is an unknown sparse vector, is an observation vector, and is the measurement matrix (also known as dictionary) with rank (). The matrix is fat with full row rank. Apparently, an infinite number of solutions are available for this equality. Therefore, constraints must be introduced. In sparse approximation, the number of nonzero elements in is denoted by , the solution with the smallest is the best representation. Because for real data, like communication signals and images, even though their observations are in high-dimensional spaces, the actual signals are organized in some lower-dimensional subspaces, i.e., . Hence the spare representation problem is to solve:
where is the so-called -norm. implies the number of nonzero elements in . The problem in (2) is a non-convex. And, due to -norm, it is NP-hard . Hence, in the first kind of methods, approximate functions are used to replace the term. It is well known that -norm is the best convex approximate function of -norm. Thus the original problem in (2) can be rewritten as
which is known as basis pursuit (BP). It can be also transformed into the unconstrained form given by
where is a trade-off parameter. (4) is known as least absolute shrinkage and selection operator (LASSO) problem. Both (3) and (4) can be used to calculate the original solution of problem (2) under certain hypotheses . And, for solving the problem in (3) and (4), some elegant implementation packages, for example L1Magic  and SPGL1 [8, 9], are available. Comparing (3) and (4), (3) is more attractive. Because it does not need any trade-off parameter and the solution of this model normally has better performance. Therefore, the proposed algorithm also uses the BP model in (3).
Another common type of methods are greedy iterative algorithms which include the matching pursuit (MP) algorithm  and its variants. The basic procedures of this kind of algorithms are shown as follows. First, the column in which best matches with is found. Second, the residual is computed. Third, the column in which best matches with the residual is calculated; Finally, we repeat the second and third step several times and then calculate a linear summation of all selected columns. Theoretically, the linear summation of these columns is the most optimal sparse representation of the observation.
In practice, the computational speed is a major barrier to the real-time signal processing with high-dimensional signals . However, existing sparse approximation algorithms normally suffer from one or more the following drawbacks: (i) Computational speed for solving the sparse approximation problem largely depends on the dimension and denseness of the problem. With the increasing of dimensions and denseness, the performance of these conventional methods is worse and worse. (ii) Most of them cannot generate exactly zero-valued coefficients in finite time . (iii) Generally speaking, they can only calculate a suboptimal solution due to the relaxation.
The analog neural network is very attractive when real-time solution are needed [15, 16, 17]. Because the analog neural network can be implemented by hardware circuit and its parallel architecture can eliminate the influence of high dimensions and great denseness. Hence, in this paper, we use the analog neural network to solve the BP problem. LPNN is an analog neural network which is generally used to solve the nonlinear constrained optimization problems . However, LPNN requires that all its objective function and constraints should be twice differentiable. Obviously, BP problem does not satisfy this requirement. Hence it cannot be directly used to solve the BP problem. LCA is another analog neural network, which can be used for solving the sparse approximation problem in (4). Even though it is effective to handle the non-differentiable term, this method can only solve the unconstrained optimization problem. For solving the BP problem with LCA, we have to introduce an appropriate trade-off parameter and transform the original problem into the unconstrained form. Besides, LCA cannot ensure that the original constraints in BP are satisfied. Hence, in our previous work, LPNN and LCA are combined to solve the BP problem. Even though the performance of this method is superior, the global stability of this system is hard to prove. Inspired by the projection neural network [19, 20], we note that the dynamics of LPNN-LCA method also contain a projection operation to a convex set, and their structure can be further modified. After the modification, its equilibrium point is still equivalent to the optimal solution of the original BP problem, and the whole system has global stability in the sense of Lyapunov.
The rest of paper is organized as follows. In Section II, the background of LPNN, LCA and projection neural network are described. In Section III, the proposed algorithm are developed. The global convergence of our algorithm is proved in Section IV. Experimental results for algorithm evaluation and comparison are provided in Section V. Finally, conclusions are drawn in Section VI.
We use a lower-case or upper-case letter to represent a scalar while vectors and matrices are denoted by bold lower-case and upper-case letters, respectively. The transpose operator is denoted as , and and respectively represent the identity matrix and zero matrix with appropriate dimensions. Other mathematical symbols are defined in their first appearance.
Ii-B Lagrange Programming Neural Network
LPNN is an analog neural network, which can be used to solve a general nonlinear constrained optimization problem given by
where is the variable vector, is the objective function, () represents equality constraints. In LPNN, we first set up its Lagrangian:
where is the Lagrange multiplier. There are two kinds of neurons in LPNN, namely, variable neurons and Lagrangian neurons. The variable neurons are used to hold the decision variable vector while the Lagrangian neurons hold the Lagrange multiplier vector . In the LPNN framework, the dynamics of the neurons are
where is the time constant of the circuit. Without loss of generality, we let in this paper. The differential equations in (7) govern the state transition of the neurons. After the neurons settle down at an equilibria, the solution is obtained by measuring the neuron outputs at the equilibrium point. The purpose of (7a) is to seek for a state with the minimum objective value, while (7b) aims to constrain its outputs into the feasible region. With (7), the network will settle down at a stable state if several mild conditions are satisfied [18, 21, 22]. It is worth noting that both and should be differentiable. Otherwise, the dynamics cannot be defined. Thus, due to the -norm term in (3), it cannot be directly solved by LPNN.
Ii-C Locally Competitive Algorithm
Due to the non-differentiable term , LCA introduces an internal state variable and defines an element-wise relationship between and which is given by
Where and denote the i-th element of and respectively. The shapes of function (8) with , and are shown in Fig. 1. It is observed that the is a threshold of function (8). From (8), we can further deduce that
where denotes the sub-differential of the -norm term which is equal to its gradient () at the differentiable points and equal to a set at zero. Thus, LCA defines its dynamics on as
It should be noticed that if the dynamics are defined as , we need to implement which is equal to a set at the zero point. Therefore, LCA uses rather than . The properties of LCA has been analysed in [14, 13, 23]. Besides, LCA cannot directly handle the constrained problem in (3).
Ii-D LPNN-LCA Framework
LPNN is a framework for solving the general nonlinear constrained optimization problems. LCA can effectively handle the non-differentiable term in dynamics. But neither LPNN nor LCA can solve the problem in (3) alone. Hence, in our previous study, we have combined these two methods and devised the LPNN-LCA framework which can be used to solve the problem in (3) .
For this method, we first construct the Lagrangian of (3) as
Then, follows the concept of LPNN in (7), we define its dynamics:
To handle the non-differential part, the concept of LCA is utilized and its dynamics can be further modified as
The relationship between and is given by (8). However, according to our preliminary simulation result, this system may not be stable. Hence we further introduce an augmented term into its Lagrangian, then
The augmented term does not affect the objective value at an equilibrium point , because . And it can further improve the stability of the system. Thus, the dynamics can be rewritten as
After the dynamics in (15) settle down, the solution of problem (3) can be obtained at an equilibrium point. Even the augmented term is used, we can only prove the local asymptotic convergence of this system.
Ii-E Projection Neural Network
where is a projection operator defined by
The dynamic of projection neural network is defined as follows:
where , and is a positive scaling constant. This neural network can be used to solve the VI problem, by solving its relevant nonlinear projection formulation in (17). Projection neural network is a kind of recurrent neural network.
Iii Development of Proposed Algorithm
Iii-a The Improved LPNN-LCA Framework
In , we have proved that the equilibrium point of the dynamics (15) is equivalent to the optimal solution of problem (3), and the system has local asymptotic convergence. However, its global convergence is hard to prove. In this section, we further modify the dynamics. After the modification, we hope that the equilibrium point of the new dynamics is still equivalent to the minimal solution of problem (3) and the global convergence of the proposed neural network can be proved.
According to the description in Section II, we see that the dynamic of the projection neural network is constructed by two parts: the projection operation term and a non-projected term. It is worth noting that the operation in LPNN-LCA framework can also be seen as an projection operation. Because according to the threshold function in (8), if we let , the element-wise relationship between and is
Apparently, the operation in (19) can be seen as a projection operation , which can project into a box set . Inspired by the dynamic of the projection neural network, we also modify the dynamics in (13) with the similar trick, after that the dynamics can be expressed as
For the convenience of description, we respectively define that , . While is the original function given by (13b), it includes a non-projected term. Refer to the dynamic of the projection neural network in , we also introduce a projection operation into it, thus we can get (20b). The matrix before the additional term is needed for resizing its dimensions. With the dynamics in (20), the parameters are updated with following steps:
where is the step size ().
Iii-B Properties Analysis
The equilibrium points of the dynamics in (20) are with respect to the internal state variable , while the optimal solution of original problem (3) is in regard to the decision variable . Hence, we need Theorem to show that the equilibrium point of the dynamics in (20) is identical to the optimal solution of problem (3).
Theorem 1: Let denotes an equilibrium point of (20). At the equilibrium point, the KKT conditions of (3) are satisfied. Since the optimization problem given in (3) is convex, and the regularity condition is satisfied. Thus the equilibrium point of (20) is equivalent to the optimal solution of the problem in (3).
Proof: The KKT conditions of problem (3) are
According to the definition of the equilibrium point, we have
Where , . By the concept of LCA, we know that , thus from (23a) we can deduce thatï¼
(22b) is satisfied. So the equilibrium point of (20) satisfies the KKT conditions of problem (3). Besides, we know the problem in (3) is convex and the regularity condition is satisfied. Hence, the KKT conditions in (22) are sufficient and necessary. Moreover, we can say that the equilibrium point of (20) is a global optimal solution of problem (3).
According to the analysis of LPNN given in , we know that (13a) is used for seeking the minimum objective value, while (13b) aims at making the variables fall into the feasible region. After we modify the dynamics, these features are still valid. However, comparing with dynamics in (13), (20) accelerates its convergence to the optimal point, and it does not introduce any augmented term. After modification the complexity of the dynamics is still equal to . We know that the circuit complexity of analog neural network depends on the time derivative calculations. Hence, the proposed improved LPNN-LCA framework basically does not increase the circuit complexity.
Iv Global Stability Analysis
Lemma 1: is a closed and convex set, denotes the projection operator to . For any and ,
The proof is shown in .
For the proposed neural network, is a projection operation to a closed and convex set. Thus, if we let and , the inequality (26) can be rewritten as
Then let and , based on the inequality in (26), we can deduce that
Lyapunov stability theory: Let be an equilibrium point of the system , Let be a continuously differentiable function such that
then the system is globally asymptotically stable in the sense of Lyapunov.
With Lemma 1 and the Lyapunov stability theory, we can deduce the following theorem.
Proof: To prove the global convergence, first, we need to devise a valid Lyapunov function. Here we use the standard form given by
where , and is an equilibrium point of dynamics in (20). According to Theorem 1, we know that is also the optimal solution of problem (3). Obviously, the function is continuous, differentiable and radially unbounded. When , we have . Otherwise, , for any . Next, we prove , for any . The derivative of is given by
Apparently, from (29) we know that, for any , the function in (IV) are less than . So for any . Thus we can say that the proposed neural network is globally asymptotically stable in the sense of Lyapunov. If , we have . Similar with the proof in , we see the system in (20) is globally convergent to an equilibrium point.
Similar with the dynamics in (15), we can also introduce an augment term into its Lagrangian, then we have
V Simulation and Experimental Result
In this section, we conduct several experiments to evaluate the performance of the proposed algorithm. In the following experiments, the length of the signal are or . When , the number of non-zero elements . And their position are randomly chosen with uniform distribution. Their corresponding values are randomly equal to or . While for the case , the number of non-zero elements . The measurement matrix is a random matrix (). And each column of it is normalized to have a unit -norm.
For better observing the performance of the proposed algorithm, several -norm based sparse approximation algorithms are implemented. They are L1Magic package , SPGL1 package , homotopy method , the improved PNN-based sparse reconstruction (IPNNSR) method  and the LPNN-LCA method given by (15). The L1Magic is a toolbox which can be used to solve the BP problem with the primal-dual interior point method. The SPGL1 package is also a common way for solving the BP problem, it is mainly based on the spectral projected gradient (SPG) method which uses the gradient vector as a search direction and chooses its step size according to the spectrum of the underlying local Hessian. Homotopy is a kind of greedy iterative algorithm. IPNNSR and LPNN-LCA are both analog neural networks. The first one is based on the projection neural network, and its global convergence has been proved . While the second one is our previous work which is based on LPNN and LCA, and it only has local asymptotic convergence.
In Section IV, we theoretically prove the global convergence of the proposed method. Then, we use several typical examples to show its convergence in practice. For both Fig. 2 and Fig. 3, the first three columns are the dynamics of the estimated parameters , , and and the corresponding recovered signals are shown in the last column. From Fig. 2, it is observed that when , the dynamics can settle down within around 60-80 iterations. While for in Fig. 3, the dynamics can settle down within 100 to 120 iterations. From these figures, we know that the whole system is stable and can converge to their optimal solutions.
V-B Convergence time
In this experiment, we test the speed of the proposed method. First let , , and . The results are shown in Fig. 4. Where the y-axis denotes the average relative error of ten trials, and the x-axis is the CPU time (unit: sec). While Fig. 5 depicts the results when , , and . From Fig. 4 and Fig. 5, it is observed that, when , the greedy method homotopy is better than others. And compared with other two analog neural networks, the IPNNSR needs less convergence time to achieve a satisfied result. When , the performance of our proposed algorithm is superior. Comparing with others, the proposed algorithm has relatively stable convergence time.
Besides, it is worth noting that these three analog neural networks can be implemented with hardware circuit, hence their speed can be further improved. The computational complexity of IPNNSR’s dynamics is , while, for the proposed algorithm and LPNN-LCA, their corresponding value is . Hence, compared with the improved LPNN-LCA and the original LPNN-LCA, the increase of has a greater impact on IPNNSR. Comparing with LPNN-LCA, the improved method has better stability. And its step size in the experiments. However, the step size of the original LPNN-LCA can only be 0.1 under the same situation, otherwise the system may not converge. So the convergence time of the proposed method is less than the original LPNN-LCA framework.
V-C MSE performance
In this experiment, we fix the length of the signal and the number of its nonzero elements. The MSE performances of the proposed method are investigated with different . We repeat the experiment times with different measurement matrices, initial states and sparse signals. The results are given in Fig. 6. Where the x-axis denotes the number of elements in observation vector , the y-axis is the mean square error (MSE) between the recovered signal and the original one. It can be seen from the figures that as increases, the MSEs of all algorithms decrease. In the same situation, the SPGL1 package and the homotopy method have larger MSEs. For other four approaches, their MSE performance is quite similar with each other.
Even the proposed method is devised for solving the BP problem, it still works under low level Gaussian noise. The corresponding simulation results are given in Fig.7 and Fig.8. Their observations are generated by
This paper proposes a novel sparse approximation algorithm by combining LPNN, LCA and projection theorem. The main target of the proposed algorithm is to calculate a real-time solution of the large-scale problem. For this purpose, this paper uses LPNN framework and introduces LCA to set up an approximate differentiable expression for the sub-differential at the non-differentiable point. Referring to the dynamic of projection neural network, we further modify its dynamics. Thus the equilibrium point of the improved dynamics is equivalent to an optimal solution, and the proposed algorithm is globally stable in the sense of Lyapunov. Besides, according to the simulation results, the MSE performance of the proposed algorithm is similar with several state-of-the-art methods and for large-scale problem it has obvious advantage in speed.
In future work we will try to modify the dynamics of LPNN-LCA method for solving BPDN problem and prove its global convergence. Besides, the modified dynamics may be helpful to improve the performance of the approximate -norm relevant methods.
-  J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust face recognition via sparse representation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
-  F. Bach, J. Mairal, J. Ponce, and G. Sapiro, “Sparse coding and dictionary learning for image analysis,” in Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, 2010.
-  K. Huang and S. Aviyente, “Sparse representation for signal classification,” in Advances in Neural Information Processing Systems, 2006, pp. 609–616.
-  Y. Li, A. Cichocki, and S.-i. Amari, “Analysis of sparse representation and blind source separation,” Neural Computation, vol. 16, no. 6, pp. 1193–1234, 2004.
-  J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan, “Sparse representation for computer vision and pattern recognition,” Proceedings of the IEEE, vol. 98, no. 6, pp. 1031–1044, 2010.
-  D. L. Donoho, “For most large underdetermined systems of linear equations the minimal -norm solution is also the sparsest solution,” Communications on Pure and Applied Mathematics, vol. 59, no. 6, pp. 797–829, 2006.
-  E. Candes and J. Romberg, “l1-magic: Recovery of sparse signals via convex programming,” URL: www. acm. caltech. edu/l1magic/downloads/l1magic. pdf, vol. 4, p. 46, 2005.
-  E. van den Berg and M. P. Friedlander, “Probing the Pareto frontier for basis pursuit solutions,” SIAM Journal on Scientific Computing, vol. 31, no. 2, pp. 890–912, 2008.
-  E. van den Berg and M. P. Friedlander, “SPGL1: A solver for large-scale sparse reconstruction,” June 2007, [Online]. Available: http://www.cs.ubc.ca/labs/scl/spgl1.
-  S. G. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397–3415, 1993.
-  R. D. Nowak, S. J. Wright et al., “Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems,” IEEE Journal of Selected Topics in Signal Processing, vol. 1, no. 4, pp. 586–597, 2007.
-  E. Candes and T. Tao, “The dantzig selector: Statistical estimation when p is much larger than n,” The Annals of Statistics, pp. 2313–2351, 2007.
-  A. Balavoine, J. Romberg, and C. J. Rozell, “Convergence and rate analysis of neural networks for sparse approximation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 9, pp. 1377–1389, 2012.
-  C. J. Rozell, D. H. Johnson, R. G. Baraniuk, and B. A. Olshausen, “Sparse coding via thresholding and local competition in neural circuits,” Neural Computation, vol. 20, no. 10, pp. 2526–2563, 2008.
-  A. Cochocki and R. Unbehauen, Neural networks for optimization and signal processing. John Wiley and Sons, Inc., 1993.
-  J. J. Hopfield, “Neural networks and physical systems with emergent collective computational abilities,” Proceedings of the National Academy of Sciences, vol. 79, no. 8, pp. 2554–2558, 1982.
-  L. Chua and G.-N. Lin, “Nonlinear programming without computation,” IEEE Transactions on Circuits and Systems, vol. 31, no. 2, pp. 182–188, 1984.
-  S. Zhang and A. Constantinides, “Lagrange programming neural networks,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 39, no. 7, pp. 441–452, 1992.
-  Y. Xia, H. Leung, and J. Wang, “A projection neural network and its application to constrained optimization problems,” IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications, vol. 49, no. 4, pp. 447–458, 2002.
-  Y. Xia and J. Wang, “A general projection neural network for solving monotone variational inequalities and related optimization problems,” IEEE Transactions on Neural Networks, vol. 15, no. 2, pp. 318–328, 2004.
-  J. Liang, H. C. So, C. S. Leung, J. Li, and A. Farina, “Waveform design with unit modulus and spectral shape constraints via Lagrange programming neural network,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 8, pp. 1377–1386, 2015.
-  J. Liang, C. S. Leung, and H. C. So, “Lagrange programming neural network approach for target localization in distributed MIMO radar,” IEEE Trans. Signal Process., vol. 64, no. 6, pp. 1574–1585, Mar. 2016.
-  A. Balavoine, C. J. Rozell, and J. Romberg, “Global convergence of the locally competitive algorithm,” in Digital Signal Processing Workshop and IEEE Signal Processing Education Workshop (DSP/SPE). IEEE, 2011, pp. 431–436.
-  R. Feng, C.-S. Leung, A. G. Constantinides, and W.-J. Zeng, “Lagrange programming neural network for nondifferentiable optimization problems in sparse approximation,” IEEE transactions on neural networks and learning systems, 2017.
-  D. Kinderlehrer and G. Stampacchia, An introduction to variational inequalities and their applications. Siam, 1980, vol. 31.
-  Q. Liu and J. Wang, “-minimization algorithms for sparse signal reconstruction based on a projection neural network,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 3, pp. 698–707, 2016.
-  D. M. Malioutov, M. Cetin, and A. S. Willsky, “Homotopy continuation for sparse signal representation,” in Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP’05). IEEE International Conference on, vol. 5. IEEE, 2005, pp. v–733.
Hao Wang is currently pursuing Ph.D. degree from the Department of Electronic Engineering, City University of Hong Kong. His current research interests include neural networks and machine learning.
Ruibin Feng received the PhD. degree in electronic engineering from the City University of Hong Kong in 2017. His current research interests include Neural Networks and Machine Learning.
Chi-Sing Leung received the Ph.D. degree from the Chinese University of Hong Kong, Hong Kong, in 1995.He is currently a Professor with the Department of Electronic Engineering, City University of Hong Kong, Hong Kong. He has authored over 120 journal papers in the areas of digital signal processing, neural networks, and computer graphics. His current research interests include neural computing and computer graphics. Dr. Leung was a member of the Organizing Committee of ICONIP2006. He received the 2005 IEEE Transactions on Multimedia Prize Paper Award for his paper titled The Plenoptic Illumination Function in 2005. He was the Program Chair of ICONIP2009 and ICONIP2012. He is/was the Guest Editor of several journals, including Neural Computing and Applications, Neurocomputing, and Neural Processing Letters. He is a Governing Board Member of the Asian Pacific Neural Network Assembly (APNNA) and the Vice President of APNNA.