Robust Decoding from 1-Bit Compressive Sampling with Least Squares

Robust Decoding from 1-Bit Compressive Sampling with Least Squares

Jian Huang  Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong 999077, P.R. China.(j.huang@polyu.edu.hk)    Yuling Jiao  School of Statistics and Mathematics, Zhongnan University of Economics and Law, Wuhan 430063, P.R. China. (yulingjiaomath@whu.edu.cn)    Xiliang Lu  School of Mathematics and Statistics, and Hubei Key Laboratory of Computational Science, Wuhan University, Wuhan 430072, P.R. China. (xllv.math@whu.edu.cn)    Liping Zhu Institute of Statistics and Big Data and Center for Applied Statistics, Renmin University of China, Beijing 100872, P. R. China. (zhu.liping@ruc.edu.cn)
Abstract

In 1-bit compressive sensing (1-bit CS) where target signal is coded into a binary measurement, one goal is to recover the signal from noisy and quantized samples. Mathematically, the 1-bit CS model reads: , where , , and is the random error before quantization and is a random vector modeling the sign flips. Due to the presence of nonlinearity, noise and sign flips, it is quite challenging to decode from the 1-bit CS. In this paper, we consider least squares approach under the over-determined and under-determined settings. For , we show that, up to a constant , with high probability, the least squares solution approximates with precision as long as . For , we prove that, up to a constant , with high probability, the -regularized least-squares solution lies in the ball with center and radius provided that and . We introduce a Newton type method, the so-called primal and dual active set (PDAS) algorithm, to solve the nonsmooth optimization problem. The PDAS possesses the property of one-step convergence. It only requires to solve a small least squares problem on the active set. Therefore, the PDAS is extremely efficient for recovering sparse signals through continuation. We propose a novel regularization parameter selection rule which does not introduce any extra computational overhead. Extensive numerical experiments are presented to illustrate the robustness of our proposed model and the efficiency of our algorithm.
Keywords: 1-bit compressive sensing, -regularized least squares, primal dual active set algorithm, one step convergence, continuation

1 Introduction

Compressive sensing (CS) is an important approach to acquiring low dimension signals from noisy under-determined measurements [8, 16, 19, 20]. For storage and transmission, the infinite-precision measurements are often quantized, [6] considered recovering the signals from the 1-bit compressive sensing (1-bit CS) where measurements are coded into a single bit, i.e., their signs. The 1-bit CS is superior to the CS in terms of inexpensive hardware implementation and storage. However, it is much more challenging to decode from nonlinear, noisy and sign-flipped 1-bit measurements.

1.1 Previous work

Since the seminal work of [6], much effort has been devoted to studying the theoretical and computational challenges of the 1-bit CS. Sample complexity was analyzed for support and vector recovery with and without noise [21, 28, 40, 23, 29, 22, 23, 41, 50]. Existing works indicate that, is adequate for both support and vector recovery. The sample size required here has the same order as that required in the standard CS setting. These results have also been refined by adaptive sampling [22, 14, 4]. Extensions include recovering the norm of the target [32, 3] and non-Gaussian measurement settings [1]. Many first order methods [6, 34, 49, 14] and greedy methods [35, 5, 29] are developed to minimize the sparsity promoting nonconvex objected function arising from either the unit sphere constraint or the nonconvex regularizers. To address the nonconvex optimization problem, convex relaxation models are also proposed [50, 41, 40, 51, 42], which often yield accurate solutions efficiently with polynomial-time solvers. See, for example, [38].

1.2 1-bit CS setting

In this paper we consider the following 1-bit CS model

\hb@xt@.01(1.1)

where is the 1-bit measurement, is an unknown signal, is a random matrix, is a random vector modeling the sign flips of , and is a random vector with independent and identically distributed (iid) entries modeling errors before quantization. Throughout operates componentwise with if and otherwise, and is the pointwise Hardmard product. Following [40] we assume that the rows of are iid random vectors sampled from the multivariate normal distribution with an unknown covariance matrix , is distributed as with an unknown noise level , and has independent coordinates satisfying . We assume and are mutually independent. Because is known model (LABEL:setup) is invariant in the sense that , . This indicates that the best one can hope for is to recover up to a scale factor. Without loss of generality we assume .

1.3 Contributions

We study the 1-bit CS problem in both the overdetermined setting with and the underdetermined setting with . In the former setting we allow for dense , while in the latter, we assume that is sparse in the sense that The basic message is that we can recover with the ordinary least squares or the regularized least squares.

(1) When , we propose to use the least squares solution

to approximate . We show that, with high probability, estimates accurately up to a positive scale factor defined by (LABEL:constc) in the sense that, , if . We make the following observation:

  • Up to a constant , the underlying target can be decoded from 1-bit measurements with the ordinary least squares, as long as the probability of sign flips probability is not equal to .

(2) When and the target signal is sparse, we consider the -regularized least squares solution

\hb@xt@.01(1.2)

The sparsity assumption is widely used in modern signal processing [20, 36]. We show that, with high probability the error can be bounded by a prefixed accuracy if , which is the same as the order for the standard CS methods to work. Furthermore, the support of can be exactly recovered if the minimum signal magnitude of is larger than When the target signal is sparse, we obtain the following conclusion:

  • Up to a constant , the sparse signal can also be decoded from 1-bit measurements with the -regularized least squares, as long as the probability of sign flips probability is not equal to .

(3) We introduce a fast and accurate Newton method, the so-called primal dual active set method (PDAS), to solve the -regularized minimization (LABEL:subreg). The PDAS possesses the property of one-step convergence. The PDAS solves a small least squares problem on the active set, is thus extremely efficient if combined with continuation. We propose a novel regularization parameter selection rule, which is incorporated with continuation procedure without additional cost. The code is available at http://faculty.zuel.edu.cn/tjyjxxy/jyl/list.htm.

  • The optimal solution can be computed efficiently and accurately with the PDAS, a Newton type method which converges after one iteration, even if the objective function (LABEL:subreg) is nonsmooth. Continuation on globalizes the PDAS. The regularization parameter can be automatically determined without additional computational cost.

1.4 Notation and organization

Throughout we denote by and the th column and th row of , respectively. We denote a vector of by 0, whose length may vary in different places. We use to denote the set , and to denote the identity matrix of size . For with length , , and denotes a submatrix of whose rows and columns are listed in and , respectively. We use to denote the th entry of the vector , and to denote the minimum absolute value of . We use to denote the multivariate normal distribution, with symmetric and positive definite. Let and be the largest and the smallest eigenvalues of , respectively, and be the condition number of . We use to denote the elliptic norm of with respect to , i.e., Let be the -norm of . We denote the number of nonzero elements of by and let . The symbols and stands for the operator norm of induced by norm and the maximum pointwise absolute value of , respectively. We use , , to denote the expectation, the conditional expectation and the probability on a given probability space . We use and to denote generic constants which may vary from place to place. By and , we ignore some positive numerical constant and , respectively.

The rest of the paper is organized as follows. In Section 2 we explain why the least squares works in the 1-bit CS when , and obtain a nonasymptotic error bound for In Section 3 we consider the sparse decoding when and prove a minimax bound on . In Section 4 we introduce the PDAS algorithm to solve (LABEL:subreg). We propose a new regularization parameter selection rule combined with the continuation procedure. In Section 5 we conduct simulation studies and compare with existing 1-bit CS methods. We conclude with some remarks in Section 6.

2 Least squares when

In this section, we first explain why the least squares works in the over-determined 1-bit CS model (LABEL:setup) with . We then prove a nonasymptotic error bound on The following lemma inspired by [7] is our starting point.

Lemma 2.1

Let be the 1-bit model (LABEL:setup) at the population level. , , It follows that,

\hb@xt@.01(2.1)

where

\hb@xt@.01(2.2)

Proof. The proof is given in Appendix LABEL:app:parallel.     

Lemma LABEL:parallel shows that, the target is proportional to . Note that

\hb@xt@.01(2.3)
\hb@xt@.01(2.4)

As long as is invertible, it is reasonable to expect that

can approximate well up to a constant even if consists of sign flips.

Theorem 2.2

Consider the ordinary least squares:

\hb@xt@.01(2.5)

If , then with probability at least ,

\hb@xt@.01(2.6)

where and are some generic constants not depending on or .

Proof. The proof is given in Appendix LABEL:app:errls.     

Remark 2.1

Theorem (LABEL:errls) shows that, if , up to a constant, the simple least squares solution can approximate with error of order even if contains very large quantization error and sign flips with probability unequal to .

Remark 2.2

To the best of our knowledge, this is the first nonasymptotic error bound for the 1-bit CS in the overdetermined setting. Comparing with the estimation error of the standard least squares in the complete data model , the error bound in Theorem LABEL:errls is optimal up to a logarithm factor , which is due to the loss of information with the 1-bit quantization.

3 Sparse decoding with -regularized least squares

3.1 Nonasymptotic error bound

Since images and signals are often sparsely represented under certain transforms [36, 15], it suffices for the standard CS to recover the sparse signal with measurements for . In this section we show that in the 1-bit CS setting, similar results can be derived through the -regularized least squares (LABEL:subreg). Model (LABEL:subreg) has been extensively studied when is continuous [44, 9, 8, 16]. Here we use model (LABEL:subreg) to recover from quantized , which is rarely studied in the literature.

Next we show that, up to the constant , is a good estimate of when even if the signal is highly noisy and corrupted by sign flips in the 1-bit CS setting.

Theorem 3.1

Assume , . Set . Then with probability at least , we have,

\hb@xt@.01(3.1)

Proof. The proof is given in Appendix LABEL:app:errsub.     

Remark 3.1

Theorem LABEL:errsub shows that, , if , up to a constant , the -regularized least squares solution can approximate with precision .

Remark 3.2

The error bound in Theorem LABEL:errsub achieves the minimax optimal order in the sense that it is the optimal order that can be attained even if the signal is measured precisely without 1-bit quantization [37]. From Theorem LABEL:errsub if the minimum nonzero magnitude of is large enough, i.e., , the support of coincides with that of .

3.2 Comparison with related works

Assuming and and , [6] proposed to decode with

A first order algorithm was devised to solve the following Lagrangian version [34], i.e.,

In the presence of noise, [29] introduced

\hb@xt@.01(3.2)

where or . They used a projected sub-gradient method, the so-called binary iterative hard thresholding (BITH), to solve (LABEL:orrobust). Assuming that there are sign flips in the noiseless model with , [14] considered

\hb@xt@.01(3.3)

where are tuning parameters. An adaptive outlier pursuit (AOP) generalization of (LABEL:orrobust) was proposed in [49] to recover and simultaneously detect the entries with sign flips by

where is the number of sign flips. Alternating minimization on and are adopted to solve the optimization problem. [24] considered the pinball loss to deal with both the noise and the sign flips, which reads

where . Similar to the BITH, the pinball iterative hard thresholding is utilized. With the binary stable embedding, [29] and [14] proved that with high probability, the sample complexity of (LABEL:orrobust) and (LABEL:prox) to guarantee estimation error smaller than is , which echoes Theorem LABEL:errsub. However, there are no theoretical results for other models mentioned above. All the aforementioned models and algorithms are the state-of-the-art works in the 1-bit CS. However, all the methods mentioned above are nonconvex. It is thus hard to justify whether the corresponding algorithms are loyal to their models.

Another line of research concerns convexification. The pioneering work is [40], where they considered the noiseless case without sign flips and proposed the following linear programming method

As shown in [40], the estimation error is . The above result is improved to in [41], where both the noise and the sign flips are allowed, through considering the convex problem

\hb@xt@.01(3.4)

Comparing with our result in Theorem LABEL:errsub, the results derived in [40] and [41] are suboptimal.

In the noiseless case and assuming , [50] considered the Lagrangian version of (LABEL:convex1)

\hb@xt@.01(3.5)

In this special case, the estimation error derived in [50] improved that of [41] and matched our results in Theorem LABEL:errsub. However, [50] did not discuss the scenario of Recently [42, 47], proposed a simple projected linear estimator , where , to estimate the low-dimensional structure target belonging to in high dimensions from noisy and possibly nonlinear observations. They derived the same order of estimation error as that in Theorem LABEL:errsub.

[51] proposed an regularized maximum likelihood estimate, and [24] introduced a convex model through replacing the linear loss in (LABEL:convex2) with the pinball loss. However, neither studied sample complexity or estimation error.

4 Primal dual active set algorithm

Existing algorithms for (LABEL:subreg) are mainly first order methods [45, 2, 12]. In this section we use primal dual active set method [18, 30], which is a generalized Newton type method, [27, 43] to solve (LABEL:subreg). An important advantage of the PDAS is that it converges after one-step iteration if the initial value is good enough. We globalize it with continuation on regularization parameter. We also propose a novel regularization parameter selection rule which is incorporated along the continuation procedure without any additional computational burden.

4.1 Pdas

In this section we use to denote for simplicity. We begin with the following results [13]. Let be the minimizer of (LABEL:subreg), then satisfies

\hb@xt@.01(4.1)

Conversely, if and satisfy (LABEL:kkteq), then is a global minimizer of (LABEL:subreg), where is the pointwise soft-thresholding operator defined by

Let where By (LABEL:kkteq), finding the minimizer of (LABEL:subreg) is equivalent to finding the root of . We use the following primal dual active set method (PDAS) [18, 30] to find the root of .

1:  Input , initial guess , maximum number of iteration MaxIter. Let .
2:  for  do
3:     Compute the active and inactive sets and respectively by
4:     Update and by
5:     If , stop.
6:  end for
7:  Output .
Algorithm 1 PDAS:
Remark 4.1

We can stop when is greater than a user-predefined MaxIter. Since the PDAS converges after one iteration, a desirable property stated in Theorem LABEL:thm:cov, we set .

The PDAS is actually a generalized Newton method for finding roots of nonsmooth equations [27, 43], since the iteration in Algorithm LABEL:alg:genew can be equivalently reformulated as

\hb@xt@.01(4.2)
\hb@xt@.01(4.3)

where

\hb@xt@.01(4.4)

We prove this equivalency in Appendix LABEL:app:shownewton for completeness.

Local superlinear convergence has been established for generalized Newton methods for nonsmooth equations [27, 43]. The PDAS require one iteration to convergence. We state the results here for completeness, which is proved in [18].

Theorem 4.1

Let and satisfy (LABEL:kkteq). Denote and Let and be initial input of algorithm LABEL:alg:genew. If the columns of are full-rank and the initial input satisfies Then, , where is updated from after one iteration.

4.2 Globalization and automatic regularization parameter selection

To apply the PDAS (Algorithm LABEL:alg:genew) to (LABEL:subreg), we need to have an initial guess and specify a proper regularization parameter in . In this section, we address these two issues together with continuation. Since the PDAS is a Newton type algorithm with fast local convergence rate and is piecewise linear function of [39], we adopt a continuation to fully exploit the fast local convergence. In particular, this is a simple way to globalize the convergence of PDAS [18]. Observing that satisfies (LABEL:kkteq) if , we define with for . We run Algorithm LABEL:alg:genew on the sequence with warmstart, i.e., using the solution as an initial guess for the -problem. When the whole continuation is done we obtain a solution path of (LABEL:subreg). For simplicity, we refer to the PDAS with continuation as PDASC described in Algorithm LABEL:alg:hotgenew.

1:  Input, , , , , MaxGrid, MaxIter.
2:  for  do
3:     Run algorithm LABEL:alg:genew with , initialized with .
4:     Check the stopping criterion.
5:  end for
6:  Output, .
Algorithm 2 PDASC:

The regularization parameter in the -regularized 1-bit CS model (LABEL:subreg), which compromises the tradeoff between data fidelity and the sparsity level of the solution, is important for theoretical analysis and practical computation. However, the well known regularization parameter selection rules such as discrepancy principle [17, 25], balancing principle [10, 31, 11, 26] or Bayesian information criterion [18, 33], are not applicable to the 1-bit CS problem considered here, since the model errors are not available directly. Here we propose a novel rule to select regularization parameter automatically. We run the PDASC to yield a solution path until for the smallest possible . Let be the set of regularization parameter at which the output of PDAS has nonzero elements. We determine by voting, i.e.,

\hb@xt@.01(4.5)

Therefore, our parameter selection rule is seamlessly integrated with the continuation strategy which serves as a globalization technique without any extra computational overhead.

Here we give an example to show the accuracy of our proposed regularization parameter selection rule (LABEL:autoreg) with data . Descriptions of the data can be found in Section LABEL:num. Left panel of Fig. LABEL:fig:aspath shows the size of active set along the path of PDASC and right panel shows the underlying true signal and the solution selected by (LABEL:autoreg).

Figure 4.1: Active set size on the path (left panel) on data and the underlying true signal and the solution selected by (LABEL:autoreg) (right panel).

5 Numerical simulation

In this section we showcase the performance of our proposed least square decoders (LABEL:ls) and (LABEL:subreg). All the computations were performed on a four-core laptop with 2.90 GHz and 8 GB RAM using MATLAB 2015b. The MATLAB package 1-bitPDASC for reproducing all the numerical results can be found at http://faculty.zuel.edu.cn/tjyjxxy/jyl/list.htm.

5.1 Experiment setup

First we describe the data generation process and our parameter choice. In all numerical examples the underlying target signal with is given, and the observation is generated by , where the rows of are iid samples from with . We keep the convention The elements of are generated from , has independent coordinate with . Here, we use to denote the data generated as above for short. We fix in our proposed PDASC algorithm and use (LABEL:autoreg) to determine regularization parameter . All the simulation results are based on 100 independent replications.

5.2 Accuracy and Robustness of when

Now we present numerical results to illustrate the accuracy of the least square decoder and its robustness to the noise and the sign flips. Fig. LABEL:fig:lowsigma shows the recovery error on data set . Left panel of Fig. LABEL:fig:lowq shows the recovery error on data set and right panel gives recovery error on data . It is observed that the recovery error () of the least square decoder is small (around 0.1) and robust to noise level and sign flips probability . This confirms theoretically investigations in Theorem LABEL:errls, which states the error is of order .

Figure 5.1: Recovery error v.s. on
Figure 5.2: Recovery error v.s. on (left panel) and on (right panel).

5.3 Support recovery of when

We conduct simulations to illustrate the performance of model (LABEL:subreg) PDASC algorithm. We report how the exact support recovery probability varies with the sparsity level , the noise level and the probability of sign flips. Fig. LABEL:fig:probsupp indicates that, as long as the sparsity level is not large, recovers the underlying true support with high probability even if the measurement contains noise and is corrupted by sign flips. This confirms the theoretical investigations in Theorem LABEL:errsub.

(a) (b)
(c) (d)
Figure 5.3: The exact support recovery probability v.s. , and on data set (panel (a)), (panel (b)) and (panel (c)), (panel (d)).

5.4 Comparison with other state-of-the-art

Now we compare our proposed model (LABEL:subreg) and PDASC algorithm with several state-of-the-art methods such as BIHT [28] (http://perso.uclouvain.be/laurent.jacques/index.php/Main/BIHTDemo), AOP [49] and PBAOP [24] (both AOP and PBAOP available at http://www.esat.kuleuven.be/stadius/ADB/huang/downloads/1bitCSLab.zip) and linear projection (LP) [47, 42]. BIHT, AOP, LP and PBAOP are all required to specify the true sparsity level . Both AOP and PBAOP also need to required to specify the sign flips probability . The PDASC does not require to specify the unknown parameter sparsity level or the probability of sign flips . We use , , , and , , , and , , . The average CPU time in seconds (Time (s)), the average of the error (-Err), and the probability of exactly recovering true support (PrE ()) are reported in Table LABEL:tab:compother. The PDASC is comparatively very fast and the most accurate even if the correlation , the noise level and the probability of sign flips are large.

   
    (a) (b) (c)
Method Time (s) -Err PrE Time (s) -Err PrE Time -Err PrE
BIHT 1.42e-1 1.89e-1 92 1.31e-1 5.73e-1 19 1.32e-1 9.39e-1 0
AOP 2.72e-1 7.29e-2 100 3.55e-1 2.11e-1 92 3.58e-1 4.22e-1 44
LP 8.70e-3 4.19e-1 98 8.50e-3 4.22e-1 93 8.30e-3 4.81e-1 26
PBAOP 1.46e-1 9.08e-2 100 1.36e-1 2.05e-1 90 1.35e-1 4.53e-1 36
PDASC 4.11e-2 6.77e-2 100 4.38e-2 9.40e-2 99 4.56e-2 2.21e-1 71
   
    (a) (b) (c)
Method Time (s) -Err PrE Time (s) -Err PrE Time -Err PrE
BIHT 4.17e-1 2.10-1 84 4.25e-1 4.21e-1 25 4.35e-1 6.46e-1 0
AOP 1.09e-0 7.78-2 100 1.10e-0 1.76e-1 95 1.16e-0 2.86e-1 59
LP 1.95e-2 4.54-1 85 1.99e-2 4.49e-1 71 2.05e-2 5.03e-1 16
PBAOP 4.22e-1 1.00-1 100 4.27e-1 1.58e-1 99 4.31e-1 2.99e-1 51
PDASC 1.23e-1 8.66-2 100 1.27e-1 1.04e-1 98 1.30e-2 1.51e-1 78
   
    (a) (b) (c)
Method Time (s) -Err PrE Time (s) -Err PrE Time -Err PrE
BIHT 2.56e+1 2.16e-1 58 2.58e+1 4.54e-1 0 2.58e+1 6.29e-1 0
AOP 6.44e+1 7.56e-2 100 6.46e+1 1.66e-1 96 6.47e+1 2.57e-1 16
LP 2.35e-1 4.47e-1 38 2.30e-1 4.46e-1 34 2.30e-1 4.47e-1 26
PBAOP 2.56e+1 9.89e-2 100 2.58e+1 1.66e-1 95 2.58e+1 2.60e-1 18
PDASC 7.09e-0 7.97e-2 100 7.17e-0 9.17e-2 99 7.23e-0 1.23e-1 86
Table 5.1: Comparison PDASC with state-of-the-art methods on CPU time in seconds (Time (s)), average error (-Err), probability on exactly recovering of true support (PrE ()).

Now we compare the PDASC with the aforementioned competitors to recover a one-dimensional signal. The true signal are sparse under wavelet basis “Db1” [36]. Thus, the matrix is of size and consists of random Gaussian and an inverse of one level Harr wavelet transform. The target coefficient has nonzeros. We set , . The recovered results are shown in Fig. LABEL:fig:1d and Table LABEL:tab:1d. The reconstruction by the PHDAS is visually more appealing than others, as shown in Fig. LABEL:fig:1d. This is further confirmed by the PSNR value reported in Table LABEL:tab:1d, which is defined by , where is the maximum absolute value of the true signal, and MSE is the mean squared error of the reconstruction.

Figure 5.4: Reconstruction of the one-dimension signal with .
method CPU time (s) PSNR
BIHT 4.97 29
AOP 4.98 33
LP 0.11 33
PBAOP 4.93 31
PDASC 3.26 36
Table 5.2: The CPU time in seconds and the PSNR of one dimensional signal recovery with .

6 Conclusions

In this paper we consider decoding from 1-bit measurements with noise and sign flips. For , we show that, up to a constant , with high probability the least squares solution approximates with precision as long as . For , we assume that the underlying target is -sparse, and prove that up to a constant , with high probability, the -regularized least squares solution lies in the ball with center and radius , provided that . We introduce the one-step convergent PDAS method to minimize the nonsmooth objection function. We propose a novel tuning parameter selection rule which is seamlessly integrated with the continuation strategy without any extra computational overhead. Numerical experiments are presented to illustrate salient features of the model and the efficiency and accuracy of the algorithm.

There are several avenues for further study. First, many practitioners observed that nonconvex sparse regularization often brings in additional benefit in the standard CS setting. Whether the theoretical and computational results derived in this paper can still be justified when nonconvex regularizers are used deserves further consideration. The 1-bit CS is a kind of nonlinear sampling approach. Analysis of some other nonlinear sampling methods are also of immense interest.

Acknowledgements

The research of Y. Jiao is supported by National Science Foundation of China (NSFC) No. 11501579 and National Science Foundation of Hubei Province No. 2016CFB486. The research of X. Lu is supported by NSFC Nos. 11471253 and 91630313, and the research of L. Zhu is supported by NSFC No. 11731011 and Chinese Ministry of Education Project of Key Research Institute of Humanities and Social Sciences at Universities No. 16JJD910002 and National Youth Top-notch Talent Support Program of China.

A Proof of Lemma LABEL:parallel

Proof. Let . Then due to and the assumption that