Subspace Learning From Bits

Subspace Learning From Bits

Yuejie Chi, , Haoyu Fu,  The authors are with the Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH 43210. Email: {chi.97, fu.436}@osu.edu.Preliminary results have been presented in part at the 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP) [1] and the 2016 Asilomar Conference on Signals, Systems, and Computers [2].This material is based upon work supported by the Air Force Summer Faculty Fellowship Program, by National Science Foundation under award number ECCS-1462191, and by the Air Force Office of Scientific Research under award number FA9550-15-1-0205.
July 13, 2019
Abstract

Networked sensing, where the goal is to perform complex inference using a large number of inexpensive and decentralized sensors, has become an increasingly attractive research topic due to its applications in wireless sensor networks and internet-of-things. To reduce the communication, sensing and storage complexity, this paper proposes a simple sensing and estimation framework to faithfully recover the principal subspace of high-dimensional data streams using a collection of binary measurements from distributed sensors, without transmitting the whole data. The binary measurements are designed to indicate comparison outcomes of aggregated energy projections of the data samples over pairs of randomly selected directions. When the covariance matrix is a low-rank matrix, we propose a spectral estimator that recovers the principal subspace of the covariance matrix as the subspace spanned by the top eigenvectors of a properly designed surrogate matrix, which is provably accurate as soon as the number of binary measurements is sufficiently large. An adaptive rank selection strategy based on soft thresholding is also presented. Furthermore, we propose a tailored spectral estimator when the covariance matrix is additionally Toeplitz, and show reliable estimation can be obtained from a substantially smaller number of binary measurements. Our results hold even when a constant fraction of the binary measurements is randomly flipped. Finally, we develop a low-complexity online algorithm to track the principal subspace when new measurements arrive sequentially. Numerical examples are provided to validate the proposed approach.

{keywords}

network sensing, principal subspace estimation, subspace tracking, binary sensing

I Introduction

Networked sensing, where the goal is to perform complex inference using data collected from a large number of inexpensive and decentralized sensors, has become an increasingly attractive research topic in recent years due to its applications in wireless sensor networks and internet-of-things. Consider, for example, a data stream which generates a zero-mean high-dimensional data sample at each time , and each sensor may access a portion of the data stream. Several main challenges arise when processing the high-dimensional data stream:

• Data on-the-fly: Due to the high rate of data arrival, each data sample may not be fully stored, and computation needs to be accomplished with only one pass or a few passes [3] at the sensors to allow fast processing.

• Resource constraints: The sensors usually are power-hungry and resource-limited, therefore it is highly desirable to minimize the computational and storage cost at the sensor side, as well as the communication overheads to the fusion center by not transmitting all the data.

• Dynamics: as new data samples arrive and/or new sensors enter, it is interesting to track the changes of the information flow at the fusion center without storing all history data in a low-complexity fashion.

I-a Contributions of this paper

Many practical data exhibit low-dimensional structures, such that a significant proportion of their variance can be captured in the first few principal components; and that the subspace spanned by these principal components is the recovery object of interest rather than the datasets themselves. In other words, the covariance matrix of the data is (approximately) low-rank, , where . This assumption is widely applicable to data such as network traffic, wideband spectrum, surveillance, and so on.

To reduce the communication, sensing and storage complexity, it is of great interest to consider one-bit sampling strategies at decentralized sensor nodes [4, 5, 6], where each sensor only transmits a single bit to the fusion center rather than all the data. Our goal in this paper is thus to design a simple yet efficient one-bit sampling strategy to faithfully retrieve the principal subspace of high-dimensional data streams with i.i.d. samples when their covariance matrices are (approximately) low-rank with provable performance guarantees. We focus on the scenario where each distributed sensor may only access a subset of the whole data, process them locally, and transmit only a single bit to the fusion center, who will estimate the principal subspace without referring to the original data. It is of interest to reduce the amount of measurements required to communicate to the fusion center while keeping the number of sensors as small as possible to allow faithful recovery of the principal subspace.

In the proposed one-bit sampling scheme, each sensor is equipped with a pair of vectors composed of i.i.d. complex standard Gaussian entries, called sketching vectors. At the sensing stage, it compares the energy projections of the data seen at the sensor onto the pair of sketching vectors respectively, and transmits a single bit indicating which of the two energy projections is larger. This is equivalent to comparing the energy projection of a sample covariance matrix onto two randomly selected rank-one subspaces. A key observation is that as long as the number of samples seen at the sensor is not too small (which we characterize theoretically), the comparison outcome will be exactly the same as if it is performed on the covariance matrix or its best low-rank approximation. By only transmitting the comparison outcome rather than the actual energy measurements, the communication overhead is minimized to a single bit which is more robust to communication channel errors and outliers. Moreover, as will be shown, the energy projections can be computed extremely simple without storing the history data samples, and are always nonnegative, making them suitable for wideband and optical applications.

At the fusion center, the sketching vectors are assumed known, which is a standard assumption for decentralized estimation [7]. When the covariance matrix is exactly low-rank, we propose a spectral estimator, which estimates the principal subspace as the subspace spanned by the top eigenvectors of a carefully designed surrogate matrix using the collected binary measurements, which can be easily computed via a truncated eigenvalue decomposition (EVD) with a low computational complexity. We show that, assuming all the bit measurements are exactly measuring the covariance matrix, the estimate of the principal subspace is provably accurate as long as the number of bits is on the order of , when the sketching vectors are composed of i.i.d. standard complex Gaussian entries. When the rank is not known a priori, we devise a soft-thresholding strategy to adaptively select the rank, and show it obtains a similar performance guarantee. Furthermore, we developed a memory-efficient algorithm to online update the principal subspace estimate when the binary measurements arrive sequentially, which can be implemented with a memory requirement on the order of the size of the principal subspace rather than that of the covariance matrix.

In many applications concerning (power) spectrum estimation, such as array signal processing and cognitive radios, the covariance matrix of the data can be modeled as a low-rank Toeplitz matrix [8]. It is therefore possible to further reduce the required number of bit measurements by exploiting the Toeplitz constraint. We propose to apply the spectral estimator to the projection of the above designed surrogate matrix to its nearest Toeplitz matrix in the Frobenius norm. When the covariance matrix is rank-one, it provably admits accurate estimation of the principal subspace as soon as the number of bit measurements is on the order of . In contrast to the scenario when the Toeplitz constraint is not explored, this eliminates the linear dependency with , thus the sample complexity is greatly reduced. Numerical simulations also suggest the algorithm works well even in the low-rank setting. Finally, our results continue to hold even when a constant fraction of the binary measurements is randomly flipped.

I-B Related work

Estimating the principal subspace of a high-dimensional data stream from its sparse observations has been studied in recent years, but most existing work has been focused on recovery of the data stream [9, 10]. Recently, sketching has been promoted as a dimensionality reduction method to directly recover the statistics of the data [11][17]. The proposed framework in this paper is motivated by the covariance sketching scheme in [15, 16, 17], where a quadratic sampling scheme is designed for low-rank covariance estimation. It is shown in [15] that a number of real-valued quadratic (energy) measurements on the order of suffices to exactly recover rank- covariance matrices via nuclear norm minimization, assuming the measurement vectors are composed of i.i.d. sub-Gaussian entries. However, transmitting these energy measurements with high precision may cause unwanted overhead and require estimating the noise level in practice.

Distributed estimation of a scalar-valued parameter from the one-bit quantization of its noisy observations has been considered in [4, 5, 18, 19]. Recently, one-bit compressed sensing [20][25] and one-bit matrix completion [26, 27] have generalized this to the estimation of vector-valued parameters such as sparse vectors and low-rank matrices, where they aim to recover the signal of interest from the signs of its random linear measurements. Our work is related to one-bit matrix completion as we also consider low-rank structures of the covariance matrices, but differs in several important aspects. First, unlike existing work, our binary measurements are not constructed directly from the low-rank covariance matrix, but rather a sample covariance matrix, therefore we need to carefully justify when the binary measurements accurately reflect the characteristic of the true covariance matrix. Second, the measurement operators with respect to the covariance matrix take the form of the difference of two rank-one matrices, designed to allow a low-complexity implementation, which leads to very different results from existing ones that assume i.i.d. entries [23]. Third, we propose simple spectral estimators for reconstructing the principal subspace of low-rank covariance matrices, and demonstrate both analytically and through simulations that it obtains similar performance as the more expensive convex programs using trace norm minimization [23]. Finally, the spectral estimator can be further tailored to the case of low-rank Toeplitz covariance matrices.

Distributed wideband spectrum sensing for cognitive radios is an appealing and motivating application [28]. It is recently proposed to estimate the power spectral density of wideband signals via least-squares estimation from sub-Nyquist samples [29]. The frugal sensing framework [6] considered the same estimation problem using one-bit measurements based on comparing the average signal power within a band of interest against a pre-determined or adaptively-set threshold [30]. Their algorithm is based on linear programming and may explore parametric representations of the power spectral density. Our work is different from [6, 30] in several aspects. Instead of comparing the average signal power against a threshold which introduces the additional issue of how to set the threshold, we compare the average signal power between two different bands of interest and therefore do not need to set any threshold. Our algorithm explores the low-rank property of the power spectral density rather than its parametric representation, and does not require knowing the noise statistics but explores the concentration phenomenon of random matrices.

Finally, the paper [31] studied one-bit phase retrieval, an extension of the one-bit compressed sensing with phaseless measurements. Despite different motivations and applications, our algorithm subsumes the scenario in [31] as a special case when the covariance matrix is assumed rank-one.

I-C Organization of this paper and notations

The rest of this paper is organized as follows. Section II describes the proposed 1-bit sampling framework and formulates the principal subspace estimation problem. Section III presents the proposed spectral estimators and their performance guarantees. Section IV presents an online algorithm to track the low-dimensional principal subspace with sequential bit measurements. Numerical examples are given in Section V. Finally, we conclude in Section VI and outline some future directions. Additional proofs are provided in the appendix.

Throughout this paper, we use boldface letters to denote vectors and matrices, e.g. and . The Hermitian transpose of is denoted by , and , , , denote the spectral norm, the Frobenius norm, the nuclear norm, and the trace of the matrix , respectively. Denote

 T(A)=argminT∥T−A∥Fs.t.T is Toeplitz,

as the linear projection of to the subspace of Toeplitz matrices, and . Define the inner product between two matrices as . If is positive semidefinite (PSD), then . The expectation of a random variable is written as .

Ii One-Bit Sampling Strategy

Let be a data stream with zero-mean and the covariance matrix . In this section we describe the distributed one-bit sampling framework for estimating the principal subspace of based on comparison outcomes of aggregated energy projections from each sensor, as summarized in Algorithm 1.

Consider a collection of sensors that are deployed distributively to measure the data stream. Each sensor can access either a portion or the complete data stream. At the th sensor, define a pair of sketching vectors and , , where their entries are i.i.d. standard complex Gaussian . Without loss of generality, we assume the th sensor has access to the samples in the data stream indexed by of the same size . Each th sensor processes the samples locally, namely, for the data sample , it takes two quadratic (energy) measurements given below:

 ui,t=|⟨ai,xℓit⟩|2,vi,t=|⟨bi,xℓit⟩|2, (1)

which are nonnegative and can be measured efficiently in high frequency applications at a linear complexity using energy detectors. These quadratic measurements are then averaged over the samples to obtain

 Ui,T =1TT∑t=1ui,t=aHiΣi,Tai, Vi,T =1TT∑t=1vi,t=bHiΣi,Tbi,

where

 Σi,T=1TT∑t=1xℓitxHℓit (2)

is the sample covariance matrix seen by the th sensor. It is clear that , and similarly , can be updated recursively without storing all the history data. At the end of the samples, the th sensor compares the average energy projections and , and transmit to the fusion center a single bit indicating the outcome:

 yi,T={1,if Ui,T>Vi,T−1,otherwise. (3)

The communication overhead is minimal since only the binary measurement is transmitted rather than the original data stream. It is also straightforward to see that each sensor only needs to store two scalars, and . More concretely, define , we can write (3) as

 yi,T=sign(⟨Wi,Σi,T⟩), (4)

where is the sign function. Intuitively, (4) can be interpreted as comparing the energy projection of onto two randomly selected rank-one subspaces. Finally, to model potential errors occurred during transmission, we assume each bit has an independent flipping probability of , and the received bit at the fusion center from the th sensor is given as

 zi,T=yi,T⋅ϵi,1≤i≤m, (5)

where and are i.i.d. across sensors.

As we’re interested in the covariance matrix , and note that the sample covariance matrix converges to as approaches infinity, the binary measurement at the th sensor also approaches quickly to the following,

 yi=sign(⟨Wi,Σ⟩), (6)

as if it is measuring the true covariance matrix . In fact, as we’ll show in Theorem 1, and start to agree very fast for much smaller than .

For simplicity we assume is an exactly rank- PSD matrix, where . The extension to approximately low-rank case will be discussed shortly. Let

 Σ =r∑k=1λkukuHk=UΛUH, (7)

where are the top- eigenvectors of and are the top- eigenvalues with . We further define , as the basis vectors spanning the complement of . Apparently not all information about can be recovered, for example, the sign measurements are invariant to scaling of the data samples, and therefore, scaling of the covariance matrix. Our goal in this paper is to recover the principal subspace spanned by from the collected binary measurements.

Ii-a How large does T need to be?

In the following proposition, whose proof can be found in Appendix B, we establish the sample complexity of to guarantee if the data stream follows a Gaussian model with i.i.d. samples.

Proposition 1.

Let . Assume are i.i.d. Gaussian satisfying . Then as soon as for some sufficiently large constant .

In order to guarantee that all bits are accurate, we need to further apply a union bound to Proposition 1, which yields the following theorem.

Theorem 1.

Let be i.i.d. . Let . With probability at least , all binary measurements are exact, i.e. for given that the number of samples observed by each sensor satisfies

 T>cTr(Σ)∥Σ∥Flog2(mδ)

for some sufficiently large constant .

It is worth emphasizing that Theorem 1 holds for any fixed covariance matrix . The term measures the “effective” rank of , as for PSD matrices with fast spectral decays this term will be small [32]. If , then . As soon as is on the order of all bit measurements are accurate with high probability, which only depends on the number of sensors (which in turn depend on the ambient dimension as will be seen from Theorem 2) logarithmically.

Ii-B Extension to approximate low-rank covariance matrices

When is only approximately low-rank, denote its best rank- approximation as , then if

 sign(⟨Wi,Σ⟩)=sign(⟨Wi,Σr⟩),∀i=1,…,m, (8)

holds, combined with Theorem 1, our framework can be applied to recover the -dimensional principal subspace of , by treating the bits as measurements of . Fortunately, (8) holds with high probability as long as is sufficiently small. The interested readers are referred to Appendix C for the details.

Iii Principal Subspace Estimators and Performance Guarantees

In this section, we first develop a spectral estimator when is a low-rank PSD matrix based on truncated EVD when the rank is known exactly; and then we develop a soft-thresholding strategy to adaptively select the rank when it’s unknown. Finally, we tailor the spectral estimator to the case when is a low-rank Toeplitz PSD matrix. For the rest of this section, we assume the binary measurements and in (5) for all .

Iii-a The spectral estimator

We propose an extremely simple and low-complexity spectral estimator whose complexity amounts to computing a few top eigenvectors of a carefully designed surrogate matrix. To motivate the algorithm, consider the special case when is a rank-one matrix with . A natural way to recover is via the following:

 maxν:∥ν∥2=11mm∑i=1zi⟨Wi,ννH⟩, (9)

which aims to find a rank-one matrix that agrees with the measured signs as much as possible. Since (9) is equivalent to

 maxν:∥ν∥2=1νH(1mm∑i=1ziWi)ν,

its solution is the top eigenvector of the surrogate matrix:

 Jm=1mm∑i=1ziWi=1mm∑i=1ϵisign(⟨Wi,Σ⟩)Wi. (10)

More generally, when is rank-, we recover its principal subspace as the subspace spanned by the top- eigenvectors of the surrogate matrix in (10). This procedure is denoted as the spectral estimator.

Iii-B Sample complexity of the spectral estimator

We establish the performance guarantee of the proposed spectral estimator by showing that the principal subspace of can be accurately estimated using as soon as is sufficiently large. This is accomplished in two steps. Define . We first show that the principal subspace of is the same as that of , and then show that is concentrated around for sufficiently large . The following lemma accomplishes the first step.

Lemma 1.

The principal subspace of is the same as that of with . For ,

 1≥uHkJuk(1−2p) ≥max⎧⎨⎩(11+κ(Σ))r−1,19re−κ(Σ)⎫⎬⎭, (11)

and for ,

 vHkJvk =0, (12)

where is the conditioning number of . When , the right-hand side of (11) equals one.

The proof is provided in Appendix D. Lemma 1 establishes that and share the same principal subspace, but their eigenvectors may still differ. Following Lemma 1, the spectral gap between the th eigenvalue and the th eigenvalue (which is zero) of is at least

 α:=(1−2p)max⎧⎨⎩(11+κ(Σ))r−1,19re−κ(Σ)⎫⎬⎭,

where the first term (exponential in ) is tighter when is small while the second term (polynomial in ) is tighter when is large. Indeed, when , only depends on and can be computed exactly once they are fixed. Fig. 1 plots the derived lower bounds and the exact values of assuming all . It can be seen that the polynomial bound on the order of is rather accurate except the leading constant when is large. Moreover, from Fig. 1 it confirms that although preserves the principal subspace of , it does not preserve the eigenvectors and eigenvalues.

Next, we show that for sufficiently large , the matrix is close to its expectation . We have the following lemma whose proof can be found in Appendix E.

Lemma 2.

Let . Then with probability at least , we have that

where is an absolute constant.

Our main theorem then immediately follows by applying an improvement of the Davis-Kahan sin-Theta theorem [33] in Lemma 7, as given below.

Theorem 2.

Let . With probability at least , there exists an orthonormal matrix such that

 ∥∥ˆU−UQ∥∥F≤α−1√c1nrmlog(2nδ) ≤min{(1+κ(Σ))r−1,9eκ(Σ)r}(1−2p)√c1nrmlog(2nδ),

where is an absolute constant.

When is small and is moderate, scales as and we have that as soon as the number of binary measurements exceeds the order of , it is sufficient to recover to recover the principal subspace spanned by the columns of with high accuracy. Since there are at least degrees of freedom to describe , our bound is near-optimal up to a polynomial factor with respect to and a logarithmic factor with respect to . It is worth emphasizing that our result indicates that the order of required binary measurements is much smaller than the ambient dimension of the covariance matrix, and even comparable to the sample complexity for low-rank matrix recovery using real-valued measurements [34]. Furthermore, the reconstruction is robust to random flipping errors, as long as . In fact, the error scales inverse proportionally to the expectation of correct transmission.

Iii-C Adaptive Rank Selection via Soft-thresholding

The performance guarantee of the spectral estimator in Theorem 2 requires perfect knowledge of the rank, which may not be readily available. One possible strategy is to soft-threshold the eigenvalues of to select a proper dimension of the principal subspace. We motivate this choice as the solution to a regularized convex optimization problem whose analysis sheds light on how to select the threshold. To begin with, following the rationale of (9), we may seek to find a low-rank PSD matrix that obeys

 ˆΣ =argmaxΣ⪰01mm∑i=1zi⟨Σ,Wi⟩=argmaxΣ⪰0⟨Σ,Jm⟩, s.t.rank(Σ)≤r,∥Σ∥F≤1.

However, this formulation is non-convex due to the rank constraint. We therefore consider the following convex relaxation by relaxing the rank constraint by trace minimization and regularizing the norm of , yielding:

 ˆΣ =argminΣ⪰0∥Σ−Jm∥2F+λTr(Σ), (13)

where is the regularization parameter. The above problem (13) admits a closed-form solution [35]. Let the EVD of be given as , where the ’s are ordered in the descending order. Then , where . Therefore, like the spectral estimator, the estimated principal space are spanned by the eigenvectors of corresponding to the nonzero eigenvalues, where the rank is now set adaptively via soft-thresholding by . The performance of (13) can be bounded by the following lemma from [8, pp. 1267-1268].

Lemma 3.

Set , then the solution to (13) satisfies:

 ∥ˆΣ−J∥F≤12λ√2r.

Combined with Lemma 2, if we set , we then have

 ∥∥ˆΣ−J∥∥F≤c√nrmlog(2nδ), (14)

for some constant . Therefore, by the Davis-Kahan theorem, we again obtain a performance guarantee on the recovered principal subspace that is qualitatively similar to that in Theorem 2, except that now the spectral estimator employs an adaptive strategy to select the rank via soft-thresholding.

Remark: It is worthwhile to pause and comment on the difference between the convex regularized algorithm (13) with the algorithm proposed by Plan and Vershynin in [23] for one-bit matrix completion, which can be written as

 maxΣ⪰0m∑i=1yi⟨Wi,Σ⟩,s.t.∥Σ∥F≤1,Tr(Σ)≤√r. (15)

Indeed, the two algorithms are essentially the same and (14) is also applicable to (15)111The analysis is straightforward by slightly adapting the arguments, therefore we omit the details.. However, the analysis in [23] assumes that ’s are composed of i.i.d. Gaussian entries, where can be shown as a scaled version of , so that the performance bound in (14) guarantees that one can recover the low-rank covariance matrix up to a scaling difference as soon as is on the order of . Unfortunately, as in our sampling scheme, due to the dependence of the entries of , is no longer a scaled variant of (as verified in Fig. 1), it is not clear whether it is possible to recover the covariance matrix in a straightforward manner. Nonetheless, we evaluate the performance of (15) numerically in the Section V for principal subspace estimation, and show it is comparable to that of the spectral estimator, while incurring a much higher computational cost.

Iii-D Rank-One Toeplitz Subspace Estimation

In many applications, the covariance matrix can be modeled as a low-rank Toeplitz PSD matrix, and it is desirable to further reduce the sampling complexity by exploiting the Toeplitz constraint. Denote as the projection of onto Toeplitz matrices, we can show that it concentrates around the Toeplitz matrix at a rate much faster than Lemma 2, as soon as scales poly-logarithmically with respect to .

Lemma 4.

With probability at least , we have

 ∥T(J)−T(Jm)∥≤c2⋅log2n√m,

where is some constant.

The proof can be found in Appendix F. When is rank-one, by Lemma 1, where is the top eigenvector of , therefore is also rank-one and Toeplitz. Denote the top eigenvector of as , combining Lemma 4 and Lemma 7, we have the following theorem.

Theorem 3.

Assume . With probability at least , there exists such that

 ∥∥ˆu1−ejθu1∥∥≤c3(1−2p)√log4nm

for some constant .

Therefore, Theorem 3 indicates that for the rank-one case, can be accurately estimated as long as scales on the order of , and the reconstruction is robust to random flipping errors in the received bits. This is a much smaller sample complexity compared with Theorem 2 when the Toeplitz structure is not exploited, which requires scales at least linearly with respect to .

Remark: In the general low-rank case, may not be Toeplitz even when is Toeplitz, which prohibits us from obtaining the performance guarantee. However, the simulation suggests that the spectral estimator continues to perform well in the low-rank case, whose investigator we leave to future work.

Iv Subspace Tracking with Online Binary Measurements

In this section, we develop an online subspace estimation and tracking algorithm for the fusion center to update the principal subspace estimate of the low-rank covariance matrix when new binary measurements arrive sequentially. This is particularly useful when the fusion center is memory limited since the proposed algorithm only requires a memory space on the order of , which is the size of the principal subspace.

Essentially we need to update the principal subspace of from that of given a rank-two update as follows:

 Jm=m−1mJm−1+ymm(amaHm−bmbHm) (16)

We can rewrite (16) using more general notations as

 Jm=ηmJm−1+KmΛmKHm, (17)

where (16) can be obtained from (17) by letting , , and . Note that it might be of interest to incorporate an additional discounting factor on to emphasize the current measurement, by letting to take a smaller value, as done in [36].

Assume the EVD of can be written as where is orthonormal and is diagonal. The goal is to find the best rank- approximation of by updating and .

We develop a fast rank-two update of the EVD of a symmetric matrix by introducing necessary modifications of the incremental SVD approach in [36, 37]. A key difference from [36, 37] is that we do not allow the size of the principal subspace to grow, which is fixed as . In the update we first compute an expanded principal subspace of rank and then only keep its largest principal components.

Let and be the orthonormal columns spanning the column space of . We write as

 Jm

where is a small matrix whose EVD can be computed easily and yields

 Γm=U′mΠ′mU′m.

Set be the top sub-matrix of assuming the eigenvalues are given in an absolute descending order, the principal subspace of can be updated correspondingly as

 Um:=[Um−1Pm]U′mIr,

where is the first columns of the identity matrix.

V Numerical Experiments

In the numerical experiments, we first examine the performance of the spectral estimator in a batch setting in terms of reconstruction accuracy and robustness to flipping errors, with comparisons against the convex optimization algorithm in (15). We then examine the performance of the tailored spectral estimator when the covariance matrix is additionally Toeplitz. Next, we examine the performance of the online subspace estimation algorithm in Section IV and apply it to the problem of line spectrum estimation. Finally, we examine the effects of aggregation over finite samples.

V-a Recovery for low-rank covariance matrices

We generate the covariance matrix as , where is composed of standard Gaussian entries. The one-bit measurements are then collected according to (5). After the bit measurements are collected, we run the spectral estimator using the constructed and the convex optimization algorithm (15) assuming the rank of principal subspace is known perfectly. The algorithm (15) is performed using the MOSEK toolbox available in CVX [38] and obtains an estimate , from which we extract its top- eigenvectors. The normalized mean squared error (NMSE) is defined as , where is the estimated principal subspace with orthonormal columns.

Fig. 2 (a) shows the NMSE with respect to the number of bit measurements for different when averaged over Monte Carlo runs. Given the high complexity of the convex algorithm (15), we only perform it when . We can see that its performance is comparable to that of the spectral estimator. Fig. 2 (b) further examines the performance of the spectral estimator for different ranks when . For the same number of bit measurements, the NMSE grows gracefully as the rank increases. The spectral estimator also exhibits a reasonable robustness against flipping errors. Fig. 3 shows the reconstructed NMSE with respect to the flipping probability for different number of bit measurements when and , where the error increases as the flipping probability increases.

V-B Recovery for low-rank Toeplitz covariance matrices

We generate a rank- PSD Toeplitz matrix via its Vandermonde decomposition [39], given as , where is a Vandermonde matrix with , for , and is a diagonal matrix describing the power of each mode.

Fig. 4 depicts the NMSE with respect to the number of bit measurements for the spectral estimators using either or when , averaged over Monte Carlo simulations, when and . It is clear that the tailored spectral estimator using achieves a smaller error using much fewer bits. Although Theorem 3 only applies to the rank-one case, the simulation suggests performance improvements even in the low-rank setting.

We perform ESPRIT [40] on the estimated subspace to recover the mode locations . Fig. 5 shows the estimated mode locations vertically with respect to the number of bit measurements, with color indicating the power of the estimated modes, where the true mode locations are set as and . The spectral estimator using estimates close-located modes and detects weak modes from a much smaller number of bits.

V-C Online subspace tracking

We now examine the performance of the online subspace estimation algorithm proposed in Section IV. Let and . Fig. 6 shows the NMSE of principal subspace estimation with respect to the number of bit measurements, where the result is averaged over Monte Carlo runs. Compared with Fig. 2 (a), the estimation accuracy is comparable to that in a batch setting.

In the sequel, we apply the online estimator to the problem of line spectrum estimation, in a set up similar to Fig. 5. Note here we do not exploit the additional Toeplitz structure of . At each new binary measurement, we first use the online subspace estimation algorithm proposed in Section IV to estimate a principal subspace of rank , then apply ESPRIT [40] to recover the mode locations. Fig. 7 shows the estimation results for various parameter settings, where the estimates of mode locations are plotted vertically at each new bit measurement. Fig. 7 (a) and (b) have the same set of modes, with two close located frequencies separated by the Rayleigh limit, . When all the modes have strong powers, the modes can be accurately estimated as depicted in (a); when one of the close modes is relatively weak, the algorithm requires more measurements to pick up the weak mode, as depicted in (b). Fig. 7 (c) examines the case when all the modes are well separated, and one of them is weak. The algorithm picks up a weak mode with a smaller number of measurements when the modes are well separated. Taking these together, it suggests that the tracking performance depends on the eigengap and the conditioning number of the covariance matrix.

V-D Performance with finite data samples

The above simulations assume that the bit measurements are exact. We now examine the performance of the spectral estimator assuming it is measured via (3) using a finite number of data samples. Let and . Assume there are a collection of samples generated as , where is an orthogonal matrix normalized from a random matrix generated with i.i.d. Gaussian entries, is generated with standard Gaussian entries, and is independently generated Gaussian entries. In other words, . All sensors measure the same set of samples (i.e. , for ) and communicate their bit measurements for subspace estimation.

Fig. 8 shows the NMSE of principal subspace estimate with respect to the number of bit measurements for different number of samples and averaged over Monte Carlo runs when the samples are (a) noise-free with , and (b) noisy with . As increases, the NMSE decreases as the bit measurements get more accurate in light of Theorem 1. Note that the gain diminishes as is sufficiently large as all bit measurements are accurate with high probability. For noisy data samples, it is evident that more samples are necessary for the aggregation procedure to yield accurate bit measurements, and performance improves as more samples are averaged.

Vi Conclusions

In this paper, we present a low-complexity distributed sensing and central estimation framework to recover the principal subspace of low-rank covariance matrices from a small number of one-bit measurements based on aggregated energy comparisons of the data samples. Spectral estimators are proposed with appealing computational complexity and theoretical performance guarantees. In the future, it is of interest to develop principal subspace estimation algorithms from quantized measurements beyond the one-bit scheme exploited in this paper.

Acknowledgement

The authors thank useful feedbacks from the anonymous reviewers that significantly improve the quality of this paper. The first author thanks Lee Seversky, Lauren Huie and Matt Berger for their hospitality and helpful discussions during her stay at the Air Force Research Lab, Rome, New York where part of this work was accomplished. She also thanks Yuxin Chen for helpful discussions.

Appendix A Supporting Lemmas

Lemma 5 (scalar Bernstein’s inequality with sub-exponential norm, [41]).

Let be independent random variables with and , and for some constants and . Define and . Then

 P[∣∣ ∣∣L∑k=1zk∣∣ ∣∣>u]≤2exp(−u22Cσ2+2Bu).
Lemma 6 (matrix Bernstein’s inequality with sub-exponential norm, [42]).

Let , , be independent zero-mean symmetric random matrices of dimension . Suppose and almost surely for all , where is the Orlitz norm [32]. Then for any ,

 P[∥∥ ∥∥L∑k=1Xk∥∥ ∥∥>τ] ≤2nexp(−τ22σ2+2Bτ/3). (18)
Lemma 7 (Davis-Kahan, [33]).

Let be symmetric matrices. Denote as the subspace spanned by the top eigenvalues of , and as the subspace spanned by the top eigenvalues of . Let be the spectral gap between the th and the th eigenvalue of . Then there exists an orthogonal matrix such that the two subspaces and is bounded by

 ∥~V−VQ∥F≤max{√2∥~A−A∥Fδ,2√2r∥~A−A∥δ}.
Lemma 8 (Hanson-Wright inequality, [43]).

Let be a fixed matrix. Consider a random vector where are independent random variables satisfying and . Then for any , we have

Appendix B Proof of Proposition 1

Proof.

For notational simplicity, we drop the sensor index and let and . Conditioned on and , we have . Let

 yi,T−E[yi,T|ai,bi]=1TT∑t=1⟨Wi,Σ−xtxHt⟩:=1TT∑t=1Qt,

where . We may appeal to the Bernstein-type inequality in Lemma 6. First, . Second,

 Var[Qt|ai,bi] =Var[⟨Wi,xtxHt⟩|ai,bi] =Var[|aHixt|2−|bHixt|2∣∣ai,bi] ≤2E[|aHixt|2|ai]+2E[|bHixt|2|bi] =2(aHiΣai+bHiΣbi):=2B,

where and are two correlated Gaussian variables. Third,

 |Qt| =|⟨Wi,Σ−xtxHt⟩| ≤∣∣aHiΣai−bHiΣbi−|xHtai|2+|xHtbi|2∣∣ ≤aHiΣai+bHiΣbi+|xHtai|2+|xHtbi|2,

Since and are exponential random variables with parameter , we have

 ∥Qt∥ψ1≤2(aHiΣai+bHiΣbi):=2B.

Assume is positive without loss of generality, we have

 P[|yi,T−E[yi,T|ai,bi]|>⟨Wi,Σ⟩|ai,bi]≤2exp(−|⟨Wi,Σ⟩|2T4B(1+|⟨Wi,Σ⟩|/3)) (19)

Next, we can bound the quadratic form using the Hanson-Wright inequality [43] in Lemma 8. Since , with probability at least , we have

 ∣∣aHiΣai−Tr(Σ)∣∣≤c∥Σ∥Flog(1/δ)

for some constant . Since , we have with probability at least for some absolute constant . Denote this as event . Further from the arguments in [15, Proposition 1], we have that

 c2∥Σ∥Flog(1/δ)≤|⟨Wi,Σ⟩|≤c3∥Σ∥Flog(1/δ) (20)

with probability at least for some absolute constants and . Denote this as event .

Conditioned on the event and , and plug in the above into (19), we have as soon as

 T≥c4Tr(Σ)∥Σ∥Flog2(1/δ) (21)

for some constant , the RHS of (19) can be upper bounded by . To summarize, assuming (21) holds,

 P[yi≠yi,T]≤∫P[yi≠yi,T|ai,bi]dμ(ai)dμ(bi) ≤P(Gc1)+P(Gc2)+ ∫G1,G2P[|yi,T−E[yi,T|ai,bi]|>⟨Wi,Σ⟩|ai,bi]dμ(ai)dμ(bi) ≤δ.

Our proposition then follows. ∎

Appendix C Approximate Low-Rank Covariance Matrices

Without loss of generality, assume is positive. We wish is also positive so that they have the same sign. Note that , it is sufficient to have

 ⟨Wi,Σ−Σr⟩≤|⟨Wi,Σ⟩| (22)

for all . Since

 |⟨Wi,Σ−Σr⟩| ≤∥Wi∥⋅∥Σ−Σr∥∗ ≤(∥ai∥22+∥bi∥22)⋅∥Σ−Σr∥∗,

and from [44], with probability at least , we have

 maxi{∥ai∥22,∥bi∥22}≤n(1+2√log(m/δ)).

Combined with (20), and renaming the constants, then (22) is guaranteed with probability at least , as long as

 ∥Σ−Σr∥∗≤c∥Σ∥Flog