Low-Rank Signal Processing: Design, Algorithms for Dimensionality Reduction and Applications
We present a tutorial on reduced-rank signal processing, design methods and algorithms for dimensionality reduction, and cover a number of important applications. A general framework based on linear algebra and linear estimation is employed to introduce the reader to the fundamentals of reduced-rank signal processing and to describe how dimensionality reduction is performed on an observed discrete-time signal. A unified treatment of dimensionality reduction algorithms is presented with the aid of least squares optimization techniques, in which several techniques for designing the transformation matrix that performs dimensionality reduction are reviewed. Among the dimensionality reduction techniques are those based on the eigen-decomposition of the observed data vector covariance matrix, Krylov subspace methods, joint and iterative optimization (JIO) algorithms and JIO with simplified structures and switching (JIOS) techniques. A number of applications are then considered using a unified treatment, which includes wireless communications, sensor and array signal processing, and speech, audio, image and video processing. This tutorial concludes with a discussion of future research directions and emerging topics.
Reduced-rank signal processing is an area of signal processing that is strategic for dealing with high-dimensional data, in low-sample support situations and large optimization problems that has gained considerable attention in the last decade [2, 3]. The origins of reduced-rank signal processing lie in the problem of feature selection encountered in statistical signal processing, which refers to a dimensionality reduction process whereby a data space is transformed into a feature space . The fundamental idea is to devise a transformation that performs dimensionality reduction so that the data vector can be represented by a reduced number of effective features and yet retain most of the intrinsic information content of the input data . The goal is to find the best trade-off between model bias and variance in a cost-effective way, yielding a reconstruction error as small as desired.
Dimensionality reduction is an emerging and strategic topic that promises great advances in the fields of statistical signal processing, linear algebra, communications, multimedia, artificial intelligence, optimization, control and physics due to its ability to deal with large systems and to offer high performance at low computational cost. Central to this idea is the existence of some form of redundancy in the signals being processed, which allows a designer to judiciously exploit it by selecting the key features of the signals. While the data in these applications may be represented in high dimensions due to the immense capacity for data retrieval of some systems, the important features are typically concentrated on lower dimensional subsets—manifolds—of the measurement space. This allows for significant dimension reduction with minor or no loss of information. In a number of applications, the dimensionality reduction may also lead to a performance improvement due to the denoising property - one retains the signal subspace and eliminates the noise subspace. Specifically, this redundancy can be typically characterized by data that exhibits reduced-rank properties and sparse signals. In these situations, dimensionality reduction provides a means to increase the speed and the performance of signal processing tasks, reduce the requirements for storage and improve the tracking performance of dynamic signals. This is particularly relevant to problems which involve large systems, where the design and applicability of methods is constrained by factors such as complexity and power consumption.
In general, the dimensionality reduction problem is associated with reduced-rank operators, characterized by a mapping performed by an transformation matrix with that compresses the observed data vector into a reduced-rank data vector . Mathematically, this relationship is given by , where is the Hermitian operator. It is desirable to perform these operations so that the reconstruction error and the computational burden are minimal. The dimensionality reduction and the system performance are characterized by (a) accuracy, (b) compression ratio (CR), and (c) complexity. The main challenge is how to efficiently and optimally design . After dimensionality reduction, a signal processing algorithm is used to perform the task desired by the designer. The resulting scheme with elements can benefit from a reduced number of parameters, which may lead to lower complexity, smaller requirements for storage, faster convergence and better tracking capabilities of time-varying signals.
In the literature, a number of dimensionality reduction techniques have been considered based on principal components (PC) analysis -, random projections [8, 9], diffusion maps , incremental manifold learning , clustering techniques [12, 13], Krylov subspace methods that include the multi-stage Wiener filter (MSWF) [18, 20, 21, 22] and the auxiliary vector filtering (AVF) algorithm [23, 24], joint and iterative optimization (JIO) techniques [25, 26, 27, 28, 29] and JIO techniques with simplified structures and switching mechanisms (JIOS) -. It is well known that the optimal linear dimensionality reduction transformation is based on the eigenvalue decomposition (EVD) of the known input data covariance matrix and the selection of the PC. However, this covariance matrix must be estimated. The approach employed to estimate and perform dimensionality reduction is of central importance and directly affects the performance of the method. Some methods are plagued by numerical instability, high computational complexity and large sensitivity to the selected rank . A common and fundamental limitation of a number of existing methods is that they rely on estimates of the covariance matrix of the data vector to design , which requires a number of data vectors proportional to the dimension of .
The goal of this paper is to provide a tutorial on this important set of methods and algorithms, to identify important applications of reduced-rank signal processing techniques as well as new directions and key areas that deserve further investigation. The virtues and deficiencies of existing methods will be reviewed in this article, and a discussion of future trends in the area will be provided taking into account application requirements such as the ability to track dynamic signals, complexity and flexibility. The paper is structured as follows. Section I introduces the fundamentals of reduced-rank signal processing, the signal model and the idea of dimensionality reduction using a transformation matrix. Section II covers the design of the transformation matrix and a subsequent parameter vector using a least squares approach. Section III reviews several methods available in the literature for dimensionality reduction and provides a discussion of their main advantages and drawbacks. Section IV is devoted to the applications of these methods, whereas Section V draws the main conclusions and discusses future research directions.
Ii Fundamentals and signal model
In this section, our goal is to present the fundamental ideas of reduced-rank signal processing and how the dimensionality reduction is performed. We will rely on an approach based on linear algebra to describe the signal processing of a basic linear signal model. This model is sufficiently general to account for numerous applications and topics of interests. Let us consider the following linear signal model at time instant that comprises a set of samples organized in a vector as given by
where is the observed signal vector which contains the samples to be processed, is the matrix that describes the mixing nature of the model, is the signal vector that is generated by a given source, is an vector of noise samples, and is the number of observed signal vector or simply the data record size.
In reduced-rank signal processing, the main idea is to process the observed signal in two stages, as illustrated in Fig. 1. The first stage corresponds to the dimensionality reduction, whereas the second corresponds to the signal processing in an often low-dimensional subspace. The dimensionality reduction is performed by a mapping represented by a transformation matrix with dimensions , where , that projects the observed data vector with dimension onto a reduced-dimension data vector . This relationship is expressed by
Key design criteria for the transformation and the dimensionality reduction are the reconstruction error, the computational complexity and the compression ratio . These parameters usually depend on the application and the design requirements.
After the dimensionality reduction, an algorithm is used to perform the signal processing task on the reduced-dimension observed vector according to the designer’s aims. The resulting scheme with elements will hopefully benefit from a reduced number of parameters, which may lead to lower complexity, smaller requirements for storage, faster convergence and superior tracking capability. In the case of a combination of weights (filtering) by a parameter vector with coefficients , we have the following output
It is expected that the output of the reduced-rank signal processing system will yield a small reconstruction error as compared to the full-rank system, and provide extra benefits such as speed of computation and a reduced set of features for extraction from .
Iii Linear MMSE Design of Reduced-Rank Techniques
In this section, we will consider a framework for reduced-rank techniques based on the linear minimum mean-square error (MMSE) design. The basic idea is to find a reduced-rank model that can represent the original full-rank model by extracting its key features. The main goal is to present the design of the main components employed for reduced-rank processing and examine the model order selection using a simple and yet general approach. Let us consider the observed signal vector in (1). For the sake of simplicity and for general illustrative purposes, we are interested in designing reduced-rank techniques with the aid of linear MMSE design techniques. In order to process the data vector with reduced-rank techniques, we need to solve the following optimization problem
where is the desired signal and stands for the expected value operator.
where is the reduced-rank correlation matrix, is the cross-correlation vector of the reduced-rank model. The associated MMSE for a rank- parameter vector is expressed by
The optimal solution of the optimization problem in (4) is obtained by fixing , taking the gradient terms of the the associated MMSE in (6) with respect to and equating them to a zero matrix. By considering the eigen-decomposition of , where is an unitary matrix with the eigenvectors of and is an diagonal matrix with the eigenvalues of in decreasing order, we have
where is a unitary matrix that corresponds to the signal subspace and contains the eigenvectors associated with the largest eigenvalues of the unitary matrix . In our notation, the subscript represents the number of components in each dimension. For example, the matrix contains the first columns of , where each column has elements.
If we substitute the expression of the optimal transformation and use the fact that
in the expression for the optimal reduced-rank parameter vector in (5), we have
The development above shows us that the key aspect for constructing reduced-rank techniques is the design of since the MMSE in (6) depends on , and . The quantities and are common to both reduced-rank and full-rank designs, however, the matrix plays a key role in the dimensionality reduction and in the performance. The strategy is to find the most appropriate trade-off between the model bias and variance  by adjusting the rank .
Our exposition assumes so far a reduced-rank linear model with model order that is able to represent a full-rank model with dimension . However, it is well known that the performance, compression ratio (CR) and complexity of such a procedure depends on the model order . In order to address this problem, a number of techniques have been reported in the literature which include the Akaike’s information-theoretic (AIC) criterion , the minimum description length criterion  and a number of other techniques . The basic idea of these methods is to determine the model order for which a given criterion is optimized. This can be cast as the following optimization problem
where is the objective function that depends on the model order and consists of a suitable criterion for the problem. This criterion can be either of an information-theoretic nature , related to the error of the model [21, 30], associated with metrics or projections computed from the basis vectors of  or based on cross-validation techniques .
Iv Algorithms for Dimensionality Reduction
For the sake of simplicity and for general illustrative purposes, we will consider in this section algorithms for dimensionality reduction based on least squares (LS) optimization techniques with a fixed model order . In order to process the data vector with reduced-rank techniques, we need to solve the following optimization problem
where is the desired signal and stands for the forgetting factor that is useful for time-varying scenarios.
The optimal solution of the LS optimization in (10) is obtained by fixing , taking the gradient terms of the argument with respect to and equating them to a null vector, which yields
where is the reduced-rank correlation matrix that is an estimate of the covariance matrix and is the cross-correlation vector of the reduced-rank model that is an estimate of at time . The associated sum of error squares (SES)  for a rank- parameter vector is obtained by substituting (11) into the cost function in (10) and is expressed by
where . The development above shows us that the optimal filter and the SES in (12) depend on , and . The quantities and are common to both reduced-rank and full-rank designs, however, the matrix and the algorithm used for its design play a key role in dimensionality reduction, the performance and complexity.
A number of algorithms for computing the transformation that is responsible for dimensionality reduction in our model have been reported in the literature. In this work, we will categorized them into eigen-decomposition techniques, Krylov-based methods, joint iterative optimization (JIO) techniques and techniques based on the JIO with simplified structures and switching (JIOS). An interesting characteristic that has not been fully explored in the literature so far is the fact that algorithms for dimensionality reduction can be devised by looking at different stages of the signal processing. Specifically, one can devise algorithms for dimensionality reduction directly from a cost function associated with the original optimization problem in (10) or by considering the SES in (12). In what follows, we will explore the fundamental ideas behind these methods and their main features.
Iv-a Eigen-decomposition techniques
In this part, we review the basic principles of eigen-decomposition techniques for dimensionality reduction. Specifically, we focus on algorithms based on the principal component (PC) and the cross-spectral (CS) approaches. Our aim is to review the main features, advantages and disadvantages of these algorithms. The PC was the first statistical signal processing method employed for dimensionality reduction and was introduced by Hotelling in the 1930’s . The PC method chooses the subspace spanned by the eigenvectors corresponding to the largest eigenvalues, which contain the largest portion of the signal energy. Nevertheless, in the case of a mixture of signals, the PC approach does not distinguish between the signal of interest and the interference signal. Hence, the performance of the algorithm degrades significantly in interference dominated scenarios. This drawback of the PC algorithm motivated the development of another technique, the CS algorithm  which selects the eigenvectors such that the MSE over all eigen-based methods with the same rank is minimal. To do so, it considers additionally the cross-covariance between the observation and the unknown signal, thus, being more robust against strong interference. A common disadvantage of PC and CS algorithms is the need for eigen-decompositions, which are computationally very demanding when the dimensions are large and have typically a cubic cost with (). In order to address this limitation, numerous subspace tracking algorithms have been developed in the last two decades, which can reduce the cost to a quadratic rule with .
We can illustrate the principle of PC by relying on the framework employed in this section. The optimal solution of the least squares optimization in (10) is obtained by fixing , taking the gradient terms of the the associated SES in (12) with respect to and equating them to a zero matrix. By considering the eigen-decomposition of , where is an unitary matrix with the eigenvectors of and is an diagonal matrix with the eigenvalues of in decreasing order, we have
where represents the eigenvectors associated with the largest eigenvalues of . The adjustment of the model order of this method as well as the algorithms described in what follows can be performed by a model order selection algorithm .
Iv-B Krylov subspace techniques
The first Krylov methods, namely, the conjugate gradient (CG) method  and the Lanczos algorithm  have been originally proposed for solving large systems of linear equations. These algorithms used in numerical linear algebra are mathematically identical to each other and have been derived for Hermitian and positive definite system matrices. Other techniques have been reported for solving these problems and the Arnoldi algorithm  is a computationally efficient procedure for arbitrarily invertible system matrices. The multistage Wiener filter (MSWF)  and the auxiliary vector filtering (AVF)  algorithms are based on a multistage decomposition of the linear MMSE estimator. A key feature of these methods is that they do not require an eigen-decomposition and have a very good performance. It turns out that Krylov subspace algorithms that are used for solving very large and sparse systems of linear equations, are suitable alternatives for performing dimensionality reduction. The basic idea of Krylov subspace algorithms is to construct the transformation matrix with the following structure:
where and denotes the Euclidean norm (or the -norm) of a vector. In order to compute the basis vectors of the Krylov subspace (the vectors of ), a designer can either directly employ the expression in (14) or resort to more sophisticated approaches such as the Arnoldi iteration . An appealing feature of the Krylov subspace algorithms is that the required model order does not scale with the system size. Indeed, when goes to infinity the required remains a finite and relatively small value. This result was established by Xiao and Honig . Among the disadvantages of Krylov subspace methods are the relatively high computational cost of constructing (), the numerical instability of some implementations and the lack of flexibility for imposing constraints on the design of the basis vectors.
Iv-C Joint iterative optimization techniques
The aim of this part is to introduce the reader to dimensionality reduction algorithms based on joint iterative optimization (JIO) techniques. The idea of these methods is to design the main components of the reduced-rank signal processing system via a general optimization approach. The basic ideas of JIO techniques have been reported in [25, 26, 27]. Amongst the advantages of JIO techniques are the flexibility to choose the optimisation algorithm and to impose constraints, which provides a significant advantage over eigen-based and Krylov subspace methods. One disadvantage that is shared amongst the JIO techniques, eigen-based and Krylov subspace methods are the complexity associated with the design of the matrix . For instance, if we are to design a dimensionality reduction algorithm with a very large , we still have the problem of having to design an matrix .
In the framework of JIO techniques, the design of the matrix and the parameter vector for a fixed model order will be entirely dictated by the optimization problem. To this end, we will focus on a generic , in which the basis vectors , will be obtained via an optimization algorithm and iterations between the and will be performed. The JIO method consists of solving the following optimization problem
where corresponds to a vector with a one in the th positions and zeros elsewhere. It should be remarked that the optimization problem in (15) is non convex, however, the algorithms do not present convergence problems. Numerical studies with JIO methods indicate that the minima are identical and global. Proofs of global convergence have been established with different versions of JIO schemes [25, 27], which demonstrate that the LS algorithm converges to the reduced-rank Wiener filter.
The solution to the problem in (15) for a particular basis vector for can be obtained by fixing the remaining basis vectors for and , computing the gradient terms of the cost function defined in (15) with respect to and equating the terms to a null vector. The solution to (15) for can be obtained in a similar way. We fix all the basis vectors , compute the gradient terms of with respect to and equate the terms to a null vector. Moreover, we also allow the basis vectors that are the columns of and the parameter vector to exchange information between each other via iterations, as illustrated in Fig. 2. This results in the following JIO algorithm
where is an vector with cross-correlations. The recursions in (16) and (17) are updated over time and iterated times for each instant until convergence to a desired solution. In practice, the iterations can improve the convergence performance of the algorithms and it suffices to use iterations. In terms of complexity, the JIO techniques have a computational cost that is related to the optimization algorithm. With recursive LS algorithms the complexity is quadratic with (), whereas the complexity can be as low as linear with when stochastic gradient algorithms are adopted .
Iv-D Joint iterative optimization techniques with simplified structures and switching
In this subsection, we introduce the reader to dimensionality reduction algorithms based on JIO techniques with simplified structures aided by the concept of switching (JIOS). One disadvantage that is shared amongst the JIO techniques, eigen-based and Krylov subspace methods is the complexity associated with the design of the matrix . For instance, if we are to design a dimensionality reduction algorithm with a very large , we still have the problem of having to design an matrix with the computational costs associated with complex coefficients. An approach to circumventing this is based on the design of with simplified structures, which can be done in a number of ways. For example, a designer can employ random projections  or impose design constraints on such that the number of computations can be significantly reduced. The main drawback of these simplified structures is the associated performance loss evidenced by the large reconstruction error, which is typically larger than that obtained with more complex dimensionality reduction algorithms. In order to address this issue, the JIOS framework incorporates multiple simplified structures that are selected according to a switching mechanism, aiming at minimizing the reconstruction error. Switching techniques play a fundamental role in diversity systems employed in wireless communications systems and control systems . They can increase the accuracy of a particular procedure by allowing a signal processing algorithm to choose between a number of signals or estimates.
The basic idea of the JIOS-based methods is to address the problem of reconstruction error associated with the design of a transformation . To this end, the strategy is to employ multiple transformation matrices in order to obtain smaller reconstruction errors by seeking the best available transformation. In order to illustrate the JIOS framework, we consider the block diagram in Fig. 3 in which multiple transformation matrices in for are employed and iterations can be performed between and the parameter vector . In addition to this, another goal is to simplify the structure of by imposing design constraints, which can correspond to having very few non zero coefficients or even having deterministic patterns for each branch .
One example of the JIOS framework, is the joint and iterative interpolation, decimation and filtering (JIDF) scheme recently reported in . Let us now review the JIDF scheme and describe it using the JIOS framework. In the JIDF scheme, the basic idea is to employ an interpolator with coefficients, a decimation unit and a reduced-rank parameter vector with coefficients. A key strategy for the design of parameters of the JIDF scheme is to express the output as a function of , the decimation matrix and as follows:
where is an vector, the matrix has an -dimensional identity matrix starting at the -th row, is shifted down by one position for each and the remaining elements are zeros, and the matrix with the samples of has a Hankel structure described by
The expression in (18) indicates that the dimensionality reduction carried out by the proposed scheme depends on finding appropriate , for constructing .
The design of the decimation matrix employs for each row the structure:
and the index () denotes the -th row of the matrix, the rank of the matrix is , the decimation factor is and corresponds to the number of parallel branches. The quantity is the number of zeros chosen according to a given design criterion.Given the constrained structure of , it is possible to devise an optimal procedure for designing via an exhaustive search of all possible design patterns with the adjustment of the variable . The exhaustive procedure has a total number of patterns equal to . The exhaustive scheme is too complex for practical use and it is fundamental to devise decimation schemes that are cost-effective. By adjusting the variable , a designer can obtain various sub-optimal schemes including pre-stored (e.g. ) and random patterns.
The decimation matrix is selected to minimize the square of the instantaneous error signal obtained for the branches employed as follows
where . The vector is computed by
The design of and corresponds to solving the following optimization problem
It should be remarked that the optimization problem in (23) is non convex, however, the methods do not present problems with local minima. Proofs of convergence are difficult due to the switching mechanism and constitute an interesting open problem. Using a similar approach to the JIO technique, we obtain the following recursions for computing and :
where , , and . In terms of complexity, the computational cost of the JIDF scheme scales linearly with and quadratically with and . Since and are typically very small (- coefficients), this makes the JIDF scheme a low-complexity alternative as compared with PC, Krylov and JIO schemes.
Iv-E Summary of Dimensionality Reduction Algorithms
In this part, we summarize the most representative LS-based algorithms for dimensionality reduction presented in the previous subsections and provide a table that explain how to simulate these algorithms. Specifically, we consider the PC method , the MSWF , the JIO  and the JIDF techniques .
In this section, we will describe a number of applications for reduced-rank signal processing and dimensionality reduction algorithms and link them with new research directions and emerging fields. Among the key areas for these methods are wireless communications, sensor and array signal processing, speech and audio processing, image and video processing. A key aspect that will considered is the need for dimensionality reduction and typical values for the dimension of the observed vector, the model order and the compression ratio .
V-a Wireless communications:
In wireless communications, a designer must deal with stringent requirements in terms of quality of service, an increasingly demand for higher data rates and scenarios with time-varying channels. At the heart of the problems in wireless communications lie the need for designing transmitters and receivers, algorithms for data detection and channel and parameter estimation. These problems are ubiquitous and common to spread spectrum, multi-carrier and multiple antenna systems. Specifically, when the number of parameters grows beyond a certain level (which is often the case), the level of interference is high and the channel is time-varying, reduced-rank signal processing and algorithms for dimensionality reduction can play a decisive role in the design of wireless communication systems. For instance, in spread spectrum systems we often encounter receiver design problems that require the computation of a few hundred parameters, i.e., and we typically wish to perform a dimensionality reduction which leads to a model order of a few elements, i.e., and which yields a . In a multi-antenna system in the presence of flat fading, the number of parameters is typically small () for spatial processing and corresponds to the number of antennas used at the transmitter and at the receiver. However, if we consider multi-antenna systems in the presence of frequency selective channels, the numbers can increase to dozens of coefficients. In addition, the combination of multi-antenna systems with multi-carrier transmissions (eg. MIMO-OFDM) in the presence of time-selective channels, can require the equalization of structures with hundreds of coefficients. Therefore, the use of reduced-rank signal processing techniques and dimensionality reduction algorithms can be of fundamental importance in the problems previously described. In what follows, we will illustrate with a numerical example the design of a space-time reduced-rank linear receiver for interference suppression in a spread spectrum system equipped with antenna arrays.
Numerical Example: Space-Time Interference Suppression for Spread Spectrum Systems
We consider the uplink of a direct-sequence code-division multiple access (DS-CDMA) system with symbol interval , chip period , spreading gain , users, multipath channels with the maximum number of propagation paths , where . The system is equipped with an antenna that consists of a uniform linear array (ULA) and sensor elements. The spacing between the ULA elements is , where is carrier wavelength. We assume that the channel is constant during each symbol, the base station receiver is perfectly synchronized and the delays of the propagation paths are multiples of the chip rate. The received signal after filtering by a chip-pulse matched filter and sampled at the chip period yields the received vector at time
where , denotes the data symbol of user at time , is the amplitude of user , the complex Gaussian noise vector is with , , corresponds to the intersymbol interference and denote transpose and Hermitian transpose, respectively, and stands for expected value. The spatial signatures for previous, current and future data symbols are , , and are constructed with the stacking of the convolution between the signature sequence of user and the channel vector of user at each antenna element .
A linear reduced-rank space-time receiver for a desired user can be designed by linearly combining the received vector with the transformation and the reduced-rank parameter vector as expressed by
where the data detection corresponds to applying the signal to a decision device as given by , where is the function that implements the decision device and depends on the modulation.
Let us now consider simulation examples of the space-time reduced-rank receiver described above in which the dimensionality reduction techniques using LS optimization are compared. In all simulations, we use the initial values and , , employ randomly generated spreading codes with , assume as an upper bound, use -path channels with relative powers given by , and dB, where in each run the spacing between paths is obtained from a discrete uniform random variable between and chips and average the experiments over runs. The system has a power distribution among the users for each run that follows a log-normal distribution with associated standard deviation equal to dB. We consider antenna arrays equipped with and elements. The dimension of the observed signal in these examples corresponds to , which corresponds to (for ) and (for ). These figures leads to a (for ). We compare the full-rank, the PC method , the MSWF  Krylov subspace technique, the JIO technique  and the JIOS (JIDF) scheme with an optimized model order for each method. We also include the linear full-rank MMSE receiver that assumes the knowledge of the channels and the noise variance at the receiver. Each reduced-rank scheme provides an estimate of the desired symbol for the desired used (user in all experiments) and we assess the bit error rate (BER) against the number of symbols. We transmit packet with QPSK symbols in which are used for training. After the training phase the space-time receivers are switched to decision-directed mode. In Fig. 4, we show an example of the BER performance versus the number of symbols, whereas in Fig. 5 we consider examples with the BER performance versus the SNR and the number of users.
The curves depicted in Figs. 4 and 5 show that the dimensionality reduction applied to the received signals combined with the reduced-rank processing can accelerate significantly the convergence of the adaptive receivers. The best results are obtained by the JIDF and JIO methods, which approach the linear MMSE receiver and are followed by the MSWF, the PC-based receiver and the full-rank technique. The algorithms analyzed show very good performance for different values of SNR and number of users in the system. A key feature to be remarked is the ability of the subspace-based algorithms to converge faster and to obtain good results in short data records, reducing the requirements for training.
V-B Sensor and array signal processing:
The basic aim of sensor and array signal processing is to consider temporal and spatial information, captured by sampling a wave field with a set of appropriately placed antenna elements or sensor devices. These devices are organized in patterns or arrays which are used to detect signals and to determine information about them. The wavefield is assumed to be generated by a finite number of emitters, and contains information about signal parameters characterizing the emitters. A number of applications for sensor and array signal processing have emerged in the last decades and include active noise and vibration control, beamforming, direction finding, harmonic retrieval, distributed processing for networks, radar and sonar systems. In these applications, when the number of parameters grows beyond a certain level the signal-to-noise ratio is low and the level of interference is high, reduced-rank signal processing and algorithms for dimensionality reduction can offer an improved performance as compared with conventional full-rank techniques. For example, in broadband beamforming or space-time adaptive processing applications for radar we may have to deal with an optimization problem that requires the processing of observed signals with dozens to hundreds of coefficients, i.e., . In this case, a dimensionality reduction which leads to a model order with only a few elements, i.e., and which has a can facilitate the design and be highly beneficial to the performance of the system. A number of other problems in sensor and array signal processing can be cost-effectively addressed with reduced-rank signal processing techniques and dimensionality reduction algorithms. In what follows, we will illustrate with a numerical example the design of adaptive beamforming techniques.
Numerical Example: Adaptive reduced-rank beamforming
Let us consider a smart antenna system equipped with a uniform linear array (ULA) of elements. Assuming that the sources are in the far field of the array, the signals of narrowband sources impinge on the array with unknown directions of arrival (DOA) for . The input data from the antenna array can be organized in an observed vector expressed by
where is the matrix of signal steering vectors. The signal steering vector is defined as
for a signal impinging at angle , , where is the inter-element spacing, is the wavelength and denotes the transpose operation. The vector denotes the complex vector of sensor noise, which is assumed to be zero-mean and Gaussian with covariance matrix .
Let us now consider the design of an adaptive reduced-rank minimum variance distortionless response (MVDR) beamformer. The design problem consists of computing the transformation and via the following optimization problem
In order to solve the above problem, we can resort to algorithms outlined in Section IV. The main difference is that now we need to minimize the mean-square value of the output of the array and enforce the constraint that ensures the response of the reduced-rank beamforming algorithm to be equal to unity.
We consider an example of adaptive beamforming in a non-stationary scenario where the system has users with equal power and the environment experiences a sudden change at time . We assess the performance of the system in terms of the signal-to-interference-plus-noise ratio (SINR), which is defined as
where denotes the covariance matrix of the signal of interest (SoI) and is the covariance matrix of the interference. In the example, we consider the full-rank, the PC , the MSWF , the AVF , the JIO , the JIDF  and the MVDR that assumes perfect knowledge of the covariance matrix. The interferers impinge on the ULA at , , , , with equal powers to the SoI, which impinges on the array at . At the time instant we have interferers with dB above the SoI’s power level entering the system with DoAs , and , whereas one interferer with DoA and a power level equal to the SoI exits the system. The results of this example are depicted in Fig. 6. The curves show that the reduced-rank algorithms have a superior performance to the full-rank algorithm. The best performance is obtained by the JIDF scheme, which is followed by the JIO, the AVF, the MSWF, the PC and the full-rank algorithms.
V-C Audio, speech, image, and video processing:
Echo cancellation, prediction, compression and recognition of multimedia signals.
V-D Modelling for non-linear and large problems:
Neural networks and other bio-inspired structures, Volterra series.
Vi New frontiers and research directions:
We will consider the relationships between reduced-rank signal processing and algorithms for dimensionality reduction with emerging techniques that include compressive sensing and tensor decompositions.
Compressive sensing techniques - can substantially improve the performance of sensor array processing systems including MIMO radars by taking into account and exploiting the sparse nature of the signals encountered in these systems. In the literature of signal processing and information theory it has been recently shown that the use of compressive sensing techniques - can provide very significant gains in performance while requiring lower computational complexity requirements than existing techniques due to a smarter way of processing information. The main idea behind compressive sensing methods is to use linear projections that extract the key information from signals and then employ a reconstruction algorithm based on optimization techniques. In particular, the linear projections are intended to collect the samples that are meaningful for the rest of the procedure and perform significant signal compression. These linear projections are essentially dimensionality reduction procedures. Samples with very small magnitude that cannot be discerned from noise are typically good candidates for elimination. This is followed by a reconstruction algorithm that aims to recreate the original signal from the compressed version.
Vii Concluding Remarks
In this tutorial on reduced-rank signal processing, we reviewed design methods and algorithms for dimensionality reduction, and discussed a number of important applications. A general framework based on linear algebra and linear estimation was employed to introduce the reader to the fundamentals of reduced-rank signal processing and to describe how dimensionality reduction is performed on an observed discrete-time signal. A unified treatment of dimensionality reduction algorithms was presented and used to describe some key algorithms for dimensionality reduction and reduced-rank processing. A number of applications were considered as well as several examples were provided to illustrate this important area.
-  S. Haykin, Adaptive Filter Theory, edition, Prentice-Hall, 2002.
-  L. L. Scharf, Statistical Signal Processing: Detection, Estimation, and Time Series Analysis, New York: Addison-Wesley Publishing Co., 1990.
-  L. L. Scharf, “The SVD and reduced-rank signal processing,” Signal Processing, 24, pp. 111-130, November 1991.
-  H. Hotelling, “Analysis of a Complex of Statistical Variables into Principal Components,” Journal of Educational Psychology, vol. 24, no. 6/7, pp. 417–441, 498–520, September/October 1933.
-  I. T. Jolliffe, Principal component analysis, New York, Springer Verlag, 1986 (2 ed., 2002).
-  M. O. Ulfarsson and V. Solo, “Sparse Variable PCA Using Geodesic Steepest Descent”, IEEE Transactions on Signal Processing, 2001, V.56, No. 12, December 2008, pp. 5823-5832.
-  T. Kohonen, S. Kaski, K. Lagus, J. Salojärvi, J. Honkela, V. Paatero and V. Saarela, “Self organization of massive document collection, IEEE Transactions on Neural Networks, vol. 11, no. 3, May 2000, pp.574-585.
-  R. C. de Lamare and R. Sampaio-Neto, “Adaptive Reduced-Rank MMSE Filtering with Interpolated FIR Filters and Adaptive Interpolators”, IEEE Sig. Proc. Letters, vol. 12, no. 3, March, 2005.
-  S. Lafon and A.B. Lee, “Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 9, 2006, pp.1393–1403.
-  M.H.C. Law, A.K. Jain, “Incremental nonlinear dimensionality reduction by manifold learning”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 3, 2006, pp. 377 - 391.
-  Jun Yan, Benyu Zhang, Ning Liu, Shuicheng Yan, Qiansheng Cheng, W. Fan, Qiang Yang, W. Xi, Zheng Chen, “Effective and efficient dimensionality reduction for large-scale and streaming data preprocessing ”, IEEE Transactions on Knowledge and Data Engineering, vol. 18, no. 3, 2006 , pp. 320 - 333.
-  G. Sanguinetti, “Dimensionality Reduction of Clustered Data Sets”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 3, 2008, pp. 535-540.
-  M. R. Hestenes and E. Stiefel, “Methods of Conjugate Gradients for Solving Linear Systems,” Journal of Research of the National Bureau of Standards, vol. 49, no. 6, pp. 409–436, December 1952.
-  C. Lanczos, “Solution of Systems of Linear Equations byMinimized Iterations,” Journal of Research of the National Bureau of Standards, vol. 49, no. 1, pp. 33– 53, July 1952.
-  W. E. Arnoldi, “The Principle of Minimized Iterations in the Solution of the Matrix Eigenvalue Problem,” Quarterly of Applied Mathematics, vol. 9, no. 1, pp. 17–29, January 1951.
-  J. S. Goldstein and I. S. Reed, “Reduced-Rank Adaptive Filtering,” IEEE Transactions on Signal Processing, vol. 45, no. 2, pp. 492–496, February 1997.
-  J. S. Goldstein, I. S. Reed, and L. L. Scharf, “A multistage representation of the Wiener filter based on orthogonal projections,” IEEE Transactions on Information Theory, vol. 44, November 1998.
-  R. Badeau, B. David, and G. Richard, “Fast approximated power iteration subspace tracking,” IEEE Trans. Signal Processing, vol. 53, pp. 2931-2941, Aug. 2005.
-  M. L. Honig and J. S. Goldstein, “Adaptive reduced-rank interference suppression based on the multistage Wiener filter,” IEEE Transactions on Communications, vol. 50, no. 6, June 2002.
-  R. C. de Lamare, M. Haardt, and R. Sampaio-Neto, “Blind Adaptive Constrained Reduced-Rank Parameter Estimation based on Constant Modulus Design for CDMA Interference Suppression”, IEEE Transactions on Signal Processing, June March 2008.
-  N. Song, R. C. de Lamare, M. Haardt, and M. Wolf, “Adaptive Widely Linear Reduced-Rank Interference Suppression based on the Multi-Stage Wiener Filter,” IEEE Transactions on Signal Processing, vol. 60, no. 8, 2012.
-  D. A. Pados and G. N. Karystinos, “An iterative algorithm for the computation of the MVDR filter,” IEEE Trans. on Sig. Proc., vol. 49, No. 2, February, 2001.
-  H. Qian and S.N. Batalama, “Data record-based criteria for the selection of an auxiliary vector estimator of the MMSE/MVDR filter”, IEEE Trans. on Communications, vol. 51, no. 10, Oct. 2003, pp. 1700 - 1708.
-  Y. Hua and M. Nikpour, “Computing the reduced rank Wiener filter by IQMD,” IEEE Signal Processing Letters, pp. 240-242, Vol. 6, Sept. 1999.
-  Y. Hua, M. Nikpour, and P. Stoica, “Optimal reduced-rank estimation and filtering,” IEEE Transactions on Signal Processing, 2001, V.49, 457-469.
-  R. C. de Lamare and R. Sampaio-Neto, “Reduced-Rank Adaptive Filtering Based on Joint Iterative Optimization of Adaptive Filters, ” IEEE Signal Processing Letters, Vol. 14 No. 12, December 2007, pp. 980 - 983.
-  R. C. de Lamare and R. Sampaio-Neto, “Reduced-Rank Space-Time Adaptive Interference Suppression With Joint Iterative Least Squares Algorithms for Spread-Spectrum Systems,” IEEE Transactions on Vehicular Technology, vol.59, no.3, March 2010, pp.1217-1228.
-  R.C. de Lamare and R. Sampaio-Neto, “Adaptive reduced-rank equalization algorithms based on alternating optimization design techniques for MIMO systems,” IEEE Trans. Veh. Technol., vol. 60, no. 6, pp. 2482-2494, July 2011.
-  R. C. de Lamare and R. Sampaio-Neto, “Adaptive Reduced-Rank Processing Based on Joint and Iterative Interpolation, Decimation and Filtering”, IEEE Transactions on Signal Processing, vol. 57, no. 7, July 2009, pp. 2503 - 2514.
-  R. Fa, R. C. de Lamare, and L. Wang, “Reduced-Rank STAP Schemes for Airborne Radar Based on Switched Joint Interpolation, Decimation and Filtering Algorithm,” IEEE Transactions on Signal Processing, vol.58, no.8, Aug. 2010, pp.4182-4194.
-  R.C. de Lamare, R. Sampaio-Neto, M. Haardt, ”Blind Adaptive Constrained Constant-Modulus Reduced-Rank Interference Suppression Algorithms Based on Interpolation and Switched Decimation,” IEEE Transactions on Signal Processing, vol.59, no.2, pp.681-695, Feb. 2011.
-  H. Akaike, “A new look at the statistical model identification,” IEEE Transactions on Automatic Control, vol. 19, no. 6, pp. 716–723, 1974.
-  J. Rissanen, “Modeling by shortest data description, Automatica, vol. 14, pp. 465-471, 1978.
-  P. Stoica and Y. Selén, “Model-order selection: a review of information criterion rules,” IEEE Signal Processing Magazine, vol. 21, no. 4, pp. 36–47, 2004.
-  W. Xiao and M. L. Honig, “Large System Transient Behavior of Adaptive Least Squares Algorithms”, IEEE Transactions on Information Theory, Vol. 51, No. 7, pp. 2447-2474, July 2005.
-  Z. Sun and S. S. Ge, Switched Linear Systems: Control and Design, London: Springer-Verlag, 2005.
-  I.D. Schizas, G.B. Giannakis, Z.-Q. Luo, “Distributed Estimation Using Reduced-Dimensionality Sensor Observations”, IEEE Transactions on Signal Processing, vol. 55, no. 8, 2007, pp. 4284-4299.
-  D. Luenberger, Linear and Nonlinear Programming, 2nd Ed. Addison-Wesley, Inc., Reading, Massachusetts 1984.
-  E. Candès and M. B. Wakin, ”An Introduction to Compressive Sampling”, IEEE Signal Processing Magazine, March 2008, pp. 21 - 30.
-  D. Donoho, ”Compressed Sensing,” IEEE Trans. Information Theory, vol. 52, no. 4, pp. 1289-1306.
-  E. Candès and T. Tao, ”Near-optimal Signal Recovery from Random Projections: Universal Encoding Strategies?, ”IEEE Trans. Information Theory, vol. 52, no. 12, pp. 5406-5425.
-  J. Haupt and R. Nowak, ”Signal Reconstruction from Noisy Random Projections”, IEEE Trans. Information Theory, vol. 52, no. 9, pp. 4036-4048.
-  M. Figueiredo, R. Nowak, S. Wright, ”Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problems”, IEEE Journal of Selected Topics in Signal Processing: Special Issue on Convex Optimization Methods for Signal Processing, vol. 1, no. 4, pp. 586-598, 2007.