A Novel Approach to Quantized Matrix Completion Using Huber Loss Measure

# A Novel Approach to Quantized Matrix Completion Using Huber Loss Measure

Ashkan Esmaeili and Farokh Marvasti A. Esmaeili was with the Department of Electrical Engineering, Stanford University, California, USA e-mail: esmaeili.ashkan@alumni.stanford.eduA. Esmaeili and F. Marvasti are now with the Electrical Engineering Department and Advanced Communications Research Institute (ACRI), Sharif University of Technology, Tehran, Iran.
###### Abstract

In this paper, we introduce a novel and robust approach to Quantized Matrix Completion (QMC). First, we propose a rank minimization problem with constraints induced by quantization bounds. Next, we form an unconstrained optimization problem by regularizing the rank function with Huber loss. Huber loss is leveraged to control the violation from quantization bounds due to two properties: 1- It is differentiable, 2- It is less sensitive to outliers than the quadratic loss. A Smooth Rank Approximation is utilized to endorse lower rank on the genuine data matrix. Thus, an unconstrained optimization problem with differentiable objective function is obtained allowing us to advantage from Gradient Descent (GD) technique. Novel and firm theoretical analysis on problem model and convergence of our algorithm to the global solution are provided. Another contribution of our work is that our method does not require projections or initial rank estimation unlike the state-of-the-art. In the Numerical Experiments Section, the noticeable outperformance of our proposed method in learning accuracy and computational complexity compared to those of the state-of-the-art literature methods is illustrated as the main contribution.

Quantized Matrix Completion; Huber Loss; Graduated Non-Convexity; Smoothed Rank Function; Gradient Descent Method

## I Introduction

In this paper, we extend the Matrix Completion (MC) problem, which has been considered by many authors in the past decade [1, 2, 3], to the Quantized Matrix Completion (QMC) problem. In QMC, accessible entries are quantized rather than continuous, and the rest are missing. The purpose is to recover the original continuous-valued matrix under certain assumptions.
QMC problem addresses wide variety of applications including but not limited to collaborative filtering [4], sensor networks [5], learning and content analysis [6] according to [7].

A special case of QMC, one-bit MC, is considered by several authors. In [8] for instance, a convex programming is proposed to recover the data by maximizing a log-likelihood function. In [9], a maximum likelihood (ML) set-up is proposed with max-norm constraint towards one-bit MC. In [10], a greedy algorithm as an extension of conditional gradient descent, is proposed to solve an ML problem with rank constraint. However, the scope of this paper is not confined to one-bit MC, and covers multi-level QMC. We investigate multi-level QMC methodologies in the literature hereunder:
In [11], the robust Q-MC method is introduced based on projected gradient (PG) approach in order to optimize a constrained log-likelihood problem. The projection guarantees shrinkage in Trace norm tuned by a regularization parameter.
Novel QMC algorithms are introduced in [7]. An ML estimation under an exact rank constraint is considered as one part. Next, the log-likelihood term is penalized with log-barrier function, and bilinear factorization is utilized along the Gradient Descent (GD) technique to optimize the resulted unconstrained problem. The suggested methodologies in [7] are robust, leading to noticeable accuracy in QMC. However, the two algorithms in [7] depend on knowledge of an upper bound for the rank (an initial rank estimation), and may suffer from local minimia or saddle points issues. In [12], Augmented Lagrangian method (ALM) and bilinear factorization are utilized to address the QMC. Enhanced accuracy in recovery is observed compared to previous works in [12].

In this paper, Huber loss and Smoothed Rank Function (SRF), which are differentiable, are utilized to induce penalty for violating quantization bounds and increase in the rank, respectively. Differentiability makes the optimization framework suitable for GD approach. It is worth noting that although Huber is convex, SRF is generally non-convex. However, we leverage Graduated Non-Convexity (GNC) approach to solve consecutive problems, in which the local convexity in a specific domain enclosing the global optimum is maintained. The solution to each problem is utilized as a warm-start to the next problem. Utilizing warm-starts and smooth transition between problems ensure the warm-start falls in a locally convex enclosure of the global optimum. It is theoretically analyzed how the SRF parameter can be tuned and shrunk gradually to guarantee the local convexity in each problem is obtained which finally leads to the global optimum. Unlike [7], our method does not require an initial rank upper bound estimation, neither projections as in [7] and [11].

The rest of the paper is organized as follows: Section II includes the problem model and discussion on Huber Loss. In Section III, Smoothing Rank Approximation is discussed. Section IV, includes our proposed algorithm. Theoretical analysis for the global convergence of our algorithm is given in Section V. Simulation results are provided in Section VI. Finally, the paper is concluded in Section VII.

## Ii Problem Model

We assume a quantized matrix is partially observed; i.e., the entries of are either missing or reported as integer values (levels) . Different levels are spaced with distance known as the quantization gap. We also assume the rounding rule forces the entries of the original matrix to be quantized to a level within of their vicinities. We assume the original matrix, from which the quantized data are obtained, has the low-rank property as in many practical settings. Thus, the following optimization problem on is reached:

 minXrankXsubject tolij≤xij≤uij∀(i,j)∈Ω, (1)

where is the observation set, and are the upper and lower quantization bounds of the -th observed entry. In our model, the bounds are assumed to symmetrically enclose , i.e., . We also add that the number of levels is considered to be known and no entry exceeds the quantization bounds of ultimate levels in the original matrix.
The Huber function for the -th entry in is defined as follows:

 Hij(xij)={(xij−mij)2,|xij−mij|≤g2g(|xij−mij|−14g),o.w.

We modify the Huber loss by subtracting ; i.e.,

 ~Hij(xij)=Hij(xij)−14g2

Huber loss is used in robust regression to advantage from desirable properties of both and penalty. Noise may have forced the original matrix entries to deviate from their genuine quantization bounds. Thus, we intentionally use linearly growing () penalty for violations from quantization bounds to be less sensitive to outliers as squared loss is. The interpretation of translating Huber as defined above is to reward entries which hold in constraints of the problem 1. This reward is quadratic which does not vary as sharply as penalty on the feasible region. The entire feasible region is delighted. Thus, sharply varying behavior on feasible region is pointless. In addition, this specific quadratic reward makes the compromise of and convex and differentiable to profit us later in the paper. The motivation of applying Huber function in our problem is to turn the constrained problem 1 to an unconstrained regularized problem. The regularization term is defined as follows:

 HΩ(X)=∑(i,j)∈Ω~Hij(xij) (2)

Thus, the unconstrained problem can be written as:

 minX G(X,λ):=rankX+λHΩ(X) (3)

Assumption1. The set of global solutions to the problem 1 is a singleton; i.e., problem 1 has a unique global minimizer .
Let , denote the set of global minimizers of the problem 3, , , and be defined as follows:

 Δ1=minx∈B1\leavevmode\nobreak \leavevmode\nobreak max(i,j)∈Ω\leavevmode\nobreak {~H(xij)|\leavevmode\nobreak ~H(xij)>0}. (4)
 Δ2=minx∈B2\leavevmode\nobreak \leavevmode\nobreak max(i,j)∈Ω\leavevmode\nobreak {~H(xij)|\leavevmode\nobreak ~H(xij)>0}. (5)

is trivially greater than zero. Otherwise, a feasible solution to problem1 exists with rank smaller than which is contradictory to the assumption . Let . We add two assumptions to follow our line of proof. (These assumptions address worst-case scenarios. In practice, they are not required to be such tight):
Assumption 2.
Assumption 3. , where is any positive small constant.

###### Proposition 1.

Suppose is any positive small constant. For each which holds in , is the singleton .

###### Proof.

Suppose . Three cases can be considered for :
Case : . We have:

 G(~X,λ)=rank~X+λ∑(i,j)∈Ω~Hij(xij)≥r∗+1−λ|Ω|g24≥ (6) r∗+1−|Ω|g2|Ω|g2+ϵ>r∗≥rank(X∗)+λHΩ(X∗)=G(X∗,λ)

, which is in contradiction to the assumption that is the global minimizer of problem 3. We used the definition of , translated Huber minimal value, the upper bound on in Assumption 3, and the fact that is negative due to feasibility of in problem 1.
Case ; i.e., .
Thus, using lower bound on in Assumption 3:

 G(~X,λ)=rank~X+λ∑(i,j)∈Ω~Hij(xij)≥
 rank~X+λΔ−λ(|Ω|−1)g24≥rank~X+r∗>r∗≥
 rankX∗+λHΩ(X∗)=G(X∗,λ)⇒G(X∗,λ)

which is again in contradiction with the assumption that is the global minimizer of problem 3.
Case
If at least one entry of violates the constraints in problem 1, then , and similar reasoning in (7) can be applied to contradict the global optimality of . Finally, if and holds in the constraints in problem 1, then by Assumption 1. Thus, . ∎

## Iii Smoothed Rank Approximation

While reviewing Huber loss, we mentioned it is convex and differentiable. We aim to find a convex differentiable surrogate for rank function to leverage GD method. Trace norm is usually considered as the rank convex surrogate. However, Trace norm is not differentiable. In addition, Sub-Gradient methods for Trace norm are computationally complex and have convergence rate issues. Thus, we seek for a convex differentiable rank approximation to leverage GD instead of Sub-Gradient based approaches. In this regard, we approximate the rank function in problem 1 with the Smoothed Rank Function (SRF). SRF is defined using a certain function satisfying QRA conditions introduced in [13]. Assume satisfies QRA conditions, and let . Among functions satisfying the QRA conditions, we consider throughout this paper. Let denote the th singular value of . We define as follows:

 Fδ(X)=n∑i=1fδ(σi(X)). (8)

Our proposed SRF is considered to be . It can be observed that converges in a pointwise fashion to the Kronecker delta function as . Thus, we have:

 limδ→0\leavevmode\nobreak [n−Fδ(X)] =limδ→0\leavevmode\nobreak [n−n∑i=1fδ(σi(X))] =n−n∑i=1δ0(σi(X)) =rank(X). (9)

Therefore, when , SRF directly approximates the rank function. As a result, we substitute the rank function in problem 3 with the proposed SRF as follows:

 minX ~Gδ(X,λ):=n−Fδ(X)+λHΩ(X) (10)

The advantage of SRF to the rank function is that is smooth and differentiable. Hence, GD can be utilized for minimization. However, the SRF is in general non-convex. When tends to , the SRF is a good rank approximation as shown in III but with many local minimia. In order for GD not to get trapped by local minimia, we start with large for SRF. When the SRF becomes convex (proved in Proposition 2) yielding a unique global minimizer for problem 10. Yet, SRF with large is a bad rank approximation. This is where GNC approach as introduced in [14] is leveraged; i.e., we gradually decrease to enhance accuracy of rank approximation. A sequence of problems as in 10 (one for each value of ) is obtained. The solution to problem with a fixed is used as a warm-start for the next problem with new . If is shrunk gradually, then the continuity property of (a QRA condition) leads to close solutions for subsequent problems. This way, GD is less probable to get trapped in local minima. The Huber loss which is also convex, acts like the augmented term in augmented Lagrangian method. It helps making the Hessian of locally positive-definite. Choosing warm-starts to fall in a convex vicinity of the global minimizer where no other local minimia is present, gradual shrinkage of (smooth transition between problems not to be prone to new local minima), and continuity of lead to finding the global minimizer. Rigid mathematical analysis on shrinkage rate is provided in Section V.

When ,

###### Proof.

When , the Taylor expansion for is as follows:

 exp(−x22δ2)=1−x22δ2+O(1δ4)
 Fδ(X)=n∑i=1exp(−σ2i(X)2δ2)=n∑i=1(1−σ2i(X)2δ2)+O(1δ4)
 ⇒n−Fδ(X)=12δ2n∑i=1σ2i(X)+O(1δ4)=∥X∥2F2δ2+O(1δ4)
 ⇒n−Fδ(X)→∥X∥2F2δ2 (11)

Hence, when , the objective function in problem 10 tends to , which is strictly convex and GD can be applied to find its global minimizer. Next, GNC is leveraged until the global minimizer is reached. Algorithm 1 in the subsequent section, includes the detailed procedure.

## Iv The Proposed Algorithm

Suppose has the SVD , where . It is shown in [13] that (gradient of ) can be obtained as:

 Gδ(X)=Udiag{−σ1δ2exp(−σ212δ2),...,−σnδ2exp(−σ2n2δ2)}VT, (12)

The derivative of the uni-variate Huber loss for entries in can be calculated as follows:

 ~Hij′(xij)=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩−g,xij−mij≤−g22(xij−mij),|xij−mij|≤g2g,xij−mij≥g2

Let denote . We have:

 GHij(X)={~Hij′(xij),(i,j)∈Ω0,o.w.

The gradient of is therefore given as:

 ∇~Gδ(X,λ)=−Gδ(X)+λGH(X) (13)

Finally, taking into account the GNC procedure and the fact that gradient look-up table is available for , our proposed algorithm QMC-HANDS is given in Algorithm 1. The convergence criteria in Algorithm 1 are based on relative difference in the Frobenius norm of consecutive updates.

## V Theoretical Analysis on Global Convergence

In this section, we propose a sufficient decrease condition for which ensures the GD is not trapped in local minima, and the global minimizer is achieved. Suppose a warm-start holds in for a positive constant , and is convex on . Let and denote the Hessian of SRF and Huber, respectively. We have on . is since is . Let (normalized w.r.t ). Thus, , we have:

 −1δ3T1(X)+λDH(X)≽0⇒δ3λDH(X)≽T1(X) (14)

This gives a lower bound for in the -th iteration as . Assume the problem with warm-start is optimized on to reach at a new minimizer . By assumption, is closer to (in Frobenius norm) than since the smaller , the better rank approximation. Suppose . Now, let . By definition, . Therefore, by the definition of the minimizer. The sufficient decrease ratio is given as follows: , which depends on .

## Vi Numerical Experiments

In this section, we provide numerical experiments conducted on the MovieLens100K dataset [15] and [16]. MovieLens100K contains ratings (instances) from users on movies, where each user has rated at least movies. Our purpose is to predict the ratings which have not been recorded or completed by users. We assume this rating matrix is a quantized version of a genuine low-rank matrix and recover it using our algorithm. Then, a final quantization can be applied to predict the missing ratings. We compare the learning accuracy and the computational complexity of our proposed approach to state-of-the-art methods discussed in introduction on MovieLens100K I: Logarithmic Barrier Gradient Method (LBG) [7], SPARFA-Lite (abbreviated as SL) [11], [17], and QMC-BIF [12] in Tables I, II.

We abbreviate the term ”missing rate” in Tables I, II with MR. We have induced different MR percentages on the MovieLens100K dataset and averaged the performance of each method over runs of simulation. are set using cross-validation.
It can be seen that owing to the differentiability and smoothness, our proposed method is fast. As it can be found in Table II, the computational runtime of the QMC-HANDS is reduced in some cases by compared to the state-of-the-art. The computational time in seconds are measured on an Intel Core i7 6700 HQ 16 GB RAM system using MATLAB ®. In addition, the learning accuracy of our method is enhanced up to in the best case, and outperforms other mentioned methods in the remaining simulation scenarios as reported in Table I. The superiority of our proposed algorithm compared to other mentioned methods is also observed on synthetic datasets. We aim to include the results on synthesized data in an extended work as the future work of this paper. It is needless to say that like [12], no projection is required in our proposed algorithm in contrast to [7], and [11].

## Vii Conclusion

In this paper, a novel approach to Quantized Matrix Completion (QMC) using Huber loss measure is introduced. A novel algorithm, which is not restricted to have initial rank knowledge, is proposed for an unconstrained differentiable optimization problem. We have established rigid and novel theoretical analyses and convergence guarantees for the proposed method. The experimental contribution of our work includes enhanced accuracy in recovery (up to ), and noticeable computational complexity reduction () compared to state-of-the-art methods as illustrated in numerical experiments.

## References

• [1] E. J. Candès and B. Recht, “Exact matrix completion via convex optimization,” Foundations of Computational mathematics, vol. 9, no. 6, p. 717, 2009.
• [2] J.-F. Cai, E. J. Candès, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM Journal on Optimization, vol. 20, no. 4, pp. 1956–1982, 2010.
• [3] R. H. Keshavan, A. Montanari, and S. Oh, “Matrix completion from a few entries,” IEEE transactions on information theory, vol. 56, no. 6, pp. 2980–2998, 2010.
• [4] Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” Computer, no. 8, pp. 30–37, 2009.
• [5] P. Biswas and Y. Ye, “Semidefinite programming for ad hoc wireless sensor network localization,” in Proceedings of the 3rd international symposium on Information processing in sensor networks.   ACM, 2004, pp. 46–54.
• [6] A. Esmaeili, K. Behdin, M. A. Fakharian, and F. Marvasti, “Transduction with matrix completion using smoothed rank function,” arXiv preprint arXiv:1805.07561, 2018.
• [7] S. A. Bhaskar, “Probabilistic low-rank matrix completion from quantized measurements,” Journal of Machine Learning Research, vol. 17, no. 60, pp. 1–34, 2016. [Online]. Available: http://jmlr.org/papers/v17/15-273.html
• [8] M. A. Davenport, Y. Plan, E. Van Den Berg, and M. Wootters, “1-bit matrix completion,” Information and Inference: A Journal of the IMA, vol. 3, no. 3, pp. 189–223, 2014.
• [9] T. Cai and W.-X. Zhou, “A max-norm constrained minimization approach to 1-bit matrix completion,” The Journal of Machine Learning Research, vol. 14, no. 1, pp. 3619–3647, 2013.
• [10] R. Ni and Q. Gu, “Optimal statistical and computational rates for one bit matrix completion,” Artificial Intelligence and Statistics, 2016, pp. 426–434.
• [11] A. S. Lan, C. Studer, and R. G. Baraniuk, “Matrix recovery from quantized and corrupted measurements.” ICASSP, 2014, pp. 4973–4977.
• [12] A. Esmaeili, K. Behdin, F. Marvasti et al., “Recovering quantized data with missing information using bilinear factorization and augmented lagrangian method,” arXiv preprint arXiv:1810.03222, 2018.
• [13] M. Malek-Mohammadi, M. Babaie-Zadeh, A. Amini, and C. Jutten, “Recovery of low-rank matrices under affine constraints via a smoothed rank function,” IEEE Transactions on Signal Processing, vol. 62, no. 4, pp. 981–992, 2014.
• [14] A. Blake and A. Zisserman, Visual reconstruction.   MIT press, 1987.
• [15] F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,” Acm transactions on interactive intelligent systems (tiis), vol. 5, no. 4, p. 19, 2016.
• [16] “Movielens100k website.” [Online]. Available: https://grouplens.org/datasets/movielens/100k/
• [17] A. S. Lan, C. Studer, and R. G. Baraniuk, “Quantized matrix completion for personalized learning,” arXiv preprint arXiv:1412.5968, 2014.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters