Is Ordered Weighted Regularized Regression Robust to Adversarial Perturbation? A Case Study on OSCAR
Abstract
Many stateoftheart machine learning models such as deep neural networks have recently shown to be vulnerable to adversarial perturbations, especially in classification tasks. Motivated by adversarial machine learning, in this paper we investigate the robustness of sparse regression models with strongly correlated covariates to adversarially designed measurement noises. Specifically, we consider the family of ordered weighted (OWL) regularized regression methods and study the case of OSCAR (octagonal shrinkage clustering algorithm for regression) in the adversarial setting. Under a normbounded threat model, we formulate the process of finding a maximally disruptive noise for OWLregularized regression as an optimization problem and illustrate the steps towards finding such a noise in the case of OSCAR. Experimental results demonstrate that the regression performance of grouping strongly correlated features can be severely degraded under our adversarial setting, even when the noise budget is significantly smaller than the groundtruth signals.
Is Ordered Weighted Regularized Regression Robust to Adversarial Perturbation? A Case Study on OSCAR
PinYu Chen, Bhanukiran Vinzamuri, Sijia Liu ^{†}^{†}thanks: P.Y. Chen and B. Vinzamuri contribute equally to this work. 
IBM Research 
{PinYu.Chen, Bhanu.Vinzamuri, Sijia.Liu}@ibm.com 
Index Terms— Adversarial machine learning, ordered weighted norm, OWLregularized regression, OSCAR
1 Introduction
In recent years, adversarial machine learning has received tremendous attention as it provides new means of improving machine leaning performance and studying model robustness in the adversarial setting, such as generative adversarial networks (GANs) [1] and adversarial examples [2]. In image classification tasks, welltrained machine learning models such as deep convolutional neural networks have shown to be vulnerable to adversarial examples – humanimperceptible perturbations to natural images can be easily crafted to mislead the decision of a target image classifier [3, 4, 5], leading to new challenges on model robustness. The adversarial perturbations are often evaluated by the , and norms [6, 7, 8, 9, su2018robustness]. Beyond image classification, other machine learning tasks such as image captioning [chen2018attacking] or sequencetosequence text learning [11] have also shown to be vulnerable to adversarial examples. Moreover, it has been made possible to generate adversarial examples in the socalled blackbox setting by simply leveraging the inputoutput correspondence of a target model and performing zerothorder optimization [12, 13, 14].
Motivated by the recent studies in adversarial machine learning, in this paper we shift our focus to the robustness of regression models to adversarial perturbations. To the best of our knowledge, regression is a fundamental task in machine learning but little has been explored in the setting of adversarial perturbations. Specifically, we aim to investigate the robustness of the ordinary leastsquared regression models regularized by the ordered weighted (OWL) norm [15, 16]. The OWL family of regularizers is a widely adopted method for sparse regression with strongly correlated covariates. It is worth mentioning that the octagonal shrinkage and clustering algorithm for regression [17], which is called as OSCAR, is in fact a special case of the OWL regularizer [21]. OSCAR is known to be more effective in identifying feature groups (i.e., strongly correlated covariates) than other feature selection methods such as LASSO [22].
In this paper, we investigate the robustness of OSCAR to adversarial perturbations by formulating the process of finding the maximally disruptive noise of the measurement model as an optimization problem. Although the recent work in [18] has established a finitesample error upper bound on the OWLregularized regression models associated with a normbounded noise level, it still remains unclear whether an adversary can disrupt the identified feature groups by intentionally manipulating the measurement error within the same noise budget. In other words, our adversarial formulation is novel in the sense that it provides a worstcase robustness analysis of OSCAR by finding a disruptive measurement error (but still within a specified noise budget) in order to deviate the detected feature groups from the ground truths. More importantly, upon verifying the lack of robustness to adversarial perturbations, our method could be incorporated to devise resilient OWLregularized regression models via adversarial learning techniques.
Perhaps surprisingly, the experimental results show that using our proposed approach, it is possible to generate small normbounded perturbations to the measurement model, in order to deviate the regression results of OSCAR from the ground truths. Consequently, our results offer new insights on the robustness analysis of OWLregularized regression methods in the adversarial setting.
2 Background
For any realvalued vector , let denote the vector of its elementwise absolute value and let denote the elementpermuted vector of such that , where is the th largest component of . Given a vector such that and , the OWL norm [15, 16] is defined as
(1) 
The OSCAR regularizer [17] is a special case of the OWL norm in (1) when , where .
We consider the OWLregularized linear regression problem taking the following form:
(2) 
where is the vector of noisy measurements, is the design matrix, and is the regularization parameter of the OWL norm.
The seminal work in [18] establishes a finitesample error bound on the OWLregularized regression method under the measurement model
(3) 
where is an sparse vector (i.e., the signal) and is the measurement error (i.e., the noise). Let denote the vector with identical coefficients corresponding to identical columns in , and assume the entries in each column of are i.i.d. (standard Gaussian random variables) but different columns could be strongly correlated. Consider the norm bounded measurement error constraint , then the solution to (2) is guaranteed to satisfy the finitesample error upper bound [18]:
(4) 
where denotes the norm, the expectation is taken over the random design matrix , and and are some positive constants that we omitted for brevity (see Theorem 1.1 in [18]). Note that in this finitesample analysis no distributional assumptions are imposed on the measurement error other than its bounded norm. Similar error bound can be obtained when the rows of are i.i.d. Gaussian random vectors [18].
In general, the solution to (2) can be efficiently obtained by leveraging the proximal operator of OWL regularization. As illustrated in [19], one can use the fast iterative shrinkagethresholding algorithm (FISTA) [20] to obtain , which includes iterating the following optimization steps:

OWL proximal gradient descent –

Momentum –

Update and if not converged
The index denotes the FISTA iteration, and and denote the inverse of the step size and the momentum coefficient, respectively. The notation denotes matrix transpose.
Specifically, when the OWL regularizer reduces to OSCAR, its approximate proximity operator (APO) has a closedform expression given by [21]
(5) 
where is the vector of entrywise sign function ( or ), denotes entrywise product, denotes entrywise maximum value between and , and , where is a permutation matrix associated with a given vector satisfying , which can be obtained by taking and sorting its entries in descending order. In addition, for OSCAR the vector in OWL has the relation , where .
3 Main Results
Although a finitesample analysis for OWLregularized regression has been established under the norm constrained measurement error in [18], motivated by the recent advances in adversarial machine learning, we are interested in investigating its robustness to adversarially designed noises satisfying the same error budget . In other words, given an norm bounded threat model and a design matrix , we aim to find an optimal noise that could maximally degrade the performance of OWLregularized regression in terms of the detected feature groups. In this paper, we particularly focus on the case of OSCAR with APO as its solver.
Under the same measurement model as in (3), we formulate the problem of finding a normbounded noise that could maximally deviate the OWLregularized regression from the groundtruth feature cluster membership vector by solving
(6)  
(7)  
(8) 
Essentially, our adversarial formulation studies the robustness of OWLregularized regression in the worstcase scenario by exploring the space of constrained measurement noise to maximize the feature group identification loss in (6). In our setting, we assume the adversary has access to the groundtruth vector so that based on (3), (8) can be written as
(9) 
Next we specify how to solve the adversarial regression formulation in (6) to (8) in the case of OSCAR with APO as its solver. Let and be the final iterates of the momentum and step size terms in FISTA as described in Section 2. Given the measurement model (3) under OSCARAPO, (9) becomes
(10) 
where is defined in (5). With the method of Lagrange multipliers, we are interested in solving the following alternative optimization problem
(11) 
where is a tunable regularization coefficient such that the solution to (11) will satisfy the norm constraint .
We note that the formulation in (11) falls into the category of LASSO problem [22] and can be efficiently solved by using optimization methods such as iterative shrinkagethresholding algorithm (ISTA). Specifically, let . Then (possibly a local optimum) can be obtained by iteratively solving
(12) 
where denotes the th iterate, is the step size, denotes the gradient of ^{1}^{1}1We use the subgradient of at points where is not differentiable., and is an entrywise function defined as
(13) 
In what follows, we explicitly derive the gradient of with respect to for OSCARAPO. For clarity, the index specifies the measurement instance, and the index specifies the covariate instance. To simplify the notation, let such that based on (5), where . Rewriting and using chain rule, we can obtain
(14) 
where is an indicator function such that if event is true; otherwise .
The detailed derivations are as follows. To obtain , we divide the analysis into three cases based on the value of :

Case I – If , then and hence .

Case II – If , then since . Therefore, We note that technically, is also a function of and hence a function of . As a result, one needs further chain rule factorization . Here we implicitly use the fact that since no matter how we permute the entries of , it is still a constant vector.

Case III – If , then . Similar to the analysis of Case II, we obtain .
Summarizing these cases, we have a simplified expression if and otherwise. Next, to obtain , based on the definition of we have . Therefore, . Finally, combining the analysis of and , we obtain the results in (3).
The pipeline of crafting a normbounded adversarial perturbation for OSCARAPO is describe in Algorithm 1. We also note that beyond OSCARAPO, it is possible to craft an adversarial noise for generic OWLregularized regression methods by treating (8) as a blackbox function, which will be considered in our future work.
4 Performance Evaluation
In this experiment, we generated a synthetic dataset of features and instances with a predefined grouping structure among the features. The features (entries in ) were generated using a standard Gaussian distribution and we modeled the response variable using (3). Given a normbounded noise strength , we call the adversarial perturbation found using our proposed approach (Algorithm 1) an “attack” on the considered regression method. For the attack parameters, we set and . In Figure 1, the xaxis represents the feature index and the yaxis represents the coefficient values. The left column represents the groundtruth with two defined feature groups. The middle column shows the feature grouping obtained after running OSCAR algorithm in the noiseless setting. The right column shows how the grouping is adversely affected after our attack. We varied the noise budget from 0.05 to 0.3 to assess the effect of the attack. One can observe that although some grouped features are retained up to a certain degree, the true effect of the attack can be seen on the features which are misaligned from their original feature groups, even for relatively small .
5 Conclusion and Future Work
To study the robustness of OWLregularized regression methods in the adversarial setting, this paper proposes a novel formulation for finding normbounded adversarial perturbations in the measurement model and illustrates the pipeline of adversarial noise generation in the case of OSCAR with APO as its solver. In the adversarial setting, the experimental results show that our proposed approach can effectively craft adversarial noises that severely degrade the regression performance in identifying groundtruth grouped features, even in the regime of small noise budgets. Our results indicate the potential risk of lacking robustness to adversarial noises in the tested regression method. One possible extension of our approach is to devise adversaryresilient regression methods. Our future work also includes developing a generic framework for generating adversarial noises for the entire family of OWLregularized regression methods.
References
 [1] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672–2680.
 [2] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” ICLR, arXiv preprint arXiv:1412.6572, 2015.
 [3] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” ICLR, arXiv preprint arXiv:1312.6199, 2014.
 [4] B. Biggio, I. Corona, D. Maiorca, B. Nelson, N. Šrndić, P. Laskov, G. Giacinto, and F. Roli, “Evasion attacks against machine learning at test time,” in Joint European conference on machine learning and knowledge discovery in databases, 2013, pp. 387–402.
 [5] B. Biggio and F. Roli, “Wild patterns: Ten years after the rise of adversarial machine learning,” arXiv preprint arXiv:1712.03141, 2017.
 [6] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial machine learning at scale,” ICLR, arXiv preprint arXiv:1611.01236, 2017.
 [7] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in IEEE Symposium on Security and Privacy (SP), 2017, pp. 39–57.
 [8] P.Y. Chen, Y. Sharma, H. Zhang, J. Yi, and C.J. Hsieh, “EAD: elasticnet attacks to deep neural networks via adversarial examples,” AAAI, arXiv preprint arXiv:1709.04114, 2018.
 [9] T.W. Weng, H. Zhang, P.Y. Chen, J. Yi, D. Su, Y. Gao, C.J. Hsieh, and L. Daniel, “Evaluating the robustness of neural networks: An extreme value theory approach,” ICLR, arXiv preprint arXiv:1801.10578, 2018.
 [10] H. Chen, H. Zhang, P.Y. Chen, J. Yi, and C.J. Hsieh, “Showandfool: Crafting adversarial examples for neural image captioning,” arXiv preprint arXiv:1712.02051, 2017.
 [11] M. Cheng, J. Yi, H. Zhang, P.Y. Chen, and C.J. Hsieh, “Seq2sick: Evaluating the robustness of sequencetosequence models with adversarial examples,” arXiv preprint arXiv:1803.01128, 2018.
 [12] P.Y. Chen, H. Zhang, Y. Sharma, J. Yi, and C.J. Hsieh, “ZOO: Zeroth order optimization based blackbox attacks to deep neural networks without training substitute models,” in ACM Workshop on Artificial Intelligence and Security, 2017, pp. 15–26.
 [13] C.C. Tu, P. Ting, P.Y. Chen, S. Liu, H. Zhang, J. Yi, C.J. Hsieh, and S.M. Cheng, “Autozoom: Autoencoderbased zeroth order optimization method for attacking blackbox neural networks,” arXiv preprint arXiv:1805.11770, 2018.
 [14] S. Liu, B. Kailkhura, P.Y. Chen, P. Ting, S. Chang, and L. Amini, “Zerothorder stochastic variance reduction for nonconvex optimization,” arXiv preprint arXiv:1805.10367, 2018.
 [15] M. Bogdan, E. van den Berg, W. Su, and E. J. Candès, Statistical estimation and testing via the ordered norm. STANFORD University, 2013.
 [16] X. Zeng and M. A. Figueiredo, “Decreasing weighted sorted regularization,” IEEE Signal Processing Letters, vol. 21, no. 10, pp. 1240–1244, 2014.
 [17] H. D. Bondell and B. J. Reich, “Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with oscar,” Biometrics, vol. 64, no. 1, pp. 115–123, 2008.
 [18] M. Figueiredo and R. Nowak, “Ordered weighted regularized regression with strongly correlated covariates: Theoretical aspects,” in Artificial Intelligence and Statistics, 2016, pp. 930–938.
 [19] X. Zeng and M. A. Figueiredo, “The ordered weighted norm: Atomic formulation, projections, and algorithms,” arXiv preprint arXiv:1409.4271, 2014.
 [20] A. Beck and M. Teboulle, “A fast iterative shrinkagethresholding algorithm for linear inverse problems,” SIAM journal on imaging sciences, vol. 2, no. 1, pp. 183–202, 2009.
 [21] X. Zeng and M. A. Figueiredo, “Solving oscar regularization problems by fast approximate proximal splitting algorithms,” Digital Signal Processing, vol. 31, pp. 124–135, 2014.
 [22] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996.