SGDLibrary: A MATLAB library for
stochastic gradient descent algorithms
Abstract
We consider the problem of finding the minimizer of a function of the form . This problem has been studied intensively in recent years in machine learning research field. One typical but promising approach for largescale data is stochastic optimization algorithm. SGDLibrary is a flexible, extensible and efficient pureMatlab library of a collection of stochastic optimization algorithms. The purpose of the library is to provide researchers and implementers a comprehensive evaluation environment of those algorithms on various machine learning problems.
1 Introduction
This work aims to facilitate research on stochastic optimization for largescale data. We particularly address a regularized empirical loss minimization problem defined as
(1) 
where represents the model parameter and denotes the number of samples. is the loss function and is the regularizer with the regularization parameter . Widely diverse machine learning models fall into this problem. Considering , , and , this results in norm regularized linear regression problem (a.k.a. ridge regression) for training samples . In case of the binary classification problem with the desired class label and , norm regularized logistic regression (LR) problem is obtained as , which encourages the sparsity of the solution of . Other problems are matrix completion, support vector machines (SVM), and sparse principal components analysis, to name but a few.
Full gradient decent (GD) with a step size is the most straightforward approach for (1) as where the update reduces to . However, this is expensive especially when is extremely large. In fact, one needs a sum of calculations of the inner product of dimensional vectors, leading to cost overall per iteration. For this issue, a popular and effective alternative is stochastic gradient descent update as where is a random vector. A popular choice for this is to set as for th sample uniformly at random, which is called stochastic gradient descent (SGD). Its update rule is . Actually, SGD assumes an unbiased estimator of the full gradient as . As the update rule clearly represents, the calculation cost is independent of , resulting in . Minibatch SGD uses , where is the set of samples of size . Also, SGD needs a diminishing stepsize algorithm to guarantee the convergence, which causes a severe slow convergence rate. To accelerate this rate, we have two active research directions in machine learning; Variance reduction (VR) techniques [Johnson_NIPS_2013_s, Roux_NIPS_2012_s, Shalev_JMLR_2013_s, Defazio_NIPS_2014_s, Nguyen_ICML_2017] explicitly or implicitly exploit a full gradient estimation to reduce the variance of noisy stochastic gradient, leading to superior convergence properties. We can regard this approach as a hybrid algorithm of GD and SGD. Another promising direction is to modify deterministic secondorder algorithms into stochastic settings, and solves the potential problem of firstorder algorithms for illconditioned problems. A direct extension of quasiNewton (QN) is known as online BFGS [Schraudolph_AISTATS_2007_s]. Its variants include regularized version (RES) [Mokhtari_IEEETranSigPro_2014], limited memory version (oLBFGS) [Schraudolph_AISTATS_2007_s, Mokhtari_JMLR_2015_s], stochastic QN (SQN) [Byrd_SIOPT_2016], incremental QN [Mokhtari_ICASSP_2017], and nonconvex version [Wang_SIOPT_2017]. Lastly, hybrid algorithms of the stochastic QN algorithm with VR are proposed [Moritz_AISTATS_2016_s, Kolte_OPT_2015]. Others include [Duchi_JMLR_2011, Bordes_JMLR_2009, De_AISTATS_2017].
The performance of stochastic optimization algorithms is strongly influenced not only by the distribution of data but also by the stepsize algorithm. Therefore, we often encounter results that are completely different from those in papers in every experiment. Consequently, an evaluation framework to test and compare the algorithms at hand is crucially important for fair and comprehensive experiments. SGDLibrary is a flexible, extensible and efficient pureMatlab library of a collection of stochastic optimization algorithms. The purpose of the library is to provide researchers and implementers a collection of stateofthearts stochastic optimization algorithms that solve a variety of largescale optimization problems. SGDLibrary provides easy access to many solvers and problem examples including, for example, linear/nonlinear regression problems and classification problems. This also provides some visualization tools to show classification results and convergence behaviors. To the best of my knowledge, no report of the literature describes a comprehensive experimental environment specialized for stochastic optimization algorithms. The code is available in https://github.com/hiroyukikasai/SGDLibrary.
2 Software architecture
The software architecture of SGDLibrary follows a typical modulebased architecture, which separates problem descriptor and optimization solver. To use the library, the user selects one problem descriptor of interest and no less than one optimization solvers to be compared.
Problem descriptor:
The problem descriptor, denoted as problem
, specifies the problem of interest with respect to , noted as w
in the library. The user does nothing other than calling a problem definition function, for instance, logistic_regression()
for norm regularized LR problem. Each problem definition includes the functions necessary for solvers;
(i) (full) cost function ,
(ii)
minibatch stochastic derivative for the set of samples , which is denoted as indices
.
(iii)
stochastic Hessian for indices
,
and
(iv)
stochastic Hessianvector product for a vector v
and indices
.
The buildin problems include norm regularized multidimensional linear regression, norm regularized linear SVM, norm regularized LR, norm regularized softmax classification (multinomial LR), norm multidimensional linear regression, and norm LR. The problem descriptor provides additional specific functions that are necessary for the problem. For example, the LR problem includes the prediction and the classification accuracy calculation function.
Optimization solver:
The optimization solver implements the main routine of the stochastic optimization algorithm. Once the optimization solver function is called with one selected problem descriptor problem
as the first argument, it solves the optimization problem by calling the corresponding functions via problem
, such as the cost function and the stochastic gradient calculation function. Calling the solver function with the selected problem mutually binds the two of them. The supported optimization solvers in the library are listed up based on the categorized groups as;
(i) SGD methods: Vanila SGD [Robbins_MathStat_1951],SGD with classical momentum, SGD with classical momentum with Nesterov’s accelerated gradient [Sutskever_ICML_2013], AdaGrad [Duchi_JMLR_2011], RMSProp, AdaDelta, Adam, AdaMax
(ii) Variance reduction (VR) methods: SVRG [Johnson_NIPS_2013_s], SAG [Roux_NIPS_2012_s], SAGA [Defazio_NIPS_2014_s],SARAH[Nguyen_ICML_2017],
(iii)
Secondorder methods: SQN [Bordes_JMLR_2009], oBFGSInf [Schraudolph_AISTATS_2007_s, Mokhtari_JMLR_2015_s],
oBFGSLim (oLBFGS) [Schraudolph_AISTATS_2007_s, Mokhtari_JMLR_2015_s],
RegoBFGSInf (RES) [Mokhtari_IEEETranSigPro_2014],
DampoBFGSInf [Wang_SIOPT_2017],
(iv)
Secondorder method with VR: SVRGLBFGS [Kolte_OPT_2015], SSSVRG [Kolte_OPT_2015], SVRGSQN [Moritz_AISTATS_2016_s],
(v)
Else: BBSGD [De_AISTATS_2017], SVRGBB.
The solver also receives optional parameters as the second argument, which forms a struct, designated as options
in the library. It contains elements such as the maximum number of epochs, the batch size, and the stepsize algorithm with an initial stepsize. Finally, the solver returns to the caller the final solution w
and rich statistic information, which are, for instance, the histories of the cost function value, optimality gap, and the number of gradient calculations.
Others:
SGDLibrary accommodates a userdefined stepsize algorithm. This accommodation is achieved by setting as options.stepsizefun=@my_stepsize_alg
, which is delivered to solvers. Additionally, when the regularizer in the minimization problem (1) is a nonsmooth regularizer such as norm regularizer , the solver calls the proximal operator as problem.prox(w,stepsize)
, which is the wrapper function defined in each problem. norm regularized LR problem, for example, calls softthreshold function as w = prox(w,stepsize)=soft_thresh(w,stepsize*lambda)
, where stepsize
is the stepsize and lambda
is the regularization parameter .
3 Tour of the SGDLibrary
We embark on a tour of SGDLibrary exemplifying norm regularized LR problem. The LR model generates pairs of for a (unknown) model parameter , where is an input dimensional feature vector and is the binary class label, as Then, the problem seeks the unknown parameter that fits the regularized LR model to the generated data . This problem is cast as a minimization problem as The code for this particular problem is in Listing 1.
First, we generate train/test datasets d
using logistic_regression_data_generator()
, where the input feature vector is with and . is its class label. The LR problem is defined properly by calling logistic_regression()
, which internally contains the functions for cost value, the gradient and the Hessian. This is stored in problem
. Then, we execute optimization solvers, i.e., SGD and SVRG, by calling solver functions, i.e., sgd()
and svrg()
with problem
and options
after setting some options into the options
struct. They return the final solutions of {w_sgd,w_svrg}
and the statistics information {info_sgd,info_svrg}
. Finally, display_graph()
visualizes the behavior of the cost function values in terms of the number of gradient evaluations. It is noteworthy that each algorithm requires a different number of evaluations of samples in each epoch. Therefore, it is common to use this value to evaluate the algorithms instead of the number of iterations. An illustrative result is presented in Figure 1(a). Figures 1(b) and 1(c) are also generated, respectively, by the same display_graph()
and another function display_classification_result()
specialized for the classification problems. Consequently, SGDLibrary provides other rich visualization tools.
Appendix A Supported stochastic optimization solvers
Table A summarizes the supported stochastic optimization algorithms and configurations.
algorithm name  solver  sub_mode 
other options

Reference 

SGD  sgd.m 
[Robbins_MathStat_1951]  
SGDCM  sgd_cm.m 
CM 

SGDCMNAG  sgd_cm.m 
CMNAG 
[Sutskever_ICML_2013]  
AdaGrad  adagrad.m 
AdaGrad 
[Duchi_JMLR_2011]  
RMSProp  adagrad.m 
RMSProp 
[Tieleman_2012]  
AdaDelta  adagrad.m 
AdaDelta 
[Zeiler_arXiv_2012]  
Adam  adam.m 
Adam 
[Kingma_ICLR_2015]  
AdaMax  adam.m 
AdaMax 
[Kingma_ICLR_2015]  
SVRG  svrg.m 
[Johnson_NIPS_2013_s]  
SAG  sag.m 
SAG 
[Roux_NIPS_2012_s]  
SAGA  sag.m 
SAGA 
[Defazio_NIPS_2014_s]  
SARAH  sarah.m 
[Nguyen_ICML_2017]  
SARAHPlus  sarah.m 
Plus 
[Nguyen_ICML_2017]  
SQN  slbfgs.m 
SQN 
[Byrd_SIOPT_2016]  
oBFGSInf  obfgs.m 
Infmem 
[Schraudolph_AISTATS_2007_s]  
oLBFGSLim  obfgs.m 
Limmem 
[Schraudolph_AISTATS_2007_s]  
[Mokhtari_JMLR_2015_s]  
RegoBFGSInf  obfgs.m 
Infmem 
delta

[Mokhtari_IEEETranSigPro_2014] 
DampoBFGSInf  obfgs.m 
Infmem 
delta & 
[Wang_SIOPT_2017] 
damped=true 

IQN  iqn.m 
[Mokhtari_ICASSP_2017]  
SVRGSQN  slbfgs.m 
SVRGSQN 
[Moritz_AISTATS_2016_s]  
SVRGLBFGS  slbfgs.m 
SVRGLBFGS 
[Kolte_OPT_2015]  
SSSVRG  subsamp 
[Kolte_OPT_2015]  
_svrg.m 

BBSGD  bb_sgd.m 
[De_AISTATS_2017]  
SVRGBB  svrg_bb.m 
[Tan_NIPS_2016] 