Least Angle Regression in Tangent Space and LASSO for Generalized Linear Model This work was partly supported by JSPS KAKENHI Grant Number JP18K18008 and JST CREST Grant Number JPMJCR1763.

Least Angle Regression in Tangent Space and LASSO for Generalized Linear Model1

Abstract

We propose sparse estimation methods for the generalized linear models, which run one of Least Angle Regression (LARS) and Least Absolute Shrinkage and Selection Operator (LASSO) in the tangent space of the manifold of the statistical model. Our approach is to roughly approximate the statistical model and to subsequently use exact calculations. LARS was proposed as an efficient algorithm for parameter estimation and variable selection for the normal linear model. The LARS algorithm is described in terms of Euclidean geometry with regarding the correlation as the metric of the parameter space. Since the LARS algorithm only works in Euclidean space, we transform a manifold of the statistical model into the tangent space at the origin. In the generalized linear regression, this transformation allows us to run the original LARS algorithm for the generalized linear models. The proposed methods are efficient and perform well. Real-data analysis shows that the proposed methods output similar results to that of the -regularized maximum likelihood estimation for the generalized linear models. Numerical experiments show that our methods work well and they can be better than the -regularization in generalization, parameter estimation, and model selection.
Keywords: Exponential family, Generalized linear regression, Information geometry, Sparse modelling

1 Introduction

We propose sparse estimation methods for the generalized linear models (GLM). One of the proposed methods is based on Least Angle Regression (LARS) [6] and is described in terms of information geometry. The main features of our approach are i) we use an approximation of a statistical model and do not use the statistical model itself, and ii) the proposed methods are calculated exactly, which allows us to compute the estimators efficiently. In the literature, a few extensions of LARS have been proposed which are based on information geometry/Riemannian geometry/differential geometry: for example, [9] and [4]. The existing methods take advantage of a dual structure of a model manifold, which requires computational costs. Our method utilizes a part of the dual structure and uses the original LARS algorithm in the tangent space. The proposed method enables us to compute the estimator easily. Furthermore, we show Least Absolute Shrinkage and Selection Operator (LASSO) [14] for the normal linear model is also available in the tangent space.

In this two decades, sparse modeling is extensively investigated. LASSO is a representative method and motivated many researchers in statistics, machine learning, and other fields. LASSO was proposed as an estimation and variable-selection method for the normal linear model. LASSO minimizes the regularized least square with a tuning parameter. Various generalizations have been proposed for other problems. For example, [13] and [15] treat the generalized linear regression and Gaussian graphical models, respectively. See also [8].

LARS was proposed for the same problem as LASSO. The LARS algorithm is very efficient, and it can also compute the LASSO estimator if a minor change is added. The LARS algorithm uses only correlation coefficients between the response and explanatory variables. Therefore, the algorithm is described in terms of Euclidean geometry.

Information geometry is a Riemannian-geometrical framework for statistics and other fields [2, 3, 1, 5]. In this framework, we treat a statistical model as a Riemannian manifold and take advantage of its geometrical properties for estimation, test, and other tasks. Each probability distribution is treated as a point in the manifold. For example, estimation problem for the generalized linear regression can be described in terms of the geometry. The GLM is treated as a manifold and an estimator assigns a point in the manifold to an observed data. The maximum likelihood estimator (MLE) uses a kind of projection.

Some extensions of LARS have been proposed based on the information geometry of the exponential family of distributions. [9] and [4] proposed different extensions of LARS, which take advantage of the dual structure of the model manifold. Their works are theoretically natural and can be extended to other models than the GLM [10, 11]. However, the existing methods need many iterations of approximation computation, which is inevitable for treating more complicated objects than Euclidean space. For example, [4] treated many tangent spaces each of which corresponds to an estimate while our methods use only one tangent space. [4] wrote that their “DGLARS method may be computationally more expensive than other customized techniques” for the L1-regularization method. One of our aims is to provide as an efficient method as the L1-regularization for the GLM. Note again that our approach is different from that of the existing methods. We roughly approximate the model manifold by the tangent space and use the exact computation of LARS in the tangent space. This approximation is natural from the viewpoint of information geometry. The usefulness of our idea is validated by numerical experiments. One advantage of our methods is that the methods do not require additional implementation because we can use existing packages.

In Section 2, we introduce our problem and the related works. In Section 3, we propose a sparse estimation method based on LARS. Furthermore, LASSO-type estimators are also proposed. In Section 4, we compare our methods with the -regularization for the GLM by performing numerical experiments. Section 5 is our conclusion. Lemmas are given and proved in Appendix A.

2 Problem and related method

In subsection 2.1, we formulate the problem and introduce our notation. In subsections 2.2 and 2.3, we briefly describe the LARS algorithm and the LASSO estimators, respectively.

2.1 Problem and notation

In this paper, we consider the generalized linear regression, which is an estimation problem of the GLM [12]. In the generalized linear regression, the expectation of a response is represented by a linear combination of explanatory variables as

where is called a link function, is the sample size, is the number of the explanatory variables, and is the parameter to be estimated. Let be the design matrix, which is an -matrix. Let and be the response vector and its expectation, respectively, which are column vectors of length . In general, the link function is a function of and is not determined uniquely. However, in the paper, we only use the canonical link function, which results in useful properties of the GLM.

In terms of probability distributions, the problem above corresponds to estimation for an exponential family of distributions, that is, the GLM,

where is called a potential function.

As a special case, the normal linear regression uses the link function and a quadratic function as the potential function. Another example is the logistic regression, where the link function is and the potential function is .

Through the paper, we assume that the design matrix is normalized, that is, each column vector has the mean zero and the -norm one: and for . Furthermore, we assume that column vectors of are linearly independent.

2.2 Lars

We briefly describe the LARS algorithm. In subsection 3.2, we use the LARS algorithm for proposing an estimation method. The detail and more discussions on LARS can be found in, for example, [6] and [8].

LARS was proposed as an algorithm for parameter estimation and variable selection in the normal linear regression. In the LARS algorithm, the estimator moves from the origin to the maximum likelihood estimate (MLE) of the full model. The full model means the linear model including all the explanatory variables. The MLE is determined by the design matrix and the response . The detailed algorithm of LARS is showed in Algorithm 1, where is -th estimate the algorithm outputs. After iterations, LARS outputs a sequence of the estimates .

Data: the design matrix and the response vector
Result: the sequence of the LARS estimates
Initialization:
while  do
       Calculate the correlations and the active set of the indices:
Using , define a bisector of an angle and others:
Define the next estimate as
with
where .
       Set and .
end while
Algorithm 1 The Least Angle Regression (LARS) algorithm

The idea of the LARS algorithm is showed by Figure 1. Figures 1 and 1 indicate the estimator’s move and the residual’s move, respectively, in the parameter space when . The estimator i) selects an element of the parameter which makes a least angle between and -axis, and ii) uses it as a trajectory in the form of the bisector of an angle. The LARS algorithm is described in terms of Euclidean geometry and can be computed efficiently. Furthermore, plays an important role in the LARS algorithm, which is one of our motivations for considering the tangent space of a statistical model.

(a) The move of the estimator
(b) The move of the residual
Figure 1: The LARS algorithm when there are two explanatory variables. The parameter space is . is the MLE of the full model. In this example, is selected at first iteration, . The first estimate is and its second element is zero. The second estimate is . In Figure 1, the estimator moves along the bisector of an angle from to the second estimate . Figure 1 is another interpretation of the LARS algorithm. The residual moves from to

2.3 Lasso

LASSO is an optimization problem for parameter estimation and variable selection in the normal linear regression. LASSO solves the minimization problem

where is a tuning parameter. The path of the LASSO estimator when varies can be made by the LARS algorithm with a minor modification.

LASSO can be applied to the GLM as the -regularized MLE, which is the minimization problem

(1)

For example, see [13].

3 The proposed methods

Our main idea is to run the LARS algorithm in the tangent space of the model manifold. This idea is very simple. However, it works well as Sections 3 and 4 show.

In subsection 3.1, we introduce information geometry we use in this paper. In subsection 3.2, we propose LARS in tangent space, which is an extension of the original LARS to the GLM. The proposed method is identical to the original LARS when applied to the normal linear model. Subsection 3.3 is a remark on the matrix . In subsection 3.4, we propose other methods which are related with LASSO. Subsection 3.5 explains the difference between the proposed methods and the existing methods.

3.1 Information geometry

We introduce some tools from information geometry, including model manifold, tangent space, and exponential map (Figure 2). This is a brief introduction. For details, see [2, 3, 1, 5].

In the generalized linear regression, we need to select one distribution from the GLM. A model manifold is a manifold consisting of probability distributions of interest. That is, the model manifold is , where indicates the probability distribution with the regression coefficient . The parameter works as a coordinate system in .

The tangent space at a point is a linear space consisting of directional derivatives, that is, , where . We consider the tangent space at . For simplicity, we call and , the origin and the tangent space at the origin, respectively.

Any pair of two vectors in has its inner product. The inner product is determined by the Fisher information matrix :

where is the log-likelihood. Using the Fisher metric , the inner product of and is given by

In the generalized linear regression, the Fisher metric at is proportional to the correlation matrix of the explanatory variables. That is, for some . For details, see Lemma 2 in subsection A.1.

A point in the tangent space can be identified with a point in via an exponential map. We introduce the e-exponential map defined as follows. For , let with . Our problem in this paper is estimation for the GLM and the parameter is a regression coefficient vector . Therefore, we can avoid technical difficulties of an exponential map. The map is a bijection from to . For details, see subsection A.3.

(a) The standard flatness perspective
(b) The e-connection perspective
Figure 2: A statistical manifold and the tangent space at the origin. The white surface is and the gray plane is . is curved from the standard perspective while it is flat from the e-connection perspective. A point in is corresponding to a point in through the e-exponential map. Furthermore, a curve (an e-geodesic, strictly) in corresponds to a line in . The former is a broken line and the latter is a solid line in the figure

For readers familiar with information geometry, we make an additional remark. The model manifold of the GLM is e-flat and the regression coefficient is an e-affine coordinate system of . is the natural basis of with respect to the coordinate system . Each coordinate axis of in corresponds to -axis in via the e-exponential map.

In the following, we also use another representation of . This representation is useful for our purpose: . In our notation, also indicates in the tangent space , not only a point . However, we believe that it is not confusing because a vector in the tangent space and a point in are identified through the exponential map.

3.2 LARS in tangent space

The main idea of the proposed method is to run LARS in the tangent space at the origin. First, we correspond the model manifold to the tangent space by the e-exponential map. After this mapping, our computation is done by the original LARS algorithm. However, we do not use the response directly. We introduce a virtual response . The LARS algorithm outputs a sequence of parameter estimates, whose length is the same as the dimension of the parameter. Finally, the estimates are mapped to the model manifold.

Before running the original LARS algorithm, we introduce the virtual response . The virtual response is defined using the design matrix and the MLE of the full model: . Note that LARS uses only correlation coefficients between the response and the explanatory variables in the form of , which is identical with . Therefore, introducing the appropriate representation of the response , we need only as .

In the estimation step of the proposed method, we run the original LARS algorithm in the tangent space as if the response is . LARS outputs a sequence of the model parameter. As is shown in Figure 1, the LARS estimator can be regarded as moving from the origin to the MLE of the full model. At the same time, however, the residual of the estimator is moving from the MLE to the origin (Figure 1). The latter is useful for our method because it allows us to fix the estimator’s tangent space to the origin. What moves is the residual , not the estimator . Note that Algorithm 1 in subsection 2.2 is actually described from the latter perspective.

LARS in Tangent space (TLARS)

LARS in Tangent space (TLARS) is given as follows:

  1. Calculate the MLE of the full model.

  2. Run the LARS algorithm for the design matrix and the response .

  3. Using the sequence made by LARS, the result is the sequence .

As a special cese, the proposed method coincides with the original LARS when we consider the normal linear regression. Note that TLARS is as computationally efficient as LARS although TLARS solves the estimation problem of the GLM. Furthermore, we can use existing packages of LARS for the computation of TLARS.

3.3 KL divergence and correlation

The Kullback-Leibler divergence (KL divergence) is a key quantity in information geometry, which is also important in statistics, machine learning and information theory. For the GLM, the KL divergence is given by

where is the expectation parameter of the exponential family. The KL divergence is approximated up to second order as

where .

In generalized linear regression, the Fisher metric is proportional to the correlation matrix , that is, for some . (See Appendix A.1.) The KL divergence is approximately related with the correlation matrix as

(2)

In the proposed method, we used the quantity

(3)

Eq (2) implies that the correlation (3) is interpreted as the inner product of two vectors and in and that they are approximately corresponding to a triangle in , where the square of the length is measured by the KL divergence.

3.4 LASSO in tangent space

We propose two estimation methods. One is LASSO modification of TLARS. The other is an approximation of the -regularization for the GLM (1).

LASSO in Tangent space 1 (TLASSO1)

By modifying the LARS algorithm so that the algorithm outputs the LASSO estimator [6], we can use LASSO in the tangent space . LASSO in Tangent space (TLASSO1) is formally defined as a minimization problem

(4)

which implies that we use the design matrix and the response in the ordinary LASSO. This is corresponding to the LASSO modification of TLARS.

As was shown in subsection 3.3, the correlation matrix is regarded as an approximation of the KL divergence, on which the MLE is based. TLASSO1 is also an approximation of the -regularization for the GLM.

LASSO in Tangent space 2 (TLASSO2)

Another LASSO-type method is a direct approximation of (1). TLASSO2 is defined as

(5)

where and satisfies . Since the column vectors of the design matrix are assumed to be linearly independent, uniquely exists. Problem (5) is LASSO for the normal linear regression with the design matrix and the response . TLASSO2 (5) is an approximation of (1). In fact, using and , the log-likelihood is approximated as follows (see subsection A.2):

Note that is an approximation of the MLE .

3.5 Remarks on other information-geometrical methods

We briefly compare TLARS with two existing methods which are extensions of LARS based on information geometry. One is Bisector Regression (BR) by [9] and the other is Differential-Geometric LARS (DGLARS) by [4]. Our concern here is about algorithm itself.

First, the BR algorithm is very different from TLARS. BR takes advantage of the dually flat structure of the GLM and tries to make an equiangular curve using the KL divergence. Furthermore, the BR estimator moves from the MLE of the full model to the origin while, in our method, the residual moves from to the origin.

DGLARS is also different from TLARS. It uses tangent spaces, where the equiangular vector is considered. However, the DGLARS estimator actually moves from to in . Accordingly, the tangent space at the current estimator moves, which makes us treat the tangent spaces at many points in . DGLARS treats the model manifold directly. Therefore, it requires many iterations of approximation computation for the algorithm. Note that, on the other hand, the update of the TLARS estimator is described fully in terms of only the tangent space .

4 Numerical examples

We show results of numerical examples and compare our methods with a related method. In detail, we compare four methods in the logistic regression setting: LARS in Tangent Space (TLARS), LASSO in Tangent Space (TLASSO1 and 2), and the -regularized maximum likelihood estimation for the GLM (L1).

Our methods do not require an extra implementation since the LARS algorithm has already been implemented in lars package of the software R. Using R, we only needed glm() for calculating the MLE and lars package for the proposed methods. For the computation of -regularization, we used glmnet package [7].

4.1 Real data

We applied the proposed methods and the L1 method to a real data. The data is the South Africa heart disease (SAheart) data included by ElemStatLearn package of R. This data contains nine explanatory variables of 462 samples. The response is a binary variable.

We show the results by the four methods. Figures 3 and 3 are the paths by TLARS and TLASSO1, respectively. In this example, they are the same. Figure 3 is the TLASSO2 path, and Figure 3 is the L1 path. The paths by TLARS, TLASSO1, and TLASSO2 are made by lars() function of R, and that of L1 by glmnet().

As Figure 3 shows, the four paths are very similar. The proposed methods are based only on the tangent space, not on the model manifold itself, while L1 directly takes advantage of the likelihood. These results imply that the approximation of the model does not require deterioration of result for our methods, especially, for TLARS and TLASSO1.

(a) TLARS
(b) TLASSO1


(c) TLASSO2
(d) L1
Figure 3: The resulted paths by 3: TLARS, 3: TLASSO1, 3: TLASSO2, and 3: L1. They are very similar. In this example, the paths by TLARS and TLASSO1 are the same. The paths by TLARS, TLASSO1, and TLASSO2 are made by lars() function of R, and that of L1 by glmnet().

4.2 Numerical experiments

We performed numerical experiments of logistic regression. The topic is three-fold: generalization, parameter estimation, and model selection. The result is shown in Table 1. Bold values are the best and better values.

The procedure of the experiments is as follows. We fixed the number of the parameter , the true value of the parameter , and the sample size . For each of trials, we made the design matrix by rnorm() function in R. Furthermore, we made the response based on and , that is, elements of have different Bernoulli distributions. The four methods were applied to .

For selecting one model and one estimate from a sequence of parameter estimates, we used AIC and BIC:

(6)
(7)

where is the dimension of the parameter of the model under consideration. For a sequence made by each of the four methods, let and the MLE of the model . We call (6) with AIC1, and (6) with AIC2. Similarly, (7) with is BIC1, and (7) with is BIC2.

For evaluating the generalization error of the four methods, we newly made observations in -th trial (). We computed the difference between and predictions by each of the methods. The “Generalization” columns of Table 1 show the average prediction error over trials. Smaller value is better.

The “Model selection” columns show the fraction of the trials (among trials) where the methods selected the true model. The “Seq” column indicates the fraction of the trials where each sequence of estimates included the true model. Larger value is better.

In the “Parameter estimation” columns, each value means the average of of the selected estimate . Smaller value is better.

In Table 1, we report the results of three cases. We used for all cases but case C2 where . In case A, we set and . We used for case A1 and for A2. In generalization, three methods (TLARS, TLASSO1, and L1) with AIC2 were much better than the other combinations of method and information criterion. In model selection, the four methods with BIC1 were much better regardless of the sample size. In parameter estimation, TLARS and TLASSO1 with AIC1 and BIC2 were better in the small sample setting. However, in the larger sample setting, the four methods with AIC2 were better. These tendencies were observed in other cases not reported here; For example, .

Case B is the case of and with the relation , where and are the second and third columns of the design matrix , respectively, and is distributed according to a multivariate normal distribution. We set and for cases B1 and B2, respectively. In generalization, TLARS and TLASSO1 with AIC1, BIC1, and BIC2 were better than the others in Case B1. Three methods (TLARS, TLASSO1, and L1) with AIC1 and BIC2 were better for the larger sample setting. In Case B, our interest is mainly in generalization because estimation of the true model and the parameter value are not very meaningful. However, the four methods with BIC1 were better in model selection.

In case C, we used and, as , the vector of the length 50 with ten s, ten s, and thirty s. In generalization and parameter estimation, three methods (TLARS, TLASSO1, and L1) with AIC2 were better than the others regardless of the sample size. In model selection, the four methods with BIC1 were much better than the others.

In summary, the proposed methods worked very well. Of course, the L1 method sometimes performs better than our methods. However, the proposed methods, especially TLARS and TLASSO1, are better than L1 in many situations. Furthermore, TLARS and TLASSO1 output the same results in very many trials.

Method Generalization  () Model selection Parameter estimation
AIC1 AIC2 BIC1 BIC2 Seq AIC1 AIC2 BIC1 BIC2 AIC1 AIC2 BIC1 BIC2
A1 TLARS 10.70 9.80 12.97 10.72 0.7246 0.3969 0.1838 0.4973 0.3672 168.3 178.7 195.5 167.4
TLASSO1 10.70 9.80 12.97 10.72 0.7247 0.3968 0.1838 0.4974 0.3784 168.3 178.7 195.5 167.4
TLASSO2 15.73 12.49 18.35 15.04 0.7086 0.4062 0.0662 0.4865 0.2769 249.3 171.4 310.2 232.4
L1 18.81 9.74 22.60 10.46 0.6897 0.3996 0.0301 0.4824 0.1548 315.7 183.5 404.5 169.1
A2 TLARS 4.04 3.60 5.14 3.96 0.9785 0.4955 0.1252 0.8573 0.4988 58.7 45.6 99.6 56.4
TLASSO1 4.04 3.58 5.14 3.96 0.9785 0.4955 0.1252 0.8573 0.4988 58.7 45.6 99.6 56.4
TLASSO2 4.73 3.69 5.91 4.40 0.9787 0.4959 0.0561 0.8575 0.4022 79.6 47.2 126.9 69.0
L1 8.20 3.59 10.57 4.05 0.9732 0.4968 0.0721 0.8570 0.3810 234.3 45.4 352.9 58.7
B1 TLARS 13.52 13.89 13.42 13.16 0.5500 0.1799 0.1455 0.4293 0.3369 146.7 221.6 104.9 106.5
TLASSO1 13.33 13.67 13.28 13.00 0.5643 0.1784 0.1508 0.4316 0.3474 144.0 214.2 102.8 102.1
TLASSO2 14.05 13.94 14.81 14.23 0.5666 0.1820 0.1152 0.4366 0.3257 105.0 133.8 106.2 100.0
L1 15.98 14.43 19.16 13.59 0.5560 0.1814 0.0785 0.4342 0.2671 131.4 334.1 155.9 101.4
B2 TLARS 4.95 5.20 5.16 4.95 0.5848 0.1926 0.1127 0.5402 0.4157 96.8 140.6 89.9 84.8
TLASSO1 4.95 5.20 5.16 4.94 0.5852 0.1918 0.1127 0.5400 0.4159 96.8 140.6 89.9 84.8
TLASSO2 5.00 5.31 5.29 5.05 0.5850 0.1926 0.0978 0.5400 0.4104 95.0 143.7 90.8 85.1
L1 5.90 5.10 7.62 4.90 0.5793 0.1925 0.0935 0.5367 0.3946 109.1 132.2 158.8 82.0
C1 TLARS 10.18 9.27 14.71 10.97 0.1479 0.0087 0.0009 0.0787 0.0237 373.3 324.8 751.4 435.9
TLASSO1 10.18 9.27 14.71 10.97 0.1479 0.0087 0.0009 0.0787 0.0237 373.3 324.8 751.4 435.9
TLASSO2 14.78 11.65 19.09 15.72 0.1399 0.0098 0.0000 0.0742 0.0147 811.9 537.0 1183.8 895.2
L1 13.48 9.48 19.48 12.05 0.1137 0.0088 0.0000 0.0706 0.0050 690.5 351.6 1215.4 566.4
C2 TLARS 3.98 3.36 6.22 4.17 0.773 0.014 0.000 0.486 0.077 247.4 172.1 608.9 274.2
TLASSO1 3.98 3.36 6.22 4.17 0.773 0.014 0.000 0.486 0.077 247.4 172.1 608.9 274.2
TLASSO2 4.45 3.53 6.64 4.57 0.779 0.014 0.000 0.486 0.068 311.6 190.4 687.7 329.0
L1 4.58 3.40 8.08 4.34 0.736 0.015 0.000 0.486 0.046 330.8 176.0 982.7 297.9
Table 1: The results of the numerical experiments. Generalization: the average prediction error. Model selection: the fraction of the trials where the methods selected the true model. Seq: the fraction of the trials where each sequence of estimates included the true model. Parameter estimation: the average of the squared error of the selected estimate. Bold values are the best and better values.

5 Conclusion

We proposed the sparse estimation methods as an extension of LARS for the GLM. The methods take advantage of the tangent space at the origin, which is a rough approximation of the model manifold. The proposed methods are computationally efficient because the problem is approximated by the normal linear regression. The numerical experiments showed that our idea worked well by comparison with the -regularization for the GLM. One of our future works is to evaluate TLARS theoretically. Furthermore, we will apply tools developed for LARS and LASSO to TLARS and TLASSO, for example, screening and post-selection inference.

Appendix A Lemmas and remarks

We show some lemmas and make remarks. Well-known facts are used. See [2, 3, 1, 5, 12].

As introduced in subsection 3.4, satisfies and , where is the link function and . As in subsection 3.3, is -th element of . Letting , it holds . Furthermore, we have .

a.1 Metric at tangent space and correlation between explanatory variables

We show that the Fisher metric at the tangent space is proportional to the correlation matrix of the explanatory variables (Lemma 2). To avoid confusion, in this subsection, we use for the metric in and for the metric in the tangent space at .

Lemma 1.

It holds that

Proof.

Since it is known that ,

Lemma 2.

for some .

Proof.

It is known that the metric is derived from the potential function : . Therefore, it holds

where is the derivative of . Letting and , we have . Since both and are known to be positive definite, is a positive constant. ∎

Note that is common to all and in the proof. This is why the tangent space at the origin is selected as the space where LARS runs.

a.2 Approximations of the likelihood and MLE

We approximate the log-likelihood and the MLE of the GLM. Lemma 3 implies that is an approximation of the MLE

Lemma 3.

The log-likelihood is expanded as

Proof.

Using and Lemmas 1 and 2, the potential function is expanded as follows:

At the last equal sign, we used since each column vector of is assumed to be normalized. Therefore,

a.3 e-exponential map

In Riemannian geometry, a point in a tangent space is mapped to a manifold via an exponential map. An exponential map is defined using a geodesic. A geodesic in a manifold corresponds to a straight line in Euclidean space. When we consider an exponential map, we need to introduce not only metric but also a connection. A connection determines flatness and straightness in a manifold. In Section 3, we implicitly introduced the e-connection. From the viewpoint of the e-connection, each curve of -axis is an e-geodesic in .

For a manifold and a point , an exponential map at is formally defined as follows. First, we consider the geodesic for which satisfies and . Here the parameter moves in an interval including . Note that, given a connection, the geodesic locally exists and is uniquely determined. The exponential map is and for , where .

In general, an exponential map is not necessarily easy to treat. For example, the domain of an exponential map is called a star-shaped domain and does not coincide with a whole tangent space. However, our exponential map has a useful property. The domain of is a whole and the range is a whole .

Lemma 4.

The map defined in subsection 3.1 is the e-exponential map for a manifold of the GLM. Furthermore, is a bijection from the tangent space to the manifold .

Proof.

For , the value of the map is , where . It is known that the e-geodesic satisfying and is represented as . Therefore, , which means that is the e-exponential map.

Since , the e-exponential map is defined on a whole . For , is in and , which imply that the e-exponential map is a surjection. Furthermore, if are different, because the column vectors of are assumed to be linearly independent. ∎

Footnotes

  1. thanks: This work was partly supported by JSPS KAKENHI Grant Number JP18K18008 and JST CREST Grant Number JPMJCR1763.

References

  1. S. Amari and H. Nagaoka (2000) Methods of information geometry. Translations of Mathematical Monographs, Vol. 191, Oxford University Press. Cited by: Appendix A, §1, §3.1.
  2. S. Amari (1985) Differential-geometrical methods in statistics. Lecture Notes in Statistics, Vol. 28, Springer. Cited by: Appendix A, §1, §3.1.
  3. S. Amari (2016) Information geometry and its applications. Springer. Cited by: Appendix A, §1, §3.1.
  4. L. Augugliaro, A. M. Mineo and E. C. Wit (2013) DgLARS: a differential geometric approach to sparse generalized linear models. Journal of the Royal Statistical Society, Series B 75, pp. 471–498. Cited by: §1, §1, §3.5.
  5. N. Ay, J. Jost, H. V. Le and L. Schwachhöfer (2017) Information geometry. Springer. Cited by: Appendix A, §1, §3.1.
  6. B. Efron, T. Hastie, I. Johnstone and R. Tibshirani (2004) Least angle regression. Annals of Statistics 32, pp. 407–499. Cited by: §1, §2.2, §3.4.
  7. J. Friedman, T. Hastie and R. Tibshirani (2008) Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33, pp. 1–22. Cited by: §4.
  8. T. Hastie, R. Tibshirani and J. Friedman (2009) The elements of statistical learning (2nd edition). Springer. Cited by: §1, §2.2.
  9. Y. Hirose and F. Komaki (2010) An extension of least angle regression based on the information geometry of dually flat spaces. Journal of Computational and Graphical Statistics 19, pp. 1007–1023. Cited by: §1, §1, §3.5.
  10. Y. Hirose and F. Komaki (2013) Edge selection based on the geometry of dually flat spaces for gaussian graphical models. Statistics and Computing 23, pp. 793–800. Cited by: §1.
  11. Y. Hirose and F. Komaki (2015) An estimation procedure for contingency table models based on the nested geometry. Journal of the Japan Statistical Society 45, pp. 57–75. Cited by: §1.
  12. P. McCullagh and J. A. Nelder (1989) Generalized linear models. Monographs on Statistics and Applied Probability, Vol. 37, Chapman & Hall/CRC. Cited by: Appendix A, §2.1.
  13. M. Y. Park and T. Hastie (2007) -Regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society, Series B 69, pp. 659–677. Cited by: §1, §2.3.
  14. R. Tibshirani (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, pp. 267–288. Cited by: §1.
  15. M. Yuan and Y. Lin (2007) Model selection and estimation in the gaussian graphical model. Biometrika 94, pp. 19–35. Cited by: §1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
410719
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description