Maximum Likelihood Estimation based on
Random Subspace EDA: Application to
Extrasolar Planet Detection
Abstract
This paper addresses maximum likelihood (ML) estimation based model fitting in the context of extrasolar planet detection. This problem is featured by the following properties: 1) the candidate models under consideration are highly nonlinear; 2) the likelihood surface has a huge number of peaks; 3) the parameter space ranges in size from a few to dozens of dimensions. These properties make the ML search a very challenging problem, as it lacks any analytical or gradient based searching solution to explore the parameter space. A population based searching method, called estimation of distribution algorithm (EDA), is adopted to explore the model parameter space starting from a batch of random locations. EDA is featured by its ability to reveal and utilize problem structures. This property is desirable for characterizing the detections. However, it is well recognized that EDAs can not scale well to large scale problems, as it consists of iterative random sampling and model fitting procedures, which results in the wellknown dilemma curse of dimensionality. A novel mechanism to perform EDAs in interactive random subspaces spanned by correlated variables is proposed and the hope is to alleviate the curse of dimensionality for EDAs by performing the operations of sampling and model fitting in lower dimensional subspaces. The effectiveness of the proposed algorithm is verified via both benchmark numerical studies and real data analysis.
Keywords:
estimation of distribution, extrasolar planet detection, maximum likelihood estimation, nonlinear model, optimization, random subspaceAuthors’ Instructions
1 Introduction
This paper presents an evolutionary computation based maximum likelihood (ML) estimation method for multivariate highly nonlinear time series models. This work is motivated by a challenging signal detection task, which aims to detect extrasolar planets (exoplanets) based on observations collected by astronomical instruments such as NASA’s Kepler space telescope [1]. The terminology “exoplanets” denotes planets outside our solar system. The goal of exoplanet science is to answer the scientific quest whether we are alone or whether there are other planets that might support life in the universe [2, 1, 3]. Exoplanet science has become a booming field in astrophysics since 1992 when the first detection of an exoplanet was confirmed [4]. In this paper, we focus on the radial velocity (RV) method of exoplanet detection [5, 6, 3]. This method has become one of the most productive techniques for detecting exoplanets so far.
Signal processing (SP) plays an important part in exoplanet detection, which contributes to improve the signaltonoise ratio of the observations, detect signals of potential planets and so forth. Current SP techniques fall short of meeting the fundamental requirement for the future of this field. For example, the ML based periodogram method was developed to deal with correlated noise in RV time series [7], while, it is just limited to detect one signal. Many planetary systems are found to contain more than one planet, which means the RV time series should exhibit more than one signal. The Bayesian simulation techniques have been applied to explore the parameter space of a global RV model using Markov Chain Monte Carlo (MCMC) or adaptive importance sampling methods [3, 8, 6], while such methods require very large computational overhead to guarantee a satisfactory performance in parameter estimation and signal detection.
The objective of this paper is to propose a computationally efficient method to address the problem of exoplanet detection based on ML fitting of complex RV models. This problem is featured by the following properties: 1) the candidate models under consideration are highly nonlinear; 2) the likelihood surface has a huge number of peaks; 3) the parameter space ranges in size from a few to dozens of dimensions. This problem lacks any analytical or gradient based searching solution to explore the parameter space. The proposed approach is based on an evolutionary computation method called estimation of distribution algorithm (EDA) [9, 10, 11]. A novel mechanism is proposed to perform EDAs in a series of random subspaces spanned by correlated variables. The basic idea is that, since the dimension of each subspace will be smaller than that of the full parameter space, the number of occurrences of samplestarved model fitting will decrease and then the curse of dimensionality is hoped to be alleviated. A benchmark numerical study and a real data analysis are used to demonstrate the effectiveness of the new algorithm.
2 RV Models and the ML based Model Fitting
In this section, we present the RV time series models and then introduce the ML parameter fitting problem in the context of exoplanet detection.
2.1 RV Models
A succinct introduction of the RV models is presented here. For more details, readers are referred to [6]. We use , , to denote the planet model, corresponding to the hypothesis that there is (are) planet(s) in the extrasolar system under consideration. In the 0planet model , the th element of the RV data is modeled to be Gaussian distributed as follows
(1) 
where and are its mean and variance, respectively. Here denotes constant centerofmass velocity of the star relative to earth and denotes the square root of the “stellar jitter”, which represents the random fluctuations in a star’s luminosity or fluctuations stemming from other systematic sources, e.g., starspots. The additional variance component is a calculated error of due to the observation procedure.
In the 1planet model , is modeled as follows
(2) 
where is the velocity shift caused by the presence of the planet. Such velocity shift is a family of curves parameterized by a 5dimensional vector defined as follows
(3) 
where is the “true anomaly at time ” given by
(4) 
and is called the “eccentric anomaly at time ”, which is the solution to a transcendental equation
(5) 
In the above expressions, denotes the velocity semiamplitude, the orbital period, the eccentricity , the argument of periastron , and the mean anomaly at time , . The parameters , and have the same unit as velocity; the velocity semiamplitude is usually restricted to be nonnegative to avoid identification problems; may be positive or negative. The eccentricity parameter , , is unitless, with corresponding to a circular orbit, and larger means more eccentric orbits. Periastron is the point at which the planet is closest to the star and the argument of periastron measures the angle at which we observe the elliptical orbit. The mean anomaly is an angular distance of a planet from periastron.
More generally, a planet model () represents the expected velocity by , in which the overall velocity shift takes the form of the summation of velocity shifts of each individual planet, i.e.,
(6) 
Therefore the parameter dimension of a planet model is . So the more planets covered by the model, the higher dimensional it is. Moreover, the RV models are highly nonlinear due to the velocity shift item included in these models.
2.2 ML Parameter Estimation of RV Models
Here we treat exoplanet detection as a problem of model selection, which means, given a set of RV observations , how to select from the candidate models the one that fits the data best in terms of Bayesian criterion. A full Bayesian solution needs to calculate the marginal likelihood of each candidate model, which involves large scale stochastic integrations over the whole parameter space of the candidate models [3, 6]. It is computationally expensive to solve such stochastic integrations. Here we resort to the Bayesian information criterion (BIC) to evaluate the fitness of the candidate models to the data. A Bayesian argument for adopting BIC was presented in [12]. We use and to denote the parameters of and the corresponding value space, respectively. The BIC metric of is defined as
(7) 
where is the number of free parameters to be estimated for , is the maximal likelihood function value associated with , i.e., , where are the parameter values that maximize the likelihood function, namely
(8) 
Given a finite set of models, the model with the lowest BIC value is preferred, according to the BIC criterion. Since the second item on the righthand side of Eqn. (7) is a constant given the model, calculation of then translates to how to solve the maximization problem defined in Eqn. (8).
3 General EDA Procedure
To address an optimization problem such as that defined in Eqn. (8), a general EDA procedure typically works with a population of candidate solutions defined over the full parameter space. The initial population is generated according to the uniform distribution over all admissible solutions. The fitness function gives a numerical ranking for each solution. Here the likelihood function plays the role of a fitness function. A subset of the most promising solutions is selected by the selection operator. A commonly used selection operator selects a certain proportion, e.g., the best 50% of solutions. A probabilistic model is then constructed to estimate the probability distribution of the selected solutions. Given the above model, the algorithm generates new solutions by sampling the distribution defined by the model. The new population of solutions replaces the old population and then the modeling and sampling procedure is repeated until some termination criteria are met. The main scheme for an iteration of the EDA method is summarized as follows: starting from a population of solutions ,

Select a population of promising solutions from ;

Build a probabilistic model from ;

Sample new candidate solutions based on ;

Replace the old population with the new population, namely set to be .
For more details about the EDA algorithm, readers are referred to [11]. As an iterative sampling and modeling procedure, the general EDA method suffers from the wellknown dilemma curse of dimensionality. Specifically speaking, with the increase in the dimension of the solution space, the volume of the space increases so fast that the available data points used for constructing the model become sparse, making the resulting model not qualified for guiding the searching process to find better solutions.
4 Random Subspace EDA (RSEDA)
In this section, we propose a new EDA algorithm, namely RSEDA. The idea is to partition the original multivariate parameter space into a series of random subspaces, and then perform EDAs in these subspaces, rather than the full parameter space, with the hope to alleviate the curse of dimensionality for EDA type methods. Fig.1 shows one iteration of the RSEDA method, wherein denotes the dimension of . The iteration ends when the estimate of the global optimal solution keeps unchanged for a fixed number of iterations. Taking the maximization problem as an instance, we describe in what follows the details of the four operators included in Fig.1.
4.1 The 1st Step: Estimation of Variable Correlations
This step corresponds to the leftmost box in Fig.1. The purpose is to provide a rough estimate on correlations among the variables (or dimensions) of . Such estimation is made based on a pool of promising solutions (PPS), given by the previous one iteration of the RSEDA. If it is currently in the first iteration, we initialize the PPS based on available prior knowledge on the variable correlations. If there is no such prior knowledge, we just draw a number of random samples uniformly from the solution space and then initialize the PPS via an operation of truncation selection. Specifically, we preserve the fittest 20% of the sample for use and throw away the others. The data in PPS is formulated by an matrix, where denotes the number of samples in PPS. The operation of correlation estimation returns the sample linear partial correlation coefficients between pairs of variables in PPS, controlling for the remaining variables in PPS. We use MATLAB’s builtin function “partialcorr” to perform the above operation. The output of this function is a symmetric matrix , whose th entry is the sample linear partial correlation corresponding to the th and th columns of the PPS matrix.
4.2 The 2nd Step: Random Subspace Partition
This step corresponds to the second box in Fig.1. The purpose is to partition the whole parameter space into subspaces based on the correlation matrix given by the 1st step. This 2nd step consists of a series of procedures performed on each row of . Take the procedures corresponding to , the th row of , for example. We first sort the elements of , namely , according to their values from large to small. This procedure outputs , where is a rearrangement of with as long as . Let and then set for each in . Now we have . Then we set for , draw a random number from a uniform distribution between 0 and 1, find the minimum index from that satisfies , and finally return the indexes , which we use to constitute the coordinates of the th subspace. After traversing each row of , we build up a series of overlapped random subspaces.
The basic idea underlying the above operations is that the more correlation between a pair of variables, the more likely they will be incorporated into the same lower dimensional subspace. Employing the random mechanism presented above, we do not need to introduce any free parameter, such as a threshold, to cluster the variables into different subspaces.
4.3 The 3rd Step: Performing EDAs in Subspaces
This step corresponds to the elliptical box in Fig.1. Now we focus on a specific subspace and present the EDA operations endowed with it. Assume that the current estimate of the global optimal solution takes value at and the subspace under consideration is associated with variables . We use a Gaussian model to fit the PPS data mapped to this subspace. Specifically, we calculate the empirical mean and sample covariance of this Gaussian model based on the PPS data mapped to the dimensions . Then we draw new samples from this Gaussian distribution, where is a constant prescribed beforehand. As the dimensions of all the exoplanet models of our concern are less than 100, we select in our experiments, which is big enough to prevent from samplestarved model fitting in the followup EDA operations. We assign these newly generated samples’ values to the corresponding variables of and keep the other variables unchanged. Then we get a set of full dimensional samples. We calculate the fitness values of these samples, based on which we select 20% fittest samples for use in updating the Gaussian model. During the above process, once better solutions are found, we shall update the estimate of the global optimal solution and the PPS accordingly to guarantee that the PPS maintains the fittest samples that have been found so far. Then we iterate the model fitting, sampling and selection procedure until the estimate of the global optimal solution keeps unchanged in the most recent continuous five iterations. We regard this phenomenon as an indication of algorithm convergence.
4.4 The 4th Step: Updating the estimate of the global optimum and the PPS
This step occupies the rightmost box in Fig.1 for ease of presentation, while, in practice, it is totally interactive with the 3rd step presented above. Once better solutions have been found in an EDA procedure included in the 3rd step, the estimate of the global optimal solution and the PPS will be updated accordingly. In this way, the PPS always keeps a set of the fittest solutions. The EDA procedure of the next subspace will be performed based on the most recent estimate of the global optimal solution and the most recently updated PPS. After carrying out the EDA procedures of all subspaces, the finally outputted PPS will then act as the input for the 1st step in the next iteration.
5 Numerical Study with Benchmark Function
To test the potential of our idea and the ability of our algorithm to improve the scalability of EDAs to search a nearoptimal solution in largescale problem settings, we tested it based on a benchmark function listed in the suite of benchmark test functions released by a special session on realparameter optimization of the 2013 Congress on Evolutionary Computation [13]. We selected the Rotated Schaffers F7 (RSF7) function, which is multimodal, nonseparable, asymmetrical, and has a huge number of the local optimum. It is defined as follows [13]
(9) 
where for , . In the above expressions, is the shifted global optimum, denotes a dimensional diagonal matrix, the th diagonal element of which is , . is an operator that transforms to be for , . and are orthogonal matrices whose entries are standard normally distributed. More details about this function can be found in [13].
The RSF7 function is designed for minimization tasks. We consider maximizing the function in order to mimic the ML estimation problem. The maximum function value of is 800, which is the target for the algorithm to search. The first instance we considered was a low dimensional case, in which was set at 5. We initialized the PPS of RSEDA using random samples and we allowed a fixed budget of function evaluations. We compared RSEDA with other three leading evolutionary computation methods, namely the Trelea type vectorized Particle Swarm Optimization (PSO) algorithm [14], the Covariance Matrix Adaptation Evolution Strategy (CMAES) method [15] and a classical EDA method termed EMNA [16]. The difference between EMNA and our method is that the former always employs a fulldimensional multivariate Gaussian distribution model, while the latter performs EDA operations in subspaces. The population size of EMNA is set at . The population size of the PSO is set at . The CMAES method of [15] was included in our comparison because it is developed to handle high dimensional problems and it currently represents the goldstandard for comparisons in new EDA research. We used the Matlab implementation available from the authors with the diagonal option, default parameter settings and random initialization [17]. We ran all these baseline methods with a maximum number of, i.e., , function evaluations, the same as for the RSEDA method. We ran each method for 50 times and calculated the mean of its estimate on the maximal function value. Fig. 2 gives a visual summary of the results obtained in comparison. It is shown that the proposed RSEDA method finds the global optimum with a slower convergence speed than PSO and CMAES and a faster convergence speed than EDA.
We then focused on a higher dimensional instance with set at 20. Compared with the first instance, the population size of each method involved increased by 10 times. The maximum function evaluations allowed for each method is . Fig. 3 shows the summary of the results corresponding to 50 independent runs of each method. The result of the EDA method totally diverged, so it is not included in Fig. 3. We see that the RSEDA beats all the other methods. Combining the two instances for a joint analysis, we see a potential of the proposed idea of random subspace in improving the scalability of EDAs in dealing with higher dimensional problems. However, it is worthy of further investigation.
6 Analysis of Real Exoplanet Data
We applied the proposed RSEDA method to analyze a real data set released in [18]. This data set was claimed by the astronomers to have two planets [18]. An adaptive annealed importance sampling (AAIS) method was developed in [6], which calculates out the probabilities of the hypothetical models based on the given RV measurements.
We fit the data based on and , respectively. For each model, we apply RSEDA to estimate the ML model parameters. The search space is constrained by the value ranges of the model parameters as listed in Table 1. These ranges are provided by the astronomers based on their experiences [6]. Fig. 4 shows the fitting results based on and , respectively. The calculated logarithm of MLs and BIC metrics associated with and are listed in Table 2. Both the visual fitting result and the quantitative comparison of the BIC metrics suggest that holds. This result is consistent with that reported in [18] and [6].
day  years  
m/s  m/s  
m/s  m/s  
m/s  m/s 
BIC  

148.4024  320.6132  
111.5458  263.9060 
7 Concluding remarks
In this paper, we proposed an RSEDA algorithm in the context of exoplanet detection based on ML estimation. The most important feature of the RSEDA method lies in the proposed operation of constructing random subspaces and then endowing the routine sampling and model fitting operations of EDAs into the lower dimensional parameter spaces, with the hope to alleviate the curse of dimensionality for EDA type methods. The effectiveness of the proposed method was demonstrated by numerical studies and real data analysis. The results show a potential of the proposed technique in dealing with complex nonlinear models with multimodal likelihood functions and in solving high dimensional optimization tasks. The scalability of the proposed method and new approaches for constructing random subspaces are both worthy of further investigations.
Acknowledgement
This work was partly supported by the National Natural Science Foundation (NSF) of China under grant No. 61571238, China Postdoctoral Science Foundation under grant Nos. 2015M580455 and 2016T90483, the Six Talents Peak Foundation of Jiangsu Province under grant No. XYDXXJSCXTD006 and the Scientific and Technological Support Project (Society) of Jiangsu Province under grant No. BE2016776.
References
 [1] J. J. Lissauer, R. I. Dawson, and S. Tremaine, “Advances in exoplanet science from Kepler,” Nature, vol. 513, no. 7518, pp. 336–344, 2014.
 [2] W. J. Borucki, D. Koch, G. Basri, N. Batalha, T. Brown, D. Caldwell, J. Caldwell, J. ChristensenDalsgaard, W. D. Cochran, E. DeVore et al., “Kepler planetdetection mission: introduction and first results,” Science, vol. 327, no. 5968, pp. 977–980, 2010.
 [3] T. J. Loredo, J. O. Berger, D. F. Chernoff, M. A. Clyde, and B. Liu, “Bayesian methods for analysis and adaptive scheduling of exoplanet observations,” Statistical Methodology, vol. 9, no. 1, pp. 101–114, 2012.
 [4] A. Wolszczan and D. A. Frail, “A planetary system around the millisecond pulsar psr 1257+ 12,” Nature, vol. 355, no. 6356, pp. 145–147, 1992.
 [5] M. Desort, A.M. Lagrange, F. Galland, S. Udry, and M. Mayor, “Search for exoplanets with the radialvelocity technique: quantitative diagnostics of stellar activity,” Astronomy & Astrophysics, vol. 473, no. 3, pp. 983–993, 2007.
 [6] B. Liu, “Adaptive annealed importance sampling for multimodal posterior exploration and model selection with application to extrasolar planet detection,” The Astrophysical Journal Supplement Series, vol. 213, no. 14, pp. 1–16, 2014.
 [7] R. V. Baluev, “Planetpack: A radialvelocity timeseries analysis tool facilitating exoplanets detection, characterization, and dynamical simulations,” Astronomy and Computing, vol. 2, pp. 18–26, 2013.
 [8] B. J. Brewer and C. P. Donovan, “Fast Bayesian inference for exoplanet discovery in radial velocity data,” Monthly Notices of the Royal Astronomical Society, vol. 448, no. 4, pp. 3206–3214, 2015.
 [9] Q. Zhang and H. Muhlenbein, “On the convergence of a class of estimation of distribution algorithms,” IEEE Trans. on Evolutionary Computation, vol. 8, no. 2, pp. 127–136, 2004.
 [10] M. Pelikan, D. E. Goldberg, and F. G. Lobo, “A survey of optimization by building and using probabilistic models,” Computational Optimization and Applications, vol. 21, no. 1, pp. 5–20, 2002.
 [11] M. Hauschild and M. Pelikan, “An introduction and survey of estimation of distribution algorithms,” Swarm and Evolutionary Computation, vol. 1, no. 3, pp. 111–128, 2011.
 [12] G. Schwarz, “Estimating the dimension of a model,” The Annals of Statistics, vol. 6, no. 2, pp. 461–464, 1978.
 [13] J. Liang, B. Qu, P. Suganthan, and A. G. HernándezDíaz, “Problem definitions and evaluation criteria for the CEC 2013 special session on realparameter optimization,” Technical Report, Computational Intelligence Lab, Zhengzhou University, Zhengzhou, China and Nanyang Technological University, Singapore, 2013.
 [14] I. C. Trelea, “The particle swarm optimization algorithm: convergence analysis and parameter selection,” Information Processing Letters, vol. 85, no. 6, pp. 317–325, 2003.
 [15] R. Ros and N. Hansen, “A simple modification in CMAES achieving linear time and space complexity,” in International Conference on Parallel Problem Solving from Nature. Springer, 2008, pp. 296–305.
 [16] P. Larranaga and J. A. Lozano, Estimation of distribution algorithms: A new tool for evolutionary computation. Kluwer Academic Publishers, 2002.
 [17] N. Hansen, “CMAES source code,” https://www.lri.fr/~hansen/cmaes_inmatlab.html.
 [18] C. G. Tinney, R. P. Butler, G. W. Marcy, H. R. Jones, G. Laughlin, B. D. Carter, J. A. Bailey, and S. O¡¯Toole, “The 2:1 resonant exoplanetary system orbiting HD73526,” The Astrophysical Journal, vol. 647, no. 1, pp. 594–599, 2006.