Smallest Eigenvalue of Large Hankel Matrices at Critical Point: Comparing Conjecture With Parallelised Computation
Abstract
We propose a novel parallel numerical algorithm for calculating the smallest eigenvalues of highly illconditioned matrices. It is based on the LDLT decomposition and involves finding a submatrix of the inverse of the original Hankel matrix . The computation involves extremely high precision arithmetic, message passing interface, and shared memory parallelisation. We demonstrate that this approach achieves good scalability on a high performance computing cluster (HPCC) which constitute a major improvement of the earlier approaches. We use this method to study a family of Hankel matrices generated by the weight supported on and Such weight generates Hankel determinant, a fundamental object in random matrix theory. In the situation where the smallest eigenvalue tend to 0, exponentially fast as gets large. If the situation where the classical moment problem is indeterminate, the smallest eigenvalue is bounded from below by a positive number for all , including infinity. If it is conjectured that the smallest eigenvalue tends to 0 algebraically, with a precise exponent. The algorithm run on the HPCC producing fantastic match between the theoretical value of and the numerical result.
Keywords: Random matrix, Parallel eigensolver, Extremely illconditioned Hankel matrices, Smallest eigenvalue
MSC: 15B57, 34E05, 42C05, 65F15, 65Y05
1 Background and motivation
Hankel matrices has entries that are moments [1, 12] of probabilities measures or weight functions. And plays an important role in the theory of Hermitian random matrices [14]. The study of the largest and smallest eigenvalues are important since they provide useful information about the nature of the Hankel matrix generated by a given density, e.g. they are related with the inversion of Hankel matrices, where the condition numbers are enormously large.
Given the moment sequence of a weight function with infinite support ,
(1.1) 
the Hankel matrices, it is known that
are positive definite [10].
Let be the smallest eigenvalue of . The asymptotic behavior of for large has broad interest, see e.g. [6, 3, 9, 4, 7, 15, 16, 17, 18, 19]. The authors in [2, 13] have studied the behavior of the condition number , where denotes the largest eigenvalue of .
Szegö[15] investigated the asymptotic behavior of for the Hermite weight and the classical Laguerre weight . He found^{1}^{1}1In all of this paper, means =1.
where are certain constants, satisfying . Moreover, in the same paper, it was showed that the largest eigenvalue refers to the Hankel matrices , and were approximated by , and respectively. Widom and Wilf [17] studied the situation that the density is supported on a compact interval , such that the Szegö condition
holds, they found that
Chen and Lawrence [6] found the asymptotic behavior of with the weight function . Critical point at . It marks the transition point at which the moment problem becomes indeterminate. Theoretical challenges to finding the asymptotic behavior at critical point. Berg, Chen and Ismail [3] proved that the moment sequence (1.1) is determinate iff as . This is a new criteria for the determinacy of the Hamburger moment problem. Chen and Lubinsky [7] found the behavior of when . Berg and Szwarc [4] proved that has exponential decay to zero for any measure which has compact support. Recently, Zhu et al. [19] studied the Jacobi weight, and derived a approximation formula of ,
which reduces to Sezgö’s result [15] if .
This paper is concerned with a numerical computation that is motivated by random matrix theory. We are interested in finding the numerical value of the smallest eigenvalue of a class of Hankel matrices.
2 Mathematical statement of the problem
We consider the weight function
(2.1) 
and the moments, based on (1.1), are given by
(2.2) 
In this case, the entries of Hankel matrix
(2.3) 
increase factorially along the diagonal. In this paper, we study the smallest eigenvalue at the critical point .
2.1 Condition number
Let be the positive eigenvalues of . In order to highlight the numerical challenge, we would like to estimate the condition number of the problem
where is the largest eigenvalue and is the smallest. We observe that the largest eigenvalues is bounded by the trace of the matrix
On the other hand it must be larger or equal than the largest entry along the diagonal, i.e.
For sufficiently large , we note
therefore,
For the case of , the smallest eigenvalues are presented in the Table LABEL:fig:table2. They are the order of and slowly decreasing with . Therefore, we estimate the condition number (assuming )
Notice that it grows factorially with increasing , as a result, we say that these matrices are extremely illconditioned. This property makes the numerical computation particularly challenging.
The condition number of a problem quantifies the sensitivity of the output value to small changes in the input. For example, in our case, a finite precision of numerical representation of the Hankel matrix introduces a (hopefully small) change of the computed eigenvalue. We want this change to be less than . However, an astronomically large condition number makes it hard, and the initial Hankel matrix must be initialized with extreme amount of precision for example needs digits of precision. Subsequently the intermediate arithmetic operations must be performed with even higher precision in order for the rounding error not to introduce large errors in the output. This extreme precision requirements demand significant computing resources, and presents novel challenges and trade offs.
We would like to stress that the algorithm we use in this paper was chosen to be as numerically stable as possible (see comparison between Secant, Householder, Jacobi, Lanczos in [9] and the Section 5.2 on page 5.2), and the numerical challenges are intrinsic to the problem and are not due to instabilities of the algorithms.
3 Properties of Hankel matrices
The Hankel matrices we defined in section 2 have many interesting properties, here we would like to describe those properties that are used in this paper. We notice that the matrix in eq. (2.3) is explicitly symmetric. If is a polynomial of degree with real coefficients , , namely,
then the quadratic form
is positive definite.
The authors in [6] have studied the small eigenvalues with respect to the weight function (2.1) and derived the asymptotic expression for smallest eigenvalue for . They have found that the smallest eigenvalue decreases exponentially with increasing . On the other hand, the situation for , the weight (2.1) is Stieltjes indeterminate^{§}^{§}§There are infinitely many measures support on with the moments (2.2)., they have found that the lower bound of the smallest eigenvalue is greater than a positive number for any .
At the critical point , the asymptotic behavior changes abruptly, and we have a phase transition. Furthermore, the authors of [6] argued that under certain assumptions the asymptotic expression for the smallest eigenvalue is
(3.1) 
for large . Some of those assumptions could not be proven, therefore, we treat eq. (3.1) as a conjecture. In the following subsections 3.1 and 3.2 we summarise main steps of [6] performed to find eq. (3.1).
3.1 Polynomials
The main heroes in the derivation of eq. (3.1) are , the orthonormal polynomials associated with
where denotes the Kronecker’s delta, i.e., if and for .
Using the ChristoffelDarboux formula [10] and the result presented in [5] for large offdiagonal recurrence coefficients, it was found that
(3.3) 
Finally, the authors of [6] used the asymptotic expression for for large
(3.4) 
to evaluate eq. (3.3) for large . We perform the integration in the following section.
3.2 Saddle point approximation
In this section we evaluate the integral in eq. (3.3) using the Laplace method, see [8]. We start by substituting the asymptotic expressions eq. (3.4) into eq. (3.3). This leads to an integral with an integrand that has a maximum at and falls off to towards
(3.5) 
Furthermore, the width of the peak decreases as increases, as a result, we can use the Laplace method to evaluate the integral for large . We recast the integrand as and expand in Taylor series
(3.6) 
Expanding around we find that and vanish. Moreover,
for large , therefore, we can treat and higher order terms as small perturbation^{¶}^{¶}¶Similarly and for and expand the exponential
(3.7) 
where
Expanding around we obtain
Taking only the first term in the above equations and neglecting all corrections to the saddle point approximation in eq. (3.7) we obtain
which together with eq. (3.2) gives^{∥}^{∥}∥Authors of this publication have noticed that eq. (3.9) does not exactly match the eq. (3.1) which is quoted from [6], we suspect there was a typo in the earlier publication.
(3.9) 
Subsequently including the nexttoleading order correction in the saddle point, i.e. including the term, see eq. (3.7), we obtain
(3.10)  
(3.11)  
(3.12)  
(3.13) 
The transformations from eq. (3.10) to eq. (3.13) neglect all nexttonexttoleading order terms. Moreover, we need to stress that this expression might not include all of the nexttoleading order contributions. For example, it does not include nexttoleading order contributions (potentially) coming from
However, we notice that the subleading term we calculated corrects only the log term in eq. (3.12) and does not affect the overall factor, and which we refer to as the leading exponent. In section 6 we show that directly computing the determinant of Hankel matrices we were able to find to a good precision.
Keeping in mind that eq. (3.10) might not capture the full nexttoleading order contribution it would be prudent to use only the leading order approximation
(3.14)  
(3.15) 
Notice that the asymptotic expression in eq. (3.1) does not decrease exponentially with as opposed to the case[6]. Moreover, the subleading terms in eq. (3.13) are suppressed only by terms and decrease very slowly with increasing . This makes the difference between eq. (3.14) and the numerically computed also decrease very slowly with . This was indeed observed in [9]. The differences were decreasing with , however, much slower than for , and, as a result, the numerics did not convincingly confirm eq. (3.1). In this paper, we endeavour to improve on that.
4 Numerical algorithm
In this section, we describe the novel numerical algorithm for computing the smallest eigenvalues of a highlyill conditioned matrix. We tweak and optimise the algorithm for Hankel matrix in eq. (2.3).
Large parts of the new algorithm are based on the Secant algorithm described in [9]. However, for completeness, we will explain the new algorithm without assuming any reader’s knowledge of [9].
4.1 Precision
The posed problem is a highly illcondition, therefore, in order to obtain accurate results we have to perform arithmetic operations and store numbers with extremely high precision. In order to obtain appropriate precision, we have used the arbitrary precision integer arithmetic implemented in GNU Multiple Precision Arithmetic Library. Each element of the initial matrix is represented by
(4.1) 
where represents the number of bits of precision of the calculation, and is the GMP arbitrary precision integer. This way we obtain a representation of a number with a fixed number of bits to the right of the decimal point and an arbitrary number of bits to the left of the decimal point. Furthermore, all intermediate computations during the LDLT decomposition are done with twice as much precision to the right of the decimal point. After, the LDLT decomposition we truncate the numbers to bits of precision.
In order to test if the number of bits of precision was enough to obtain the correct output, we perform the same computation with an increased number of bits of precision. Only when the eigenvalues calculated match the eigenvalues calculated with precision up to we record it as the correct output.
4.2 Sketch of algorithm
Frequently it is much easier to compute the largest eigenvalue of a matrix rather than the smallest eigenvalue, see for example [11]. At the same time, the smallest eigenvalue of an invertible matrix is the largest eigenvalue of its inverse. Therefore, it might prove advantages to treat the problem of calculating the smallest eigenvalue of a Hankel matrix as a problem of finding the inverse of the Hankel matrix and subsequently finding its largest eigenvalue.
The challenge with this approach is to compute the inverse of the Hankel matrix. Conveniently it is not necessary to compute the whole . Experimenting with the Hankel matrices in eq. (2.3) for relatively small size i.e. , the authors have observed that largest eigenvalue of is equal (to within precision) to the largest eigenvalue of a matrix constructed from the topleft entries of .
In other words, if one is interested in the largest eigenvalues of , one can discharge all entries of except a small topleft section, and compute the largest eigenvalues of that section. Moreover, does not need to be large. In the Table LABEL:fig:table2 we found that was enough to achieve precision for .
4.3 Ldlt algorithm
In order to find the first entries of the authors have employed LDLT matrix decomposition. In this subsection, we start by describing the sequential algorithm for computing the LDLT matrix decomposition. In the next subsection, we follow by describing the parallelised version of the LDLT algorithm.
It is often considered the paradigm of numerical linear algebra that an algorithm must correspond to a matrix factorisation. The matrices we considered i.e. Hankel matrices are symmetric positivesemidefinite, therefore, it is natural to use Cholesky factorisation. In order to avoid the square root operations, we have to use the LDLT variant of Cholesky factorisation
where^{**}^{**}** The matrix is positive semidefinite, as a result, the LDLT factorization will be numerically stable without pivoting. Therefore, pivoting is not necessary. Further, the authors suspect that for the case of Hankel matrices in eq. (2.3) any permutations would lead to slower execution and higher numerical errors. is a lower triangular matrix with ones along the diagonal and is a diagonal matrix.
Algorithm 1: Serial LDLT code 
for (i=1 to N) { // this loop precomputes \(B_{i}\) for (j=i+1 to N) \(B_{i}[j]=A_{i}[j]/A_{i}[i]\); // these loops apply \(A_{i}\) and \(B_{i}\) to all entries to the right of \(A_{i}\) for (j=i+1 to N) for (k=j to N) \(A_{j}[k]=A_{j}[k]A_{i}[j]*B_{i}[k]\); } 
In order to obtain the matrix factorisation in terms of and , we employed the algorithm presented in the box named Algorithm 1. Its outer loop is over the columns of from left to right. The column of we will call . For each of those columns , we first divide it by the diagonal entry i.e. . The result we call . Subsequently we loop over all entries of the matrix to the right of and subtract
A[j][k] = A[j][k]  A[i][j]*B[i][k]Notice that the term we subtract i.e. involves only the picked by the outer loop and we calculated at this step of the loop.
It is important to stress that the divisions involved in calculating the vector
B[i][j] = A[i][j] / A[i][i]are very expensive computationally. Therefore, significant amount of time is saved by precomputing it as indicated in the Algorithm 1 box rather than subtracting
A[j][k] = A[j][k]  A[i][j]*A[i][k]/A[i][i].
column  node  
column  node  
…  …  
column  node  
column  node  
column  node  
column  node  
…  …  
…  node  
…  node  
…  node  
…  …  
column  …  
column  … 
4.3.1 Parallelising Ldlt
In order to parallelise the decomposition on a computing cluster we followed the steps of [9]. We assigned each column to one and only one nodes on the cluster. In order to assign similar amount of work to each node, we assigned columns to nodes using a (balanced) round robin approach. For nodes this approach assigns first columns to nodes up to and then next columns to nodes down to , see table 1. This process gets repeated until each column is assigned to a node. This approach was found to produce satisfactory results for balancing the computational load and memory requirements on a homogeneous system in [9] and no further improvements were introduced in that paper.
Splitting the columns between nodes enables the parallelisation of the “for (j=i+1 to N)” loop in the algorithm 1 over the nodes, see algorithm 2 box.
Algorithm 2: The parallel version of the LDLT algorithm to be run by each node independently. 
Assign the columns to nodes in a balanced round robin fashion Broadcast the values needed to construct the initial matrix Compute \(B_{1}\) for (i=1 to N) { if(column i+1 is assigned to this node) { Apply the \(B_{i}\) to column \(A_{i+1}\) Compute \(B_{i+1}\) Initiate background transmit of \(A_{i+1}\) (and a part of \(B_{i+1}\)) Apply \(A_{i}\) and \(B_{i}\) to all \(A_{i+2}\) ... \(A_{N}\) assigned to this node. Wait for transmit to complete } else { Initiate background receive of \(A_{i+1}\) (and a part of \(B_{i+1}\)) Apply \(A_{i}\) and \(B_{i}\) to all \(A_{i+1}\) ... \(A_{N}\) assigned to this node. Wait for receive to complete Compute any missing \(B_{i+1}\) elements  discussed in detail below } } 
4.3.2 Communication between nodes
In order to to distribute data between nodes and communicate between them we use OpenMPI library to implement MPI. As indicated in algorithm 2 box, initial distribution of the data between the nodes is done using MPI broadcasts. Subsequently at each loop iteration we broadcast (and part of ) from the node that is assigned to all the other nodes.
We broadcast (and part of ) as soon as they are calculated i.e. at step of the loop, see algorithm 2. Moreover, the communication is run in a separate thread that allows the transmission to overlap with the computation.
When sending we are faced with a choice:

we can broadcast the whole and make each node compute locally,

or computing on the node that is assigned column and broadcast both and to the other nodes.
On one hand, the time needed to broadcast is significantly longer than the time needed to compute it. On the other hand, for small values of there is a significant amount of idle communication bandwidth. We use this bandwidth to broadcast both and in parallel to the computation. However, for smaller values of we limit the portion of we broadcast and let each node calculate the remaining part of . The final transmission algorithm is summarized in algorithm 3:
Algorithm 3: The and transmission algorithm. 
serialize the \(A_{i+1}\) column send the \(A_{i+1}\) column break the \(B_{i+1}\) column into chunks of 100 values while (... there are more chunks ... and ... at least 8000 more multiplications to perform ...) { send the next chunk of \(B_{i+1}\) } 
The thresholds of 100 values and 8000 multiplications as presented in the algorithm 3 box were found empirically by the authors of [9] “to work well on a variety of problem sizes and cluster geometries”. This paper did not try to improve on those.
4.3.3 Hybrid MPI and OpenMP parallelisation
We have implemented the algorithm 2 using both MPI and OpenMP to manage the parallelism. In the preceding subsection we have just described how MPI is used to distribute columns between nodes. In contrast OpenMP is used to spread the computation assigned to a particular node across multiple cores within a node. In particular we have used OpenMP to parallelise the inner most loops associated with

compute ,

apply the and to all … assigned to this node,
see algorithm 2 box. The second of those loops is a loop over both rows and the columns assigned to a node. A primitive parallelisation would involve OpenMP parallelising one of those loops
for (j=i+1 to N) {
#pragma omp parallel for
for (k=j to N)
\(A_{j}[k]=A_{j}[k]A_{i}[j]*B_{i}[k]\);
} 
However, it was noticed in [9] that the implicit barrier at the end of the innermost loop resulted resulted in significant wast of computing time. Following [9] we have implemented as a single index array which could be iterated over with a single loop
#pragma omp parallel for threadprivate(j,k) schedule(dynamic,chunk_size)
for (index=lastIndex to firstIndex) {
j=... decode j from index ...
k=... decode k from index ...
\(V_{index}=V_{index}A_{i}[j]*B_{i}[k]\);
}

This loop is equivalent to the double loop above. Further improvements come from

dynamic scheduling  The computation time of each loop iteration varies considerably depending on the location in the matrix. Dynamic scheduling uses a queue to store block of work and assigns them the processor thread when they become available.

 Dynamic scheduling introduced a trade off. The larger the block size the more time wasted at the final barrier at the end of the loop, however, the smaller block size the larger overhead due to queue management overhead. We have used
(4.2) 
We make the OpenMP loop run down through the indices. The underlying reason for going backwards is that the matrix entries become smaller toward the upper left corner of the matrix. This leads to the smaller execution blocks that lead to less time wasted at the final barrier at the end of the loop.
4.4 Finding the inverse of
In this subsection we will show that the LDLT matrix decomposition described in the previous section is the most computationally heavy step to find the first by entries of . Given and such that
in order to find the inverse
or in index notation
(4.3) 
we need to find . We find the inverse of by performing inplace GaussJordan raw elimination. Let us first present the sequential version of the algorithm
Algorithm 3: The sequential version of inplace GaussJordan raw elimination. 
for (i=0 to N1) { for (j=i+1 to N1) { f=A[j][i]; A[j][i]=f; for(k=0 to i1) A[j][k]=A[j][k]  f*A[i][k]; } } 
Moreover, we are only interested in the first by entries of , therefore, we are interested only the first by entries of . These can be found by a significantly faster algorithm, see algorithm 4 box below.
Algorithm 4: Sequential GaussJordan raw elimination for the first by entries of . 
for (i=0 to N1) { for (j=i+1 to N1) { f=A[j][i]; if(i < m) A[j][i]=f; for(k=0 to min(m,i)1) A[j][k]=A[j][k]  f*A[i][k]; } } 
4.4.1 Parallelisation of GaussJordan elimination
We parallelised the GaussJordan elimination in a very similar way to the LDLT decomposition. We have distributing the rows of A over the nodes. We loop over rows of from top to bottom. At iteration number we broadcast the row number of to the rest of the nodes. After the transmission is finished, each node updates all rows assigned to it
A[k][j]=A[k][j] + A[i][j]*A[k][i];We summarise the parallelised algorithm in Algorithm 5 box.
Algorithm 5: The parallelised inplace GaussJordan raw elimination for the first by entries of . 
Assign the columns to nodes in a balanced round robin fashion for (i=0 to N1) { n = the smallest of i and m if(row i is assigned to this node) Broadcast first n entries of i row of A to the other nodes else Receive first n entries of i row of A for (j=i+1 to N1) { if(row j is assigned to this node) { f=A[j][i]; if(i < k) A[j][i]=f; for (k=0 to n1) A[j][k]=A[j][k]  f*A[i][k]; } } } 
There are some important differences between how LDLT decomposition and GaussJordan elimination parallelised

Unlike for LDLT decomposition, we use no OpenMP parallelisation. It would be natural to use OpenMP to parallelise the innermost for loop. However, we noticed that due to the upper limit () the overhead was much larger than possible gains for small ).

Due to small size of the broadcasted message which contains at maximum entries, we did not need a parallel communication and computation^{††}^{††}††One might suspect that an overhead from starting a parallel communication might actually result in a slower over all performance for small .. Table LABEL:fig:table2 shows that during out tests the time spend communicating in the parallelised GaussJordan was very short.
4.5 Finding the inverse of Hankel matrix
Having finished the algorithm 4 we obtain the first by entries of in place of the first by entries of . Subsequently, we can perform the sum
to obtain the first entries of . We perform the sum on the first node. The other nodes send the relevant entries of as they become needed. The final values of we cast into double precision floating point numbers. Finally, we use GNU Scientific Library to find the largest 2 eigenvalues of the truncated .
When truncating to the first entries, we introduce a truncation error. In section 4.2 we have argued that the truncation error is small, however, we would like to estimate it. For that purpose we calculate the largest 5 eigenvalues of truncated . We then use the difference between the eigenvalues for truncation and truncation to estimate the truncation error, see table LABEL:fig:table2.
5 Numerical results
In section 4, we have presented a new numerical algorithm for computing the smallest eigenvalues of Hankel matrices, which we have implemented the algorithm in C programming language. In this section, we present the numerical results obtained using the new implementation.
5.1 Computed eigenvalues
In this subsection, we present the computations we performed on HighPerformance Computing Cluster of the University of Macau. Each computation was performed on 3 computing nodes, each node with 2 IntelÂ® XeonÂ® E52690 v3 E52697 v2 CPUs and 64 GB of RAM. Each CPU had 12 cores/24 threads and 30 MB of cache. Therefore, in total, we had 36 cores and 192 GB of RAM available.
The numerical results obtained together with the timing profiles for the corresponding calculations we have presented in table LABEL:fig:table2 on page LABEL:fig:table2. The table columns correspond to
 N

is the rank of the matrix.
 Required Precision
 No. of nodes

 the number of nodes used for the computation. For each computation, we have checked that during the computationally heavy part i.e. LDLT decomposition we have used 24 cores at (close to) 100% at each node.
 Total Wall time

is the total time taken by the computation as measured by “a clock on the wall”.
 LDLT time

is the time taken by the LDLT decomposition part of the algorithm.
 Inversion of time

is divided into time spent performing arithmetic operations involved in finding the inverse of , and time spent communicating between the nodes. We also quote the sum the above.
 Transpose time

is the time taken to swap between entire columns being assigned to a particular node for the LDLT decomposition and entire rows being assigned to a particular node for the inversion of .
 Inverse of

is the time taken by multiplication of two and matrices to find the truncated .
 Truncation error

is the estimate of the error resulting from truncation of to by size, see section 4.5 for details.
All of the above times were measured on the host node i.e. the node that was recruiting other nodes to store assigned columns and to perform operations on the assigned columns see section 4.
There are a few important things to notice in table LABEL:fig:table2. First, the inversion of time constitutes only a few percents of the total wall times taken by the computation, which demonstrates that the LDLT decomposition is the computationally heavy part of the computation. Second, the time taken by matrix multiplication to find constitutes an even smaller fraction of the total wall times taken by the computation, even though, it was performed on a single node. Third, the “Send/Receive” time is a small fraction of the “Inversion of time”, therefore, parallelising the communication and arithmetics would not lead to significant speedup.
5.2 Timing: the new algorithm against Secant algorithm
The authors of [9] compared the Secant algorithm with a number of classical eigenvalue algorithms including Householder, Jacobi, Lanczos on the task of finding the smallest eigenvalue of Hankel matrices. It was found that for the case of large and extremely illconditioned matrices Secant algorithm proved to be much more efficient. In this paper, we try to improve on the Secant algorithm with the new algorithm presented in section 4. Here we would like to compare the timing of the new algorithm implementation against the implementation of Secant algorithm provided by [9].
In table 3 we have compared the time needed to compute the smallest eigenvalue of by Hankel matrix from eq. (2.3) to numerical accuracy with the two algorithms for different values of . For each we have scanned over precision , defined in eq. (4.1), and chose the minimum value that produced the desired accuracy and called it the “required precision”. The values of precision we scanned over were 256, 388, 512, 768, 1024, 1536, 2048, 3072, 4096 bits. The computation was performed on Thinkpad T480 with IntelÂ® Coreâ¢ i58250U Processor using close to 100% of all 8 threads for both implementations.
The table contains the number of iterations the Secant algorithm needs to converge on the zero of the characteristic polynomial of the matrix for each , let us call it . The total wall time taken by the Secant implementation was composed of LDLT decompositions plus the initial decomposition. Therefore
Looking at table 3, we can notice two improvements over the Secant algorithm:

The computation time required by the new implementation is very close to the time required for the Secant implementation to complete an average iteration. Therefore, the new implementation is times faster. This is to be expected as the new algorithm performs the computationally heavy LDLT decomposition only once, whilst the Secant algorithm does that number of times.

We notice that the precision required to compute the smallest eigenvalue with the desired precision is typically significantly smaller for the new implementation. This can be attributed to the fact that the Secant algorithm finds a zero of the characteristic polynomial whose values are typically astronomically large, and therefore, finding the zero to satisfactory precision is harder than finding a “satisfactory” LDLT decomposition. This leads to small but significant speed gains for larger matrices.
Secant  New algorithm  
N  Precision Required  Wall time  No. of iterations  Average iteration  Precision Required  Wall time  LDLT time  Inv. L time 
[bits]  [s]  [s]  [bits]  [s]  [s]  [s]  
200  512  230.6  8  25.6  388  22.4  22.2  0.037 
400  1024  505.4  8  56.15  758  55.1  54.2  0.527 
600  1024  1652.9  11  137.7  1024  134.9  131.8  2.24 
800  2048  2395.7  8  266.2  1536  376.4  367.9  6.79 
1000  3072  8542.4  7  1067.8  2048  954.6  935.5  15.7 
5.3 Scaling with number of nodes
Authors of [9] extensively discuss how and present evidence that their implementation of Secant algorithm scales well with the number of nodes. In this section, we would like to check that this statement extends to the new algorithm.
We have timed our implementation of the new algorithm for a different number of nodes. We have used Google Cloud Platform to create a cluster of 6 “f1micro” virtual machines^{‡‡}^{‡‡}‡‡These machines have GHz CPU and GB of memory. and run our implementation on a various number of nodes to check the scaling of the computation time with the number of nodes. The results are presented in table 4. The row names follow the same convention as table LABEL:fig:table2 with one difference; we also quote the total CPU time which is the sum of time consumed by all of the CPUs utilized by the program. For this setup we estimate the CPU time using
We can see that the total CPU time only slowly increases with the number of nodes, which leads to the total wall time steadily decreasing with the number of nodes. We can see that we achieve parallelisation over the nodes of the cluster, and it scales well with the number of recruited nodes.
Number of nodes  2  3  4  5  6  

Total wall time  29.3  20.1  16.5  13.8  12.6  
Total CPU time  58.6  60.4  66.1  69.2  75.5  
LDLT time  Total wall  27.4  18.4  14.7  12.3  11 
Total CPU  54.9  55.1  58.9  61.7  66.3  
Inverse of time  Total wall  0.672  0.465  0.366  0.301  0.263 
Total CPU  1.34  1.40  1.46  1.51  1.58  
Arithmetics  0.608  0.424  0.311  0.257  0.205  
Send/Receive  0.0641  0.041  0.055  0.044  0.058  
Inverse time  0.22  0.158  0.175  0.169  0.176  
Eigenvalue time  0.004  0.034  0.034  0.002  0.002 
6 Leading exponent determination
In this section, we use the numerically calculated eigenvalues for from section 5.1 to compare it to eq. (3.14) and extract the leading exponent introduced in section 3.2.
The authors of [9] have also compared eigenvalues they have numerically calculated for to eq. (3.1). Their approach was to directly compare the numerically calculated eigenvalue and the corresponding value predicted by eq. (3.1). They have observed a significant difference^{**}^{**}**The difference is significantly smaller when we compare the numerics to eq. (3.14) instead of eq. (3.1). even for relatively large e.g. for . These differences are due to the subleading in order term not captured by eq. (3.14).
In contrast, we would like to concentrate on the scaling of the smallest eigenvalue with
and emphasise the algebraic factor in our analysis. For that purpose we recast eq. (3.14) which reads
(6.1) 
as
(6.2) 
We notice that if we plot against for large we should find a straight line with the gradient equal to the leading exponent i.e. . For smaller we would expect some deviation from a straight line due to the subleading terms not captured by eq. (6.1).
This picture emphasises the leading exponent. Moreover, as we have noted in section 3.2, the subleading terms do not affect the overall factor, and they are suppressed by a double logfunction. Therefore, we would expect the leading exponent determination to be less affected by the subleading terms than the direct comparison performed by [9].
6.1 Linear fit
We plotted the eigenvalues presented in table LABEL:fig:table2 on a xy plot where
(6.3)  
(6.4) 
in figure 1. Subsequently, we have fit the data points with a linear model. We have included a constant term in the linear model to allow for the differences for smaller values of . We weighted each point proportionally to to put higher weight on the larger values of and found the best fit is
with adjusted , and residuals presented in figure 2.
We can see that the residuals tend to decrease as increases. The gradient value is only away from the predicted value , i.e.
Moreover, the confidence interval^{*†}^{*†}*†We should mention that the points in figure 1 have negligible (numerical and truncation) error bars associated with them, but since we are fitting the model parameters to the data points those model parameters acquire statistical errors. for the gradient contains the . Therefore, we conclude that the smallest eigenvalue of the Hankel matrices in eq. (2.3) at the critical point decays algebraically for large , and the numerical results are a strong evidence that eq. (3.4) and eq. (3.14) are true asymptotically for large .
7 Discussion
In this paper, we have introduced a novel algorithm for finding the smallest eigenvalues of a matrix that is particularly suited for Hankel matrices generated by the probability density function . We have developed an implementation of the algorithm that runs on a high performance computing cluster making use of both shared memory parallelisation within a node and a distributed memory parallelisation between nodes of the cluster. We have shown that the new algorithm improves on Secant algorithm previously considered to be the state of the art algorithm to study of Hankel matrices [9]. Moreover, we show that it scales well with the number of participating nodes.
We have employed the novel algorithm to find the smallest eigenvalues of Hankel matrices at the critical point for matrices up to . We have plotted the numerical results on a custom loglog plot and determined that at the leading exponent is , therefore
where is the smallest eigenvalue of the Hankel matrix in eq. (2.3) at critical point.
7.1 Significance of the result
Berg, Chen and Ismail [3] proved that the minimum eigenvalues of Hankel matrices generated by probability density function
is bounded away from zero for all including equals infinity if and only if the associated moment problem has more than one solutions. For the weight it can be shown that the smallest eigenvalue tends to zero exponentially fast in if , a critical point, where the moment problem is at the verge of being indeterminate, the smallest eigenvalue tends to zero algebraically in . For , the smallest eigenvalue tends to a strictly positive constant whose value depending on is not yet found.
7.2 Preconditioner
The authors suspect that there might be space for further improvements in the design of the numerical algorithm to calculate the smallest eigenvalue. The Hankel matrix we consider has particularly natural and effective preconditioner and postconditioner that make its condition number much more tame. One can consider
which has a nice property that for and a significantly smaller condition number. Therefore, it might be significantly easier to find the determinant of than the original . However, is just the original Hankel matrix premultiplied and postmultiplied by two diagonal matrices. Therefore, the determinant of the original is just the determinant of multiplied by the diagonal terms of pre and postmultiplier. This sounds like a powerful idea to improve Secant algorithm, which relays on a repeated evaluation of (modified) determinants of Hankel matrix. However, we were surprised to find out that the preconditioner and postconditioner did not bring any efficiency improvements to the Secant algorithm. This might be partially explained by the fact that still has a very large condition number though much smaller than . We feel that this situation deserves further clarification, and we leave it as a future direction.
8 Acknowledgements
Y. Chen, J. Sikorowski and M. Zhu would like to thank the Science and Technology Development Fund of the Macau SAR for generous support in providing FDCT 130/2014/A3 and FDCT 023/2017/A1. We would also like to thank the University of Macau for generous support via MYRG 201400011 FST, MYRG 201400004 FST and MYRG 201800125 FST.
References
 [1] Akhiezer NI. The Classical Moment Problem and Some Related Questions in Analysis. Edinburgh: Oliver and Boyd; 1965.
 [2] Beckermann B. The condition number of real Vandermonde, Krylov and positive definite Hankel matrices. Numer Math. 2000; 85: 553557.
 [3] Berg C, Chen Y, Ismail MEH. Small eigenvalues of large Hankel matrices: The indeterminate case. Math. Scand. 2002; 91: 6781.
 [4] Berg C, Szwarc R. The smallest eigenvalue of Hankel matrices. Constr Approx. 2011; 34: 107133.
 [5] Chen Y, Ismail MEH. Thermodynamic relations the Hermitian matrix ensembles. J Phys A: Math Gen. 1997; 30: 66336654.
 [6] Chen Y, Lawrence N. Small eigenvalues of large Hankel matrices. J. Phys. A: Math. Gen. 1999; 32: 73057315.
 [7] Chen Y, Lubinsky DS. Smallest eigenvalues of Hankel matrices for exponential weights. J Math Anal Appl. 2004; 293: 476495.
 [8] De Bruijn NG. Asymptotic Methods in Analysis. New York: Interscience; 1958.
 [9] Emmart N, Chen, Y, Weems C. Computing the smallest eigenvalue of large illconditioned Hankel matrices. Commun. Comput. Phys. 2015; 18: 104124.
 [10] Ismail MEH. Classical and Quantum Orthogonal Polynomials in One Variable, Encyclopedia of Mathematics and its Applications, Vol. 98. Cambridge, UK: Cambridge University Press; 2005.
 [11] Kuczyński J, Woźniakowski H. Estimating the largest eigenvalue by the power and lanczos algorithms with a random start. SIAM Journal on Matrix Analysis and Applications. 1992; 13: 10941122.
 [12] Krein MG, Nudelman AA. Markov Moment Problems and Extremal Problems. Providence, RI: American Mathematical Society; 1977.
 [13] Lubinsky DS. Condition numbers of Hankel matrices for exponential weights. J Math Anal Appl. 2006; 314: 266285.
 [14] Mehta ML. Random Matrices, Third Edition. Singapore: Elsevier (Singapore) Pte Ltd.; 2006.
 [15] Szegö G. On some Hermitian forms associated with two given curves of the complex plane. Trans Amer Math Soc. 1936; 40: 450461.
 [16] Todd J. Contributions to the solution of systems of linear equations and the determination of eigenvalues. Nat Bur Standards Appl Math Ser. 1959; 39: 109116.
 [17] Widom H, Wilf HS. Small eigenvalues of large Hankel matrices. Proc Amer Math Soc. 1966; 17: 338344.
 [18] Widom H, Wilf HS. Errata: Small Eigenvalues of Large Hankel Matrices. Proc Amer Math Soc. 1968; 19: 1508.
 [19] Zhu M, Chen Y, Emmart N, Weems C. The smallest eigenvalue of large Hankel matrices. Applied Mathematics and Computation. 2018; 334: 375387.