Abstract

$Id: LebesgueQuadratures.tex,v 1.160 2020/02/22 19:25:15 mal Exp$


A new type of quadrature is developed. The Gaussian quadrature, for a given measure, finds optimal values of a function’s argument (nodes) and the corresponding weights. In contrast, the Lebesgue quadrature developed in this paper, finds optimal values of function (value–nodes) and the corresponding weights. The Gaussian quadrature groups sums by function argument; it can be viewed as a –point discrete measure, producing the Riemann integral. The Lebesgue quadrature groups sums by function value; it can be viewed as a –point discrete distribution, producing the Lebesgue integral. Mathematically, the problem is reduced to a generalized eigenvalue problem: Lebesgue quadrature value–nodes are the eigenvalues and the corresponding weights are the square of the averaged eigenvectors. A numerical estimation of an integral as the Lebesgue integral is especially advantageous when analyzing irregular and stochastic processes. The approach separates the outcome (value–nodes) and the probability of the outcome (weight). For this reason, it is especially well–suited for the study of non–Gaussian processes. The software implementing the theory is available from the authors.

1

I Introduction

A Gaussian quadrature is typically considered as ‘‘an integral calculation tool’’. However, the quadrature itself can be considered as a discrete measureTotik (11 Nov. 2005). The major practical drawback of Gauss–type quadratures is that they, like a Riemann integral, are finding the nodes in a function’s argument space. A very attractive idea is to build a quadrature with the nodes in a function’s value space, a Lebesgue–type quadrature. As with the Lebesgue integral, such a quadrature can be applied to integration of irregular functions and interpolating sampled measure by a discrete Lebesgue integral. When implemented numerically such an approach can give a completely new look toward relaxation type processes analysis. This is the goal of this paper.

Ii Measure

Consider a measure , a basis , and a function to integrate . An example of the measure can be: Chebyshev with support , Laguerre with support , experimental data sample of points (discrete –point measure), etc. In this paper basis is a polynomial of the degree , e.g. or some orthogonal polynomials basis, the results are invariant with respect to basis choice, and give identical results, but numerical stability can be drastically differentBeckermann (1996); Malyshkin and Bakhramov (2015). Introduce Paul Dirac quantum mechanic bra–ket notation Wikipedia (2016) and :

 \BraketQkf = ∫dμQk(x)f(t) (1) \BraketQjfQk = ∫dμQj(x)Qk(x)f(t) (2)

The problem we study in this paper is to estimate a Lebesgue integralKolmogorov and Fomin (8 May 2012) by an optimal –point discrete measure (15).

 \Braketf =∫fdμ (3)

We are going to apply the technique originally developed in Refs. Malyshkin and Bakhramov (2015); Bobyl et al. (2016); Malyshkin (2017), the main idea is to consider not a traditional interpolation of an observable as a linear superposition of basis functions such as

 \Braket[f(x)−fLS(x)]2 →min (4) fLS(x) =n−1∑k=0βkQk(x) (5)

but instead to introduce a wavefunction as a linear superposition of basis functions, then to average an observable with the weight:

 ψ(x) = n−1∑j=0αjQj(x) (6) fψ = \Braketψfψ\Braketψψ=n−1∑j,k=0αj\BraketQjfQkαjn−1∑j,k=0αj\BraketQjQkαk (7)

With a positively defined matrix the generalized eigenvalue problem:

 n−1∑k=0\BraketQjfQkα[i]k =λ[i]n−1∑k=0\BraketQjQkα[i]k (8) ψ[i](x) =n−1∑k=0α[i]kQk(x) (9)

has a unique solution. Found eigenfunctions to be normalized as . Then ; ; and .

A -point Gaussian quadrature ; :

 ∫f(x)dμ=\Braketf ≈ n−1∑i=0f(x[i])w[i] (10)

on the measure is integration formula (10) that is exact if is a polynomial of a degree or less, in other cases it can be considered as an approximation of the measure by a discrete –point measure . A question about an efficient numerical approach to calculation is a subject of extensive workTotik (11 Nov. 2005); Gautschi (2004). In our recent workMalyshkin and Bakhramov (2015) we established, that the most practical approach to obtain for an arbitrary measure (often available only through data sample) is to put in Eq. (8) and to solve the generalized eigenvalue problem:

 n−1∑k=0\BraketQjxQkα[i]k = λ[i]n−1∑k=0\BraketQjQkα[i]k (11) x[i] = λ[i] (12) w[i] = 1[ψ[i](x[i])]2 (13)

The –th order orthogonal polynomial relatively the measure is equal to the . The Gaussian quadrature nodes are (11) eigenvalues, the weights are equal to inverse square of the eigenfunction at (the eigenfunctions are normalized as ). The (11) is exactly the threediagonal Jacobi matrix eigenvalue problem (see Ref. Killip and Simon (2003) and references therein for a review), but written in the basis of , not in the basis of as typically studied. Particularly, this makes it easy to obtain three term recurrence coefficients and () from a sampled data numerically: find the moments ; and obtain orthogonal polynomials ; in basis; then calculate and using multiplication operator of basis functions, see the method \seqsplitgetAB() of provided software. An ability to use Chebyshev or Legendre basis as allows us to calculate the and to a very high order (hundreds). The weight expression (13) is typically more convenient numerically than the one with the Christoffel function :

 K(x) = 1∑n−1j,k=0Qj(x)G−1jkQk(x)=1∑n−1i=0[ϕ[i](x)]2 (14)

Here is Gram matrix inverse; in (14) the is an arbitrary orthogonal basis, such that , when obtain (13).

The Gaussian quadrature (10 can be considered as a Riemann integral formula, its nodes select optimal positions of a function’s argument, they are operator eigenvalues (11), this integration formula assumes that exist and can be calculated. As with any Riemann integral it requires the to be sufficiently regular for an integral to exist.

The Riemann integral sums the measure of all intervals. The Lebesgue integral sums the measure of all intervals for which the value of function is in the interval , see demonstrating Fig. 1 of Ref. Malyshkin (2017). Consider a -point Lebesgue quadrature ; :

 ∫f(x)dμ=\Braketf ≈ n−1∑i=0f[i]w[i] (15)

Now quadrature nodes are in function value space, not in function argument space as in (10). We will call them the value–nodes. To obtain the value–nodes and weights of a Lebesgue quadrature for the measure and function consider an arbitrary polynomial of a degree or less and expand it in (8) eigenfunctions:

 P(x) = n−1∑i=0\BraketPψ[i]ψ[i](x) (16)

Taking into account that the expression for can be written (here and are arbitrary polynomials of a degree or less):

 \BraketPfS = n−1∑i=0λ[i]\BraketPψ[i]\BraketSψ[i] (17) \Braketf = n−1∑i=0λ[i]\Braketψ[i]2 (18)

The (18) (the case ) is eigenvalues averaged with the weights (note that ). The (18) gives the Lebesgue quadrature value–nodes and weights:

 f[i] = λ[i] (19) w[i] = \Braketψ[i]2 (20)

The Lebesgue quadrature can be considered as a Lebesgue integral interpolating formula by a –point discrete measure (15). The value–nodes select optimal positions of function values, they are operator eigenvalues (8), the weight is the measure corresponding to the value . The weights (20) give

 \Braket1 =n−1∑i=0w[i] (21)

the same normalizing as for the Gaussian quadrature weights (13). As with the Gaussian quadrature (10) the Lebesgue quadrature (15 is exact for some class of functions.

Theorem 1.

If a –point Lebesgue quadrature (15) is constructed for a measure and a function , then any integral , where is a polynomial of a degree or less, can be evaluated from it exactly.

Proof.

When is of a degree or less, then apply (17) with . For a degree above expand . The matrix is non–unique, but always exists and can be obtained e.g. by synthetic division , or using density matrix approach of the Appendix A. The integral then can be evaluated using (17) formula:

 \BraketfP(x) =n−1∑i=0λ[i]w[i](P)=n−1∑i=0λ[i]\Braketψ[i]ˆPψ[i] (22) w[i](P) =\Braketψ[i]ˆPψ[i]=n−1∑j,k=0\Braketψ[i]QjMjk\BraketQkψ[i] (23)

The formula (22) has the same eigenvalues , but they are now averaged with the weights , that are not necessary positive as in (20), note that . ∎

Remark.

The Gaussian quadrature can be considered as a special case of the Lebesgue quadrature. If one put , then –point Lebesgue quadrature gives exact answer for an integral with a polynomial of a degree or less, is reduced to a quadrature that is exact for a polynomial of a degree or less, i.e. to a Gaussian quadrature. When the Lebesgue quadrature value–nodes are equal to the Gaussian nodes. The most remarkable feature of the Lebesgue quadrature is that it directly estimates the distribution of : each from (20) is the measure of sets. For an application of this feature to the optimal clustering problem see Malyshkin (2019).

Theorem 1 gives an algorithm for integral calculation: use the same value–nodes from (19), but the weights are now from (23). The Lebesgue quadrature allows to obtain the value of any integral, adjusting only the weights, value–nodes remain the same, what provides a range of opportunities in applications.

 \Braket[f(x)−fLS(x)]2 =\Braketf2−n−1∑i=0(f[i])2w[i]=\Braket(f−¯¯¯f)2−n−1∑i=0(f[i]−¯¯¯f)2w[i] (24)

Here . The difference between (24) and regular principal components is that the basis of the Lebesgue quadrature is unique. This removes the major limitation of a principal components method: it’s dependence on the attributes scale.

ii.3 Numerical Estimation Of Radon–Nikodym Derivative

Radon–Nikodym derivativeKolmogorov and Fomin (8 May 2012) is typically considered as a probability density relatively two Lebesgue measures and . Consider , then (8) is generalized eigenvalue problem with and matrices (basis functions products averaged with respect to the measure and respectively). If at least one of these two matrices is positively defined then (8) has a unique solution.

Theorem 2.

The eigenvalues are Radon–Nikodym derivative extremums in the basis of (8).

Proof.

Consider the first variation of in the state , then

 \Braketψ+δψdνdμψ+δψ\Braketψ+δψψ+δψ = \Braketψdνdμψ (25) + 2[\Braketψdνdμδψ−\Braketψdνdμψ\Braketψδψ]+…

when is (8) eigenvector, then the first variation (25) (linear in ) is zero because of relation for (8) eigenvectors. ∎

Remark.

If does not belong to the original basis space of (8) problem — then extremal property no longer holds.

Other estimates of Radon–Nikodym derivative can be easily expressed in terms of (8) eigenvectors. For example Nevai operator Nevai (1986) is equal to eigenvalues averaged with the weights:

 dνdμ(x) =n−1∑i=0λ[i][ψ[i](x)]2n−1∑i=0[ψ[i](x)]2 (26)

Other estimates, such as the ratio of two Christoffel functionsSimon (2011) for the measures and if both are positive, can also be expressed in a form of averaged, but with the other weights:

 dνdμ(x) =n−1∑i=0(λ[i])γ[ψ[i](x)]2n−1∑i=0(λ[i])γ−1[ψ[i](x)]2 −1≤γ≤1 (27)

Different estimators converge to each other for . A weighted type of expression preserves the bounds: if original is bounded then (26) is bounded as well; this is an important difference from positive polynomials interpolationBernard (2009), where only a low bound (zero) is preserved. A distinguishing feature of Radon–Nikodym derivative estimate as (8) spectrum is that it is not linked to the states localized in –space (such as (26)), but instead is linked to extremal states of the Radon–Nikodym derivative .

The in (26) is , i.e. it can be considered as a distribution with a single support point : the distribution moments are equal to . Now assume correspond to some actual distribution of and are the moments of this distribution. Then the is:

 dνdμ(x) = n−1∑i=0λ[i][n−1∑k=0α[i]kqk]2n−1∑i=0[n−1∑k=0α[i]kqk]2 (28)

The (28) is averaged eigenvalues with positive weights, for it coincides with –localized (26). However the (28) is much more general, it allows to obtain a Radon–Nikodym derivative for non–localized states. The (28) is the value of the Radon–Nikodym derivative for a distribution with given moments. Such ‘‘distributed’’ states naturally arise, for example, in a distribution regression problemMalyshkin (2015a, b), where a bag of –observations is mapped to a single –observation. There is one more generalization, considered inMalyshkin (2015c, 2017): density matrix mixed states, that cannot be reduced to a pure state of a form, we are going to discuss this generalization elsewhere, for a few simple examples see Appendix A, where a density matrix corresponding to a given polynomial is constructed and Appendix B, where a density matrix corresponding to the Chrisoffel function (14) is constructed. Our approach can estimate both: the measure (as a Lebesgue quadrature) and two measures density (as a Radon–Nikodym derivative), together with provided numerical implementation, this makes the approach extremely attractive to a number of practical problems, for example to joint probability estimationMalyshkin (2018).

Iii Numerical Estimation

The pairs of (8) eigenproblem (for a Gaussian quadrature with and matrices, and for a Lebesgue one with and matrices) are required to calculate a quadrature. A question arise about numerically most stable and efficient way of doing the calculations. Any matrix () can be calculated from the moments () using multiplication operator:

 QjQk = j+k∑m=0cjkmQm (29)

The value of is analytically known (see numerical implementation in the Appendix A of Ref. Malyshkin and Bakhramov (2015)) for four numerically stable bases: Chebyshev, Legendre, Hermite, Laguerre, and for a basis with given three term recurrence coefficients and it can be calculated numerically2 (all the bases give mathematically identical results, because (8) is invariant with respect to an arbitrary non–degenerated linear transform of the basis, but numerical stability of the calculations depends greatly on basis choice).

Once the matrices and are calculated the (8) can be solved using e.g. generalized eigenvalue problem subroutines from LapackLapack (2013). With a good basis choice numerically stable results can be obtained for a 2D problemMalyshkin (2015d) with up to elements in basis, i.e. for basis functions.

In Appendix A & B of Ref. Malyshkin and Bakhramov (2015) the description of API and java implementation of polynomial operations in Chebyshev, Legendre, HermiteE, Laguerre, Shifted Legendre, Monomials bases is presented. The code is available fromMalyshkin (2014), file \seqsplitcode_polynomials_quadratures.zip. See the program \seqsplitcom/polytechnik/algorithms/ExampleRadonNikodym_F_and_DF.java for usage example. This program reads pairs from a tab–separated file, then calculates (19) value–nodes and (20) weights for Lebesgue integral of the functions: , with the measure , and with the measure , see Ref. Bobyl et al. (2016) for a description, and Appendix D for an example. As a proof–of–concept a simple matlab/octave implementation \seqsplitcom/polytechnik/utils/LebesgueQuadratureWithEVData.m is also provided, the class calculates the Lebesgue quadrature value–nodes and weights either from two matrices, or, second option, given in an analytic form, calculates two matrices first and then finds the Lebesgue quadrature. Usage demonstration in available from \seqsplitcom/polytechnik/utils/LebesgueQuadratures_selftest.m. This unoptimized code calculates and matrices in monomials and Chebyshev bases, then builds Gaussian and Lebesgue quadratures.

Iv Conclusion

Obtained Lebesgue quadrature is a new class of quadratures, besides being suitable for integrals estimation, it can be applied to an estimation of the distribution of : each from (20) is the measure of sets. This is especially important for of relaxation type, this approach is superior to typically used approaches based on , , , , skewness and kurtosis approachesBobyl et al. (2018a). In our early worksBobyl et al. (2016, 2018b) the (8) equation was obtained, but all the eigenvalues were considered to have equal weights, their distribution was interpreted as a one related to the distribution of , this is similar to an interpretation of eigenvalues distribution used in random matrix theoryGuhr et al. (1998).

In this paper an important step forward is made. An eigenvalue should have the Lebesgue quadratures weight (20) , not the same weight as in our previous works (first time the Eq. (20) was obtained in Ref. Malyshkin (2015c) as cluster coverage, formula (20) for , but it’s importance was not then understood).

To demonstrate the difference in weights accounting take two–stage degradation data model from Ref. Bobyl et al. (2018b). Li–ion batteries capacity fade with each cycle, the degradation rate per cycle is the characteristics of interest. Consider and the measure (recent and old cycles are equally important), use as battery degradation rate . As in Ref. Bobyl et al. (2018b) consider for 1000 cycles, the degradation rate for the first and second stages is and per cycle respectively. Two processes with first:second stages ratio as 500:500 ( for ; for ) and 800:200 ( for ; for ) are used as the model data, Fig. 1. In our previous worksBobyl et al. (2016, 2018b) we established, that the distribution of from (8) is related to the distribution of . In this paper this relation is found, the weights are (20) Lebesgue quadrature weights. Note, that for the data in Fig. 1, the peaks height for (c) and (f) correspond exactly to stage length, because of the measure chosen .

A Lebesgue quadrature can be interpreted as discrete distribution. The selection of value–nodes is optimal, such a quadrature performs optimal –point discretization of . The approach is applicable to non–Gaussian distributions (e.g. with infinite standard deviation (but not with infinite mean), burst of many orders of magnitude, etc.). The situation is similar to the one in quantum mechanics: when a quantum Hamiltonian is known incorrectly and have some energy state, that is greatly different from the ground state, such a state does not change system behavior at all, because it has close to zero probability. The Lebesgue quadrature has similar ideology, it separates the state on: an observable value and the probability of it . Similar path have been successfully tried earlier in our quantum–mechanics approach to machine learning of Ref. Malyshkin (2015c), where we separated system properties (described by the outcomes) and system testing conditions (described by the coverage).

Acknowledgment

Vladislav Malyshkin would like to thank S. V. Bozhokin and I. A. Komarchev from Peter the Great St.Petersburg Polytechnic University for fruitful discussions.

Appendix A Density matrix, corresponding to a given polynomial

In Section II.2 the integral with a polynomial of a degree or less is considered. The technique of Malyshkin and Bakhramov (2015) deals mostly with type of integrals, and it is of practical value to be able to reduce a state described by an arbitrary polynomial:

 P(x) =2n−2∑k=0γkQk(x) (30)

to the state described by the density matrix:

 ρ(x,y) =n−1∑i=0λ[i]ψ[i](x)ψ[i](y) (31) P(x) =ρ(x,x) (32)

such that , and are the eigenvalues and the eigenvectors of some operator .

Theorem 3.

For a non–degenerated basis relatively the measure such operator always exists and is generated by a measure with the moments .

Proof.

To find a measure, such that (here is Gram matrix inverse) apply multiplication operator from (29) to obtain:

 2n−2∑m=0γmQm(x) =n−1∑j,s,t,k=0j+k∑m=0s+t∑l=0cjkmG−1jscstlG−1tk\BraketQlPQm(x) (33)

Comparing the coefficients by obtain a linear system of dimension, from which the ; moments can be found:

 n−1∑j,s,t,k=0s+t∑l=0cjkmG−1jscstlG−1tk\BraketQlP =γm (34)

Then construct Gram matrix of the measure corresponding to found moments , this gives the required . To construct operator, eigenvalues/eigenvectors of which give (32): solve (8) generalized eigenvalue problem with the matrices and in (8) left– and right– hand side respectively, obtained eigenvalues/eigenvectors pairs give (32) expansion over the states of operator:

 n−1∑k=0\BraketQjQkPα[i]k =λ[i]n−1∑k=0\BraketQjQkα[i]k (35) ρ(x,y) =n−1∑i=0λ[i]ψ[i](x)ψ[i](y)=n−1∑i=0\Ketψ[i]λ[i]\Braψ[i]=∥ρ∥ (36) P(x) =ρ(x,x) (37)

Remark.

The expansion of with the matrix generated by a measure is unique, the measure moments are (34) linear system solution; without a requirement that the matrix to be generated by a measure, the solution is non–unique. Another non–uniqueness can arise from a degeneracy of matrix, for example, take Christoffel function (14), : the solution (34) and the matrix are unique, but the (32) expansion is non–unique due to (35) spectrum degeneracy (all the eigenvalues are equal to one), for an arbitrary orthogonal basis .

Note.

This prof is actually an algorithm to construct the density matrix , producing a given polynomial . In provided implementation \seqsplitcom/polytechnik/utils/BasisFunctionsMultipliable.java the method \seqsplitgetMomentsOfMeasureProducingPolynomialInKK_MQQM(), for a given , solves the linear system (34) and obtains the moments . The method \seqsplitgetDensityMatrixProducingGivenPolynomial() uses these moments to solve (35) and to obtain the from (36) as a Lebesgue quadrature, the spectrum of which corresponds to a given polynomial (32).

From (32) it immediately follows that the sum of all eigenvectors is equal to , particularly for Christoffel function we have: , and in general case:

 \Braketf(x)P(x) =n−1∑i=0λ[i]\Braketψ[i]fψ[i] (38)

The (38) is a representation of integral as a sum of –moments over the states of the density matrix operator (35). This formula is a complementary one to (22), which is a representation of integral as a sum of –moments over the states of operator (8).

Finally, we want to emphasize, that used all of the above is a special case of a density matrix. Consider , then , and for an operator , Similarly, a spur with a density matrix , e.g. corresponding to a polynomial , can be used instead of all averages:

 \Braketf →Spur∥f|ρ∥ (39)

This way the approach we developed can be extended not only to polynomial by operator products study, but also to operator–by–operator products. Then, instead of , which can be written either in (22) or in (38) representation, a general case of two operators can be considered. The first attempt to explore this direction is presented in Malyshkin (2018).

Appendix B On The Christoffel Function Spectrum

In the consideration above was a given function with finite moments in (2). It’s selection depends on the problem approached, for example we used to obtain Gaussian quadrature (11) and for Li–ion degradation rate study in Fig. 1. A question arise what the result we can expect if the Christoffel function (14) is used as .

Theorem 4.

If is equal to the Christoffel function the eigenproblem

 n−1∑k=0\BraketQjK(x)Qkα[i]k =λ[i]Kn−1∑k=0\BraketQjQkα[i]k (40) ψ[i]K(x) =n−1∑k=0α[i]kQk(x) (41)

has the sum of all eigenvalues equals to the total measure:

 \Braket1 =∫dμ=n−1∑i=0λ[i]K (42)
Proof.

For a given Christoffel function vanishes at large with asymptotic, the integrals (2) are finite and (40) has a solution with eigenvalues (possibly degenerated) and eigenfunctions . The Christoffel function (14) can be expressed in any orthogonal basis, take . From and obtain . ∎

The eigenfunctions (11) of a Gaussian quadrature correspond to –localized states, they are operator eigenfunctions and the total weight is with ; the is (11) eigenproblem solution. The states of (40) eigenproblem satisfy Theorem 4 and the Lebesgue quadrature weights sum (21): . However an eigenvalue of (40) is not equal to the Lebesgue quadrature weight , see (46) below. A density matrix operator can be constructed from (40) eigenvalues and eigenfunctions:

 ρK(x,y) =n−1∑i=0λ[i]Kψ[i]K(x)ψ[i]K(y)=n−1∑i=0\Ketψ[i]Kλ[i]K\Braψ[i]K=∥ρK∥ (43)

it is similar to ‘‘regular average’’ density matrix considered in the Appendix A, e.g. both have the same Spur (equals to total measure). The (43) is the same as (36) but the eigenvalues/eigenfunctions are (40) instead of (35). The density matrix operator corresponds to the Christoffel function . The problem of averaging an operator with the Christoffel function used as a weight is a difficult problem Malyshkin (2015a). The (43) allows this problem to be approached directly: take the . A question arise about mapping: whether it is a one–to–one mapping or not? For , a polynomial of degree, the mapping is (36). For this requires a separate consideration. Anyway, built from the Christoffel function density matrix operator (43) allows us to consider an operator average with the Christoffel function in a regular ‘‘operatorish’’ way: by taking a Spur of operators product.

Recent progressMalyshkin (2019) in numerical computability of Radon–Nikodym derivative for multi–dimensional allows us to demonstrate Theorem 4 numerically. Take a simple demonstration measure of the Appendix C of Malyshkin (2019):

 dμ =dx (44) x ∈[−1:1]

The file \seqsplitdataexamples/runge_function.csv is bundled with provided software. It has 10001 rows (the measure support is split to 10000 intervals) and 9 columns. In the first seven columns there are the powers of : . Then, in the next two columns, follow: Runge function and the (44) weight. Run the program to obtain Christoffel function value for all observations in data file (column indexes are base 0):

java com/polytechnik/utils/RN --data_cols=9:0,6:1:8:1 \
--data_file_to_build_model_from=dataexamples/runge_function.csv


Here as we use the , the data is in the column with index . The Lebesgue quadrature then produces the Gaussian quadrature for the measure (44):

 x[0] =−0.9491080257215085 w[0] =0.1294848235792277 w[0]K =0.11746154871932572 (45) x[1] =−0.7415313130354606 w[1] =0.279705429437816 w[1]K =0.2794795769155739 x[2] =−0.40584522389537203 w[2] =0.3818301175303132 w[2]K =0.38911964330481996 x[3] =0 w[3]