Semi Parametric Estimations of rotating and scaling parameters for aeronautic loads
Abstract
In this paper, we perform registration of noisy curves. We provide an appropriate model in estimating the rotation and scaling parameters to adjust a set of curves through a Mestimation procedure. We prove the consistency and the asymptotic normality of our estimators. Numerical simulation and a real life aeronautic example are given to illustrate our methodology.
Keywords:
Semiparametric model, Registration of curves, Industrial monitoring
MSC Classification:
62F12, 62F30, 62P30
1 Introduction
An airframe structure is a complex system and its design is a complex task involving today many simulation activities generating massive amounts of data. This is, for example, the process of loads and stress computations of an aircraft. That is the computations of the forces and the mechanical strains suffered by the structure. The overall process exposed in Figure 1 is run to identify load cases (i.e aircraft mission and configurations: maneuvers, speed, loading, stiffness…), that are critical in terms of stress endured by the structure and, of course, the parameters which make them critical. The final aim is to size and design the structure (and potentially to reduce loads in order to reduce the weight of the structure). Typically for an overall aircraft structure, millions of load cases can be generated and for each of these load cases millions of structural responses (i.e how structural elements react under such conditions) have to be computed. As a consequence, computational times can be significant.
In an effort to continuously improve methods, tools and waysofworking, Airbus has invested a lot in digital transformation and the development of infrastructures allowing to treat data (newly or already produced). The main industrial challenge for Airbus is to reduce lead time in the computation and preliminary sizing of an airframe as well as extracting value from already calculated loads.
In this paper, we focus on the external loads of a wing: for each load case are calculated the shear forces (transverse forces near to vertical arising from aerodynamic pressure and inertia) and bending moments (resulting from the shear forces, they represent the flexion of the wing) such as shown in Figure 2.
These external loads appeared to be extremely regular and one can legitimately suppose that it exists a link between all those curves. Indeed, it is natural to assume that it exists a reference bending moment (a reference curve) which can be morphed through a deformation model to give all the other curves. This problem of point sets, curves and images registration has been widely studied and extended to image and shape analysis [1, 8].
The first problem of registration is the identification of a reference function: which binds all the others and the second is to define the transformation model. Numerous papers refers to the identification of a proper reference template such as [3] who used a wavelet threshold to identify the mean pattern, or [5] who proposed to estimate a common template to through an approximated geodesic distance for manifold embedding. Nevertheless, the use of a template is not always necessary as shown in [4]. In our work, we do not focus on the identification of the reference curve and use simply the mean curve as a template.
While the estimation of the rotation and scaling parameters have been widely studied for image registration such as shown in [4], [10], researches on curve registration have been focused on warping strategies and estimations of shifts such as shown in [12], [7]. However, to the best of our knowledge, none work has been realized on the estimation of rotation and scaling parameters when the transformation is applied on the couple of ordinate and abscissa of a curve. More precisely in our study, we observe a large number of functions differing from each other only by the non linear combination of transformations: a rotation, a homothetic transformation and a dilation of the design space. The idea is then to use a transformation model in order to predict the deformation parameters to estimate loads for different loads configurations.
Our paper is organized as follow: in Section 1, we define the observations as well as the transformation model used and the non linear regression model. Estimation of the rotation and scaling parameters is made following the classical guideline of Mestimators and we provide Mestimators of the unknown parameters and we give their asymptotic behaviour. Then, in Section 3, we give a simulated example illustrating our methodology and discuss an application to the aeronautic problem. The proofs are postponed to the last section.
2 Framework, model and analytic results
In this section, we describe the statistical model studied and give the asymptotic behaviour of the Mestimators of the unknown parameters.
2.1 The observations
Notation 1
Let be x\in[0,1], and f:[0,1]\to\mathbb{R}^{+}. We denote by C the curve
C:=\begin{pmatrix}x\\
f(x)\end{pmatrix}_{x\in[0,1]}.
In our framework, we have at hand K+1 curves. \tilde{C} is the reference curve and we assume that the K other curves C_{j},\ j=1,...,K are the images of \tilde{C} by the transformation model described bellow. The K curves are observed on the same random grid \mathcal{D}_{N}:=\{X_{1},...,X_{N}\} where (X_{i})_{i=1,...,N} are iid random variables with uniform distribution. Hence we have at our disposal
C_{j}^{N}=(X_{i},f_{j}(X_{i}))_{i=1,...,N,\ j=1,...,K}. 
We will consider the following two cases

i) \tilde{C} is known everywhere: \tilde{C}=(x,\tilde{f}(x)), \forall x\in[0,1],

ii) \tilde{C} is only known on \mathcal{D}_{N}: \tilde{C}=(x,\tilde{f}(x)), \forall x\in\mathcal{D}_{N}.
2.2 Transformation model
Before defining the transformation model linking the K observed curves to the pattern \tilde{C}, one must ensure that the functions f_{j}, j=1,...,K and \tilde{f} are ”admissible”. This is the aim of the next definition:
Definition 2
Let \theta_{0}\in]0,\frac{\pi}{2}[, \mathcal{F}_{\theta_{0}} is the set of applications defined by:
\mathcal{F}_{\theta_{0}}:=\{f:[0,1]\to\mathbb{R}^{+},f\ \mathrm{is}\ \mathrm{% differentiable}\ \mathrm{on}\ [0,1],
f(0)>0,f(1)=0,
f^{\prime}(x)<0,f^{\prime}(1)=0,f^{\prime}(0)>\cot{\theta_{0}}\}.
Let now define the parametric family of transformation that we will put in action.
Notation 3
For any function T_{\alpha}:\mathbb{R}^{2}\to\mathbb{R}^{2} depending on a parameter \alpha\in\Theta, we will denote by T_{\alpha}^{1} and T_{\alpha}^{2} its two coordinates.
In the following, we define our transformation model.
Set 0<\theta_{0}<\theta_{1}<\frac{\pi}{2}, and 0<\lambda_{min}<\lambda_{max}. For any \alpha:=(\theta,\lambda)\in\Theta=[\theta_{0},\theta_{1}]\times[\lambda_{min}%
,\lambda_{max}], set
T_{\alpha}:=H_{\lambda}\circ S_{\theta}\circ R_{\theta}
the composition of a rotation R_{\theta}, a rescaling S_{\theta} applied on the xaxis, and a homothetic transformation H_{\lambda} on the yaxis. More precisely

R_{\theta} is the rotation centered in \begin{pmatrix}1\\ 0\end{pmatrix} and of angle \theta. That is
\displaystyle R_{\theta}:[0,1]\times\mathbb{R}^{+} \displaystyle\to[A_{min},1]\times\mathbb{R} \displaystyle\begin{pmatrix}x\\ y\end{pmatrix} \displaystyle\to\begin{pmatrix}(x1)\cos{\theta}y\sin{\theta}+1\\ (x1)\sin{\theta}+y\cos{\theta}\end{pmatrix}, with A_{min}=\sqrt{1+y(0)^{2}}1.

After rotating a curve, depending of the angle \theta, the resulting design space of the curve is generally no more [0,1]. Hence, we define the following transformation to reposition the curve on [0,1]:
\displaystyle S_{\theta}:[A_{min},1]\times\mathbb{R} \displaystyle\to[0,1]\times\mathbb{R} \displaystyle\begin{pmatrix}x\\ y\end{pmatrix} \displaystyle\to\begin{pmatrix}\frac{xa}{1a}\\ y\end{pmatrix}, with a is the real minimum value of the design space after rotating the curve.

The scaling transformation of parameter \lambda>0 acting on the second coordinate of the curve:
\displaystyle H_{\theta}:[0,1]\times\mathbb{R^{+}} \displaystyle\to[0,1]\times\mathbb{R}^{+} \displaystyle\begin{pmatrix}x\\ y\end{pmatrix} \displaystyle\to\begin{pmatrix}x\\ \lambda y\end{pmatrix}.
We now introduced the assumption that will be used:

(A1): The functions (f_{j})_{j=1,...,K}, and \tilde{f} belong to \mathcal{F}_{\theta_{0}}.

(A2): For any \theta\in[\theta_{0},\theta_{1}], for any x\in[0,1] and any f\in\mathcal{F}_{\theta_{0}}, the second coordinate of R_{\theta} is positive.

(A3): \Theta=[\theta_{0},\theta_{1}]\times[\lambda_{min},\lambda_{max}].

(A4): \tilde{C} is known on [0,1].

(A5): \tilde{C} is known on the grid \mathcal{D}_{N}.
Thus, the transformation model considered is as follows:
\displaystyle T_{\alpha}:[0,1]\times\mathbb{R}^{+}  \displaystyle\to[0,1]\times\mathbb{R}^{+}  
\displaystyle\begin{pmatrix}x\\ f(x)\end{pmatrix}  \displaystyle\to\begin{pmatrix}\frac{(x1)\cos{\theta}f(x)\sin{\theta}}{\cos{% \theta}+f(0)\sin{\theta}}+1\\ \lambda((x1)\sin{\theta}+f(x)\cos{\theta})\end{pmatrix}. 
2.3 Nonlinear regression model
Recall that we wish to adjust the reference curve \tilde{C} on the other curves C_{j} (j=1,...,K) by transformations defined in the previous subsection. Notice that these transformations act on both axis. For any \alpha, we want to compare the value of the transformed curve (T_{\alpha}\tilde{C})(X_{i}) with f_{j}(X_{i}). Since the abscissa points are affected by the transformation, we denote by X_{i}(\alpha) the point such that T_{\alpha}^{1}(X_{i}(\alpha))=X_{i}. For that reason, we introduce the following definition:
Definition 4
We denote by x(\alpha,g) the solution of the equation:
T_{\alpha}^{1}(u,g(u))=x.  (1) 
To ease the notation, we finally set x(\alpha):=x(\alpha,g), so X_{i}(\alpha)=X_{i}(\alpha,\tilde{f}). We consider the nonlinear parametric regression model:
f_{j}(X_{i})=T_{\alpha_{j}^{*}}^{2}(X_{i}(\alpha_{j}^{*}),\tilde{f}(X_{i}(% \alpha_{j}^{*})))+\epsilon_{j,i},(j=1,...,K).  (2) 
Where:

(X_{i}) are iid with distribution \mathbb{P}_{X}. It is the design on which we observe the curves C_{j};

\alpha_{j}^{*}=(\theta_{j}^{*},\lambda_{j}^{*}) is the couple of true parameters for each curve (j=1,...,K);

\epsilon_{j,i},\ \forall i=1,...,N,\ \forall j=1,...,K are iid \mathcal{N}(0,\sigma^{2}) random variables. These variables are assumed to be independent of X_{i}.
2.4 Estimation
2.4.1 Estimation when \tilde{C} is known on [0,1]
For the sake of simplicity, let us fix j. Relying on a classical Mestimation procedure, we consider a semiparametric method to estimate the parameters and define consequently the following empirical contrast function to fit the reference curve \tilde{C} to C_{j} (j=1,...,K):
\begin{split}\displaystyle M^{j}_{N}(\alpha)&\displaystyle=\frac{1}{N}\sum_{i=% 1}^{N}(f_{j}(X_{i})T_{\alpha}^{2}(X_{i}(\alpha),\tilde{f}(X_{i}(\alpha))))^{2% }\\ &\displaystyle=\frac{1}{N}\sum_{i=1}^{N}m_{\alpha}^{j}(X_{i}).\end{split}  (3) 
The random function M^{j}_{N} is non negative. Furthermore, intuitively, its minimum value should be reached close to the true parameter \alpha_{j}^{*}. Indeed, the following theorem gives the consistency of the Mestimator, defined by :
\hat{\alpha}^{j}_{N}=\underset{\alpha\in\Theta}{\mathrm{argmin}}\ M^{j}_{N}(% \alpha).\\  (4) 
Recall that our empirical constrast function enters in the general theory of Mestimator. The Central Limit Theorem will be shown by using Mestimator arguments.
Theorem 5
Assume that A1, A2, A3 and A4 are satisfied. Then
\displaystyle\begin{split}\displaystyle i)&\displaystyle\hat{\alpha}_{N}^{j}% \xrightarrow[N\to+\infty]{\mathbb{P}}\alpha_{j}^{*},\\ \end{split}  (5)  
\displaystyle\begin{split}\displaystyle ii)&\displaystyle\sqrt{N}(\hat{\alpha}% _{N}^{j}\alpha_{j}^{*})\xrightarrow[N\to+\infty]{\mathcal{L}}\mathcal{N}(0,% \Gamma_{\alpha_{j}^{*}}).\\ \end{split}  (6) 
In particular, the covariance matrix has the following form
\Gamma_{\alpha_{j}^{*}}=V_{\alpha_{j}^{*}}^{1}2\sigma^{2},  (7) 
with V_{\alpha_{j}^{*}}=2\mathbb{E}[\dot{T}_{\alpha_{j}^{*}}^{2}\dot{T}_{\alpha_{j}% ^{*}}^{2\textbf{T}}], and \dot{T}_{\alpha_{j}^{*}}^{2} is the vector of partial derivatives of T_{\alpha_{j}}^{2} w.r.t elements of \alpha taken at \alpha_{j}^{*}.
2.4.2 Estimation when \tilde{C} is observed on \mathcal{D}
In this section, we consider the case where the reference curve \tilde{C} is observed on the same grid \mathcal{D}:=(X_{i})_{i=1,..,N} as the other curve C_{j}, i.e \tilde{C}=\begin{pmatrix}x\\
\tilde{f}(x)\end{pmatrix}_{x\in D}. By applying the transformation T_{\alpha} to \tilde{C}, the transformed pattern T_{\alpha}\tilde{C} is no longer observable on \mathcal{D}. As a consequence, one must make use of an approximation process over \tilde{f}. Let \tilde{f}_{N} be the linear interpolate of \tilde{f}, defined by:
\tilde{f}_{N}(x)=\sum_{i=1}^{N}\Delta_{i}(x)\mathds{1}_{x\in[X_{(i)},X_{(i+1)}% )},  (8) 
where
\Delta_{i}(x)=\frac{\tilde{f}(X_{(i+1)})\tilde{f}(X_{(i)})}{X_{(i+1)}X_{(i)}% }\ x\ +\ \tilde{f}(X_{(i)})\frac{\tilde{f}(X_{(i+1)})\tilde{f}(X_{(i)})}{X_{% (i+1)}X_{(i)}}X_{(i)}.\\  (9) 
It is easy to see that \tilde{f}_{N} belongs also to \mathcal{F}_{\theta_{0}}. Replacing \tilde{C} by \hat{C} in (3) we obtain
\hat{M}^{j}_{N}(\alpha)=\frac{1}{N}\sum_{i=1}^{N}(f_{j}(X_{i})T_{\alpha}^{2}(% X_{i}(\alpha,N),\tilde{f}_{N}(X_{i}(\alpha,N))))^{2},
where X_{i}(\alpha,N) is the solution to the equation T_{\alpha}^{1}(u,\tilde{f}_{N}(u))=X_{i}.
Using the linear interpolate defined by (8), we show the consistency and asymptotic normality of our Mestimator defined as follow:
\hat{\hat{\alpha}}^{j}_{N}=\underset{\alpha\in\Theta}{\mathrm{argmin}}\ \hat{M% }^{j}_{N}(\alpha).\\  (10) 
Theorem 6
Assume that A1, A2, A3 and A5 are satisfied. Let \tilde{f}_{N} be defined by (8) and assume that \exists C>0 s.t \forall x\in[0,1], \tilde{f}_{N}^{\prime}(x)\leq C, and \tilde{f}^{\prime}(x)\leq C, then
\displaystyle\begin{split}\displaystyle i)&\displaystyle\hat{\hat{\alpha}}_{N}% ^{j}\xrightarrow[N\to+\infty]{\mathbb{P}}\alpha_{j}^{*},\\ \end{split}  (11)  
\displaystyle\begin{split}\displaystyle ii)&\displaystyle\sqrt{N}(\hat{\hat{% \alpha}}_{N}^{j}\alpha_{j}^{*})\xrightarrow[N\to+\infty]{\mathcal{L}}\mathcal% {N}(0,\Gamma_{\alpha_{j}^{*}})\\ \end{split}  (12) 
with \Gamma_{\alpha_{j}^{*}} such defined in (7).
3 Simulations and applications
In this section we illustrate the method on numerical applications. The first subsection is dedicated to some simulated toy example while the second to a real problem. The optimisation problems (4) and (10) will be numeritically solved by using the BFGS algorithm [11].
3.1 Simulated toy example
We consider the following model:
f_{\lambda,\theta}(x)=\lambda[(x1)\sin{\theta}+g(x)\cos{\theta}], 
with g(x)=2(\cos{\pi x}+1). We observe f_{\lambda_{j},\theta_{j}}(x_{i}) for i=1,...,100, for J=25 values of (\lambda,\theta) with some iid errors \epsilon_{ij}.

The observations points x_{i},\ i=1,...,100 are iid random variables with uniform distribution on [0,1].

The parameters are chosen randomly with the following arbitrary distribution (\lambda,\theta)\hookrightarrow\mathcal{U}([0,15])\times\mathcal{U}([\frac{% \pi}{3},\frac{\pi}{30}]).

The errors are assumed to be \mathcal{N}(0,0.01).
Results are given in Figure 3. The simulated data are shown in Figure 3 (a). Each of these curves is rescaled to [0,1] to avoid numerical issues. The curve in blue is the reference curve. After the estimation of the parameters \lambda_{j} and \theta_{j}, all curves can be rescaled back to their original space as shown in Figure 3 (b).
Rescaling the curves allows to easily choose the initial point of parameters for the optimization algorithm taking 1 for \lambda and 0 for \theta.
3.2 Aeronautic loads
In [6], the authors present an aeronautic model that computes the loads (forces and moments) on the wing of some aircraft model denoted by ACM1. They present several statistical methods in order to study these data. In this section, we will compare the method used in [6] with the model presented in Section 2 for a new aircraft model called ACM2. The data at our disposal represents bending moments of a wing (representing its flexion) of an aircraft calculated for 1152 different configurations (load cases). Each configuration is defined by 28 features (speed of the aircraft, mass, altitude, quantity of fuel, etc.), leading to a bending moment calculated on 45 stations along the wing. In a more formal way, we observe the couple (X_{j},Y_{j})_{j=1,...,1152}, where X_{j}=(X_{j}^{1},...,X_{j}^{28}) are the features and Y_{j}=(Y_{j}^{1},...,Y_{j}^{45}) is the bending moment. The idea is to predict the bending moment for different configurations. The data are represented in Figure 4.
Due to the discontinuities at the 3^{rd} and 20^{th} stations, we apply our methodology to each section independently. Then, each section can be represented by its minimum and maximum values, and by its rotation and scaling coefficients \lambda_{j} and \theta_{j}. Figure 5 assess the quality of the matching process (the reference curve used is the average bending moment).
Thus the dimensional space of the outputs is reduce to 12 instead of 45. We compare our method to three other methods of [6] applied on the outputs: no transformation (we call it raw  we build 45 models, one per station); a PCA (the three first principal components represent 99,9% of the explained variance  3 models instead of 45); a polynomial fitting per section (of degree 4 for the first section, of degree 2 for the second and of degree 1 for the third section) which leads to 10 models instead of 45. The Table 1 sums up the number of outputs to predict depending on the method used.
Number of outputs  Names of outputs  

Raw  45  Bending moment value at 
station 0 to 44  
Deformation  12  \theta_{1},\theta_{2},\theta_{3},\lambda_{1},\lambda_{2},\lambda_{3},min_{1},% min_{2}, 
Model  min_{3},max_{1},max_{2},max_{3}  
PCA  3  Principal components 1 to 3 
Polynomial fitting  10  Coefficients of polynomials 
The significant advantage of the reduction dimension techniques used is that the response of the model would have a physical form contrary to the simple linear models performed on the raw data. To build our models, we use the Orthogonal Greedy Algorithm (OGA) also known as the Matching Pursuit Algorithm. Detailed explanations can be found in [2], [14] and [9]. Roughly speaking, we consider the problem of approximating a function by a sparse linear combination of inputs.
To assess the goodness of fit of our models, we defined for a curve of bending moment j the error rate as follows:
error(j)=\sqrt{\frac{\sum_{i=1}^{45}(\hat{y}(x_{i})y_{j}(x_{i}))^{2}}{\sum_{i% =1}^{45}y_{j}^{2}(x_{i})}}, j=1,...,n_{test},
where n_{test} is the size of the sample of test. We compute the error rates on (the sample of test is of 25% the size of the total database). It gives an idea of how far our predictions are. For this standpoint, we can easily compute the empirical cumulative distribution function (CDF): \forall\ j=1,...,n_{test}, let \alpha\in[0,1]. The empirical CDF is defined as:
\alpha\to G(\alpha)=\frac{1}{n}\sum_{j=1}^{n_{test}}\mathds{1}_{(error(j)\leq% \alpha)}
In Table 2, we give the values of G(\alpha) for \alpha=1\%,2\%,5\%,10\% and the mean error. In Figure 6 we give the plots of the function G(\alpha) for the different methods.
Concerning the approximation and prediction of loads, our model is equivalent in average to other tested methods, there are just slightly more observations with an error below 1%. Nevertheless, in our case, the linear models built through the deformation model are sparser than the other. Indeed, in average, 11 variables are chosen as optimal parameter of the greedy algorithm by crossvalidation for the deformation model, 13 for the polynomial fitting, 15 for the PCA and 14 for the raw outputs one.
Even though the prediction of loads with the deformation model is so likely equivalent to none transformation, it obtains better results than the polynomial fitting and the PCA. Besides, it is important to notice that it gives to engineers a physical interpretation and idea of how react the wing to new constraints. Besides, using this deformation model gives a physical response contrary to a simple linear model per station whose response could be irregular.
4 Proofs and technical result
4.1 Technical result
This section is dedicated to the technical result used in the proof of Theorem 4.
Lemma 7
Let X_{1},...,X_{N} be N independent and identically distributed random variables with uniform distribution on [0,1] and let X_{(1)}\leq...\leq X_{(N)} be the reordered sample . Let a_{N}=O(\sqrt{N}), then:
\mathbb{P}(a_{N}\underset{j}{\sup}X_{(j+1)}X_{(j)}\geq\epsilon)\xrightarrow% [N\to+\infty]{}0. 
Proof of Lemma 1 Let Z_{1},...,Z_{N+1} be N independent and identically distributed random variables with exponential distribution with parameter 1. It is a well known fact that
(\frac{Z_{1}}{\sum_{k=1}^{N+1}Z_{k}},\frac{Z_{1}+Z_{2}}{\sum_{k=1}^{N+1}Z_{k}}%
,...,\frac{Z_{1}+...+Z_{N}}{\sum_{k=1}^{N+1}Z_{k}})\overset{(\mathcal{L})}{=}(%
X_{(1)},...,X_{(N)}) and we have
X_{(j+1)}X_{(j)}\overset{(\mathcal{L})}{=}\frac{Z_{j+1}}{\sum_{k}Z_{k}}.  (13) 
Now, for \epsilon>0
\begin{split}\displaystyle\mathbb{P}(\underset{j}{\sup}X_{(j+1)}X_{(j)}\geq% \epsilon)&\displaystyle\leq\sum_{j}\mathbb{P}(X_{(j+1)}X_{(j)}\geq\epsilon)\\ &\displaystyle\leq N\max_{j}\mathbb{P}(X_{(j+1)}X_{(j)}\geq\epsilon).\\ \end{split} 
By using (13), we have
\begin{split}\displaystyle\mathbb{P}(X_{(j+1)}X_{(j)}\geq\epsilon)=(1% \epsilon)^{N1}.\\ \end{split} 
Then,
\begin{split}\displaystyle\mathbb{P}(\underset{j}{\sup}X_{(j+1)}X_{(j)}\geq% \epsilon)&\displaystyle\leq N(1\epsilon)^{N1}.\\ \end{split} 
The result follows replacing \epsilon by \frac{\epsilon}{a_{N}} and letting N\to+\infty.
4.2 Proofs of Theorems 3 and 4
Proof of Theorem 3 To ease the notation, we do not display the dependency in j.
i) By (3) it is easy to see that M_{N}(\alpha) is an empirical mean of iid bounded random variables. Thus, by the Strong Law of Large Number (SLLN)
M_{N}(\alpha)\xrightarrow[N\to+\infty]{p.s}M(\alpha),
with M(\alpha)=\mathbb{E}[\epsilon^{2}]+\mathbb{E}[(T_{\alpha}^{2}(X(\alpha),\tilde%
{f}(X(\alpha)))T_{\alpha^{*}}^{2}(X(\alpha^{*}),\tilde{f}(X(\alpha^{*}))))^{2}].
M(\alpha) is continuous and has an obvious unique minimum \alpha^{*}. Since \Theta is compact, this implies that \underset{\alpha:d(\alpha,\alpha^{*})\geq\epsilon}{\inf}\ M(\alpha)>M(\alpha^{%
*}) is satisfied (see Problem 27 p. 84 in [15]).
It remains to prove that \{m_{\alpha}:\alpha\in\Theta\} is a GlivenkoCantelli class. Thanks to the remark following the proof of Theorem 5.9 in [15], this is an easy consequence of the continuity of \alpha\to m_{\alpha} and the fact that the function is bounded by a continuous and integrable function on [0,1]. Indeed, it exists at least a function f^{*} in \mathcal{F}_{\theta_{0}} which bounds every other functions, and two constants K_{1}>0,K_{2}>0 such that
m_{\alpha}(x)\leq K_{1}(f^{*}(x)+K_{2})^{2},
and
\underset{\alpha\in\Theta}{\sup}\ M_{N}(\alpha)M(\alpha)\xrightarrow[N\to+% \infty]{\mathbb{P}}\ 0.\\  (14) 
The result follows from the Theorem 5.7 in [15].
ii) The Central Limit Theorem will be a consequence of Theorem 5.23 in [15]. Recall that
m_{\alpha}(x)=[f(x)\lambda((x(\alpha)1)\sin\theta+\cos\theta\tilde{f}(x(% \alpha)))]^{2}.
By the Implicit Function Theorem, that is easy to see that \alpha\to x(\alpha) is C^{1} on a compact set. This implies that the norm of the gradient of m_{\alpha} is uniformly bounded in \alpha. Hence \exists\dot{\phi}(x)\in L^{1} such that \nabla_{\alpha}m_{\alpha}(x)\leq\dot{\phi}(x) hence
m_{\alpha_{1}}(x)m_{\alpha_{2}}(x)\leq\dot{\phi}(x)\times\alpha_{1}% \alpha_{2}.
In order to give an explicit formula for the limit variance, we apply the results of Example 5.27 in [15] where f_{\theta} becomes in our case T_{\alpha}^{2} and hence, we have
\sqrt{N}(\hat{\alpha}_{N}^{j}\alpha_{j}^{*})\xrightarrow[N\to+\infty]{% \mathcal{L}}\mathcal{N}(0,\Gamma_{\alpha_{j}^{*}}),\\ 
with \Gamma_{\alpha_{j}^{*}}=V_{\alpha_{j}^{*}}^{1}2\sigma^{2} and V_{\alpha_{j}^{*}}=2\mathbb{E}[\dot{T}_{\alpha_{j}^{*}}^{2}\dot{T}_{\alpha_{j}% ^{*}}^{2\textbf{T}}].
Proof of Theorem 4
i) To prove the consistency of \hat{\hat{\alpha}}_{N} we have to show that
\underset{\alpha\in\Theta}{\sup}\hat{M}_{N}(\alpha)M(\alpha)\xrightarrow[N% \to+\infty]{\mathbb{P}}\ 0.
We have,
\underset{\alpha\in\Theta}{\sup}\hat{M}_{N}(\alpha)M(\alpha)\leq\underset{% \alpha\in\Theta}{\sup}\hat{M}_{N}(\alpha)M_{N}(\alpha)+\underset{\alpha\in% \Theta}{\sup}M_{N}(\alpha)M(\alpha).
It has been shown in the proof of Theorem 3 that
\underset{\alpha\in\Theta}{\sup}M_{N}(\alpha)M(\alpha)\xrightarrow[N\to+% \infty]{\mathbb{P}}\ 0.
It remains to prove that
\underset{\alpha\in\Theta}{\sup}\hat{M}_{N}(\alpha)M_{N}(\alpha)% \xrightarrow[N\to+\infty]{\mathbb{P}}\ 0.
To ease the notation, we write T_{\alpha}^{2}(i,N)=T_{\alpha}^{2}(X_{i}(\alpha,N),\tilde{f}_{N}(X_{i}(\alpha,% N))), and T_{\alpha}^{2}(i)=T_{\alpha}^{2}(X_{i}(\alpha),\tilde{f}(X_{i}(\alpha))). Set
\begin{split}\displaystyle D_{N}(\alpha)&\displaystyle=M_{N}(\alpha)\hat{M}_{% N}(\alpha)\\ &\displaystyle=\frac{1}{N}\sum_{i=1}^{N}2y(X_{i})[T_{\alpha}^{2}(i,N)T_{% \alpha}^{2}(i)]\frac{1}{N}\sum_{i=1}^{N}[T_{\alpha}^{2}(i,N)T_{\alpha}^{2}(i% )][T_{\alpha}^{2}(i,N)+T_{\alpha}^{2}(i)].\\ \end{split} 
As f and T_{\alpha}^{2} are continuous and bounded on \Theta\times[0,1], this implies that:
\begin{split}\displaystyleD_{n}(\alpha)&\displaystyle\leq\frac{1}{N}\sum_{i% =1}^{N}2f(X_{i})[T_{\alpha}^{2}(i,N)T_{\alpha}^{2}(i)]+\frac{1}{N}\sum_{i=1% }^{N}[T_{\alpha}^{2}(i,N)T_{\alpha}^{2}(i)][T_{\alpha}^{2}(i,N)+T_{\alpha}^{2% }(i)]\\ &\displaystyle\leq K(\frac{1}{N}\sum_{i=1}^{N}[T_{\alpha}^{2}(i,N)T_{\alpha}^% {2}(i)]^{2})^{\frac{1}{2}}\\ &\displaystyle\leq K^{\prime}(\frac{1}{N}\sum_{i=1}^{N}[(X_{i}(\alpha,N)X_{i}% (\alpha))(1+C)+(\tilde{f}_{N}(X_{i}(\alpha))\tilde{f}(X_{i}(\alpha))]^{2})^{% \frac{1}{2}}.\\ \end{split} 
By construction, there exists j such that X_{(j)}\leq X_{i}(\alpha)\leq X_{(j+1)}, and X_{(j)}\leq X_{i}(\alpha,N)\leq X_{(j+1)} which leads to:
X_{i}(\alpha,N)X_{i}(\alpha)=\gamma(X_{(j+1)}X_{(j)}).\\ 
Besides, since there exists \gamma^{\prime}>0 such that
\tilde{f}_{N}(X_{i}(\alpha))=\gamma^{\prime}\tilde{f}(X_{(j+1)})+(1\gamma^{%
\prime})\tilde{f}(X_{(j)}) we have
\begin{split}\displaystyle\tilde{f}_{N}(X_{i}(\alpha))\tilde{f}(X_{i}(\alpha% ))&\displaystyle\leq\gamma^{\prime}\tilde{f}(X_{(j+1)})\tilde{f}(X_{(j)})% \\ &\displaystyle\leq C\gamma^{\prime}X_{(j+1)}X_{(j)},\\ \end{split} 
and
\displaystyleD_{n}(\alpha)  \displaystyle\leq K^{\prime}(\frac{1}{N}\sum_{j=1}^{N}[X_{(j+1)}X_{(j)}]^{2})% ^{\frac{1}{2}}  
\displaystyle\underset{\alpha\in\Theta}{\sup}D_{n}(\alpha)  \displaystyle\leq K^{\prime}(\frac{1}{N}\sum_{j=1}^{N}\underset{\alpha\in% \Theta}{\sup}[X_{(j+1)}X_{(j)}]^{2})^{\frac{1}{2}}  
\displaystyle\leq K^{\prime}\underset{j}{\sup}X_{(j+1)}X_{(j)}. 
By Lemma 1 \mathbb{P}(K^{\prime}\underset{j}{\sup}X_{(j+1)}X_{(j)}\geq\epsilon)%
\xrightarrow[N\to+\infty]{}0. Hence D_{N} is bounded by an integrable and continuous function which goes to 0 in probability on \Theta
\underset{\alpha\in\Theta}{\sup}\hat{M}_{N}(\alpha)M(\alpha)\xrightarrow[N% \to+\infty]{\mathbb{P}}\ 0.
So we may conclude.
ii) First, we use that \sqrt{N}(\hat{\hat{\alpha}}_{N}\alpha^{*})=\sqrt{N}(\hat{\hat{\alpha}}_{N}%
\hat{\alpha}_{N})+\sqrt{N}(\hat{\alpha}_{N}\alpha^{*}). By Theorem 3, \sqrt{N}(\hat{\alpha}_{N}\alpha^{*})\xrightarrow[N\to+\infty]{\mathcal{L}}%
\mathcal{N}(0,\Gamma_{\alpha^{*}}) with \Gamma_{\alpha^{*}} defined in (6). It remains to prove that \sqrt{N}(\hat{\hat{\alpha}}_{N}\hat{\alpha})\xrightarrow[N\to+\infty]{\mathbb%
{P}}0.
Using the same arguments as in the proof i), we have
\mathbb{P}(\sqrt{N}\underset{\alpha\in\Theta}{\sup}\hat{M}_{N}(\alpha)M_{N}(% \alpha)\geq\epsilon)\leq\mathbb{P}(K\sqrt{N}\underset{\alpha\in\Theta}{\sup}% X_{(j+1)}X_{(j)}\geq\epsilon)  (15) 
The right hand side of (15) converges to 0 by Lemma 1. This implies that \sqrt{N}(\hat{\hat{\alpha}}_{N}\hat{\alpha})\xrightarrow[N\to+\infty]{\mathbb%
{P}}0.
5 Perspectives and conclusion
One of the main quality of our approach is that it is easy to implement and execute. The cost function being simple, we use a BFGS algorithm to find the optimal parameters, and because of the regularity of curves we deal with, the initial points for optimization can be easily defined.
Furthermore, the search of the coordinate of the reference curve which is sent to the coordinate of the curve to fit can be easily implemented with a simple value search.
Besides, the deformation parameters can be exploited through an explainable model such as the linear model used in the real world problem.
It seems that the deformation model is robust if the noise is controlled. An interesting extension of this work would be to study what is going on when the reference curve is noisy. A generalization of this work to less regular functions would be worthwhile. Finally, it would be interesting to include in the model a way to handle discontinuities in order to reduce the dimension and have a more global representation of the deformation.
References
 [1] Allassonniere, S., Bigot, J., Glaunes, J., Maire, F., F., R.: Statistical models for deformable templates in image and shape analysis. Annales Mathematiques Blaise Pascal 20(1), 1–35 (2013)
 [2] Barron, A.R., Cohen, A., Dahmen, W., DeVore, R.A.: Approximation and learning by greedy algorithms. Ann. Statist. 36(1), 64–94 (2008). DOI 10.1214/009053607000000631. URL https://doi.org/10.1214/009053607000000631
 [3] Bigot, J., Gadat, S.: A deconvolution approach to estimation of a common shape in a shifted curves model. Ann. Statist. 38(4), 2422–2464 (2010). DOI 10.1214/10AOS800. URL https://doi.org/10.1214/10AOS800
 [4] Bigot, J., Gamboa, F., Vimond, M.: Estimation of translation, rotation, and scaling between noisy images using the fourierâmellin transform. SIAM Journal on Imaging Sciences 2, 614–645. ISSN 29364954
 [5] Dimeglio, C., Loubes, J.M., Maza, E.: Manifold embedding for curve registration (2011). URL https://hal.archivesouvertes.fr/hal00593180. Working paper or preprint
 [6] Fournier, E., Grihon, S., Klein, T.: A case study : Influence of Dimension Reduction on regression treesbased Algorithms Predicting Aeronautics Loads of a Derivative Aircraft (2018). URL https://hal.archivesouvertes.fr/hal01700314. Working paper or preprint
 [7] Gamboa, F., Loubes, J.M., Maza, E.: Semiparametric estimation of shifts. Electron. J. Statist. 1, 616–640 (2007). DOI 10.1214/07EJS026. URL https://doi.org/10.1214/07EJS026
 [8] Grenander, U.: General pattern theory. Oxford Science Publications (1993)
 [9] Mallat, S.G., Zhang, Z.: Matching pursuits with timefrequency dictionaries. IEEE Transactions on Signal Processing 41(12), 3397–3415 (1993). DOI 10.1109/78.258082
 [10] McGuire, M.: An image registration technique for recovering rotation, scale and translation parameters. Tech. rep., NEC Tech Report (1998). URL http://casualeffects.com/research/McGuire1998ParameterRecovery/index.html
 [11] Nocedal, J., Wright, S.J.: Numerical Optimization, second edn. Springer, New York, NY, USA (2006)
 [12] Ramsay, J.O., Li, X.: Curve registration. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60(2), 351–363. DOI 10.1111/14679868.00129
 [13] Ritter, M., Dillinger, J.: Nonlinear numerical flight dynamics for the prediction of maneuver loads. IFASD 2011 pp. 1–10 (2011)
 [14] Sancetta, A.: Greedy algorithms for prediction. Bernoulli 22(2), 1227–1277 (2016). DOI 10.3150/14BEJ691. URL https://doi.org/10.3150/14BEJ691
 [15] Van der Vaart, A.W.: Asymptotic statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press (1998)