Solving Large-Scale Robust Stability Problems by Exploiting the Parallel Structure of Polya’s Theorem

# Solving Large-Scale Robust Stability Problems by Exploiting the Parallel Structure of Polya’s Theorem

## Abstract

In this paper, we propose a distributed computing approach to solving large-scale robust stability problems on the simplex. Our approach is to formulate the robust stability problem as an optimization problem with polynomial variables and polynomial inequality constraints. We use Polya’s theorem to convert the polynomial optimization problem to a set of highly structured Linear Matrix Inequalities (LMIs). We then use a slight modification of a common interior-point primal-dual algorithm to solve the structured LMI constraints. This yields a set of extremely large yet structured computations. We then map the structure of the computations to a decentralized computing environment consisting of independent processing nodes with a structured adjacency matrix. The result is an algorithm which can solve the robust stability problem with the same per-core complexity as the deterministic stability problem with a conservatism which is only a function of the number of processors available. Numerical tests on cluster computers and supercomputers demonstrate the ability of the algorithm to efficiently utilize hundreds and potentially thousands of processors and analyze systems with 100+ dimensional state-space. The proposed algorithms can be extended to perform stability analysis of nonlinear systems and robust controller synthesis.

Robust stability, Polynomial optimization, Large-scale systems, Decentralized computing

## I Introduction

This paper addresses the problem of stability of large-scale systems with several unknown parameters. Control system theory when applied in practical situations often involves the use of large state-space models, typically due to inherent complexity of the system, the interconnection of subsystems, or the reduction of an infinite-dimensional or PDE model to a finite-dimensional approximation. One approach to dealing with such large scale models has been to use model reduction techniques such as balanced truncation [1]. However, the use of model reduction techniques are not necessarily robust and can result in arbitrarily large errors. In addition to large state-space, practical problems often contain uncertainty in the model due to modeling errors, linearization, or fluctuation in the operating conditions. The problem of stability and control of systems with uncertainty has been widely studied. See, e.g. the texts [2, 3, 4, 5]. However, a limitation of existing computational methods for analysis and control of systems with uncertainty is high complexity. This is a consequence of fact that the problem of robust stability of systems with parametric uncertainty is known to be NP-hard [6, 7]. The result is that for systems with parametric uncertainty and with hundreds of states, existing algorithms will fail with the primary point of failure usually being lack of unallocated memory.

In this paper, we seek to distribute the computation laterally over an array of processors within the context of existing computational resources. Specifically, we seek to utilize cluster-computing, supercomputing and Graphics Processing Unit (GPU)-computing architectures. When designing algorithms to run in a parallel computing environment, one must both synchronize computational tasks among the processors while minimizing communication overhead among the processors. This can be difficult, as each architecture has a specific communication graph. we account for communication by explicitly modeling the required communication graph between processors. This communication graph is then mapped to the processor architecture using the Message-Passing Interface (MPI) [8]. While there are many algorithms for robust stability analysis and control of linear systems, ours is the first which explicitly accounts for the processing architecture in the emerging multi-core computing environment.

Our approach to robust stability is based on the well-established use of parameter-dependent Quadratic-In-The-State (QITS) Lyapunov functions. The use of parameter-dependent Lyapunov QITS functions eliminates the conservativity associated with e.g. quadratic stability [9], at the cost of requiring some restriction on the rate of parameter variation. Specifically, our QITS Lyapunov variables are polynomials in the vector of uncertain parameters. This is a generalization of the use of QITS Lyapunov functions with affine parameter dependence as in [10] and expanded in, e.g. [11, 12, 13, 14]. The use of polynomial QITS Lyapunov variables can be motivated by [15], wherein it is shown that any feasible parameter-dependent LMI with parameters inside a compact set has a polynomial solution or [16] wherein it is shown that local stability of a nonlinear vector field implies the existence of a polynomial Lyapunov function.

There are several results which use polynomial QITS Lyapunov functions to prove robust stability. In most cases, the stability problem is reduced to the general problem of optimization of polynomial variables subject to LMI constraints - an NP-hard problem [17]. To avoid NP-hardness, the polynomial optimization problem is usually solved in an asymptotic manner by posing a sequence of sufficient conditions of increasing accuracy and decreasing conservatism. For example, building on the result in [15][18] provides a sequence of increasingly precise LMI conditions for robust stability analysis of linear systems with affine dependency on uncertain parameters on the complex unit ball. Necessary and sufficient stability conditions for linear systems with one uncertain parameter are derived in [19], providing an explicit bound on the degree of the polynomial-type Lyapunov function. The result is extended to multi-parameter-dependent linear systems in [20]. Another important approach to optimization of polynomials is the Sum of Squares (SOS) methodology which replaces the polynomial positivity constraint with the constraint that the polynomial admits a representation as a sum of squares of polynomials [21, 22, 23, 24]. A version of this theorem for polynomials with matrix coefficients can be found in [23]. While we have worked extensively with the SOS methodology, we have not, as of yet, been able to adapt algorithms for solving the resulting LMI conditions to a parallel-computing environment. Finally, there have been several results in recent years on the use of Polya’s theorem to solve polynomial optimization problems [25] on the simplex. An extension of the Polya’s theorem for uncertain parameters on the multisimplex or hypercube can be found in [26]. The approach presented in this paper is an extension of the use of Polya’s theorem for solving polynomial optimization problems in a parallel computing environment.

The goal of this project is to create algorithms which explicitly map computation, communication and storage to existing parallel processing architectures. This goal is motivated by the failure of existing general-purpose Semi-Definite Programming (SDP) solvers to efficiently utilize platforms for large-scale computation. Specifically, it is well-established that linear programming and semi-definite programming both belong to the complexity class P-Complete, also known as the class of inherently sequential problems. Although there have been several attempts to map certain SDP solvers to a parallel computing environment [27, 28], certain critical steps cannot be distributed. The result is that as the number of processors increases, certain bottleneck computations dominate, leading a saturation in computational speed of these solvers (Amdahl’s law [29]). We avoid these bottleneck computations and communications by exploiting the particular structure of the LMI conditions associated with Polya’s theorem. Note that, in principle, a perfectly designed general-purpose SDP algorithm could identify the structure of the SDP, as we have, and map the communication, computation and memory constraints to the parallel architecture. Indeed, there has been a great deal of research on creating programming languages which attempt to do just this [30, 31]. However, at present such languages are mostly theoretical and have certainly not been incorporated into existing SDP solvers.

In addition to parallel SDP solvers, there have been some efforts to exploit structure in certain polynomial optimization algorithms to reducing the size and complexity of the resulting LMI’s. For example, in [32] symmetry was used to reduce the size of the SDP variables. Specific sparsity structure was used in [33, 34, 35] to reduce the complexity of the linear algebra calculations. Generalized approaches to the use of sparsity in SDP algorithms can be found in [34]. Groebner basis techniques [36, 37] have been used by [33] to simplify the formulation of the SDPs associated with the SOS decomposition problems.

The paper is organized around two independent problems: setting up the sequence of structured SDPs associated with Polya’s theorem and solving them. Note that the problem of decentralizing the set-up algorithm is significant in that for large-scale systems, the instantiation of the problem may be beyond the memory and computational capacity of a single processing node. For the set-up problem, the algorithm that we propose has no centralized memory or computational requirements whatsoever. Furthermore, if a sufficient number of processors are available, the number of messages does not change with the size of the state-space or the number of Polya’s iterations. In addition, the ideal communication architecture for the set-up algorithm does not correspond to the communication structure of GPU computing or supercomputing. In the second problem, we propose a variant of a standard SDP primal-dual algorithm and map the computational, memory and communication requirements to a parallel computing environment. Unlike the set-up algorithm, the primal-dual algorithm does have a small centralized component corresponding to the update of the set of dual variables. However, we have structured the algorithm so that the size of this dual computation is solely a function of the degree of the polynomial QITS Lyapunov function and does not depend on the number of Polya’s iterations, meaning that the sequence of algorithms has fixed centralized computational and communication complexity. In addition, there is no communication between processors, which means that the algorithm is well suited to most parallel computing architectures. A graph representation of the communication architecture of both the set-up and SDP algorithms has also been provided in the relevant sections.

Combining the set-up and SDP components and testing the result of both in cluster computing environments, we demonstrate the capability of robust analysis and control of systems with 100+ states and several uncertain parameters. Specifically, we ran a series of numerical experiments using a local Linux cluster and the Blue Gene supercomputer (with 200 processor allocation). First, we applied the algorithm to a current problem in robust stability analysis of magnetic confinement fusion using a discretized PDE model. Next, we examine the accuracy of the algorithm as Polya’s iterations progress and compare this accuracy with the SOS approach. We show that unlike the general-purpose parallel SDP solver SDPARA [28], the speed-up - the increase in processing speed per additional processor - of our algorithm shows no evidence of saturation. Finally, we calculate the envelope of the algorithm on the Linux cluster in terms of the maximum state-space dimension, number of processors and Polya’s iterations.

NOTATION

We represent variate monomials as , where is the vector of variables and is the vector of exponents and is the degree of the monomial. We define as the totally ordered set of the exponents of variate monomials of degree , where the ordering is lexicographic. In lexicographical ordering precedes , if the left most non-zero entry of is positive. The lexicographical index of every can be calculated using the map defined as [38]

 ⟨γ⟩=l−1∑j=1γi∑i=1f\bbl(l−j,d+1−j−1∑k=1γk−i\bbr)+1, (1)

where as in [39]

 f(l,d):=⎧⎪⎨⎪⎩0forl=0(l+d−1l−1)=(d+l−1)!d!(l−1)!forl>0, (2)

is the cardinality of , i.e., the number of variate monomials of degree . For convenience, we also define the index of a monomial to be . We represent variate homogeneous polynomials of degree as

 P(α)=∑γ∈WdpP⟨γ⟩αγ, (3)

where is the matrix coefficient of the monomial . We denote the element corresponding to the row and column of matrix as . The subspace of symmetric matrices in is denoted by . We define a basis for as

 [Ek]i,j:={1if i=j=k0otherwise,fork≤nand
 [Ek]i,j:=[Fk]i,j+[Fk]Ti,j,fork>n, (4)

where

 [Fk]i,j:={1if i=j−1=k−n0otherwise. (5)

Note that this choice of basis is arbitrary - any other basis could be used. However, any change in basis would require modifications to the formulae defined in this paper. The canonical basis for is denoted by for , where The vector with all entries equal to one is denoted by . The trace of is denoted by . The block-diagonal matrix with diagonal blocks is denoted or occasionally as . The identity and zero matrices are denoted by and .

## Ii Preliminaries

Consider the linear system

 ˙x(t)=A(α)x(t), (6)

where and is a vector of uncertain parameters. In this paper, we consider the case where is a homogeneous polynomial and where is the unit simplex, i.e.,

 Δl={α∈Rl,l∑i=1αi=1,αi⩾0}. (7)

If is not homogeneous, we can obtain an equivalent homogeneous representation in the following manner. Suppose is a non-homogeneous polynomial with , is of degree and has monomials with non-zero coefficients. Define , where is the degree of the monomial of according to lexicographical ordering. Now define the polynomial as per the following.

1. Let .

2. For , multiply the monomial of , according to lexicographical ordering, by .

Then, since , for all and hence all properties of are retained by the homogeneous system .

1) Example: Construction of the homogeneous system .

Consider the non-homogeneous polynomial of degree , where . Using the above procedure, the homogeneous polynomial can be constructed as

 B(α)=Cα21+Dα2(α1+α2+α3)+Eα3(α1+α2+α3) +F(α1+α2+α3)2=(C+FB1)α21+(D+2F)B2α1α2 +(E+2F)B3α1α3+(D+F)B4α22+(D+E+2F)B5α2α3 +(E+F)B6α23=∑γ∈W2B⟨γ⟩αγ. (8)

The following is a stability condition [25] for System (6). {thm} System (6) is stable if and only if there exists a polynomial matrix such that and

 AT(α)P(α)+P(α)A(α)≺0 (9)

for all . A similar condition also holds for discrete-time linear systems. The conditions associated with Theorem II are infinite-dimensional LMIs, meaning they must hold at infinite number of points. Such problems are known to be NP-hard [17]. In this paper we derive a sequence of polynomial-time algorithms such that their outputs converge to the solution of the infinite-dimensional LMI. Key to this result is Polya’s Theorem [40]. A variation of this theorem for matrices is given as follows.

{thm}

(Polya’s Theorem) The homogeneous polynomial for all if and only if for all sufficiently large ,

 (l∑i=1αi)dF(α) (10)

has all positive definite coefficients.

Upper bounds for Polya’s exponent can be found as in [41]. However, these bounds are based on the properties of and are difficult to determine a priori. In this paper, we show that applying Polya’s Theorem to the robust stability problem, i.e., the inequalities in Theorem II yields a semi-definite programming condition with an efficiently distributable structure. This is discussed in the following section.

## Iii Problem Set-Up

In this section, we show how Polya’s theorem can be used to determine the robust stability of an uncertain system using linear matrix inequalities with a distributable structure.

### Iii-a Polya’s Algorithm

We consider the stability of the system described by Equation (6). We are interested in finding a which satisfies the conditions of Theorem II. According to Polya’s theorem, the constraints of Theorem II are satisfied if for some sufficiently large and , the polynomials

 (l∑i=1αi)d1P(α)and (11)
 −(l∑i=1αi)d2(AT(α)P(α)+P(α)A(α)) (12)

have all positive definite coefficients.

Let be a homogeneous polynomial of degree which can be represented as

 P(α)=∑γ∈WdpP⟨γ⟩αγ, (13)

where the coefficients and where we recall that is the set of the exponents of all -variate monomials of degree . Since is a homogeneous polynomial of degree , we can write it as

 A(α)=∑γ∈WdaA⟨γ⟩αγ, (14)

where the coefficients . By substituting (13) and (14) into (11) and (12) and defining as the degree of , the conditions of Theorem II can be represented in the form

 ∑h∈Wdpβ⟨h⟩,⟨γ⟩P⟨h⟩≻0;γ∈Wdp+d1and (15)
 ∑h∈Wdp(HT⟨h⟩,⟨γ⟩P⟨h⟩+P⟨h⟩H⟨h⟩,⟨γ⟩)≺0;γ∈Wdpa+d2. (16)

Here is defined to be the scalar coefficient which multiplies in the -th monomial of the homogeneous polynomial using the lexicographical ordering. Likewise is the term which left or right multiplies in the -th monomial of using the lexicographical ordering. For an intuitive explanation as to how these and terms are calculated, we consider a simple example. Precise formulae for these terms will follow the example.

1) Example: Calculating the and coefficients.

Consider and . By expanding Equation (11) for we have

 (α1+α2)P(α)=P1α21+(P1+P2)α1α2+P2α22. (17)

The terms are then extracted as

 β1,1=1,β2,1=0,β1,2=1,β2,2=1,β1,3=0,β2,3=1. (18)

Next, by expanding Equation (12) for we have

 (α1+α2)(AT(α)P(α)+P(α)A(α))=(AT1P1+P1A1)α31 +(AT1P1+P1A1+AT2P1+P1A2+AT1P2+P2A1)α21α2 +(AT2P1+P1A2+AT1P2+P2A1+AT2P2+P2A2)α1α22 +(AT2P2+P2A2)α32. (19)

The terms are then extracted as

 H1,1=A1, H2,1=0, H1,2=A1+A2, H2,2=A1, H1,3=A2, H2,3=A1+A2, H1,4=0, H2,4=A2. (20)

2) General Formula: The can be formally defined recursively as follows. Let the initial values for be defined as

 β(0)⟨h⟩,⟨γ⟩={1if% h=γ0otherwiseforγ∈Wdpandh∈Wdp. (21)

Then, iterating for , we let

 β(i)⟨h⟩,⟨γ⟩=∑λ∈W1β(i−1)⟨h⟩,⟨γ−λ⟩% forγ∈Wdp+iandh∈Wdp. (22)

Finally, we set . To obtain , set the initial values as

 H(0)⟨h⟩,⟨γ⟩=∑λ∈Wda:λ+h=γA⟨λ⟩forγ∈Wdp+daandh∈Wdp. (23)

Then, iterating for , we let

 H(i)⟨h⟩,⟨γ⟩=∑λ∈W1H(i−1)⟨h⟩,⟨γ−λ⟩forγ∈Wdpa+iandh∈Wdp. (24)

Finally, set .

For the case of large-scale systems, computing and storing and is a significant challenge due to the number of these coefficients. Specifically, the number of terms increases with (number of uncertain parameters in system (6)), (degree of ), (degree of ) and (Polya’s exponents) as follows.

3) Number of coefficients: For given and , since and , the number of coefficients is the product of and . Recall that card is the number of all -variate monomials of degree and can be calculated using (2) as follows.

 L0=f(l,dp)=⎧⎪⎨⎪⎩0forl=0(dp+l−1l−1)=(dp+l−1)!dp!(l−1)!forl>0. (25)

Likewise, card, i.e., the number of all variate monomials of degree is calculated using (2) as follows.

 L=f(l,dp+d1)= ⎧⎪⎨⎪⎩0forl=0(dp+d1+l−1l−1)=(dp+d1+l−1)!(dp+d1)!(l−1)!forl>0. (26)

The number of coefficients is .

4) Number of coefficients: For given and , since and , the number of coefficients is the product of and . By using (2), we have

 M=f(l,dpa+d2)= ⎧⎪⎨⎪⎩0forl=0(dpa+d2+l−1l−1)=(dpa+d2+l−1)!(dpa+d2)!(l−1)!forl>0. (27)

The number of coefficients is .

The number of and coefficients and the required memory to store these coefficients are shown in Figs. 1 and 2 in terms of the number of uncertain parameters and for different Polya’s exponents. In all cases .

It is observed from Fig. 2 that, even for small and , the required memory is in the Terabyte range. In [38], we proposed a decentralized computing approach to the calculation of on large cluster computers. In the present work, we extend this method to the calculation of and the SDP elements which will be discussed in the following section. We express the LMIs associated with conditions (15) and (16) as an SDP in both primal and dual forms. We also discuss the structure of the primal and dual SDP variables and the constraints.

### Iii-B SDP Problem Elements

A semi-definite programming problem can be stated either in primal or dual format. Given , and , the primal problem is of the form

 maxXtr(CX)
 subject toa−B(X)=0
 X⪰0, (28)

where the linear operator is defined as

 B(X)=[tr(B1X)tr(B2X)⋯tr(BKX)]T. (29)

is the primal variable. Given a primal SDP, the associated dual problem is

 miny,ZaTy
 subject toBT(y)−C=Z
 Z⪰0,y∈RK, (30)

where is the transpose operator and is given by

 BT(y)=K∑i=1yiBi (31)

and where and are the dual variables. The elements , and of the SDP problem associated with the LMIs in (15) and (16) are defined as follows. We define the element as

 C:=diag(C1,⋯CL,CL+1,⋯CL+M), (32)

where

 Ci:={δIn⋅(∑h∈Wdpβ⟨h⟩,idp!h1!⋯hl!),1≤i≤L0n,L+1≤i≤L+M, (33)

where recall that is the number of monomials in , is the number of monomials in , where is the dimension of system (6), is the number of uncertain parameters and is a small positive parameter.

For , define elements as

 Bi:=diag(Bi,1,⋯Bi,L,Bi,L+1,⋯Bi,L+M), (34)

where is the number of dual variables in (30) and is equal to the product of the number of upper-triangular elements in each (the coefficients in ) and the number of coefficients in (i.e. the cardinality of ). Since there are coefficients in and each coefficient has upper-triangular elements, we find

 K=(dp+l−1)!dp!(l−1)!~N. (35)

To define the blocks, first we define the function ,

 V⟨h⟩(x):=~N∑j=1Ejxj+~N(⟨h⟩−1)for allh∈Wdp, (36)

which maps each variable to a basis matrix , where recall that is the basis for . Note that a different choice of basis would require a different function . Then for ,

 Bi,j:= ⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩∑h∈Wdpβ⟨h⟩,jV⟨h⟩(ei),1≤j≤L(I)−∑h∈Wdp(HT⟨h⟩,j−LV⟨h⟩(ei)+V⟨h⟩(ei)H⟨h⟩,j−L),L+1≤j≤L+M.(II) (37)

Finally, to complete the SDP problem associated with Polya’s algorithm set

 a=→1∈RK. (38)

### Iii-C Parallel Set-up Algorithm

In this section, we propose a decentralized, iterative algorithm for calculating the terms , , and as defined in (22), (24), (32) and (34). The algorithm has been implemented in C++, using MPI (Message Passing Interface) and is available at: www.sites.google.com/a/asu.edu/kamyar/software. We present an abridged description of this algorithm in Algorithm 1, wherein is the number of available processors.

Note that we have only addressed the problem of robust stability analysis, using the polynomial inequality

 P(α)≻0,AT(α)P(α)+P(α)A(α)≺0

for . However, we can generalize the decentralized set-up algorithm to consider a more general class of feasibility problems, i.e.,

 ^N∑i=1(~Ai(α)~X(α)~Bi(α)+~BTi(α)~X(α)~ATi(α)+Ri(α))≺0 (41)

for . One motivation behind the development of such generalized set-up algorithm is that the parameter-dependent versions of the LMIs associated with and synthesis problems in [42, 43] can be formulated in the form of (41).

### Iii-D Set-up algorithm: Complexity Analysis

Since checking the positive definiteness of all representatives of a square matrix with parameters on proper real intervals is intractable [7], the question of feasibility of (9) is also intractable. To solve the problem of inherent intractability we establish a trade off between accuracy and complexity. In fact, we develop a sequence of decentralized polynomial-time algorithms whose solutions converge to the exact solution of the NP-hard problem. In other words, the translation of a polynomial optimization problem to an LMI problem is the main source of complexity. This high complexity is unavoidable and, in fact, is the reason we seek parallel algorithms.

Algorithm 1 distributes the computation and storage of and among the processors and their dedicated memories, respectively. In an ideal case, where the number of available processors is sufficiently large (equal to the number of monomials in , i.e. ) only one monomial ( of and of ) are assigned to each processor.

1) Computational complexity analysis: The most computationally expensive part of the set-up algorithm is the calculation of the blocks in (37). Considering that the cost of matrix-matrix multiplication is , the cost of calculating each block is According to (34) and (37), the total number of blocks is . Hence, as per Algorithm 1, each processor processes of the blocks, where is the number of available processors. Thus the per processor computational cost of calculating the at each Polya’s iteration is

 ∼card(Wdp)⋅n3⋅K(floor(LN)+floor(MN)). (42)

By substituting for from (35), card from (25), from (26) and from (27), the per processor computation cost at each iteration is

 ∼((dp+l−1)!dp!(l−1)!)2n42(n+1)⎛⎜ ⎜ ⎜ ⎜ ⎜⎝floor⎛⎜ ⎜ ⎜ ⎜ ⎜⎝(dp+d1+l−1)!(dp+d1)!(l−1)!N⎞⎟ ⎟ ⎟ ⎟ ⎟⎠+floor⎛⎜ ⎜ ⎜ ⎜ ⎜⎝(dpa+d2+l−1)!(dpa+d2)!(l−1)!N⎞⎟ ⎟ ⎟ ⎟ ⎟⎠1212⎞⎟ ⎟ ⎟ ⎟ ⎟⎠ (43)

assuming that and . For example, for the case of large-scale systems (large and ), the computation cost per processor at each iteration is having processors, having processors and having processors. Thus for the case where , the number of operations grows more slowly in than in .

2) Communication complexity analysis: Communication between processors can be modeled by a directed graph , where the set of nodes is the set of indices of the available processors and the set of edges is the set of all pairs of processors that communicate with each other. For every directed graph we can define an adjacency matrix . If processor communicates with processor , then , otherwise . In this section, we only define the adjacency matrix for the part of the algorithm that performs Polya’s iterations on . For Polya’s iterations on , the adjacency matrix can be defined in a similar manner. For simplicity, we assume that at each iteration, the number of available processors is equal to the number of monomials in