Probabilistic analysis of Wiedemann’s algorithm for minimal polynomial computation Research supported by National Science Foundation Grants CCF-1018063 and CCF-1016728

# Probabilistic analysis of Wiedemann’s algorithm for minimal polynomial computation ††thanks: Research supported by National Science Foundation Grants CCF-1018063 and CCF-1016728

Gavin Harrison
Drexel University
Jeremy Johnson
Drexel University
B. David Saunders
University of Delaware,
doi:10.1016/j.jsc.2015.06.005
###### Abstract

Blackbox algorithms for linear algebra problems start with projection of the sequence of powers of a matrix to a sequence of vectors (Lanczos), a sequence of scalars (Wiedemann) or a sequence of smaller matrices (block methods). Such algorithms usually depend on the minimal polynomial of the resulting sequence being that of the given matrix. Here exact formulas are given for the probability that this occurs. They are based on the generalized Jordan normal form (direct sum of companion matrices of the elementary divisors) of the matrix. Sharp bounds follow from this for matrices of unknown elementary divisors. The bounds are valid for all finite field sizes and show that a small blocking factor can give high probability of success for all cardinalities and matrix dimensions.

## 1 Introduction

The minimal polynomial of a matrix may be viewed as the minimal scalar generating polynomial of the linearly recurrent sequence of powers of . Wiedemann’s algorithm (Wiedemann, 1986) projects the matrix sequence to a scalar sequence ), where . The vectors are chosen at random. The algorithm continues by computing the minimal generating polynomial of which, with high probability, is the minimal polynomial of . Block Wiedemann algorithms (Coppersmith, 1995; Eberly et al., 2006; Kaltofen, 1995; Villard, 1997, 1999) fatten to matrix having several rows and to a matrix having multiple columns, so that the projection is to a sequence of smaller matrices, , where, for chosen block size , are uniformly random matrices of shape and , respectively. A block Berlekamp/Massey algorithm is then used to compute the matrix minimal generating polynomial of B (Kaltofen and Yuhasz, 2013; Giorgi et al., 2003), and from it the minimal scalar generating polynomial. All of the algorithms based on these random projections rely on preservation of some properties, including at least the minimal generating polynomial. In this paper we analyze the probability of preservation of minimum polynomial under random projections for a matrix over a finite field.

Let and let denote the probability that for uniformly random and . is the focus of this paper and this notation will be used throughout. Our analysis proceeds by first giving exact formulas for in terms of field cardinality , projected dimension , and the elementary divisors of . Let , a function of field cardinality, , projected block size, , and the matrix dimension, . Building from our formula for , we give a means to compute precisely and hence to derive a sharp lower bound. Our bound is less pessimistic than earlier ones such as (Kaltofen and Saunders, 1991; Kaltofen, 1995) which primarily apply when the field is large.

Even for cardinality 2, we show that a modest block size (such as b = 22) assures high probability of preserving the minimal polynomial. A key observation is that when the cardinality is small the number of low degree irreducible polynomials is also small. Wiedemann (1986) used this observation to make a bound for probability of minimal polynomial preservation in the non-blocked algorithm. Here, we have exact formulas for which are worst when the irreducibles in the elementary divisors of are as small as possible. Combining that with information on the number of low degree irreducibles, we obtain a sharp lower bound for the probability of minimal polynomial preservation for arbitrary matrix (when the elementary divisors are not known a priori).

Every square matrix, , over a finite field is similar over to its generalized Jordan normal form, , a block diagonal direct sum of the Jordan blocks of its elementary divisors, which are powers of irreducible polynomials in . and have the same distribution of random projections. Thus we may focus attention on matrices in Jordan form. After section 2 on basic definitions concerning matrix structure and linear recurrent sequences, the central result, theorem 16 is the culmination of section 3 where probability of preserving the minimal polynomial for a matrix of given elementary divisors is analyzed. Examples immediately following theorem 16 illustrate the key issues. The exact formulation of the probability of minimal polynomial preservation in terms of matrix, field, and block sizes is our main result, theorem 20, in section 4. It’s corollaries provide some simplified bounds. Section 4.2, specifically figure 1, illustrates practical applicability. We finish with concluding remarks, section 5.

## 2 Definitions and Jordan blocks

Let be the vector space of matrices over , and the vector space of sequences of matrices over . For a sequence and polynomial , define as the sequence whose -th term is . This action is a multiplicative group action of on , because for and for and . Further, if we say annihilates . In this case, is completely determined by and its leading coefficient matrices . Then is said to be linearly generated, and is also called a generator of . Moreover, for given , the set of polynomials that generate is an ideal of . Its unique monic generator is called the minimal generating polynomial, or just minimal polynomial of and is denoted . In particular, the ideal of the whole of is generated by 1 and, acting on sequences, generates only the zero sequence. For a square matrix , the minimal polynomial of the sequence is also called the minimal polynomial of . .

We will consider the natural transforms of sequences by matrix multiplication on either side. For over and for over For any polynomial , it follows from the definitions that . It is easy to see that the generators of also generate and , so that and

More specifically, we are concerned with random projections, , of a square matrix , where are uniformly random, . By uniformly random, we mean that each of the (finitely many) matrices of the given shape is equally likely.

###### Lemma 1.

Let be similar square matrices over and let be any block size. Then . In particular, where is the generalized Jordan form of .

###### Proof.

Suppose and are similar, so that , for a nonsingular matrix . The projection of is the projection of . But when are uniformly random variables, then so are and , since the multiplications by and are bijections. ∎

Thus, without loss of generality, in the rest of the paper we will restrict attention to matrices in generalized Jordan normal form. We describe our notation for Jordan forms next.

The companion matrix of a monic polynomial of degree is

is the Jordan block corresponding to , a matrix. It is standard knowledge that the minimal polynomial of is . When , .

In particular, we use these basic linear algebra facts: For irreducible , (1) is zero everywhere except in the lowest leftmost block where it is a nonsingular polynomial in (see, for example, Robinson (1970)), and (2) the Krylov matrix is nonsingular unless .

Generalized Jordan normal forms are (block diagonal) direct sums of primary components,

 J=⨁i⨁jJfei,ji,

where the are distinct irreducibles and the are positive exponents, nonincreasing with respect to . Every matrix is similar to a generalized Jordan normal form, unique up to order of blocks.

## 3 Probability Computation, Matrix of Given Structure

Recall our definition that, for , denotes the probability that minimal polynomial is preserved under projection to , i.e., for uniformly random and . For the results of this paper the characteristic of the field is not important. However the cardinality is a key parameter in the results. For simplicity, we are restricting to projection to square blocks. It is straightforward to adjust these formulas to the case of rectangular blocking.

By lemma 1, we may assume that the given matrix is in generalized Jordan form, which is a block diagonal matrix. The projections of a block diagonal matrix are sums of independent projections of the blocks. In other words, for the projection of let be the blocks of columns of and rows of conformal with the block sizes of the . Then . In additionto this observation the particular structure of the Jordan form is utilized.

In subsection 3.1 we show that the probability may be expressed in terms of for the primary components, , associated with the distinct irreducible factors of the minimal polynomial of . This is further reduced to the probability for a direct sum of companion matrices in 3.2.1. Finally, the probability for is calculated in 3.2.2 by reducing it to the probability that a sum of rank 1 matrices over the extension field is zero. In consequence we obtain a formula for in theorem 16. Examples Examples illustrating theorem 16 are given in subsection 3.3.

### 3.1 Reduction to Primary Components

Let where the are distinct irreducible polynomials and the are positive exponents, nonincreasing with respect to . In this section, we show that

 \boldmathPq,b(A)=∏i\boldmathPq,b(⨁jJfei,ji).
###### Lemma 2.

Let and be linearly generated matrix sequences. Then .

###### Proof.

Let , and . The lemma follows from the observation that

 (fg/d)(S+T)=(fg/d)(S)+(fg/d)(T)=(g/d)(f(S))+(f/d)(g(T))=0.

As an immediate corollary we get equality when and are relatively prime.

###### Corollary 3.

Let and be linearly generated matrix sequences with and such that . Then .

###### Proof.

By the previous lemma, with and . We show that and . Under our assumptions, so that is a generator of . But if is a proper divisor of , then is not in the ideal generated by , a contradiction. Similarly must equal . ∎

###### Theorem 4.

Let where the are distinct irreducibles and the are positive exponents, nonincreasing with respect to . Then, .

###### Proof.

Let , and , where are blocks of conforming to the dimensions of the blocks of . Then, . Let . Because and all are unique irreducibles, then when . Therefore, by corollary 3, . Therefore if and only if for all , and . ∎

### 3.2 Probability for a Primary Component

Next we calculate , where is an irreducible polynomial and are positive integers. We begin with the case of a single Jordan block before moving on to the case of a direct sum of several blocks.

Consider the Jordan block determined by an irreducible power, . is independent of . Thus, . This fact and are the subject of the next lemma.

###### Theorem 5.

Given a finite field , an irreducible polynomial of degree , an exponent , and a block size , let be the Jordan block of and let be the sequence . For and the following properties of minimal polynomials hold.

1. If the entries of are uniformly random in , then

 Prob(fe=minpoly(¯JV))=1−1/qdb.

Note that the probability is independent of .

2. If is fixed and the entries of are uniformly random in , then

 Prob(minpoly(¯JV)=minpoly(U¯JV))≥1−1/qdb,

with equality if .

3. If and are both uniformly random, then

 \boldmathPq,b(J)=Prob(fe=minpoly(U¯JV))=(1−1/qdb)2=\boldmathPq,b(Cf).
###### Proof.

For parts 1 and 2, let be the lower left block of . is nonzero and all other parts of are zero. Note that , the set of polynomials in the companion matrix , is isomorphic to . Since is nonzero and a polynomial in , it is nonsingular. Since for any polynomial and matrix one has , the lower left blocks of the sequence form the sequence .

Part 1. is zero except in its lower rows which are , where is the top rows of . This sequence is nonzero with minimal polynomial unless which has probability .

Part 2. If the inequality is trivially true. For , is zero except in its lower left corner , where is the top rows of and is the rightmost columns of . Since is nonsingular, is uniformly random and the question is reduced to the case of projecting a companion matrix.

Let for irreducible of degree . For nonzero , is nonzero and has minpoly . We must show that if is nonzero then also has minpoly . Let be a nonzero column of . The Krylov matrix has as it’s columns the first vectors of the sequence . Since is nonzero, this Krylov matrix is nonsingular and implies . Thus, for any nonzero vector , we have so that, for nonzero , the sequence is nonzero and has minimal polynomial as needed. Of the possible , only fails to preserve the minimal polynomial.

Part 3. By parts 1 and 2, we have probability of preservation of minimum polynomial , first at right reduction by to the sequence and then again the same probability at the reduction by to block sequence . Therefore, . ∎

#### 3.2.1 Reduction to a Direct Sum of Companion Matrices

Consider the primary component , for irreducible , and let . We reduce the question of projections preserving minimal polynomial for to the corresponding question for direct sums of the companion matrix , which is then addressed in the next section.

###### Lemma 6.

Let , where is irreducible, and are positive integers. Let . Let be the number of equal to . Then,

 \boldmathPq,b(J)=\boldmathPq,b(s⨁i=1Cf).
###### Proof.

The minimal polynomial of is and that of is . A projection preserves minimal polynomial if and only if has minimal polynomial . For all we have , so it suffices to consider direct sums of Jordan blocks for a single (highest) power .

Let be the Jordan block for , and let . A projection is successful if it has the same minimal polynomial as . This is the same as saying the minimal polynomial of is . We have

 fe−1(U¯AV)=Ufe−1(¯A)V=s∑i=1Uife−1(¯Je)Vi=s∑i=1Ui,e¯Cf~Vi,1.

For the last expression is the rightmost block of and is the top block of . The equality follows from the observation in the proof of theorem 5 that is the sequence that has ( nonsingular) in the lower left block and zero elsewhere. Thus, . ∎

#### 3.2.2 Probability for a Direct Sum of Companion Matrices

Let be irreducible of degree . To determine the probability that a block projection of preserves the minimal polynomial of , we need to determine the probability that . We show that this is equivalent to the probability that a sum of rank one matrices over is zero and we establish a recurrence relation for this probability in corollary 14. This may be considered the heart of the paper.

###### Lemma 7.

Let , where is irreducible of degree . is equal to the probability that , where and are chosen uniformly randomly, and are blocks of , respectively, conforming to the dimensions of the blocks of .

###### Proof.

Because and , then . Because is irreducible, it has just two divisors: and . The divisor 1 generates only the zero sequence. Therefore, if then . Otherwise, . Thus equals the probability that . ∎

The connection between sums of sequences and sums of rank one matrices over the extension field is obtained through the observation that for column vectors , one has where is the regular matrix representation of , i.e. in . The vectors and can be interpreted as elements of by associating them with the polynomials and . Moreover, if is chosen as a basis for over , then and .

Letting , the initial segment of is , which is , where is the Krylov matrix whose columns are . The following lemma shows that and establishes the connection .

###### Lemma 8.

Let be an irreducible polynomial and be the extension field defined by . Let be the regular representation of and the companion matrix of . Then .

###### Proof.

Let be the vector with a one in the -th location and zeros elsewhere. Then, abusing notation, and . Since this is true for arbitrary the lemma is proved. ∎

Let and be and matrices over . Let be the -th row of and be -th column of . The sequence of matrices can be viewed as a matrix of sequences whose element is equal, by the discussion above, to . This matrix can be mapped to the matrix over whose element is the product . This is the outer product , with and viewed as a column vector over and a row vector over respectively. Hence it is a rank one matrix over provided neither nor is zero. Since any rank one matrix is an outer product, this mapping can be inverted. There is a one to one association of sequences with rank one matrices over .

To show that this mapping relates rank to the probability that the block projection preserves the minimum polynomial of , we must show that if then the corresponding sum of rank one matrices over is the zero matrix and vice versa. This will be shown using the fact that the transpose is similar to . While it is well known that a matrix is similar to its transpose, we provide a proof in the following lemma which constructs the similarity transformation and shows that the same similarity transformation works independent of .

###### Lemma 9.

Given an irreducible monic polynomial of degree , there exists a symmetric nonsingular matrix such that , for all .

###### Proof.

We begin with . Every matrix is similar to it’s transpose by a symmetric transform (Taussky and Zassenhaus, 1959). Let be a similarity transform such that . Then . ∎

It may be informative to have an explicit construction of such a transform . It can be done with Hankel structure (equality on antidiagonals). Let denote the Hankel matrix with first row and last row . For example Then define as . A straightforward computation verifies .

###### Lemma 10.

Given an irreducible monic polynomial and it’s extension field , there exists a one-to-one, onto mapping from the projections of to that preserves zero sums, i.e. iff .

###### Proof.

The previous discussion shows that the mapping from projections of onto rank one matrices over is one-to-one. Let and be the -th row of and and the -th column of , respectively. Let P be a matrix, whose existence follows from lemma 9, such that . Assume . Then using lemma 8 and properties of

 t∑k=1uTk,i¯Cfvk,j=0 ⇒ t∑k=1uTk,iρ(vk,j)=0  ⇒  t∑k=1uTk,iPP−1ρ(vk,j)PP−1=0 ⇒ t∑k=1uTk,iPρ(vk,j)TP−1=0  ⇒  t∑k=1(uTk,iP)ρ(vk,j)T=0 ⇒ t∑k=1~uk,i⋅vk,j=0,where ~uk,i=(uTk,iP).

Let be the vector whose -th row is then the corresponding sum of outer projects . Because is invertible, the argument can be done in reverse, and for any zero sum of rank one matrices over we can construct the corresponding sum of projections equal to zero. ∎

Thus the probability that is the probability that randomly selected -term outer products over sum to zero. The next lemma on rank one updates provides basic results leading to these probabilities.

###### Lemma 11.

Let be given and consider rank one updates to . For conformally blocked column vectors . we have that
if and only if and are both zero, and
if and only if are both nonzero.

###### Proof.

Without loss of generality (orthogonal change of basis) we may restrict attention to the case that and , where is the -th unit vector, if and otherwise, and similarly for vis a vis . Suppose that in this basis . Then

 (Ir⊕0)+uvT=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝1…00…0⋮⋱⋮⋮⋱⋮αw1…1+αwrαzr+1…αznβw1…βwrβzr+1…βzn⋮⋱⋮⋮⋱⋮0…00…0⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠.

The rank of is just in case (Meyer, 2000). In our setting this condition is that . We see that, for a rank of , we must have that and both zero. For rank it is clearly necessary that both of are nonzero. It is also sufficient because for the order minor has determinant . These conditions translate into the statements of the lemma before the change of basis. ∎

###### Corollary 12.

Let be of rank , and let be uniformly random in . Then,

1. the probability that is

 D(r)=qr−1(qr−1)q2n,
2. the probability that is

 U(r)=(qn−r−1)2q2(n−r),
3. the probability that is

 N(r)=1−D(r)−U(r)≥2qn−1q2n,

with equality when .

###### Proof.

There exist nonsingular such that and . Since and are uniformly random when are, we may assume without loss of generality that .

For part 1, by corollary 12, the rank of is less than only if both are zero in their last rows and . For , only when and we have, for the first such that , that . Counting, there are possible and then ’s satisfying the conditions. The stated probability follows.

For part 2, by the preceding lemma, the rank is increased only if the last rows of and are both nonzero. The probability of this is .

For the part 3 inequality, if the sign is changed and 1 is added to both sides, the inequality becomes . Note that and . Let and . Note that and are positive. Thus, it is obvious that . That is,

 U(r)+D(r)≤(qn−qrqn)2+(qr−1qn)2≤(qn−1qn)2.

Therefore, . ∎

###### Definition 13.

For uniformly random in , and , let denote the probability that .

###### Corollary 14.

Let , for uniformly random , and let , and be defined as described in corollary 12. Let (definition 13). Then, satisfies the recurrence relation

 \boldmathQt(r) = ⎧⎨⎩0,if r<0 or r>min(t,n)1,if r=0 and t=0ϕt−1(r),otherwise,

where ; and