Parallel degree computation for solution space of binomial systems with an application to the master space of \mathcal{N}=1 gauge theories

# Parallel degree computation for solution space of binomial systems with an application to the master space of N=1 gauge theories

## Abstract

The problem of solving a system of polynomial equations is one of the most fundamental problems in applied mathematics. Among them, the problem of solving a system of binomial equations form a important subclass for which specialized techniques exist. For both theoretic and applied purposes, the degree of the solution set of a system of binomial equations often plays an important role in understanding the geometric structure of the solution set. Its computation, however, is computationally intensive. This paper proposes a specialized parallel algorithm for computing the degree on GPUs that takes advantage of the massively parallel nature of GPU devices. The preliminary implementation shows remarkable efficiency and scalability when compared to the closest CPU-based counterpart. Applied to the “master space problem of gauge theories” the GPU-based implementation achieves nearly 30 fold speedup over its CPU-only counterpart enabling the discovery of previously unknown results. Equally important to note is the far superior scalability: with merely 3 GPU devices on a single workstation, the GPU-based implementation shows better performance, on certain problems, than a small cluster totaling 100 CPU cores.

## 1Introduction

The problem of solving a system of polynomial equations is one of the most fundamental problems in applied mathematics and science. Among them, the problem of solving a system of binomial equations is of special interest for they appear naturally in many applications, and specialized and much more efficient algorithm exists (e.g. [25]). In many applications, only the solutions of a system of binomial equations for which no variable is zero are needed. Such solutions are known as the -solution set and will be the focus of this article.

In this article, we propose a parallel algorithm for computing the degree of a -solution set of a system of binomial equations. This algorithm is specially designed for GPU (graphics processing unit) devices by taking advantage of the massively parallel nature of GPUs. When applied to a binomial system coming from particle physics, called the master space of gauge theories, this algorithm is able to produce previously unknown results. Furthermore, the experimental implementation for GPU built on top of the CUDA framework has already shown promising results. Remarkably, with multiple GPU devices (on the same computer), the GPU based implementation exhibits much better performance, in many cases, than small to medium sized computer clusters.

This article is structured as follows: First, necessary notations and concepts are introduced. In particular, we shall review basic geometric properties of the -solution set defined by a binomial system. Then the algorithm for computing the Smith Normal Form of an integer matrix is reviewed in §Section 3, as it is an important tool necessary in understanding the structure of the -solution set of a binomial system. The core of this article is §Section 4 where a highly scalable parallel algorithm for computing the degree of the -solution set of a system of binomial equations is presented. A natural by-product of the degree computation is a series of homotopy constructions that can be used to compute the “witness sets” of components of the -solution set of a binomial system, which is an important and ubiquitous construction in numerical algebraic geometry. This process is explained in §Section 5. The problem of studying the master space of gauge theories, arising from string theory, is briefly reviewed in §Section 6, and we show the interesting and previously unknown results obtained by applying the parallel algorithm for solving systems of binomial equations and computing the degree of the solution set to the master space problem.

## 2Laurent binomial systems and its solution set

First, we shall introduce necessary concepts and notations. For positive integers and , let denote the set of all integer matrices. A square integer matrix is said to be unimodular if its determinant is . Note that such a matrix has a unique inverse that is also in , where is the adjoint matrix of . The identity matrix in is denoted by .

Even though the main application considered in this article are binomial systems, its theory is more naturally developed in the context of more general “Laurent binomial systems” where negative exponents are allowed. For variables , a Laurent monomial in is an expression of the form where are integers (which may be zero or negative). For convenience, we shall write and use the “vector exponent” notation

to denote a Laurent monomial. Similarly, for an integer matrix with columns , the “matrix exponent” notation will be used for an -tuple of Laurent monomials:

This notation is particularly convenient since the familiar identities and still hold. Since the exponents here may be negative, it is only meaningful to consider the function when we restrict each to be nonzero. In particular, throughout this article, we shall let for each . In this case, each matrix induces a function from to given by . Of particular importance is the function induced by a unimodular matrix since is also in , and hence functions and are the inverses of each other ().

A Laurent binomial is an expression of the form for some and . This article focuses on the properties of the solution set of systems of Laurent binomials equations, or simply Laurent binomial systems, over . Stated formally, given exponent vectors and the coefficients , the goal is to describe the set of all that satisfies the system of equations

Since only the solutions in are concerned, this system is clearly equivalent to

With the more compact “matrix exponent” notation in , this system can simply be written as

where the integer matrix , having columns , represents the exponents appeared in the Laurent monomials and the vector collects all the coefficients. The solution set of over shall be denoted by

The goal of this article is to present efficient parallel algorithms for computing the structural properties of the set : its dimension, number of components, global parametrizations, and, most importantly, degree. We shall first briefly review some basic facts about the -solution set of a Laurent binomial system. A more detailed summary can be found in the article [6] by the first author and Tien-Yien Li. In depth theoretical discussions can be found in standard references such as [8]. Certain computational aspects have been studied in [25].

An important tool in understanding the structure of is the Smith Normal Form of the exponent matrix : there are unimodular square matrices and such that

with nonzero integers for , unique up to the signs. Here, means divides as usual. This decomposition of the matrix provides important topological information about summarized in the following proposition:

This description can be strengthened significantly. Here we shall briefly outline the derivation of the stronger description of as it provides the important data that form the starting point of the degree computation to be discussed in §Section 4. For and in the Smith Normal Form of in (Equation 4), let and be the top rows and the remaining rows of respectively. Similarly, let and be the left columns and the remaining columns of respectively. With these notations, the Smith Normal Form of of can be written as

with and ’s representing zero block matrices of appropriate sizes. With this we can transform the binomial system into a form from which more detailed information can be easily extracted.

Since and are both unimodular the maps and are both bijections on and respectively. Therefore, considering the solution set in , the original system is equivalent to . Similarly, the solution sets remain equivalent after the change of variables , which produces

Since , the original system    can now be decomposed into a combined system

where appears when with , and appears when . The word “free” in means the system imposes no constraints on the variables .

Focusing on the above decomposed system, it is clear that if , then the system is inconsistent unless . If the system is consistent (namely, holds), then the solutions to are exactly

where each    is a fixed choice of the  -th  root of  -th  coordinate of  . Clearly, all of them are isolated and the total number of these solutions is . If , then the solution set of the decomposed system – in breaks into “components” of the form , and they are in one-to-one correspondence with solutions in . Since each component is parametrized by the free variables , it is smooth and of dimension . Furthermore, they are disjoint, because these components have distinct coordinates.

To translate the above description of the -solution set of the decomposed system (in ) into a description the original solution set , one may simply apply the change of variables . Note that this map and its inverse are both given by by monomials (bi-regular maps [21]), the basic properties of the solution set, such as, the number of solution components, their dimensions, and smoothness are therefore preserved. To summarize, the above elaborations assert the following proposition.

Note that, as previously stated, in the case of , the solution set is of dimension , that is, consists of isolated points. Then the “parametrizations” are understood as constants each describes a single isolated point.

As indicated in Proposition ?, for a consistent Laurent binomial system where with , each component of the solution set in will be of dimension . In this situation, for both theoretical interests and demands from concrete applications, like the Master Space problem to be discussed in §Section 6, one often wishes to identify another important property: the degrees of the components. Degree is a classic concept developed for plane algebraic curves. For example, the quadratic equation   defines a curve of degree 2, i.e., the parabola. The generalized notation of degree for irreducible algebraic sets is usually formulated algebraically via Hilbert Polynomials. In this article, following the common practice of Numerical Algebraic Geometry, we shall take a geometric approach: Let be a component of for some fixed choice of as defined in Proposition ?. The number of isolated intersection point between and a “generic” affine space of complementary dimension is a fixed number, and this number is the degree of , denoted by . In algebraic terms, we are considering the degree of the projective closure of .

Stated more precisely, let be the set of all affine space in of dimension . Then it can be shown that in a fixed open and dense subset of , all the affine spaces intersect with at a fixed number of isolated points. This geometric interpretation of degree is explained in [13].

From a computational standpoint, a generic affine space in can be represented by the solution set of a system of linear equations with generic coefficients. Therefore is precisely the number of points that satisfies the system of linear equations

where for and are generic complex numbers. But recall that the set is precisely the image of the injective map

in Proposition ?. If we let and then

where for each , and are the -th columns of and respectively. In other words, has the global parametrization . Therefore the intersections between and the generic affine space defined by are precisely the solutions of the polynomial system

By letting and for each and , the above is a systems of polynomial equations in variables with generic complex coefficients and the same set of monomials .

It is important to note that for generic coefficients, the -solutions of the above system are all isolated (0-dimensional), and the total number is a constant. Indeed, in [27], Kushnirenko has shown that this number can be expressed in terms of the volume of a geometric object: the Newton polytope of the above system. Here we state the result in the context of degree computation and leave the technical statement of Kushnirenko’s theorem, as well as its related concepts to Appendix § ?.

The degree of the solution set of can be computed efficiently through methods in combinatorial geometry.

## 3Parallel Smith Normal Form computation

As summarized in , the key to finding the dimension, number of components, and global parametrization of the -solution set is the Smith Normal Form of the exponent matrix . In this section, we briefly review the procedure for computing the Smith Normal Form of an integer matrix and then outline the parallel modification that is suitable for both multi-core systems and GPU.

We first briefly review the standard algorithm for computing the Smith Normal Form of a matrix with integer entries. One of the classic algorithms for computing the Smith Normal Form uses successive row () and column () reductions of the input matrix, as listed in [7] and [16]: Consider the special case where with nonzero, that is, take and . By the Bézout’s identity, there exist and such that . Let

then and

Similarly, for the special case , let , then . In general, and version of of the above matrices and can be constructed to perform row and column reduction respectively for a integer matrix.

After repeated such row and column reduction together with potential row and column permutations one can construct unimodular matrices and such that

with and nonzero. As noted in standard references such as [16], further reduction can ensure , but for the purpose of solving binomial system it is not necessary.

By their design, GPUs are naturally well suited to perform the row and column reductions [40] used in computing the Smith Normal Form. As shows, GPUs have a clear advantage over CPUs in performing simple row reductions for sufficiently large matrices: over 30 fold speedup can be achieved. §Section 6 shows the result of this algorithm when applied to the master space problem.

## 4Parallel degree computation

When the solution set consists of positive dimensional components, Proposition ? provides a computationally viable means for computing the degree of each component as the normalized volume of a convex polytope. In this section we shall present a parallel algorithm for computing the degree that is suitable for both multi-core systems and GPUs, though the focus is the GPU-based implementation. Throughout this section, let be a component of , and let . Here we shall focus on the case where . Let be the matrix appears in the Smith Normal For of . Considering each as a point in , let be the finite point set. Then by Proposition ?,

For brevity, let be the normalized volume function in , then the above equation can be written as

Therefore the task of computing the degree for is equivalent to the computation of the normalized volume of a lattice polytope (a polytope whose vertices have integer coordinates).

Our parallel algorithm for the degree computation is developed based on the parallel “mixed cell enumeration” algorithm presented in [5]. (See Remark ? below) Among many different approaches for computing the normalized volume, here we adopt a technique known as regular simplicial subdivision [31]. This approach produces an important byproduct that will be used in the computation of witness set, which will be the subject of §Section 5. In this approach, we are interested in computing the normalized volume by dividing the lattice polytope into a collection of smaller pieces for which the volume computation is easy.

A simplicial subdivision plays an important role in computing : the normalized volume of a -simplex in is easy to compute: given a -simplex ,

So the volume of can be computed easily as the sum of the volume of all simplices in .

Note that the simplicial subdivision for a given polytope is, in general, not unique, and there are many different approaches for constructing them. Here we focus on the approach of regular simplicial subdivision: One can define a “lifting function” by assigning a real number to each point in . For each point , a new point can be created by using as an additional coordinate. This procedure “lifts” points of into , the space of one higher dimension. Let

be the lifted version of via the lifting function . Figures ? and ? show examples of this lifting procedure.. Let be the projection that simply erases the last coordinate, then .

Recall that for a face of the lifted polytope , its inner normal is a vector such that the linear functional attains its minimum over on . Moreover, a face of is called a lower face with respect to the projection if its inner normal has positive last coordinate. Without loss of generality, in this case, we may assume the last coordinate of to be 1, that is, . It can be shown that for almost all choices of the lifting function , the projections of all the -dimensional lower faces of via form a simplicial subdivision for which is called a regular simplicial subdivision of . The construction of this simplicial subdivision is therefore equivalent to the enumeration of all the lower faces of .

Algebraically speaking, a -dimensional lower face of is the convex hull of a set of points for which there exists a such that the system of inequalities

is satisfied. In other words, the existence of the lower face defined by is equivalent to the feasibility of the above system of inequalities . This algebraic description of the lower faces is the basis on which enumeration methods are developed. While a brute-force approach of checking all the possible combinations of points in against the system of inequalities may be possible, the combinatorial explosion will likely render it impractical for all but the most trivial cases.

In the following subsections, we shall present an approach that results in a parallel algorithm which is suitable for both multi-core systems and GPU devices. In this approach, we employ two complementing processes of “extension” and “pivoting”. We shall outline them below.

### 4.1Extension of k-faces

Intuitively speaking, in the extension process, one starts with the vertices of the lower hull of . For each of these vertices, systematic attempts are made to “extend” it by finding another lower vertex so that the two vertices form a “lower edge” (an edge on the lower hull of ). The possible extensions may not be unique, and for each possibility, further attempts are made to extend it to 2-dimensional lower faces. This process continues until one reaches all the -dimensional lower faces. Finally, the collection of such -dimensional lower faces will project down, via , to form a simplicial subdivision for .

To describe this process, we first extend the characterization to include lower faces of all dimensions: A set of affinely independent points in is said to determine a lower -face if their convex hull form a -dimensional lower face of with respect to the projection . Stated algebraically, the affinely independent set determines a lower -face if and only if there exists an , such that the system of inequalities

is satisfied.

Clearly, a lower -face is a vertex on the lower hull of . Similarly, a lower -face is simply a lower edge. We can conveniently organize all possible system of inequalities of the above form into a directed acyclic graph, as illustrated ine , where each node represents a system of inequalities and there is an edge from to whenever is obtained by joining a new points in into . With this construction, the resulting graph is graded by the number of points involved.

It can be easily verified that for generic lifting function , containment relation between lower -faces of the same dimension is impossible. That is, for a fixed , no lower -face is contained in another lower -face. Therefore the the graph describes precisely the containment relationship among possible lower -faces.

A node is said to be feasible if the corresponding system of inequality is feasible. shows an example of the labeling of the graph via the feasibility of the nodes: dark for infeasible nodes and white for feasible ones. Recall that a node determines to lower -faces if and only if it is feasible. Hence we only need to explore of the feasible subgraph (the white subgraph in ).

One crucial observation is that if two points do not define a lower edge, then they cannot be a part of any lower faces of dimension greater than 1. More generally, if a set of points does not define a lower -face, then there are no lower -faces containing them for any . Stated formally, for ,

In terms of the graph, if a node is infeasible, then the entire subgraph reachable by that node is infeasible.

Therefore during the exploration of the graph, once an infeasible node is encountered, no further exploration from that node is needed as all nodes reachable are infeasible. This simple observation produces significant savings in terms of computation.

A key procedure in the exploration of the feasible subgraph is the jump from one feasible node to another along an edge. Assuming, for some , the node is feasible, then the feasibility of an adjacent node, say via the edge , can be determined by solving the linear programming problem

with the variable for .

Note that under the constraints, the value of the objective function must be nonnegative. Indeed, the minimum value of 0 is attainable precise when there is an for which the constraints are satisfied, simultaneously to . That is, minimum value is 0 if and only if is feasible. In this case, a new node is discovered. Geometrically, we have “extended” the lower -face determined by into a lower -face by joining it the new vertex .

Using the extension procedure as a basic building block, we shall discuss, in §Section 4.3, we shall discuss the complete parallel algorithm for the exploration of the feasible subgraph.

### 4.2Simplicial pivoting

In the above we have described a process that gradually explore the feasible subgraph via extension procedures. This process is complemented by another process which we shall call “simplicial pivoting” which explore the feasible subgraph by “moving sideways” in the graph from one lower -face to another.

This process starts with a lower -face of already obtained. Consider, for example, one of the lower face shown in . Using an edge as a hinge, we shall “pivot” one lower face until another lower face is obtained. More generally, recall that a lower -face is determined by a set of affinely independent points in that has an inner normal of the form with . Stated algebraically, the system of inequalities

is satisfied. Note that the equalities form a system of linearly independent constrains on and hence uniquely determines . By removing a single equality from the above system, we give the inner normal one degree of freedom which would allow it to “pivot”. The goal is to let it pivot until it defines a different lower -face.

For any choice , with the equality corresponding to in the above system removed, the inner normal , now with one degree of freedom, is characterized by the system

Note that this system has equalities. If a solution with equalities exists, then that solution corresponds to a different lower -face. In the context of Linear Programming, such a solution is called a basic feasible solution. The problem of finding a basic feasible solution is known as the Phase One problem in Linear Programming. It can be solved exactly and efficiently.

This procedure is called “simplicial pivoting”. It allows us to pivot from one lower -face to another. By repeatedly applying this procedure, more lower -faces can be gathered. illustrates this process.

### 4.3Traverse the feasible subgraph

In the above we have formulated the enumeration of lower -faces as the problem of exploring the feasible subgraph of which the lower -faces is a subset. We also have two procedures for “walking” within the graph: The extension procedure moves from one lower face to another of one higher dimension while the pivoting procedure jumps from one lower -face to another lower -face. With these building blocks in place, the exploration can be handled by classic graph traversal algorithms which we shall briefly review for completeness.

Most graph traversal algorithms follow a “discover-explore” procedure with proper book keeping [41]: They gradually explore the graph node by node through the connection between them while keeping track of the nodes visited so that no node is explored twice. For a single node, such an algorithm is divided into the discover and explore stages: a node is first discovered, and then its connections to other yet unknown nodes are explored. Clearly, each node only needs to be visited once. That is, one only needs to explore a spanning tree of the graph, (a subgraph that contains all the vertices but is a tree in structure), so some mechanism must be used to prevent a node from being visited twice. To keep track of the nodes as they are being visited, we assign each task a dynamic marker – its state. A node can be in one of the following three states:

undiscovered

The initial status of every node. In this state, the existence of the node is completely unknown to us.

discovered

The existence of the node is known, but its connections to other nodes are not yet explored.

completely-explored

The existence of the node is known and its connections to other nodes have been fully explored.

Obviously, a node cannot be completely-explored before it is first discovered, so in the course of the algorithm, the state of vertices progresses from undiscovered to discovered to completely-explored. This point of view also reveals the parallelism in such algorithms: nodes on different branches of the spanning tree can be explored in parallel, while consecutive nodes on a single branch must be discovered and explored in order. To start the algorithm, an initial set of nodes are generated by some other means (bootstrapping). The algorithm then discovers other nodes through their connections. From these newly discovered nodes the algorithm can discover even more node. This will continue as a self-sustaining process until all connected vertices are visited.

A complete algorithm also need a data structure to keep track of the discovered but not yet completely explored vertices (bookkeeping). In the present work, a queue is used. It is a linear data structure where newly discovered nodes are added to the back-end of the queue. The use of the queue structure essentially imposes an implicit ordering of “first-in-first-out”, that is, nodes discovered first are explored first. In the context of graph traversal algorithms, this is referred to as a breadth-first strategy in exploring the feasible subgraph. Experiments, presented in [5], suggest that a more flexible ordering of nodes within the queue may provide better performance, scalability, and memory usage. However, for simplicity, in this work, only the breadth-first approach has been studied. The detail of this class of algorithms can be found in standard textbooks such as [41]. In §Section 4.5, we list the pseudo code.

### 4.4Checking for duplicated discovery

One important problem we must deal with, in the parallel algorithm, is that same nodes may be discovered by different threads at the same time. Since the degree is the sum of the normalized volume of the projection of all the lower -faces which are represented by the nodes in the feasible subgraph, duplicated nodes will produce incorrect results. Therefore, the mechanism for ensuring no duplicated lower -faces are listed is the key to the correctness of the algorithm.

This mechanism appears to be the bottleneck, in terms of performance, of the original algorithm [14] for enumerating “mixed cells” using mainly the pivoting process which is one of the main inspiration of the present work (see Remark ?). Our experiments confirm that an inefficient checking mechanism would be the limiting factor of the scalability in a parallel implementation. Since on a GPU, it is typical to have thousands of threads active simultaneously, the efficiency of such a mechanism is crucial.

In the present work, the hash table data structure is used to keep track of the nodes, in the graph, that have been discovered or completely-explored. The great advantage of this choice is that unlike a sorted data structure, hash table provides nearly constant access time, in most cases. In our current implementation, for simplicity, the well-tested bit-string hash function from the standard C++ library is used.

Our experiments suggest that a hash table with to entries is sufficient for all problems considered in §Section 6 in the sense that the collision rate in hash table access can be virtually ignored.

### 4.5Summary of the algorithm

In the above, we formulate the degree computation for solution components defined by binomial systems as the exploration of the feasible subgraph to be accomplished by the two complementing processes: extension and pivoting. In this section, we list the main algorithms.

These algorithms are designed for a system with one or more GPU devices and a single CPU with the GPU performance most of the computation intensive tasks. For simplicity, we restrict ourselves to modern GPUs manufactured by NVidia and build our program based on NVidia CUDA (a GPU programming framework). All the GPU devices must share memory since they must all have access to data structures WaitingNodes, KnownNodes, and NewNodes. In the current implementation, this is accomplished via a technique known as pinned memory [40] provided by the CUDA framework.

In the following algorithms, the list WaitingNodes contains nodes whose feasibility are to be determined by the extension procedure. Cells is the unordered collection of lower -faces already discovered. KnownNodes is the hash table that record the discovery of nodes, and it is crucial mechanism by which we ensure the uniqueness of the discovered nodes. Finally, NewNodes is an unordered list that keeps track of nodes discovered through pivoting or extension. They need to be checked against KnownNodes for uniqueness.

Random is a function that randomly choose an item from a collection using pseudo random number generator. The randomness is employed to achieve a more uniform performance from one run to another which simplifies the benchmarking process. SimplexPhaseOne and SimplexPhaseTwo are the phase-one and phase-two algorithm of the simplex method for the linear programming problems and respectively. Even though, at over 3000 lines, the C++ code for these two components are the longest and most complicated parts of the entire program, they have been a fixture of the long line of “mixed volume computation” software developed over the last two decades whence the present work inherits much of its techniques and design. Therefore we choose to not describe them in detail and refer to works including [5].

The Extend procedure tests the feasibility of a node (see §Section 4.1) in the waiting list WaitingNodes, and it is designed to run simultaneously on all available threads across all GPU devices.

The Pivot procedure implements the simplicial pivoting process detailed in §Section 4.2, and it is designed to run simultaneously on all available threads across all GPU devices. It picks a random lower -face already discovered and apply simplicial pivoting to potentially obtain a new lower faces. Just like the Extend procedure above, newly discovered nodes will be place in the NewNodes list.

The procedure CheckUniq checks newly discovered nodes against the hash table KnownNodes to make sure they have not already been discovered. It will run on a GPU device with a large number of threads simultaneously checking the uniqueness of all nodes in the list of NewNodes.

Finally, the main procedure, which runs on the CPU, coordinates all the different processes.

## 5Computation of witness sets

The concept of “witness sets” [42] is one of the most fundamental and versatile tool in numerical algebraic geometry. In its most basic form, given a pure dimensional algebraic set, it can be shown that its intersection with a generic affine space of complementary dimension consists of finitely many isolated points. This finite set is called a witness set of the algebraic set. It can be used to compute, among many other things, the irreducible decomposition and primary decomposition numerically. In many scenarios, it produces the degree of each component as a byproduct. Indeed, this technique (via witness sets) was first used to numerically compute the degrees of the “Master Space” problem in the work [22].

Given the ubiquity of the use of witness sets in numerical algebraic geometry, in this section, we shall briefly outline a homotopy construction for computing witness sets for a component of . It is a special case of the polyhedral homotopy [24].

Recall that by Proposition ?, the intersection between a component and a generic affine space of complementary dimension consists of precisely the points that satisfy the system of Laurent polynomial equation in variables given by

where the coefficients depends on both the choice of the component in and the choice of the -dimensional affine space.

Reusing the notations from §Section 4, let , and let be the generic lifting function used for constructing regular simplicial subdivision of in §Section 4. Without loss of generality, we can pick to have images only in . With these, we introduce a new variable and consider

which is constructed by multiplying each term in by a rational power of the new variable whose exponent is determined by the lifting function . Clearly, is exactly the system which we aim to solve (inside ). As varies, however, represents a continuous deformation of the system , or a homotopy. The central idea behind the homotopy continuation method for solving systems of equations is the deformation of a system into a “starting system” which one can solve easily. Then numerical continuation methods are employed to trace the movement of the solutions of the starting system under the deformation toward the solutions of the original system which one aims to solve.

The key here is to find an appropriate starting system that can be easily solved. As is, cannot be used as the starting system since at , the system is either identically zero or undefined. Therefore certain transformation is necessary to produce a meaningful and solvable starting system. Such transformations are given by the regular simplicial subdivision discussed in §Section 4.

Still let be a regular simplicial subdivision obtained by the algorithm presented in §Section 4.5. Recall that each cell in is a projection of a cell of the form such that is a lower -face of that is characterized by . That is, there exists a (unique) vector of the form such that

Using , we shall consider the change of variables

with which becomes

Let and define a new homotopy

Note that the new homotopy still has the necessary property that is identical to the system which we aim to solve.

One important observation here is that, by , there are precisely terms in each component of having no power of (the terms corresponding to ), and all other terms have positive powers of . Consequently, at , terms with positive powers of vanish, leaving only

To simplify the notation, let

then the above equation can be written as

For generic choices of the coefficients, there exists a nonsingular matrix such that