A Rigorous Extension of the SchönhageStrassen Integer Multiplication Algorithm Using Complex Interval Arithmetic
Abstract
Multiplication of digit integers by long multiplication requires operations and can be timeconsuming. In 1970 A. Schönhage and V. Strassen published an algorithm capable of performing the task with only arithmetic operations over ; naturally, finiteprecision approximations to are used and rounding errors need to be accounted for. Overall, using variableprecision fixedpoint numbers, this results in an time algorithm. However, to make this algorithm more efficient and practical we need to make use of hardwarebased floatingpoint numbers. How do we deal with rounding errors? and how do we determine the limits of the fixedprecision hardware? Our solution is to use interval arithmetic to guarantee the correctness of results and determine the hardware’s limits. We examine the feasibility of this approach and are able to report that 75,000digit base256 integers can be handled using doubleprecision containment sets. This clearly demonstrates that our approach has practical potential; however, at this stage, our implementation does not yet compete with commercial ones, but we are able to demonstrate the feasibility of this technique.
X. Zheng and N. Zhong (Eds.) Computability and Complexity in Analysis (CCA 2010) EPTCS 24, 2010, pp. A Rigorous Extension of the SchönhageStrassen Integer Multiplication Algorithm Using Complex Interval Arithmetic–LABEL:LastPage, doi:10.4204/EPTCS.24.19 © T. Steinke & R. Sainudiin This work is licensed under the Creative Commons Attribution License.
A Rigorous Extension of the SchönhageStrassen Integer Multiplication Algorithm Using Complex Interval Arithmetic
Thomas Steinke  


tas74@student.canterbury.ac.nz and Raazesh Sainudiin  


r.sainudiin@math.canterbury.ac.nz 
1 Introduction
Multiplication of very large integers is a crucial subroutine of many algorithms such as the RSA cryptosystem [8]. Consequently, much effort has gone into finding fast and reliable multiplication algorithms; [5] discusses several methods. The asymptoticallyfastest known algorithm [2] requires steps, where is the iterated logarithm — defined as the number of times one has to repeatedly take the logarithm before the number is less than . However, being asymptoticallyfaster does not translate to being faster in practice. We shall concern ourselves with the practicalities of the subject; we will analyse our algorithm’s performance on a finite range of numbers.
The algorithm we are studying here is based on the first of two asymptotically fast multiplication algorithms by A. Schönhage and V. Strassen [9]. These algorithms are based on the convolution theorem and the fast Fourier transform. The first algorithm (the one we are studying) performs the discrete Fourier transform over using finiteprecision approximations. The second algorithm uses the same ideas as the first, but it works over the finite ring rather than the uncountable field . We wish to point out that “the SchönhageStrassen algorithm” usually refers to the second algorithm. However, in this document we use it to refer to the first based algorithm.
From the theoretical viewpoint, the second algorithm is much nicer than the first. The second algorithm does not require the use of finiteprecision approximations to . Also, the second algorithm requires steps to multiply two bit numbers, making it asymptoticallyfaster than the first algorithm. However, the second algorithm is much more complicated than the first, and it is outperformed by asymptoticallyslower algorithms, such as long multiplication, for smalltomedium input sizes. In practice, both of the SchönhageStrassen algorithms are rarely used.
The first SchönhageStrassen Algorithm is more elegant, if the finiteprecision approximations are ignored. More importantly, it is faster in practice. Previous studies [3] have shown that the first algorithm can be faster than even highlyoptimised implementations of the second. However, the first algorithm’s reliance on finiteprecision approximations, despite exact answers being required, leads to it being discounted.
The saving grace of the SchönhageStrassen algorithm is that at the end of the computation an integral result will be obtained. So the finiteprecision approximations are rounded to integers. Thus, as long as rounding errors are sufficiently small for the rounding to be correct, an exact answer will be obtained. Shönhage and Strassen showed that fixedpoint numbers with a variable precision of bits would be sufficient to achieve this.
For the SchönhageStrassen algorithm to be practical, we need to make use of hardwarebased floatingpoint numbers; softwarebased variableprecision numbers are simply too slow. However, we need to be able to manage the rounding errors. At the very least, we must be able to detect when the error is too large and more precision is needed. The usual approach to this is to prove some kind of worstcase error bound (for an example, see [7]). Then we can be sure that, for sufficiently small inputs, the algorithm will give correct results. However, worstcase bounds are rarely tight. We propose the use of dynamic error bounds using existing techniques from computeraided proofs.
Dynamic error detection allows us to move beyond worstcase bounds. For example, using standard singleprecision floatingpoint numbers, our naïve implementation of the SchönhageStrassen algorithm sometimes gave an incorrect result when we tried multiplying two 120digit base256 numbers, but it usually gave correct results. Note that by a ‘naïve implementation’ we simply mean a modificiation of the SchönhageStrassen algorithm that uses fixedprecison floatingpoint arithmetic and does not guarantee correctness. A worstcase bound would not allow us to use the algorithm in this case, despite it usually being correct. Dynamic error detection, however, would allow us to try the algorithm, and, in the rare instances where errors occur, it would inform us that we need to use more precision.
We will use complex interval containment sets for all complex arithmetic operations. This means that at the end of the computation, where we would ordinarily round to the nearest integer, we simply choose the unique integer in the containment set. If the containment set contains multiple integers, then we report an error. This rigorous extension of the SchönhageStrassen algorithm therefore constitutes a computeraided proof of the desired product. When an error is detected, we must increase the precision being used or we must use a different algorithm.
2 The Algorithm
For the sake of completeness we explain the SchönhageStrassen algorithm, as it is presented in [9]. We also explain how we have modified the algorithm using interval arithmetic in subsection 2.5. Those already familiar with the material may skip all or part of this section.
We make the convention that a positive integer is represented in base (usually for some ) as a vector ; the value of is
2.1 Basic Multiplication Algorithm
The above definition immediately leads to a formula for multiplication. Let and be positive integers with representations and . Then
Of course, we cannot simply set ; this may violate the constraint that for every . We must ‘carry’ the ‘overflow’. This leads to the long multiplication algorithm (see [5]).
The Long Multiplication Algorithm
1.
Input: and
2.
Output: #
3.
Set . # = carry
4.
For up to do {
5.
Set . # = sum
6.
For up to do {
7.
Set .
8.
}.
9.
Set .
10.
Set .
11.
}.
12.
# at the end.
This algorithm requires steps (for a fixed ). Close inspection of the long multiplication algorithm might suggest that steps are required as the sum can become very large. However, adding a bounded number () to an unbounded number () is, on average, a constanttime operation.
2.2 The Discrete Fourier Transform
The basis of the SchönhageStrassen algorithm is the discrete Fourier transform and the convolution theorem. The discrete Fourier transform is a map from to . In this section we will define the discrete Fourier transform and we will show how it and its inverse can be calculated with complex additions and multiplications. See [5] for further details.
Definition 1 (Discrete Fourier Transform).
Let and let . Then define the discrete Fourier transform of by
There is nothing special about our choice of ; the consequences of the following lemma are all that we need to satisfy. Any other element of with the same properties would suffice.
Lemma 2.
Let and . Then
and, for all ,
Note that the case where is uninteresting, as and the discrete Fourier transform is the identity mapping .
Now we can prove that the discrete Fourier transform is a bijection.
Proposition 3 (Inverse Discrete Fourier Transform).
Let and let . Define by
Then this defines the inverse of the discrete Fourier transform — that is, if , then .
Now we explain the fast Fourier transform; this is simply a fast algorithm for computing the discrete Fourier transform and its inverse.
Let be a power of and be given. Now define by
for all with .
Now the critical observation of the CooleyTukey fast Fourier transform algorithm is the following. Fix with and let . Then we have
Note that , so taking the modulus is justified. This observation leads to the following divideandconquer algorithm.
The CooleyTukey Fast Fourier Transform
1.
Input: and
2.
Output:
3.
function FFT(, ) {
4.
If , then .
5.
Partition into .
6.
Compute = FFT(, ) by recursion.
7.
Compute = FFT(, ) by recursion.
8.
Compute .
9.
For up to do {
10.
Set .
11.
}.
12.
}.
It is easy to show that this algorithm requires complex additions and multiplications. With very little modification we are also able to obtain a fast algorithm for computing the inverse discrete Fourier transform.
Note that, to compute , we can use the recurrence
where . Other efficient methods of computing are also available.
2.3 The Convolution Theorem
We start by defining the convolution. Let . We can interpret and as the coefficients of two polynomials — that is,
The convolution of and — denoted by — is, for our purposes, the vector of coefficients obtained by multiplying the polynomials and . Thus we have for all . Note that we can add ‘padding zeroes’ to the end of the coefficient vectors without changing the corresponding polynomial.
The convolution theorem relates convolutions to Fourier transforms. We only use a restricted form.
Theorem 4 (Convolution Theorem).
Let and , where . Pad and by setting
Then, for every with ,
The convolution theorem gives us a fast method of computing convolutions and, thus, of multiplying polynomials. Given , we can calculate using only arithmetic operations as follows.
The Fast Convolution Algorithm
1.
Input:
2.
Output:
3.
Set and .
4.
# Pad and so they are in .
5.
Set .
6.
Compute and .
7.
For , set .
8.
Compute .
2.4 The SchönhageStrassen Algorithm
The SchönhageStrassen algorithm multiplies two integers by convolving them and then performing carrys. Let two base integer representations be and . We consider the digits as the coefficients of two polynomials. Then , and
So, to compute , we can first compute in steps and then we can evaluate . The evaluation of to yield an integer representation is simply the process of performing carrys.
The SchönhageStrassen Algorithm
1.
Input: and
2.
Output: #
3.
Compute using the fast convolution Algorithm.
4.
Set . #carry
5.
For up to do {
6.
Set .
7.
Set .
8.
}.
9.
Set .
Clearly the SchönhageStrassen algorithm performs the multiplication using complex arithmetic operations.
When finiteprecision complex arithmetic is done, rounding errors are introduced. However, this can be countered: We know that must be a vector of integers. As long as the rounding errors introduced are sufficiently small, we can round to the nearest integer and obtain the correct result. Schönhage and Strassen [9] proved that bit floating point numbers give sufficient precision.
2.5 Interval Arithmetic
Our rigorous extension of the algorithm uses containment sets. By replacing all complex numbers with complex containment sets, we can modify the SchönhageStrassen algorithm to find a containment set of ; if the containment set only contains one integervalued vector, then we can be certain that this is the correct value. We have used rectangular containment sets of machinerepresentable floatingpoint intervals with directed rounding to guarantee the desired integer product. A brief overview of the needed interval analysis [6] is given next.
Let be real numbers with . Let be a closed and bounded real interval and let the set of all such intervals be . Note that since we allow thin or punctual intervals with . If is one of the arithmetic operators , , , , we define arithmetic over operands in by , with the exception that is undefined if . Due to continuity and monotonicity of the operations and compactness of the operands, arithmetic over is given by real arithmetic operations with the bounds:
In addition to the above elementary operations over elements in , our algorithm requires us to contain the range of the square root function over elements in . Once again, due to the monotonicity of the square root function over nonnegative reals it suffices to work with the real image of the bounds . To complete the requirements for our rigorous extension of the SchönhageStrassen algorithm we need to extend addition, multiplication and division by a nonzero integer to elements in
Interval arithmetic over naturally extends to , the set of rectangular complex intervals. Addition and subtraction over given by
are sharp but not multiplication or division due to rectangular wrapping effects. Complex interval multiplication and division of a complex interval by a nonnegative integer can be contained with real interval multiplications given by
See [4] for details about how CXSC manipulates rectangular containment sets over and .
3 Results
We have implemented the SchönhageStrassen algorithm, our containmentset version with rectangular complex intervals and long multiplication in C++ using the CXSC library [4]. Our implementation is available at http://www.math.canterbury.ac.nz/~r.sainudiin/codes/capa/multiply/ ^{1}^{1}1Please also download the CXSC library from http://www.math.uniwuppertal.de/~xsc/.. Results show that, using base , our version of the algorithm is usually able to guarantee correct answers for up to 75,000digit numbers.
The following graph compares the speed of long multiplication (labelled ‘Long multiplication’), the conventional SchönhageStrassen algorithm with different underlying data types (the implementation using the CXSC complex data type is the line labelled ‘complex naïve SS’ and the one using our own implementation of complex numbers based on the C++ double data type is labelled ‘double naïve SS’) and our containmentset version (‘cinterval extended SS’) on uniformlyrandom digit base inputs. All tests were performed on a 2.2 GHz 64bit AMD Athlon 3500+ Processor running Ubuntu 9.04 using CXSC version 2.2.4 and gcc version 4.3.1. Times were recorded using the C++ clock() function — that is to say, CPU time was recorded. Note that only the ‘Long multiplication’ and ‘cinterval extended SS’ implementations are guaranteed to produce correct results. The ‘double naïve SS’ and ‘complex naïve SS’ implementations may have produced erroneous results, as the implementations do not necessarily provide sufficient precision; these are still shown for comparison. Note also that by ‘naïve’ we mean that these implementations use fixedprecision floatingpoint arithmetic, whereas the ‘real’ SchönhageStrassen algorithm uses variableprecision, which is much slower.
The above graph shows that the SchönhageStrassen algorithm is much more efficient than long multiplication for large inputs. However, our modified version of the algorithm is slower than the naïve SchönhageStrassen algorithm. We believe that CXSC is not welloptimised; for example, their punctual complex data type (used in the ‘complex naïve SS’ implementation) is much slower than our doublebased complex data type (used in the ‘double naïve SS’ implementation), even though ostensibly they are the same thing. We see that the CXSC cinterval type (used in the ‘cinterval extended SS’ implementation) is about three times as slow as the complex type. This leaves the possibility that a more optimised implementation of containment sets would be able to compete with commercial algorithms. Investigations such as [3] have shown that the Naïve SchönhageStrassen algorithm is able to compete with commercial implementations.
Note that the “steps” seen in the graph can be explained by the fact that the algorithm will always round the size up to the nearest power of two. Thus there are steps at the powers of two. The most important feature of our results is the range of input sizes for which our algorithm successfully determines the answer. Using only standard doubleprecision IEEE floatingpoint numbers, we are able to use the algorithm to multiply 75,000digit, or 600,000bit, integers; this range is more than sufficient for most applications, and at this point the second SchönhageStrassen algorithm will become competitive.
4 Conclusion
Our investigation has demonstrated that the SchönhageStrassen algorithm with containment sets is a practical algorithm that could be used reliably for applications such as cryptography. Schöhage and Strassen never showed that their algorithm had any practical value. However, as our implementation was not optimised, this is more of a feasibility study than a finished product.
Note that the advantage of our algorithm over the original SchönhageStrassen algorithm is that we make use of hardwarebased floatingpoint arithmetic, whereas the original is designed to make use of much slower softwarebased arithmetic. Both the original algorithm and our adaptation always produce correct results. However, we use a different approach to guaranteeing correctness. The naïve algorithms we mention are not guaranteed to be correct because they are modifications of the SchönhageStrassen algorithm which does not take measures to ensure correctness — they simply use fixedprecision floatingpoint arithemtic and hope for the best; these are only useful for speed comparisons.
It remains to optimise our implementation of the algorithm to compete with commercial libraries. This is dependent on a faster implementation of interval arithmetic. It may also be interesting to use circular containment sets rather than rectangular containment sets. The advantage of circular containment sets is that they easily deal with complex rotations — that is, multiplying by ; this is in fact the only type of complex multiplication (other than division by an integer) that our algorithm performs.
References
 [1]
 [2] Martin Fürer (2007): Faster Integer Multiplication. In: 39th ACM STOC, San Diego, California, USA, pp. 57–66.
 [3] Pierrick Gaudry, Alexander Kruppa & Paul Zimmermann (2007): A gmpbased implementation of schönhagestrassen’s large integer multiplication algorithm. In: ISSAC ’07: Proceedings of the 2007 international symposium on Symbolic and algebraic computation, ACM, New York, NY, USA, pp. 167–174.
 [4] Hofschuster & Krämer (2004): CXSC 2.0: A C++ library for extended scientific computing. In: R Alt, A Frommer, RB Kearfott & W Luther, editors: Numerical software with result verification, Lecture notes in computer science 2991, SpringerVerlag, pp. 15–35.
 [5] Donald E. Knuth (1998): The Art of Computer Programming, 2. AddisonWesley, 3 edition.
 [6] Ramon E. Moore, R. Baker Kearfott & Michael J. Cloud (2009): Introduction to Interval Analysis. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.
 [7] Colin Percival (2003): Rapid multiplication modulo the sum and difference of highly composite numbers. Math. Comput. 72(241), pp. 387–395.
 [8] R.L. Rivest, A. Shamir & L. Adleman (1977): A Method for Obtaining Digital Signatures and PublicKey Cryptosystems. Communications of the ACM 21(2), pp. 120–126.
 [9] A. Schönhage & V. Strassen (1971): Schnelle Multiplikation großer Zahlen (Fast Multiplication of Large Numbers). Computing: Archiv für elektronisches Rechnen (Archives for electronic computing) 7, pp. 281–292. (German).