A New Algorithm for Double Scalar Multiplication over Koblitz Curves
{onecolabstract}Koblitz curves are a special set of elliptic curves and have improved performance in computing scalar multiplication in elliptic curve cryptography due to the Frobenius endomorphism. Doublebase number system approach for Frobenius expansion has improved the performance in single scalar multiplication. In this paper, we present a new algorithm to generate a sparse and joint adic representation for a pair of scalars and its application in double scalar multiplication. The new algorithm is inspired from doublebase number system. We achieve 12% improvement in speed against stateoftheart adic joint sparse form.
Keywords
Elliptic curve cryptography, Koblitz curves, field programmable gate array, adic joint sparse form.
1 Introduction
Let Koblitz curve be
(1) 
where , a group of points on for some extension field and the group order of . Any point has following properties
(2) 
where is called the Frobenius map over . Further, there exists a point at infinity denoted by [1]. The point at infinity satisfies the properties,
(3) 
The Frobenius mapping of a point can be computed by squaring its coordinates. The cost of the squaring is very cheap and fast in its hardware implementation with both polynomial and normal basis representations [2]. In the digital signature verification of an elliptic curve cryptosystem, double scalar multiplication consumes most computational power, where and . The scalar are represented in adic expansion to obtain the advantage of Frobenius map by replacing point doublings.
Let be a ring of polynomials in the form , where is the length of the polynomial, for all and . First, both scalars are converted or reduced in to complex numbers such that is minimal. The reduction in is defined as , where is an integer in and . Next adic nonadjacent form with average Hamming weight is computed [3]. Then two NAFs are used as inputs to generate adic joint sparse form (JSF) of both scalars with average Hamming weight of [4].
Dimitrov et al. has introduced the two dimensional Frobenius expansion (TDFE) , where is the length of the adic expansion, and is an integer to compute single scalar multiplication [5]. Note that TDFE can be reduced to a polynomial in [5].
Our approach towards the double scalar representation is based on TDFE. Our algorithm delivers a joint and sparse two dimensional representation that can be reduced to . It is used with Straus’ idea [6] to compute double scalar multiplication and perform minimum number of point additions. Our new algorithm, joint two dimensional Frobenius expansion (JTDFE) is having 15% improvement in terms of speed compared to JSF in its implementation on a field programmable gate array (FPGA).
2 Two Dimensional Frobenius Expansion
The Frobenius map is a complex number with value , where . A complex number in the form of , where is called a Kleinian integer [7]. Next, we define the {}Kleinian integer.
Definition 1
{}Kleinian integer A Kleinian integer of the form , where is called a {}Kleinian integer.
The two dimensional Frobenius expansion of an integer can be represented as in the following equation:
(4) 
where is the length of the expansion and . We rearrange (4) as follows:
(5) 
where is the maximum power of that is multiplied by in (5).
Algorithm 1 illustrates the routine to compute the single scalar multiplication when the {}expansion of is given. In order to simplify, we denote the terms corresponding to in (5) with , i.e. . The multiplication costs one Frobenius mapping and a point addition. Therefore, should be limited when the two dimensional Frobenius expansion is computed.
Finding an algorithm that returns a fairly short representation of as the sum of {}Kleinian integers is an absolute need. The greedy algorithm given in [5] is used to obtain such a representation. Greedy algorithm does not always return the canonical {}expansion. Note that the complexity of the greedy algorithm depends crucially on the time spent to find the closest {}Kleinian integer to the current Kleinian integer.
However, finding the closest Kleinian integer in an intermediate step of greedy algorithm is achieved by precomputing all Kleinian integers for less than certain bounds and using an exhaustive search. Using divideandconquer principle, Dimitrov et al. have invented an effective method to generate an efficient two dimensional Frobenius expansion for computing single scalar multiplication [5]. Further they have conjectured following:
Conjecture 1
Length of Two Dimensional Frobenius Expansion Every Kleinian integer , can be represented as the sum of at most {}Kleinian integers, where is the norm of .
3 Joint Blocking Algorithm
In this section, we present the construction of our new algorithm to return a joint and sparse representations for a pair of Kleinian integers . Algorithm 2 illustrates the procedure to compute a joint two dimensional Frobenius expansion in for a pair of Kleinian integers.
A window size is fixed prior to running the algorithm. Then the optimal joint two dimensional Frobenius expansions for all possible pairs of bit adic representations are precomputed and given as another input.
First, two adic expansions , where and is the length of the longer expansion, in are computed. Next both adic expansions are arranged as in (6) to generate joint columns.
(6) 
The joint column in (6) has two elements for all satisfying . If one adic expansion is shorter than the other, then the coefficients of higher degrees of of shorter expansion should be set to zero.
Two adic expansions are separated into bit number of blocks. The least significant bits of adic expansion have the label block 0, while the most significant bits have label block .
At step 6 of Algorithm 2, block of and representations are considered to find the block of optimal joint two dimensional Frobenius expansion. This is achieved by a lookuptable approach. Once the block of optimal joint two dimensional Frobenius expansion is obtained, all elements are multiplied by and appended to the relevant lists. We repeat this step for times to obtain the complete joint two dimensional Frobenius expansion. Example 3 illustrates the execution of Algorithm 2.
Example 1
Joint Two Dimensional Frobenius Expansion We consider two Kleinian integers and with in this example. As the first step we compute adic expansions for both and (Step 3 of Algorithm 2):
To construct the joint expansion we need to make both expansions in the same length. Therefore we append a zero to the beginning of the adic representation of :
Let . We divide both adic expansions into bit blocks and find the optimal joint two dimensional Frobenius expansion for each block. The pair and have and . Most significant bits pair have and (Step 6 of Algorithm 2). We multiply last pair by to obtain final results (Step 7 of Algorithm 2). The optimal joint two dimensional Frobenius expansion is given by:
The main advantage of a joint representation in double scalar multiplication in elliptic curve cryptography is that Straus’ method can be applied with some precomputations to improve the efficiency [6]. Considering Example 3, if and are precomputed, we can compute with four point additions. We do not consider any additions due to terms in the total cost.
(7) 
Then we can apply Algorithm 1 to compute final point with (7). The point negations over Koblitz curves is only a field addition and can be neglected in terms of cost compared to field multiplication. Fig. 1 illustrates the graphical representation of joint two dimensional Frobenius expansion of and .
The generation of precomputed optimal joint representations for all possible combinations of pairs of Kleinian integers for a given window is achieved by an exhaustive search. This computation needs to be done only once per curve and a given window size.
4 Hardware Implementation and Results
The double scalar multiplication over with joint two dimensional Frobenius expansion is implemented in VHDL and placed and routed to Xilinx XC4VLX200 FPGA by executing Xilinx Integrated Software Environment (ISE™) version 9.2i. The window size is set to and maximum exponent of is limited to four. We describe the hardware architecture of our circuit in this section. The toplevel design components and architecture of the circuit are illustrated in Fig. 2.
The circuit is partitioned into four highlevel components, namely, main controller (MC), binary arithmetic processor (BAP), integerto converter (ITC) and registers. In our implementation, databus width is set to 163 bits. Other than and , inputs and outputs of main controller are handshaking signals between MC and other units.
The binary arithmetic processor performs four basic arithmetic operations needed for point multiplication, namely, addition, squaring, multiplication and inversion. All arithmetic operations are performed in the normal basis representation. Addition and squaring can be executed in a single clock cycle. Addition is an exclusive OR () operation and squaring is a cyclic shift operation in the normal basis representation [9]. Multiplication is a direct implementation of MasseyOmura multiplier with computing four bits in one clock cycle [10]. Therefore we need only forty one clock cycles for the multiplication. The inversion is performed with ItohTsuji architecture [11]. It needs nine multiplications to calculate the inversion of an element in . Once the multiplication or inversion is performed, binary arithmetic processor sends out a job completion signal by setting DONE of BAP to high.
The primary job of the integerto converter is to compute the joint two dimensional Frobenius expansion from a pair of integers. Our implementation comprises of two integerto converters with lazy reduction introduced in [8], because it is faster and needs less area in hardware implementations. The converter is slightly modified to generate nonnegative elements for the adic expansion, whereas the circuit proposed in [8] generates the NAF. Then a precomputed lookuptable is used to compute joint two dimensional Frobenius expansion. The signal DONE of ITC is high when first bits of each expansion is available for processing.
The registers are used to store point coordinates and intermediate values during point additions. Further some registers can perform cyclic shift operation to facilitate Frobenius mapping on points , , , and .
The main controller is designed with a finite state machine to perform the double scalar multiplication with other three components. With the INIT of MC set to high, main controller begins loading integers and , point coordinates , , , to registers. Then and are loaded into the integerto converter simultaneously. Once DONE of ITC is high, the joint two dimensional Frobenius expansion is read to main controller and the double scalar multiplication is started. The main controller knows that it has reached to the end of computation, when the TOP of ITC is high. Final results are stored in the registers and DONE of MC is set to high.
Affine coordinates are used in precomputations. Mixed coordinates are used for computing and related calculations and needs 8 field multiplications and 5 field squarings. For other point additions, i.e. related computations we have used LópezDahab projective coordinates in this implementation [12]. These point additions require 13 field multiplications and 4 field squarings.
The hardware implementations are carried out for both adic joint sparse form and joint two dimensional Frobenius expansion based double scalar multiplication. A window value and maximum exponent are selected for the joint two dimensional Frobenius expansion implementation. The is considered over binary field . We have considered the curve parameters and field for implementation which are specified by NIST. We have implemented both circuits in Xilinx XC4VLX200 FPGA and tested for 10,000 pair of integers, , and pair of points, , . The summary of the experimental results are given in Table 1.
Max.  Area  Increase  Av. time  Gain in  
Algo.  Freq.  (Num. of  in area  per calc.  time 
(MHz)  slices)  (%)^{a}  (s)  (%)^{b}  
JSF  75.364  9,217    479.609   
J2DFE  76.559  13,403  45.42  418.675  12.70 

Increase in area is given against JSF.

Gain in time is given against JSF.
The results collected in Table 1 are based on the synthesis goals set for speed maximization. Time is read for each algorithm when the circuit is operating at its maximum frequency.
Note:
Timings given for single scalar multiplication in [5] are very smaller than the figures for double scalar multiplication presented in this paper. That is mainly due to three reasons: Firstly, the clock speed in [5] is two times as fast as that of this implementation. Secondly, average number of point additions in double scalar multiplication is more than twice of that in single scalar multiplication. Thirdly, field multiplication in [5] needs 9 clock cycles, while in this implementation we need 41 clock cycles.
5 Conclusions
The joint two dimensional Frobenius expansion outperforms the stateoftheart adic joint sparse form in double scalar multiplication over Koblitz curves, in speed, according to the experimental results presented in Table 1. The area of the new architecture has increased by about 45% of that of JSF architecture. Having greater values for window sizes and maximum exponents, the speed of the double scalar multiplication can be improved. When the window size is increased the size of lookuptable in integerto conversion grows exponentially. We will investigate on different combinations of and maximum exponent as future work.
References
 [1] N. Koblitz, “CMCurves with Good Cryptographic Properties,” in CRYPTO ’91: Proceedings of the 11th Annual International Cryptology Conference on Advances in Cryptology. London, UK: SpringerVerlag, 1992, pp. 279–287.
 [2] D. Hankerson, A. Menezes, and S. Vanstone, Guide to Elliptic Curve Cryptography. Springer, 2004.
 [3] J. A. Solinas, “Efficient arithmetic on Koblitz curves,” Design Codes Cryptography, vol. 19, no. 23, pp. 195–249, 2000.
 [4] M. Ciet, T. Lange, F. Sica, and J. Quisquater, “Improved algorithms for efficient arithmetic on elliptic curves using fast endomorphisms,” in Advances in Cryptology, EUROCRYPT 2003, 2003, pp. 388–400.
 [5] V. S. Dimitrov, K. U. Järvinen, M. J. Jacobson, W. F. Chan, and Z. Huang, “Provably sublinear point multiplication on Koblitz curves and its hardware implementation,” Computers, IEEE Transactions on, vol. 57, pp. 1469–1481, 2008.
 [6] E. G. Straus, “Addition chains of vectors (problem 5125),” American Mathematical Monthly, vol. 70, pp. 806–808, 1964.
 [7] J. H. Conway and D. Smith, On Quaternions and Octonions, 1st ed. AK Peters, 2003.
 [8] B. B. Brumley and K. U. Järvinen, “Conversion algorithms and implementations for Koblitz curve cryptography,” Computers, IEEE Transactions on, vol. 59, pp. 81–92, 2009.
 [9] G. Agnew, T. Beth, R. Mullin, and S. Vanstone, “Arithmetic operations in ,” Journal of Cryptology, vol. 6, no. 1, pp. 3–13, Mar. 1993.
 [10] J. K. Omura and J. L. Massey, “Computational method and apparatus for finite field arithmetic,” no. 4587627, May 1986, (US Patent 4587627).
 [11] T. Itoh and S. Tsujii, “A fast algorithm for computing multiplicative inverses in using normal bases,” Information and Computation, vol. 78, no. 3, pp. 171–177, Sept. 1988.
 [12] J. López and R. Dahab, “Fast Multiplication on elliptic curves Over without precomputation,” in Cryptographic Hardware and Embedded Systems, ser. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 1999, vol. 1717, pp. 316–327.