Exploiting Binary FloatingPoint Representations
for Constraint Propagation:
The Complete Unabridged Version
Roberto Bagnara
BUGSENG srl and Dept. of Mathematics and Computer Science, University of Parma, Italy,
Matthieu Carlier
INRIA Rennes Bretagne Atlantique, France
Roberta Gori
Dept. of Computer Science, University of Pisa, Italy,
Arnaud Gotlieb
Certus Software V&V Center, SIMULA Research Laboratory, Norway,
Floatingpoint computations are quickly finding their way in the design of safety and missioncritical systems, despite the fact that designing floatingpoint algorithms is significantly more difficult than designing integer algorithms. For this reason, verification and validation of floatingpoint computations is a hot research topic. An important verification technique, especially in some industrial sectors, is testing. However, generating test data for floatingpoint intensive programs proved to be a challenging problem. Existing approaches usually resort to random or searchbased test data generation, but without symbolic reasoning it is almost impossible to generate test inputs that execute complex paths controlled by floatingpoint computations. Moreover, as constraint solvers over the reals or the rationals do not natively support the handling of rounding errors, the need arises for efficient constraint solvers over floatingpoint domains. In this paper, we present and fully justify improved algorithms for the propagation of arithmetic IEEE 754 binary floatingpoint constraints. The key point of these algorithms is a generalization of an idea by B. Marre and C. Michel that exploits a property of the representation of floatingpoint numbers.
Key words: software verification; testing; floatingpoint numbers; constraint solving
During the last decade, the use of floatingpoint computations in the design of critical systems has become increasingly acceptable. Even in the civil and military avionics domain, which are among the most critical domains for software, floatingpoint numbers are now seen as a sufficientlysafe, faster and cheaper alternative to fixedpoint arithmetic. To the point that, in modern avionics, floatingpoint is the norm rather than the exception (BurdyDL12).
Acceptance of floatingpoint computations in the design of critical systems took a long time. In fact, rounding errors can cause subtle bugs which are often missed by non experts (Monniaux08), and can lead to catastrophic failures. For instance, during the first Persian Gulf War, the failure of a Patriot missile battery in Dhahran was traced to an accumulating rounding error in the continuous execution of tracking and guidance software: this failure prevented the interception of an Iraqi Scud that hit the barracks in Dhahran, Saudi Arabia, killing 28 US soldiers (Skeel92). A careful analysis of this failure revealed that, even though the rounding error obtained at each step of the floatingpoint computation was very small, the propagation during a long loopiterating path could lead to dramatic imprecision.
Adoption of floatingpoint computations in critical systems involves the use of thorough unit testing procedures that are able to exercise complex chains of floatingpoint operations. In particular, a popular practice among software engineers in charge of the testing of floatingpointintensive computations consists in executing carefully chosen loopiterating paths in programs. They usually pay more attention to the paths that are most likely to expose the system to unstable numerical computations.^{1}^{1}1A computation can be called numerically stable if it can be proven not to magnify approximation errors. It can be called (potentially) unstable otherwise. For critical systems, a complementary requirement is to demonstrate the infeasibility of selected paths, in order to convince a thirdparty certification authority that certain unsafe behaviors of the systems cannot be reached. As a consequence, software engineers face two difficult problems:

How to accurately predict the expected output of a given floatingpoint computation?^{2}^{2}2This is the wellknown oracle problem (see Weyuker82).

How to find a test input that is able to exercise a given path, the execution of which depends on the results of floatingpoint computations, or to guarantee that such a path is infeasible?
The first problem has been well addressed in the literature (Kuliamin10) through several techniques. AmmannK88 report on a technique known as the data diversity approach, which uses multiple related program executions of a program to check their results. Metamorphic testing (ChanCCLY98) generalizes this technique by using known numerical relations of the function implemented by a program to check the results of two or more executions. Goubault01 proposes using the abstract interpretation framework (CousotC77) to estimate the deviation of the floatingpoint results with respect to an interpretation over the reals. ScottJDC07 propose using a probabilistic approach to estimate roundoff error propagation. More recently, TangBLS10 propose to exploit perturbation techniques to evaluate the stability of a numerical program. In addition to these approaches, it is possible to use a (partial) specification, a prototype or an old implementation in order to predict the results for a new implementation.
In contrast, the second problem received only little attention. Beyond the seminal work of MillerS76, who proposed to guide the search of floatingpoint inputs to execute a selected path, few approaches try to exactly reason over floatingpoint computations. The work of MillerS76 paved the way to the development of searchbased test data generation techniques, which consist in searching test inputs by minimizing a cost function, evaluating the distance between the currently executed path and a targeted selected path (Korel90a, McMinn04, LakhotiaHG10). Although these techniques enable quick and efficient coverage of testing criteria such as “all decisions,” they are unfortunately sensitive to the rounding errors incurred in the computation of the branch distance (Arcuri09). Moreover, searchbased test data generation cannot be used to study path feasibility, i.e., to decide whether a possible execution path involving floatingpoint computations is feasible or not in the program. In addition, these techniques can be stuck in local minima without being able to provide a meaningful result (Arcuri09). An approach to tackle these problems combines program execution and symbolic reasoning (GodefroidKS05), and requires solving constraints over floatingpoint numbers in order to generate test inputs that exercise a selected behavior of the program under test. However, solving floatingpoint constraints is hard and requires dedicated filtering algorithms (MichelRL01, Michel02). According to our knowledge, this approach is currently implemented in four solvers only: ECLAIR^{3}^{3}3http://bugseng.com/products/eclair, FPCS (BlancBGJ+06), FPSE^{4}^{4}4http://www.irisa.fr/celtique/carlier/fpse.html (BotellaGM06), and GATeL, a test data generator for Lustre programs (MarreB05). It is worth noticing that existing constraint solvers dedicated to continuous domains (such as, e.g., RealPaver (GranvilliersB06), IBEX and Quimper (ChabertJ09) or ICOS (Lebbah09)) correctly handle real or rational computations, but they cannot preserve the solutions of constraints over floatingpoint computations in all cases (see Section LABEL:sec:discussion for more on this subject). “Surprising” properties of floatingpoint computations such as absorption and cancellation (Goldberg91) show that the rounding operations can severely compromise the preservation of the computation semantics between the reals and the floats.
Example 1
Consider the C functions f1
and f2
:
For both functions, let us ask the question whether the paths
traversing lines 23456 are feasible.
The condition that must be satisfied in order for a certain path to be
traversed is called path condition.
For f1
, the path conditions
and ,
which on the reals are equivalent to
whereas on the floats they have no solution.
Conversely, for f2
the path conditions are
and , which
have no solutions on the reals but are satisfied by all IEEE 754
single precision floatingpoint numbers in the range .
To illustrate the concrete problem raised by floatingpoint computations
in program verification settings,
consider the code depicted in Listing 1.
It is a somewhat reduced version of a realworld example extracted
from a critical embedded system.^{5}^{5}5The original source
code is available at http://paparazzi.enac.fr, file
sw/airborne/modules/cam_control/cam.c, last checked on November 29, 2013.
In order to gain confidence in this code,
a testsuite should be created that contains
enough test cases to achieve a specified level of coverage.
The basic coverage criterion is “all statements”, and
prescribes that each statement is reached at least once by at least
one test.^{6}^{6}6There exist more sophisticate and, correspondingly,
more challenging coverage criteria, such as the alreadymentioned
“all decisions” and Modified Condition Decision Coverage
(MCDC, see AmmannOH03).
For each statement, a set of constraints is defined that encodes the
reachability of the statement and then solution is attempted:
if one solution is found, then such a solution, projected on the
explicit inputs (read parameters) and the implicit inputs (read global
variables) of the function, constitutes the input part of a test case;
if it is determined that a solution does not exist, then the statement
is dead code;
if the solution process causes a timeout, then we don’t know.
For example, if the CAM_PAN_NEUTRAL
is defined to expand to the integer
literal 5
(or, for that matter, 45
or many other values),
then we can prove that the statements in lines 45 and 47 are
unreachable.^{7}^{7}7All the experiments
mentioned in this paper have been conducted using the ECLAIR system.
The presence of dead code is not acceptable for several
industry standards such as MISRA C (MISRAC2012),
MISRA C++ (MISRACPP2008),
and JSF C++ (JSFCPP2005).
Another application of the same technology is the proof of absence of
runtime anomalies, such as overflows or the unwanted generation of infinities.
For each operation possibly leading to such an anomaly, a constraint system
is set up that encodes the conditions under which the anomaly takes place.
A solution is then searched:
if it is found, then we have the proof that the code is unsafe;
if it can be determined that no solution exists, then we know the code
is safe;
otherwise we don’t know.
For the code of Listing 1,
if the CAM_PAN_NEUTRAL
is defined 5
or 45
,
then we can prove that no runtime anomaly is possible, whatever
is the value of variable cam_pan_c
when the function
is invoked.
Now, let us take another point of view and consider that
the macro CAM_PAN_NEUTRAL
is not defined in the same file,
as it is a configuration parameter. Its definition is (partially)
validated by means of preprocessor directives as shown in the listing
at lines 13–20: these directives enough to protect against dangerous
definitions of CAM_PAN_NEUTRAL
?
We can provide an answer to this question by treating CAM_PAN_NEUTRAL
as a variable of any type that is compatible with its uses in the code.
This way we discover that, if CAM_PAN_NEUTRAL
is defined to expand
to, e.g., 2147483558
, then we will have an overflow in line 36 on
a 32bit machine. Most compilers will catch this particular mistake,
but this will not be the case if someone, someday, defines
CAM_PAN_NEUTRAL
as, e.g.,
+0x1ca5dc14c57550.p81
(roughly ):
then in line 34 an infinity will be generated, something
that in the aviation and other industries is unacceptable.
One might also wonder whether one can define CAM_PAN_NEUTRAL
as a double precision floatingpoint literal
so that the denominator of divisions in lines 36 and 38 can be so small
to cause an overflow: constraint solving over floatingpoint numbers
is able to answer negatively to this question.
A promising approach to improve the filtering capabilities of constraints over floatingpoint variables consists in using some peculiar numerical properties of floatingpoint numbers. For linear constraints, this led to a relaxation technique where floatingpoint numbers and constraints are converted into constraints over the reals by using linear programming approaches (BelaidMR12). For intervalbased consistency approaches, MarreM10 identified a property of the representation of floatingpoint numbers and proposed to exploit it in filtering algorithms for addition and subtraction constraints. CarlierG11 proposed a reformulation of the MarreMichel property in terms of “filtering by maximum ULP” (Units in the Last Place) that is generalizable to multiplication and division constraints.
BagnaraCGG13ICST addressed the question of whether the MarreMichel property can be useful for the automatic solution of realistic test input generation problems: they sketched (without proofs) a reformulation and correction of the filtering algorithm proposed in (MarreM10), along with a uniform framework that generalizes the property identified by Marre and Michel to the case of multiplication and division. Most importantly, (BagnaraCGG13ICST) presented the implementation of filtering by maximum ULP in FPSE and some of its critical design choices, and an experimental evaluation on constraint systems that have been extracted from programs engaging into intensive floatingpoint computations. These results show that the MarreMichel property and its generalization are effective, practical properties for solving constraints over the floats with an acceptable overhead. The experiments reported in (BagnaraCGG13ICST) showed that improvement of filtering procedures with these techniques brings speedups of the overall constraint solving process that can be substantial (we have observed up to an order of magnitude); in the cases where such techniques do not allow significant extrapruning, the slowdowns are always very modest (up to a few percent on the overall solution time).
The present paper is, on the one hand, the theoretical counterpart of (BagnaraCGG13ICST) in that all the results are thoroughly proved; on the other hand, this paper generalizes and extends (BagnaraCGG13ICST) as far as the handling of subnormals and floatingpoint division are concerned. More precisely, the contributions of the paper are:

a uniform framework for filtering by maximum ULP is thoroughly defined and justified;

the framework is general enough to encompass all floatingpoint arithmetic operations and subnormals (the latter are not treated in (BagnaraCGG13ICST));

a second indirect projection by maximum ULP for division (not present in any previous work);

all algorithms only use floatingpoint machine arithmetic operations on the same formats used by the analyzed computations.
The plan of the paper is as follows. Next section presents the IEEE 754 standard of binary floatingpoint numbers and introduces the notions and notations used throughout the paper. Section Document recalls the basic principles of intervalbased consistency techniques over floatingpoint variables and constraints. Section LABEL:sec:maxulp presents our generalization of the MarreMichel property along with a precise definition and motivation of all the required algorithms. Section LABEL:sec:discussion discusses related work. Section LABEL:sec:conclusion concludes the main body of the paper. The most technical proofs are available in the Appendix.
In this section we recall some preliminary concepts and introduce the used notation.
This section recalls the arithmetic model specified by the IEEE 754 standard for binary floatingpoint arithmetic (IEEE7542008). Note that, although the IEEE 754 standard also specifies formats and methods for decimal floatingpoint arithmetic, in this paper we only deal with binary floatingpoint arithmetic.
IEEE 754 binary floatingpoint formats are uniquely identified by quantities: , the number of significant digits (precision); , the maximum exponent; , the minimum exponent.^{8}^{8}8Note that, although the IEEE 754 formats have , we never use this property and decided to keep the extragenerality, which might be useful to accommodate other formats. The single precision format has and , the double precision format has and (IEEE 754 also defines extended precision formats). A finite, nonzero IEEE 754 floatingpoint number has the form where is the sign bit, is the hidden bit, is the bit significand and the exponent is also denoted by or . Hence the number is positive when and negative when . is termed “hidden bit” because in the binary interchange format encodings it is not explicitly represented, its value being encoded in the exponent (IEEE7542008).
Each format defines several classes of numbers: normal numbers, subnormal numbers, signed zeros, infinities and NaNs (Not a Number). The smallest positive normal floatingpoint number is and the largest is ; normal numbers have the hidden bit . The nonzero floatingpoint numbers whose absolute value is less than are called subnormals: they always have exponent equal to and fewer than significant digits as their hidden bit is . Every finite floatingpoint number is an integral multiple of the smallest subnormal . There are two infinities, denoted by and , and two signed zeros, denoted by and : they allow some algebraic properties to be maintained (Goldberg91).^{9}^{9}9Examples of such properties are and for . NaNs are used to represent the results of invalid computations such as a division of two infinities or a subtraction of infinities with the same sign: they allow the program execution to continue without being halted by an exception.
IEEE 754 defines five rounding directions: toward negative infinity (roundTowardNegative or, briefly, down), toward positive infinity (roundTowardPositive, a.k.a. up), toward zero (roundTowardZero, a.k.a. chop) and toward the nearest representable value (a.k.a. near); the latter comes in two flavors that depend on different tiebreak rules for numbers exactly halfway between two representable numbers: roundTiesToEven (a.k.a. tailtoeven) or roundTiesToAway (a.k.a. tailtoaway) in which values with even significand or values away from zero are preferred, respectively. This paper is only concerned with roundTiesToEven, which is, by far, the most widely used. The roundTiesToEven value of a real number will be denoted by .
The most important requirement of IEEE 754 arithmetic is the accuracy of floatingpoint computations: add, subtract, multiply, divide, square root, remainder, conversion and comparison operations must deliver to their destination the exact result rounded as per the rounding mode in effect and the format of the destination. It is said that these operations are “correctly rounded.”
The accuracy requirement of IEEE 754 can still surprise the average programmer: for example the single precision, roundtonearest addition of and (both numbers can be exactly represented) gives , i.e., the second operand is absorbed. The maximum error committed by representing a real number with a floatingpoint number under some rounding mode can be expressed in terms of the function (Muller05). Its value on is about for the single precision format.
The set of real numbers is denoted by while denotes a subset of the binary floatingpoint numbers, defined from a given IEEE 754 format, which includes the infinities and , the signed zeros and , but neither subnormal numbers nor NaNs. Subnormals are introduced in the set . In some cases, the exposition can be much simplified by allowing the of to be , i.e., by considering an idealized set of floats where the exponent is unbounded. Among the advantages is the fact that subnormals in can be represented as normal floatingpoint numbers in . Given a set of floatingpoint numbers , denotes the “nonnegative” subset of , i.e., with .
For a finite, nonzero floatingpoint number , we will write (resp., ) to signify that the least significant digit of ’s significand is (resp., ).
When the format is clear from the context, a real decimal constant (such as ) denotes the corresponding roundTiesToEven floatingpoint value (i.e., for ).
Henceforth, for , (resp., ) denotes the smallest (resp., greatest) floatingpoint number strictly greater (resp., smaller) than with respect to the considered IEEE 754 format. Of course, we have and .
Binary arithmetic operations over the floats will be denoted by , , and , corresponding to , , and over the reals, respectively. According to IEEE 754, they are defined, under roundTiesToEven, by
As IEEE 754 floatingpoint numbers are closed under negation, we denote the negation of simply by . Note that negation is a bijection. The symbol denotes any of , , or . A floatingpoint variable x is associated with an interval of possible floatingpoint values; we will write , where and denote the smallest and greatest value of the interval, and either or . Note that is not an interval, whereas is the interval denoting the set of floatingpoint numbers .
In this section, we briefly recall the basic principles of intervalbased consistency techniques over floatingpoint variables and constraints.