Example 1

Exploiting Binary Floating-Point Representations

for Constraint Propagation:

The Complete Unabridged Version

Roberto Bagnara

BUGSENG srl and Dept. of Mathematics and Computer Science, University of Parma, Italy,

bagnara@cs.unipr.it, http://www.cs.unipr.it/~bagnara

Matthieu Carlier

INRIA Rennes Bretagne Atlantique, France

Roberta Gori

Dept. of Computer Science, University of Pisa, Italy,

gori@di.unipi.it, http://www.di.unipi.it/~gori

Arnaud Gotlieb

Certus Software V&V Center, SIMULA Research Laboratory, Norway,

arnaud@simula.no, http://simula.no/people/arnaud

Floating-point computations are quickly finding their way in the design of safety- and mission-critical systems, despite the fact that designing floating-point algorithms is significantly more difficult than designing integer algorithms. For this reason, verification and validation of floating-point computations is a hot research topic. An important verification technique, especially in some industrial sectors, is testing. However, generating test data for floating-point intensive programs proved to be a challenging problem. Existing approaches usually resort to random or search-based test data generation, but without symbolic reasoning it is almost impossible to generate test inputs that execute complex paths controlled by floating-point computations. Moreover, as constraint solvers over the reals or the rationals do not natively support the handling of rounding errors, the need arises for efficient constraint solvers over floating-point domains. In this paper, we present and fully justify improved algorithms for the propagation of arithmetic IEEE 754 binary floating-point constraints. The key point of these algorithms is a generalization of an idea by B. Marre and C. Michel that exploits a property of the representation of floating-point numbers.

Key words: software verification; testing; floating-point numbers; constraint solving


During the last decade, the use of floating-point computations in the design of critical systems has become increasingly acceptable. Even in the civil and military avionics domain, which are among the most critical domains for software, floating-point numbers are now seen as a sufficiently-safe, faster and cheaper alternative to fixed-point arithmetic. To the point that, in modern avionics, floating-point is the norm rather than the exception (BurdyDL12).

Acceptance of floating-point computations in the design of critical systems took a long time. In fact, rounding errors can cause subtle bugs which are often missed by non experts (Monniaux08), and can lead to catastrophic failures. For instance, during the first Persian Gulf War, the failure of a Patriot missile battery in Dhahran was traced to an accumulating rounding error in the continuous execution of tracking and guidance software: this failure prevented the interception of an Iraqi Scud that hit the barracks in Dhahran, Saudi Arabia, killing 28 US soldiers (Skeel92). A careful analysis of this failure revealed that, even though the rounding error obtained at each step of the floating-point computation was very small, the propagation during a long loop-iterating path could lead to dramatic imprecision.

Adoption of floating-point computations in critical systems involves the use of thorough unit testing procedures that are able to exercise complex chains of floating-point operations. In particular, a popular practice among software engineers in charge of the testing of floating-point-intensive computations consists in executing carefully chosen loop-iterating paths in programs. They usually pay more attention to the paths that are most likely to expose the system to unstable numerical computations.111A computation can be called numerically stable if it can be proven not to magnify approximation errors. It can be called (potentially) unstable otherwise. For critical systems, a complementary requirement is to demonstrate the infeasibility of selected paths, in order to convince a third-party certification authority that certain unsafe behaviors of the systems cannot be reached. As a consequence, software engineers face two difficult problems:

  1. How to accurately predict the expected output of a given floating-point computation?222This is the well-known oracle problem (see Weyuker82).

  2. How to find a test input that is able to exercise a given path, the execution of which depends on the results of floating-point computations, or to guarantee that such a path is infeasible?

The first problem has been well addressed in the literature (Kuliamin10) through several techniques. AmmannK88 report on a technique known as the data diversity approach, which uses multiple related program executions of a program to check their results. Metamorphic testing (ChanCCLY98) generalizes this technique by using known numerical relations of the function implemented by a program to check the results of two or more executions. Goubault01 proposes using the abstract interpretation framework (CousotC77) to estimate the deviation of the floating-point results with respect to an interpretation over the reals. ScottJDC07 propose using a probabilistic approach to estimate round-off error propagation. More recently, TangBLS10 propose to exploit perturbation techniques to evaluate the stability of a numerical program. In addition to these approaches, it is possible to use a (partial) specification, a prototype or an old implementation in order to predict the results for a new implementation.

In contrast, the second problem received only little attention. Beyond the seminal work of MillerS76, who proposed to guide the search of floating-point inputs to execute a selected path, few approaches try to exactly reason over floating-point computations. The work of MillerS76 paved the way to the development of search-based test data generation techniques, which consist in searching test inputs by minimizing a cost function, evaluating the distance between the currently executed path and a targeted selected path (Korel90a, McMinn04, LakhotiaHG10). Although these techniques enable quick and efficient coverage of testing criteria such as “all decisions,” they are unfortunately sensitive to the rounding errors incurred in the computation of the branch distance (Arcuri09). Moreover, search-based test data generation cannot be used to study path feasibility, i.e., to decide whether a possible execution path involving floating-point computations is feasible or not in the program. In addition, these techniques can be stuck in local minima without being able to provide a meaningful result (Arcuri09). An approach to tackle these problems combines program execution and symbolic reasoning (GodefroidKS05), and requires solving constraints over floating-point numbers in order to generate test inputs that exercise a selected behavior of the program under test. However, solving floating-point constraints is hard and requires dedicated filtering algorithms (MichelRL01, Michel02). According to our knowledge, this approach is currently implemented in four solvers only: ECLAIR333http://bugseng.com/products/eclair, FPCS (BlancBGJ+06), FPSE444http://www.irisa.fr/celtique/carlier/fpse.html (BotellaGM06), and GATeL, a test data generator for Lustre programs (MarreB05). It is worth noticing that existing constraint solvers dedicated to continuous domains (such as, e.g., RealPaver (GranvilliersB06), IBEX and Quimper (ChabertJ09) or ICOS (Lebbah09)) correctly handle real or rational computations, but they cannot preserve the solutions of constraints over floating-point computations in all cases (see Section LABEL:sec:discussion for more on this subject). “Surprising” properties of floating-point computations such as absorption and cancellation (Goldberg91) show that the rounding operations can severely compromise the preservation of the computation semantics between the reals and the floats.

Example 1

Consider the C functions f1 and f2:

1float f1(float x) {                                          float f2(float x) {
2  float y = 1.0e12F;                                           float y = 1.0e12F;
3  if (x < 10000.0F)                                            if (x > 0.0F)
4     z = x + y;                                                  z = x + y;
5  if (z > y)                                                   if(z == y)

For both functions, let us ask the question whether the paths traversing lines 2-3-4-5-6 are feasible. The condition that must be satisfied in order for a certain path to be traversed is called path condition. For f1, the path conditions and , which on the reals are equivalent to whereas on the floats they have no solution. Conversely, for f2 the path conditions are and , which have no solutions on the reals but are satisfied by all IEEE 754 single precision floating-point numbers in the range .

1#define MAX_PPRZ 9600
4#ifndef CAM_PAN_MAX
5#define CAM_PAN_MAX 90
7#ifndef CAM_PAN_MIN
8#define CAM_PAN_MIN -90
10#define M_PI 3.14159265358979323846
11#define RadOfDeg(x) ((x) * (M_PI/180.))
15#error CAM_PAN_MAX has to be different from CAM_PAN_NEUTRAL
18#error CAM_PAN_MIN has to be different from CAM_PAN_NEUTRAL
22float cam_pan_c;
24void cam_angles( void ) {
25  float cam_pan = 0;
26  if (cam_pan_c > RadOfDeg(CAM_PAN_MAX)) {
27    cam_pan_c = RadOfDeg(CAM_PAN_MAX);
28  } else {
29    if (cam_pan_c < RadOfDeg(CAM_PAN_MIN))
30      cam_pan_c = RadOfDeg(CAM_PAN_MIN);
31  }
34  float pan_diff = cam_pan_c - RadOfDeg(CAM_PAN_NEUTRAL);
35  if (pan_diff > 0)
36    cam_pan = MAX_PPRZ * (pan_diff / (RadOfDeg(CAM_PAN_MAX - CAM_PAN_NEUTRAL)));
37  else
38    cam_pan = MIN_PPRZ * (pan_diff / (RadOfDeg(CAM_PAN_MIN - CAM_PAN_NEUTRAL)));
40  cam_pan = ((float)RadOfDeg(cam_pan_c - CAM_PAN_MIN))
41              * ((float)MAX_PPRZ / (float)RadOfDeg(CAM_PAN_MAX-CAM_PAN_MIN) );
44  if (cam_pan < MIN_PPRZ)
45    cam_pan = MIN_PPRZ;
46  else if (cam_pan > MAX_PPRZ)
47    cam_pan = MAX_PPRZ;
Listing 1: Code excerpted from a real-world avionic library

To illustrate the concrete problem raised by floating-point computations in program verification settings, consider the code depicted in Listing 1. It is a somewhat reduced version of a real-world example extracted from a critical embedded system.555The original source code is available at http://paparazzi.enac.fr, file sw/airborne/modules/cam_control/cam.c, last checked on November 29, 2013. In order to gain confidence in this code, a test-suite should be created that contains enough test cases to achieve a specified level of coverage. The basic coverage criterion is “all statements”, and prescribes that each statement is reached at least once by at least one test.666There exist more sophisticate and, correspondingly, more challenging coverage criteria, such as the already-mentioned “all decisions” and Modified Condition Decision Coverage (MCDC, see AmmannOH03). For each statement, a set of constraints is defined that encodes the reachability of the statement and then solution is attempted: if one solution is found, then such a solution, projected on the explicit inputs (read parameters) and the implicit inputs (read global variables) of the function, constitutes the input part of a test case; if it is determined that a solution does not exist, then the statement is dead code; if the solution process causes a timeout, then we don’t know. For example, if the CAM_PAN_NEUTRAL is defined to expand to the integer literal 5 (or, for that matter, 45 or many other values), then we can prove that the statements in lines 45 and 47 are unreachable.777All the experiments mentioned in this paper have been conducted using the ECLAIR system. The presence of dead code is not acceptable for several industry standards such as MISRA C (MISRA-C-2012), MISRA C++ (MISRA-CPP-2008), and JSF C++ (JSF-CPP-2005).

Another application of the same technology is the proof of absence of run-time anomalies, such as overflows or the unwanted generation of infinities. For each operation possibly leading to such an anomaly, a constraint system is set up that encodes the conditions under which the anomaly takes place. A solution is then searched: if it is found, then we have the proof that the code is unsafe; if it can be determined that no solution exists, then we know the code is safe; otherwise we don’t know. For the code of Listing 1, if the CAM_PAN_NEUTRAL is defined 5 or 45, then we can prove that no run-time anomaly is possible, whatever is the value of variable cam_pan_c when the function is invoked.

Now, let us take another point of view and consider that the macro CAM_PAN_NEUTRAL is not defined in the same file, as it is a configuration parameter. Its definition is (partially) validated by means of preprocessor directives as shown in the listing at lines 13–20: these directives enough to protect against dangerous definitions of CAM_PAN_NEUTRAL? We can provide an answer to this question by treating CAM_PAN_NEUTRAL as a variable of any type that is compatible with its uses in the code. This way we discover that, if CAM_PAN_NEUTRAL is defined to expand to, e.g., -2147483558, then we will have an overflow in line 36 on a 32-bit machine. Most compilers will catch this particular mistake, but this will not be the case if someone, someday, defines CAM_PAN_NEUTRAL as, e.g., +0x1ca5dc14c57550.p81 (roughly ): then in line 34 an infinity will be generated, something that in the aviation and other industries is unacceptable. One might also wonder whether one can define CAM_PAN_NEUTRAL as a double precision floating-point literal so that the denominator of divisions in lines 36 and 38 can be so small to cause an overflow: constraint solving over floating-point numbers is able to answer negatively to this question.

A promising approach to improve the filtering capabilities of constraints over floating-point variables consists in using some peculiar numerical properties of floating-point numbers. For linear constraints, this led to a relaxation technique where floating-point numbers and constraints are converted into constraints over the reals by using linear programming approaches (BelaidMR12). For interval-based consistency approaches, MarreM10 identified a property of the representation of floating-point numbers and proposed to exploit it in filtering algorithms for addition and subtraction constraints. CarlierG11 proposed a reformulation of the Marre-Michel property in terms of “filtering by maximum ULP” (Units in the Last Place) that is generalizable to multiplication and division constraints.

BagnaraCGG13ICST addressed the question of whether the Marre-Michel property can be useful for the automatic solution of realistic test input generation problems: they sketched (without proofs) a reformulation and correction of the filtering algorithm proposed in (MarreM10), along with a uniform framework that generalizes the property identified by Marre and Michel to the case of multiplication and division. Most importantly, (BagnaraCGG13ICST) presented the implementation of filtering by maximum ULP in FPSE and some of its critical design choices, and an experimental evaluation on constraint systems that have been extracted from programs engaging into intensive floating-point computations. These results show that the Marre-Michel property and its generalization are effective, practical properties for solving constraints over the floats with an acceptable overhead. The experiments reported in (BagnaraCGG13ICST) showed that improvement of filtering procedures with these techniques brings speedups of the overall constraint solving process that can be substantial (we have observed up to an order of magnitude); in the cases where such techniques do not allow significant extra-pruning, the slowdowns are always very modest (up to a few percent on the overall solution time).

The present paper is, on the one hand, the theoretical counterpart of (BagnaraCGG13ICST) in that all the results are thoroughly proved; on the other hand, this paper generalizes and extends (BagnaraCGG13ICST) as far as the handling of subnormals and floating-point division are concerned. More precisely, the contributions of the paper are:

  1. a uniform framework for filtering by maximum ULP is thoroughly defined and justified;

  2. the framework is general enough to encompass all floating-point arithmetic operations and subnormals (the latter are not treated in (BagnaraCGG13ICST));

  3. a second indirect projection by maximum ULP for division (not present in any previous work);

  4. all algorithms only use floating-point machine arithmetic operations on the same formats used by the analyzed computations.

The plan of the paper is as follows. Next section presents the IEEE 754 standard of binary floating-point numbers and introduces the notions and notations used throughout the paper. Section Document recalls the basic principles of interval-based consistency techniques over floating-point variables and constraints. Section LABEL:sec:max-ulp presents our generalization of the Marre-Michel property along with a precise definition and motivation of all the required algorithms. Section LABEL:sec:discussion discusses related work. Section LABEL:sec:conclusion concludes the main body of the paper. The most technical proofs are available in the Appendix.

In this section we recall some preliminary concepts and introduce the used notation.

This section recalls the arithmetic model specified by the IEEE 754 standard for binary floating-point arithmetic (IEEE-754-2008). Note that, although the IEEE 754 standard also specifies formats and methods for decimal floating-point arithmetic, in this paper we only deal with binary floating-point arithmetic.

IEEE 754 binary floating-point formats are uniquely identified by quantities: , the number of significant digits (precision); , the maximum exponent; , the minimum exponent.888Note that, although the IEEE 754 formats have , we never use this property and decided to keep the extra-generality, which might be useful to accommodate other formats. The single precision format has and , the double precision format has and (IEEE 754 also defines extended precision formats). A finite, non-zero IEEE 754 floating-point number has the form where is the sign bit, is the hidden bit, is the -bit significand and the exponent is also denoted by or . Hence the number is positive when and negative when . is termed “hidden bit” because in the binary interchange format encodings it is not explicitly represented, its value being encoded in the exponent (IEEE-754-2008).

Each format defines several classes of numbers: normal numbers, subnormal numbers, signed zeros, infinities and NaNs (Not a Number). The smallest positive normal floating-point number is and the largest is ; normal numbers have the hidden bit . The non-zero floating-point numbers whose absolute value is less than are called subnormals: they always have exponent equal to and fewer than significant digits as their hidden bit is . Every finite floating-point number is an integral multiple of the smallest subnormal . There are two infinities, denoted by and , and two signed zeros, denoted by and : they allow some algebraic properties to be maintained (Goldberg91).999Examples of such properties are and for . NaNs are used to represent the results of invalid computations such as a division of two infinities or a subtraction of infinities with the same sign: they allow the program execution to continue without being halted by an exception.

IEEE 754 defines five rounding directions: toward negative infinity (roundTowardNegative or, briefly, down), toward positive infinity (roundTowardPositive, a.k.a. up), toward zero (roundTowardZero, a.k.a. chop) and toward the nearest representable value (a.k.a. near); the latter comes in two flavors that depend on different tie-break rules for numbers exactly halfway between two representable numbers: roundTiesToEven (a.k.a. tail-to-even) or roundTiesToAway (a.k.a. tail-to-away) in which values with even significand or values away from zero are preferred, respectively. This paper is only concerned with roundTiesToEven, which is, by far, the most widely used. The roundTiesToEven value of a real number will be denoted by .

The most important requirement of IEEE 754 arithmetic is the accuracy of floating-point computations: add, subtract, multiply, divide, square root, remainder, conversion and comparison operations must deliver to their destination the exact result rounded as per the rounding mode in effect and the format of the destination. It is said that these operations are “correctly rounded.”

The accuracy requirement of IEEE 754 can still surprise the average programmer: for example the single precision, round-to-nearest addition of and (both numbers can be exactly represented) gives , i.e., the second operand is absorbed. The maximum error committed by representing a real number with a floating-point number under some rounding mode can be expressed in terms of the function (Muller05). Its value on is about for the single precision format.

The set of real numbers is denoted by while denotes a subset of the binary floating-point numbers, defined from a given IEEE 754 format, which includes the infinities and , the signed zeros and , but neither subnormal numbers nor NaNs. Subnormals are introduced in the set . In some cases, the exposition can be much simplified by allowing the of to be , i.e., by considering an idealized set of floats where the exponent is unbounded. Among the advantages is the fact that subnormals in can be represented as normal floating-point numbers in . Given a set of floating-point numbers , denotes the “non-negative” subset of , i.e., with .

For a finite, non-zero floating-point number , we will write (resp., ) to signify that the least significant digit of ’s significand is  (resp., ).

When the format is clear from the context, a real decimal constant (such as ) denotes the corresponding roundTiesToEven floating-point value (i.e., for ).

Henceforth, for , (resp., ) denotes the smallest (resp., greatest) floating-point number strictly greater (resp., smaller) than with respect to the considered IEEE 754 format. Of course, we have and .

Binary arithmetic operations over the floats will be denoted by , , and , corresponding to , , and over the reals, respectively. According to IEEE 754, they are defined, under roundTiesToEven, by

As IEEE 754 floating-point numbers are closed under negation, we denote the negation of simply by . Note that negation is a bijection. The symbol denotes any of , , or . A floating-point variable x is associated with an interval of possible floating-point values; we will write , where and denote the smallest and greatest value of the interval, and either or . Note that is not an interval, whereas is the interval denoting the set of floating-point numbers .

In this section, we briefly recall the basic principles of interval-based consistency techniques over floating-point variables and constraints.

Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description