LifeJacket: Verifying precise floatingpoint optimizations in LLVM
Abstract
Optimizing floatingpoint arithmetic is vital because it is ubiquitous, costly, and used in computeheavy workloads. Implementing precise optimizations correctly, however, is difficult, since developers must account for all the esoteric properties of floatingpoint arithmetic to ensure that their transformations do not alter the output of a program. Manual reasoning is error prone and stifles incorporation of new optimizations.
We present an approach to automate reasoning about floatingpoint optimizations using satisfiability modulo theories (SMT) solvers. We implement the approach in LifeJacket, a system for automatically verifying precise floatingpoint optimizations for the LLVM assembly language. We have used LifeJacket to verify 43 LLVM optimizations and to discover \numberstringnum8 incorrect ones, including \numberstringnum3 previously unreported problems. LifeJacket is an open source extension of the Alive system for optimization verification.
0.9 \SetWatermarkTextDRAFT
Andres Nötzli and Fraser Brown Stanford University {noetzli,mlfbrown}@stanford.edu
1 Introduction
In this paper, we present LifeJacket, a system for automatically verifying floatingpoint optimizations. Floatingpoint arithmetic is ubiquitous—modern hardware architectures natively support it and programming languages treat it as a canonical representation of real numbers—but writing correct floatingpoint programs is difficult. Optimizing these programs is even more difficult. Unfortunately, despite hardware support, floatingpoint computations are still expensive, so avoiding optimization is undesirable.
Reasoning about floatingpoint optimizations and programs is difficult because of floatingpoint arithmetic’s unintuitive semantics. Floatingpoint arithmetic is inherently imprecise and lossy, and programmers must account for rounding, signed zeroes, special values, and nonassociativity goldberg1991every []. Before the standardization, a wide range of incompatible floatingpoint hardware with varying support for range, precision, and rounding existed. These implementations were not only incompatible but also had undesirable properties such as numbers that were not equal to zero for comparisons but were treated as zeros for multiplication and division severance1998interview []. The IEEE 7541985 standard and its stricter IEEE 7542008 successor were carefully designed to avoid many of these pitfalls and designed for (contrary to popular opinion, perhaps) nonexpert users. Despite these advances, program correctness and reproducibility still rests on a fragile interplay between developers, programming languages, compilers, and hardware implementations.
Compiler optimizations that alter the semantics of programs, even in subtle ways, can confuse users, make problems hard to debug, and cause cascading issues. IEEE 7542008 acknowledges this by recommending that language standards and implementations provide means to generate reproducible results for programs, independent from optimizations. In practice, many transformations that are valid for real numbers, change the precision of floatingpoint expressions. As a result, compilers optimizing floatingpoint programs face the dilemma of choosing between speed and reproducibility. They often address this dilemma by dividing floatingpoint optimizations into two groups, precise and imprecise optimizations, where imprecise optimizations are optional (e.g. the ffastmath flag in clang). While precise optimizations always produce the same result, imprecise ones produce reasonable results on common inputs (e.g. not for special values) but are arbitrarily bad in the general case. To implement precise optimizations, developers have to reason about all edge cases of floatingpoint arithmetic, making it challenging to avoid bugs.
To illustrate the challenge of developing floatingpoint optimizations, Figure 1 shows an example of an invalid transformation implemented in LLVM 3.7.1. We discuss the specification language in more detail in Section 3.2 but, at a highlevel, the transformation simplifies to , an optimization that is correct in the realm of real numbers. Because floatingpoint numbers distinguish between negative and positive zero, however, the optimization is not valid if , because the original code returns and the optimized code returns . While the zero’s sign may be insignificant for many applications, the unexpected sign change may cause a ripple effect. For example, the reciprocal of zero is defined as and .
Since reasoning manually about floatingpoint operations and optimizations is difficult, we argue that automated reasoning can help ensure correct optimizations. The goal of LifeJacket is to allow LLVM developers to automatically verify precise floatingpoint optimizations. Our work focuses on precise optimizations because they are both more amenable to verification and arguably harder to get right. LifeJacket builds on Alive lopes2015provably [], a tool for verifying LLVM optimizations, extending it with floatingpoint support.
Our contributions are as follows:

We describe the background for verifying precise floatingpoint optimizations in LLVM and propose an approach using SMT solvers.

We implemented the approach in LifeJacket, an open source fork of Alive that adds support for floatingpoint types, floatingpoint instructions, floatingpoint predicates, and certain fastmath flags.

We validated the approach by verifying 43 optimizations. LifeJacket finds 8 incorrect optimizations, including \numberstringnum3 previously unreported problems in LLVM 3.7.1.
In addition to the core contributions, our work also lead to the discovery of two issues in Z3 de2008z3 [], the SMT solver used by LifeJacket, related to floatingpoint support.
2 Related Work
Alive is a system that verifies LLVM peephole optimizations. LifeJacket is a fork of this project that extends it with support for floatingpoint arithmetic. We are not the only ones interested in verifying floatingpoint optimizations; close to the submission deadline, we found that one of the Alive authors had independently begun a reimplementation of Alive that seems to include support for floatingpoint arithmetic.^{1}^{1}1https://github.com/rutgersapl/alivenj
Our work intersects with the areas of compiler correctness, optimization correctness, and analysing floatingpoint expressions.
Research on compiler correctness has addressed floatingpoint and floatingpoint optimizations. CompCert, a formallyverified compiler, supports IEEE 7542008 floatingpoint types and implements two floatingpoint optimizations boldo2015verified []. In CompCert, developers use Coq to prove optimizations correct, while LifeJacket proves optimization correctness automatically.
Regarding optimization correctness, researchers have explored both the consequences of existing optimizations and techniques for generating new optimizations. Recent work has discussed consequences of unexpected optimizations wang2013towards []. In terms of new optimizations, STOKE schkufza2014stochastic [] is a stochastic optimizer that supports floatingpoint arithmetic and verifies instances of floatingpoint optimizations with random testing. Souper souper [] discovers new LLVM peephole optimizations using an SMT solver. Similarly, Optgen generates peephole optimizations and verifies them using an SMT solver buchwald2015optgen []. All of these approaches are concerned with the correctness of new optimizations, while our work focuses on existing ones. Vellvm, a framework for verifying LLVM optimizations and transformations using Coq, also operates on existing transformations but does not do automatic reasoning.
Researchers have explored debugging floatingpoint accuracy chiang2014efficient [] and improving the accuracy of floatingpoint expressions panchekha2015automatically []. These efforts are more closely related to imprecise optimizations and provide techniques that could be used to analyze them. Z3’s support for reasoning about floatingpoint arithmetic relies on a model construction procedure instead of naive bitblasting zeljic2014approximations [].
3 Background
Flag  Description  Formula 

nnan  Assume arguments and result are not NaN. Result undefined over NaNs.  {Verbatim} ite (or (isNaN a) (isNaN b) (isNaN r) (x (_FP ¡ebits¿ ¡sbits¿)) r 
ninf  Assume arguments and result are not . Result undefined over .  {Verbatim} ite (or (isInf a) (isInf b) (isInf r)) (x (_FP ¡ebits¿ ¡sbits¿)) r 
nsz  Allow optimizations to treat the sign of a zero argument or result as insignificant.  {Verbatim} or (a = b) (and (isZero a) (isZero b)) 
Our work verifies LLVM floatingpoint optimizations. These optimizations take place on LLVM assembly language, a humanreadable, lowlevel language. The language serves as a common representation for optimizations, transformations, and analyses. Front ends (like clang) output the language, and, later, back ends use it to generate machine code for different architectures.
Our focus is verifying peephole optimizations implemented in LLVM’s InstCombine pass. This pass replaces small subtrees in the program tree without changing the controlflow graph. Alive already verifies some InstCombine optimizations, but it does not support optimizations involving floatingpoint arithmetic. Instead of building LifeJacket from scratch, we extends Alive with the machinery to verify floatingpoint optimizations. To give the necessary context for discussing our implementation in Section 4, we describe LLVM’s floatingpoint types and instructions and give a brief overview of Alive.
3.1 Floatingpoint arithmetic in LLVM
In the following, we discuss LLVM’s semantics of floatingpoint types and instructions. The information is largely based on the LLVM Language Reference Manual for LLVM 3.7.1 llvmlangref [] and the IEEE 7542008 standard. For completeness, we note that the language reference does not explicitly state that LLVM floatingpoint arithmetic is based on IEEE 754. However, the language reference refers to the IEEE standard multiple times, and LLVM’s floatingpoint software implementation APFloat is explicitly based on the standard.
Floatingpoint types
LLVM defines six different floatingpoint types with bitwidths ranging from 16 bit to 128 bit. Floatingpoint values are stored in the IEEE binary interchange format, which encodes them in three parts: the sign , the exponent and the significand . The value of a normal floatingpoint number is given by: , where and is the number of bits in the exponent. The range of the exponents for normal floatingpoint numbers is . Exponents outside of this range are used to encode special values: subnormal numbers, NotaNumber values (NaNs), and infinities.
Floatingpoint zeros are signed, meaning that and are distinct. While most operations ignore the sign of a zero, the sign has an observable effect in some situations: a division by zero (generally) returns or depending on the zero’s sign, for example. As a consequence, does not imply . If and , is true, since floating point . On the other hand, is false, since .
Infinities () are used to represent an overflow or a division by zero. They are encoded by setting and . Subnormal numbers, on the other hand, are numbers with exponents below the minimum exponent; normal floatingpoint numbers have an implicit leading in the significand that prevents them from representing these numbers. The IEEE standard defines the value for subnormal numbers as: , where .
NaNs are used to represent the result of an invalid operation (such as ) and are described by and a nonzero . There are two types of NaNs: quiet NaNs (qNaNs) and signalling NaNs (sNaNs). The first bit in the significand determines the type of NaN ( in the case of a qNaN) and the remaining bits can be used to encode debug information. Operations generally propagate qNaNs and quiet sNaNs: If one of the operands is qNaN, the result is qNaN, if the operand is an sNaN, it is quieted by setting the first bit to .
Floatingpoint exceptions occur in situations like division by zero or computation involving an sNaN. By default, floatingpoint exceptions do not alter controlflow but raise a status flag and return a default result (e.g. a qNaN).
Floatingpoint instructions
In its assembly language, LLVM defines several instructions for binary floatingpoint operations (fadd, fsub, fmul, fdiv, …), conversion instructions (fptrunc, fpext, fptoui, uitofp, …), and allows floatingpoint arguments in other operations (e.g. select). We assert that floatingpoint instructions cannot generate poison values (values that cause undefined behavior for instructions that depend on them) or result in undefined behavior. The documentation is not entirely clear but our interpretation is that undefined behavior does not occur in the absence of sNaNs and that sNaNs are not fully supported.
While IEEE 7542008 defines different rounding modes, LLVM does not yet allow users to specify them. As a consequence, the rounding performed by fptrunc (casting a floatingpoint value to a smaller floatingpoint type) is undefined for inexact results.
Fastmath flags
Some programs either do not depend on the exact semantics of special floatingpoint values or do not expect special values (such as NaN) to occur. To specify these cases, LLVM binary operators can provide fastmath flags, which allow LLVM to do additional optimizations with the knowledge that special values will not occur. Table 1 summarizes the fastmath flags that LifeJacket supports. There are two additional flags, arcp (allows replacing arguments of a division with the reciprocal) and fast (allows imprecise optimizations), that we do not support.
Discussion
The properties of floatingpoint arithmetic discussed in this section hint at how difficult it is to manually reason about floatingpoint optimizations. The floatingpoint standard is complex, so compilers do not always follow it completely—as we mentioned earlier, LLVM does not currently support different rounding modes.^{2}^{2}2More details: http://lists.llvm.org/pipermail/llvmdev/2016February/094869.html. Similarly, it does not yet support access to the floatingpoint environment, which makes reliable checks for floatingpoint exceptions in clang impossible, for example. This runs counter to the IEEE standard, which defines reproducability as including “invalid operation,” “division by zero,” and “overflow” exceptions.
3.2 Verifying transformations with Alive
Alive is a tool that verifies peephole optimizations on LLVM’s intermediate representation; these optimizations are expressed (as input) in a domainspecific language. At a high level, verifying an optimization with Alive takes the following steps:

The user specifies a new or an existing LLVM optimization using the Alive language.

Alive translates the optimization into a series of SMT queries that express the equivalence of the source and the target.

Alive uses Z3, an SMT solver, to check whether any combination of values makes the source and target disagree. If the optimization is incorrect, Alive returns a counterexample that breaks the optimization.
Alive specializes in peephole optimizations that are highly local and do not alter the controlflow graph of a program. This type of optimization is performed by the LLVM InstCombine pass in lib/Transforms/InstCombine and InstructionSimplify in lib/Analysis.
Alive can also generate code for an optimizer pass that performs all of the verified optimizations. We do not discuss this feature further since LifeJacket does not support it for floatingpoint optimizations. In the following, we discuss the Alive language and the role of SMT solvers in proving optimization correctness.
Specifying transformations with the Alive language
In the domainspecific Alive language, each transformation consists of a list of preconditions, a source template, and a target template. Alive verifies whether it is safe to replace the instructions in the source template with the instructions in the target given that the preconditions hold. Figure 1 is an example of a transformation in the Alive language. This transformation has no preconditions, so it always applies. The instructions above the “=>” delimiter are the source template, while the target template are below.
Preconditions are logical expressions enforced by the compiler at compiletime and Alive takes them for granted. The precondition isNormal(%x), for example, expresses the fact that an optimization only holds when %x is a normal floatingpoint value.
Alive interprets the instructions in the sources and targets as expression trees, so the order of instructions does not matter, only the dependencies. Verifying the equivalence of the source and the target is done on the root of the trees. The arguments for instructions are either inputs (e.g. %x), constant expressions (e.g. C), or immediate values (e.g. 0.0). Inputs model registers, constant expressions correspond to computations that LLVM performs at compiletime, and immediate values are values known at verification time. Constant expressions consist of constant functions and compiletime constants. Inputs and constant expressions can be subjects for predicates in the precondition.
In contrast to actual LLVM code, the Alive language does not require type information for instructions and inputs. Instead, it uses the types expected by instructions to restrict types and bitwidths of types. Then, it issues an SMT query that encodes these constraints to infer all possible types and sizes of registers, constants, and values. This mirrors the fact that LLVM optimizations often apply to multiple bitwidths and makes specifying optimizations less repetitive. Alive instantiates the source and target templates with the possible type and size combinations and verifies each instance.
Undefined values (undef) in LLVM represent input values of arbitrary bitpatterns when used and may be of any type. For each undef value in the target template, Alive has to verify that any value can be produced and for each undef value in the source, Alive may assume any convenient value. Figure 2 is a known incorrect optimization in LLVM that LifeJacket confirms and that illustrates this concept: The source template cannot produce all possible bitpatterns, so it cannot be replaced with undef.^{3}^{3}3Discussion on this optimization: https://groups.google.com/d/topic/llvmdev/iRb0gxroT9o/discussion
Verifying transformations with SMT solvers
Alive translates the source and target template into SMT formulas. For each possible combination of variable types in the templates, it creates SMT formulas for definedness constraints, poisonfree constraints, and the execution values for the source and target. Alive checks the definedness and poisonfree constraints of the source and target for consistency. These checks are not directly relevant to floatingpoint arithmetic, so we do not discuss them further. Instead, we deal more directly with the execution values of the source and target.
An optimization is only correct if the source and the target always produce the same value. To check this property, Alive asks an SMT solver to verify that is unsatisfiable—that there is no assignment that can make the formula true. If there is, the optimization is incorrect: there is an assignment for which the source value is different from the target value. When Alive encounters an incorrect optimization, it uses the output of the SMT solver to return a counterexample in the form of input and constant assignments that lead to different source and target values.
Ultimately, Alive relies on Z3 to determine whether an optimization is correct (by answering the SMT queries). LifeJacket would have been impossible without Z3’s floatingpoint support, which was added in version 4.4.0 by implementing the SMTLIB standard for floatingpoint arithmetic smtFPA2010 [] less than a year ago.
4 Implementation
Our implementation extends Alive in four major ways: It adds support for floatingpoint types, floatingpoint instructions, floatingpoint predicates, and fastmath flags. In the following, we describe our work in those areas, briefly comment on our experience with floatingpoint support in Z3, and conclude with a discussion of the limitations of the current version of LifeJacket.
Floatingpoint types
LifeJacket implements support for half, single, and double floatingpoints. Alive itself provides support for integer and pointer types of arbitrary bitwidths up to 64 bit. Following the philosophy of the original implementation, we do not require users to explicitly annotate floatingpoint types. Instead, we use a logical disjunction (in the SMT formula for type constraints) to limit floatingpoint types to bitwidths of 16, 32, or 64 bits. Then, we use Alive’s existing mechanisms to determine all possible type combinations for each optimization (as discussed in Section 3.2).
Adding a new type required us to relax some assumptions, e.g. that the arguments of select are integers. Additionally, we modified the parser to support floatingpoint immediate values.
Floatingpoint predicates and constant functions
LifeJacket adds precondition predicates and constant functions related to floatingpoint arithmetic.
Recall that preconditions are logical formulas that describe facts that must be true in order to perform an optimization; they are fulfilled by LLVM and assumed by Alive. In the context of floatingpoint optimizations, preconditions may include predicates about the type of a floatingpoint number (e.g. isNormal(%x) to make sure that %x is a normal floatingpoint number) or checks to ensure that conversions are lossless. We discuss more predicates in the following paragraphs.
Constant functions mirror computation performed by LLVM at compiletime and are evaluated by Alive symbolically at verificationtime. For example, the constant function fptosi(C) (not to be confused with the instruction) converts a floating point number to a signed integer, corresponding to a conversion LLVM does at compile time. Constant expressions (expressions that contain constant functions) can be assigned to registers in the target template, mirroring the common strategy of optimizing operations by partially evaluating them at compiletime.
In contrast to Alive, LifeJacket supports precondition predicates that refer to constant expressions in target templates. For example, some optimizations have restrictions about precise conversions, and we express those restrictions in the precondition. If the target converts a floatingpoint constant to an integer with %c = fptosi(C), then the precondition can ensure that the conversion is lossless by including sitofp(%c) == C (which guarantees that converting the number back and forth results in the original number). If the precondition does not refer to %c in the target and instead imposes sitofp(fptosi(C)) == C then it would not restrict the bitwidth of %c, so %c could be too narrow to represent the number.
Floatingpoint instructions
Our implementation supports binary floatingpoint instructions (fadd, fsub, fmul, fdiv, and frem), conversions involving floatingpoint numbers (fptrunc, fpext, fptoui, fptosi, uitofp, sitofp), the fabs intrinsic, and floatingpoint comparisons (fcmp). Most of these instructions directly correspond to operations that the SMTLIB for floatingpoint standard supports, so translating them to SMT formulas is straightforward. Next, we discuss our support for frem, fcmp, conversions, and the equivalence check for floatingpoint optimizations.
The frem instruction does not correspond to remainder as defined by IEEE 754 but rather to fmod in the C POSIX library, so translating it to an SMT formula involves multiple operations. Both fmod and remainder calculate (where is ), but fmod rounds toward zero whereas remainder rounds to the nearest value and ties to even. Figure 3 shows how the C standard defines fmod in terms of remainder for doubles [c11, , §F.10.7.1] and the corresponding SMT formula that LifeJacket implements. The formula uses a fixed roundingmode because the roundingmode of the environment does not affect fmod.
The fcmp instruction compares two floatingpoint values. In addition to the two floatingpoint values, it expects a third operand, the condition code. The condition code determines the type of comparison. There are two larger genres of comparison: ordered comparisons can only be true if none of the inputs are NaN and unordered comparisons are true if any of the inputs is NaN. LLVM supports an ordered version and an unordered version of the usual comparisons such as equality, inequality, greaterthan, etc. Additionally, there are condition codes that just check whether both inputs are not NaN (ord) or any of the inputs are NaN (uno).
Optimizations involving comparisons often apply to multiple condition codes. To allow users to efficiently describe such optimizations, LifeJacket supports predicates in the precondition that describe the applicable set of condition codes. For example, there are predicates for constraining the set of condition codes to either ordered or unordered conditions. We also support predicates that express a relationship between multiple condition codes. This is useful, for example, to describe an optimization that performs a multiplication by negative one on both sides: To replace the comparison (C1) between x and C with the comparison (C2) between x and C, we use the swap(C1, C2) predicate.
When no sensible conversion between floatingpoint values and integers is possible, LLVM defaults to returning undef. For conversions from floatingpoint to integer value (signed or unsigned), LifeJacket checks whether the (symbolic) floatingpoint value is NaN, , too small, or too large and returns undef if necessary. Conversions from integer to floatingpoint values, similarly return undef for values that are too small or too large.
Recall that LifeJacket must determine the unsatisfiability of to verify optimizations. The SMTLIB standard defines two equality operators for floatingpoint, one implementing bitwise equality, and one implementing the IEEE equality operator. The latter operator treats signed zeros as equal and NaNs as different, so using it to verify optimizations would not work, since it would accept optimizations that produce different zeros and reject sourcetarget pairs that both produce NaN. The bitwise equality works, because SMTLIB uses a single NaN value (recall that there are multiple bitpatterns that correspond to NaN). While this is convenient, it also means that we cannot model different NaNs. We discuss the implications later.
Fastmath flags
LifeJacket currently supports three of the five fastmath flags that LLVM implements: nnan, ninf, and nsz.
LifeJacket handles the nnan and ninf flags in a similar way by modifying the SMT formula for the instruction on which the flag appears. As Table 1 shows, if the instruction’s arguments or result is a NaN or , respectively, the formula returns a fresh unconstrained variable that it treats as an undef value. This is a direct translation from the description in the language reference and works for root and nonroot instructions.
The nsz flag is different: Instead of relaxing the requirements for the behavior for certain inputs and results, it states that the sign of a zero value can be ignored. This primarily affects how LifeJacket compares the source and target values: it adds a logical conjunction to the SMT query that states that the source and target values are only different if both are nonzero (shown in Table 1). The flag itself has no effect on zero values at runtime, meaning that it does not affect the computation performed by instructions with the flag. Thus, we do not change the SMT formula for the instruction.
Since the nsz flag has no direct effect on how LLVM does matching, this flag also does not change the significance of the sign of immediate zeros (e.g. +0.0) in the optimization templates. Instead, we mirror how LLVM determines whether an optimization applies. In LLVM, optimizations that match a certain sign of zero do not automatically apply to other zeros when the nsz flag is set. For example, an optimization that applies to fadd x, 0.0 does not automatically apply to fadd nsz x, +0.0. If applicable, developers explicitly match any zero if the nsz flag is set. We mirror this design by implementing an AnyZero(C) predicate, which makes C negative or positive zero.
Limitations
While Section 5 shows that LifeJacket is a useful tool, it does not support all floatingpoint types and imprecise optimizations, uses a fixed roundingmode, and does not model floatingpoint exceptions and debug information in NaNs.
Currently, LifeJacket does not support LLVM’s vectors and the two 128bit and the 80bit floatingpoint types. Supporting those would likely not require fundamental changes.
There are many imprecise optimizations in LLVM. These optimizations need a different style of verification because they do not make any guarantees about how much they affect the program output. A possible way to deal with these optimizations would be to verify that they are correct for real numbers and estimate accuracy changes by randomly sampling inputs, similar to Herbie panchekha2015automatically [].
LifeJacket’s verification ultimately relies on the SMTLIB standard for floatingpoint arithmetic. The standard corresponds to IEEE 7542008 but it only defines a single NaN value and does not distinguish between signalling and quiet NaNs. Thus, our implementation cannot verify whether an operation with NaN operands returns one of the input NaNs, propagating debug information encoded in the NaN, as recommended by the IEEE standard. In practice, LLVM does not attempt to preserve information in NaNs, so this limitation does not affect our ability to verify LLVM optimizations. We do not model floatingpoint exceptions, either, since LLVM does not currently make guarantees about handling floatingpoint exceptions. Floatingpoint exceptions could be verified with separate SMT queries, similar to how Alive verifies definedness.
LifeJacket currently rounds to nearest and ties to the nearest even digit, mirroring the most common roundingmode. Even though LLVM does not yet support different roundingmodes, we are planning to add support soon.
The limited type and roundingmode support and missing floatingpoint exceptions make our implementation unsound at worst: LifeJacket may label some incorrect optimizations as correct, but optimizations labelled as incorrect are certainly wrong.
Working with Z3
Even though Z3’s implementation of floatingpoint support is recent, we found it to be an effective tool for the job. Due to the youth of the floatingpoint support, we found that LifeJacket does not work with the newest release of Z3 because of issues in the implementation and the Python API. During the development of LifeJacket, we reported issues that were fixed quickly and fixed some issues, mostly in the Python API, ourselves. This suggests that LifeJacket is an interesting test case for floatingpoint support in SMT solvers.
5 Evaluation
To evaluate LifeJacket, we translated 54 optimizations from LLVM 3.7.1 into the Alive language and tried to verify them. We discovered 8 incorrect optimizations and verified 43 optimizations to be correct. In the following, we outline the optimizations that we checked and describe the bugs that we found.
We performed our evaluation on a machine with an Intel i34160 CPU and 8 GB RAM, running Ubuntu 15.10. We compiled Z3 commit b66fc4e^{4}^{4}4Full disclaimer: We ran into regression issues with this version, we verified some optimizations with an older version, will change for camera ready. with GCC 5.2.1, the default compiler, used the qffpbv tactic, and chose a 5 minute timeout for SMT queries. Table 2 summarizes the results for the different source files: AddSub contains optimizations with fadd/fsub at the root, MulDivRem with fmul/fdiv/frem, Compares deals with fcmps and Simplify contains simple optimizations for all instructions.
Using this process, LifeJacket found 43 out of 54 optimizations to be correct. LifeJacket timed out on \numberstringnum4 optimizations. The AddSub optimization that times out contains a sitofp instruction and verification is slow for integers with a large bitwidth. The two MulDivRem optimizations that timeout both contain nsz flags and AnyZero predicates. Similar optimizations without those features do not timeout. In general, fdiv seems to slow down verification as seems to be the case for the timeout in Simplify. Out of the 8 optimizations that we found to be incorrect, \numberstringnum4 had been reported. The bug in Figure 1 had already been fixed in a newer version of LLVM when we discovered it. The rest of the reported bugs resembled the example in Figure 2 and are all caused by an unjustified undef in the target. Figure 4 depicts the \numberstringnum3 previously unreported incorrect optimizations that we reported to the LLVM developers. We discuss these bugs in the next paragraphs.
PR26958 optimizes to . The implementation of this optimization requires that the nnan and the ninf flag each appear at least once on the source instructions. We translate four variants of this instruction: One where both flags are on fsub, one where both are on fadd and two where each instruction has one of the flags. As it turns out, it is not enough to have both flags on either of the instructions. For the case where both flags are on fsub, the transformation is invalid if %x is NaN or . The nnan and ninf flags require the optimized program to retain defined behavior over NaN and , so %r must be 0.0 even for those inputs (if they resulted in undefined behavior, any result would be correct). If %x is NaN, however, then there is no value for %a that would result in %r being 0.0 because NaN added to any other number is NaN.
PR26958 optimizes fmod(x, c ? 0 : C) to fmod(x, C) (select acts like a ternary and frem corresponds to fmod). The implementation of this optimization shares its code with the same optimization for the rem instruction that deals with integers. For integers, rem %x, 0 results in undefined behavior, so the optimization is valid. The POSIX standard specifies that fmod(x, 0.0) returns NaN, though, so the optimization is incorrect for frem because %r must be NaN and not frem %x, C if %a is 0.0.
PR27036 illustrates the last incorrect optimization that LifeJacket identified. It transforms (float) x + (float) y into (float) (x + y), replacing an fadd instruction with a more efficient add. This transformation is invalid, though, since adding two rounded numbers is not equivalent to adding two numbers and rounding the result. For example, assuming 16bit floatingpoint numbers, let %x = 4095 and %y = 17. In the portion of the source formula %a = sitofp %a, %a cannot store an exact number and stores 4094 instead. The target formula, though, can accurately represent the result 4078 of the addition.
Our results confirm that it is difficult to write correct floatingpoint optimizations; we found bugs in almost all the LLVM files from which we collected our optimizations. Unsurprisingly, all of these bugs relate to floatingpoint specific properties such as rounding, NaN, inputs, and signed zeros. These edge cases are clearly difficult for programmers to reason about.
File  Verified  Timeouts  Bugs 

AddSub  7  1  1 
MulDivRem  3  2  1 
Compares  11  0  0 
Simplify  22  1  6 
Total  43  4  8 
6 Conclusion
In an ideal world, programming languages and compilers are boring. They do what the user expects. They exhibit the same behavior with and without optimization, at all optimization levels, and on all hardware. “Boring,” however, is surprisingly difficult to achieve, especially in the context of the complicated semantics of floatingpoint arithmetic. With LifeJacket, we hope to make LLVM’s precise floatingpoint optimizations more predictable (and boring) by automatically checking them for correctness.
References
 [1] Souper. https://github.com/google/souper.
 [2] LLVM language reference manual. http://llvm.org/releases/3.7.1/docs/LangRef.html, 2016.
 [3] Sylvie Boldo, JacquesHenri Jourdan, Xavier Leroy, and Guillaume Melquiond. Verified compilation of floatingpoint computations. Journal of Automated Reasoning, 54(2):135–163, 2015.
 [4] Sebastian Buchwald. Optgen: A generator for local optimizations. In 24th Internation Conference on Compiler Construction (CC), 2015.
 [5] WeiFan Chiang, Ganesh Gopalakrishnan, Zvonimir Rakamaric, and Alexey Solovyev. Efficient search for inputs causing high floatingpoint errors. In PPoPP, 2014.
 [6] Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. In TACAS. 2008.
 [7] David Goldberg. What every computer scientist should know about floatingpoint arithmetic. ACM Computing Surveys (CSUR), 23(1), 1991.
 [8] ISO/IEC JTC1/SC22/WG14. ISO/IEC 9899:2011, Programming languages  C. Technical report, 2011.
 [9] Nuno P Lopes, David Menendez, Santosh Nagarakatte, and John Regehr. Provably correct peephole optimizations with Alive. In PLDI, 2015.
 [10] Pavel Panchekha, Alex SanchezStern, James R Wilcox, and Zachary Tatlock. Automatically improving accuracy for floating point expressions. In PLDI, 2015.
 [11] Philipp Rümmer and Thomas Wahl. An smtlib theory of binary floatingpoint arithmetic. In Informal proceedings of 8th International Workshop on Satisfiability Modulo Theories (SMT) at FLoC, Edinburgh, Scotland, 2010.
 [12] Eric Schkufza, Rahul Sharma, and Alex Aiken. Stochastic optimization of floatingpoint programs with tunable precision. ACM SIGPLAN Notices, 49(6), 2014.
 [13] Charles Severance. An interview with the old man of floating point. IEEE Computer, pages 114–115, 1998.
 [14] Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando SolarLezama. Towards optimizationsafe systems: Analyzing the impact of undefined behavior. In SOSP, 2013.
 [15] Aleksandar Zeljić, Christoph M Wintersteiger, and Philipp Rümmer. Approximations for model construction. In Automated Reasoning, pages 344–359. 2014.