All floats are equal, but some floats are more equal than others
Just bumped into the most bizarre bug at work. Given this source file
compiled with g++ -O1
gives the follwing output when executed
[doctest] doctest version is "2.4.8"
[doctest] run with "--help" for options
===============================================================================
sin-cos-doctest.cxx:5:
TEST CASE: gcc-sin-cos-O1-bug
sin-cos-doctest.cxx:16: FATAL ERROR: REQUIRE( c == cos(a) ) is NOT correct!
values: REQUIRE( -0.1912064051 == -0.1912064051 )
logged: 833
===============================================================================
[doctest] test cases: 1 | 0 passed | 1 failed | 0 skipped
[doctest] assertions: 2501 | 2500 passed | 1 failed |
[doctest] Status: FAILURE!
What is going on!? Printing out the actual value for a
in iteration 833
shows that the magic value that makes it fail is: 1.2475784324341769870869711667182855308055877685546875
(yes I have to print the whole thing out to ensure that the decimal representation it is exactly representable by a double).
So, I put together a reduced test case
The reduced test case only fails if compiled without optimization and gives the following output
[doctest] doctest version is "2.4.8"
[doctest] run with "--help" for options
===============================================================================
sin-cos-doctest-reduced.cxx:4:
TEST CASE: gcc-cos-bug
sin-cos-doctest-reduced.cxx:8: FATAL ERROR: REQUIRE( c == cos(a) ) is NOT correct!
values: REQUIRE( 0.317619466 == 0.317619466 )
===============================================================================
[doctest] test cases: 1 | 0 passed | 1 failed | 0 skipped
[doctest] assertions: 1 | 0 passed | 1 failed |
[doctest] Status: FAILURE!
Nice, we have a minimal test case that we can inspect further.
Let us expand those REQUIRE
macros and see what the compiler actually sees with g++ -E
.
After some cleanup of the preprocessed output one ends up with this
void DOCTEST_ANON_FUNC_5()
{
const double a = 1.2475784324341769870869711667182855308055877685546875;
const double c = cos(a);
[&] {
doctest::detail::ResultBuilder DOCTEST_RB(doctest::assertType::DT_REQUIRE, "sin-cos-doctest-reduced.cxx", 8, "c == cos(a)");
try {
DOCTEST_RB.setResult(doctest::detail::ExpressionDecomposer(doctest::assertType::DT_REQUIRE) << c == cos(a));
} catch (...) {
DOCTEST_RB.translateException();
}
if (DOCTEST_RB.log()) {
__asm__("int $3\n" : :);
}
DOCTEST_RB.react();
return !DOCTEST_RB.m_failed;
}();
}
Ok, let’s see what we got here. The magical buggy constant a
is still present and the expected value c
is computed from that.
Nothing looking suspicious yet.
There is some weird looking inline assembly stuff going on which I have no idea what it does.
I decide to create a minimal example that resemble the structure of the expanded reduced unit test without the doctest library and came up with this
Compiling with the same flags still make this fail! Seems like the bug has some thing to do with different computations happening inside and outside a lambda!?
Let’s file a bug report with GCC! But hold on, gotta do some search for duplicate issues first. Looking at the most frequently reported bugs one quickly bumps into the infamous bug 323. This is an over 20 year old “bug” (within quotes since there is disagreement where the bug actually lies). Long story short: when doing floating point computations there is something called excess precision, which means that mathematical expressions are allowed to be carried out in a higher precision than the type of the source and target variables. To be concrete, in this little pseudo code snipped
double x = ...;
double y = ...;
double z = x + y;
the expression x + y
might be carried out in a precision higher that the normal 64 bits that a double guarantees.
In particular, on the x86 family of processors, the floating point unit (FPU) support 80 bit extended precision doubles, which tends to be used.
That sounds good right? More precision for free, yay.
While the intent is good, which is to allow for more precise floating point computations, it has the weird side effect of creating non-deterministic results
when floating point values produced by computations with and without extended precision are compared.
This leads to extremely weird bugs and can make harmless examples like this fail (thanks to Kasper Westman for pointing out a typo 1)
double x = ...;
double y = ...;
double z = x + y;
double w = x + y;
assert(z == w);
This can fail if the compiler decides to put z
in a floating point register with extended precision, but put w
back to memory causing double rounding.
Reading up on excess precision, I learn about the following compiler flags in GCC
-
-fexcess-precision=standard
: This makes GCC follow ISO conventions regarding excess precision in variables. Without this, GCC is allowed to store a double in an extended double register. -
-ffloat-store
: This ensures that all variables are flushed to memory when assigned to a variable. This option will make code slower, since you basically make sure registers are not used properly. However, when debugging floating point stuff, this option removes on state from the Heisen code and makes you a bit less paranoid. -
-mfpmath=sse
: This option force the usage of SSE floating point instructions instead of x87 FPU. A note of caution though, this does not affect linked libraries compiled without this option so you still might have excess precision bugs. Also note that it effectively turnslong double
todouble
, so if you rely on extended precision this should not be used.
When I am discovering this bug I am on Ubuntu 22.04 LTS, and -fexcess-precision
flag is not implemented for C++ in the distributed GCC yet.
I compile GCC 14.2.0 from source (If you do this, take a lunch break or something because it takes a long time to build).
Using the freshly built toolchain, I compile the example in super paranoia mode (g++ -std=c++11 -O0 -Wall -Wextra -Wpedantic -Werror -ffloat-store -fexcess-precision=standard -mfpmath=sse
), but the test still fails!
We have to go deeper…
Looking at the generated assembly (add -S -fverbose-asm
flags and you will get a .s
file next to your source file with the assembly code).
This can be a bit intimidating to look at, but don’t worry, one does not need to understand it to do some analysis on it.
Let’s check those function calls using grep call sin-cos-lambda.s | c++filt
call cos #
call __assert_fail #
call main::{lambda()#1}::operator()() const #
Hmm, interesting. There is only one call to cos
, one assertion, and an evaluation of the lambda.
This can’t be correct can it? Turns out that GCC computes constant literals at compile time (std::cos
is required to be constexpr
in C++26, and GCC is early on the ball).
The fancy word for this optimization is constant folding.
So the issue here is that the compile time result differs from the value computed at runtime!
Filing a bug with GCC they confirm the issue (though they don’t classify this a bug) 2.
I still think it is (but it’s pointless to try to convince others that is the case) because what GCC essentially does it to add another overload constexpr double cos(constexpr double)
(note that this is an invalid prototype since you can’t overload on constexpr, but this is how the compiler behaves).
The bug appears when the const double
is promoted to constexpr double
(you can always add constness right?).
Now that would be fine if the constexpr overload gives the same result as the non-constexpr.
But it doesn’t! And I think it is going to create a lot of headache for the GCC devs once people start doing crazy compile time computations with advanced math.
If this is issue is not resolved, the hidden type information turns C/C++ programs into a weird heisencode where the type of every floating point variable can be in several states,
but you wouldn’t know unless you bring out your assembler x-ray glasses and have a look underneath the high level language abstraction.
This is an abstraction leakage.
So, back to the original bug, which was provoked using random numbers. That can’t be due to constant folding, so the original bug must be something else!
It’s remarkable how the magic constant 1.2475784324341769870869711667182855308055877685546875
managed to trigger a completely different type of bug!
So, realizing that one has to read the generated assembly to understand these bugs, we do the same exercise for the original test case.
The compiler generates almost 100K lines of assembly code for the original example, so I’ll try to rewrite it without doctest again.
I had to implement a custom assertion macro, the builtin version does not provoke the bug.
Interestingly, this also makes the bug appear without the lambda. Understanding how the optimizer in the compiler thinks is non-trivial indeed…
Compiling it with g++ -std=c++17 -O1 -Wall -Wextra -Wpedantic -Werror -ffloat-store -fexcess-precision=standard -mfpmath=sse -fverbose-asm sin-cos-non-const.cxx
and
running the program with the magic constant as stdin produce the following output
c != std::cos(a) @ sin-cos-non-const.cxx:29
This is a true minimal reproducing example. Let’s look at the assembly.
Inspecting the assembly with grep call
again reveals that the compiler optimize the first sin/cos pair into a non-standard sincos call!
call printf #
call exit #
call __isoc99_scanf #
call sincos #
call assert_eq_impl(char const*, double, double, char const*, unsigned int) #
call cos #
call assert_eq_impl(char const*, double, double, char const*, unsigned int) #
call puts #
call exit #
Also the sin call disappears, the optimizer seem to realize that sincos already has computed it, but for some reason not the cos. Is this a compiler bug? Is this a bug in glibc? Let us just check that the sincos really produce different result from separate sin and cos calls
Compiling this with gcc -O1 -Wall -Wextra -Wpedantic -Werror -ffloat-store -fexcess-precision=standard -mfpmath=sse -fverbose-asm sin-cos-glibc.c -lm
(note that this is C and not C++ now)
reproduces the bug
c != cos(x) @ sin-cos-glibc.c:30
Hence, this is a most likely a bug in glibc. Checking the documentation 3 does indicate that sincos might give a different result compared with calling sin and cos differently (but they do have a very elaborate list of measured accuracies for each function across various platforms)! All three functions, sin, cos and sincos, have up to 1 ULP error on x86_64. But the error should be the same! Hence a bug. Turns on that this was discovered 2 years ago and fixed in glibc 2.36 4.
This demonstrates how weird things can get once you have an error in the libc implementation, and the compiler assumes this bug does not exist.
So how do we compare the results from floating point computations, knowing that there might be a certain amount of error in the last place? I searched the internet without getting any good answer, so I came up with a suggestion for the doctest library 5 to add a feature to compare floats in terms of ULPs. The current draft at the time of writing this document is given below
#define DOCTEST_CONFIG_IMPLEMENT_WITH_MAIN
#include <doctest/doctest.h>
#include <cassert>
#include <limits>
#include <random>
#define MAX_ULPS 100
#define ITERATIONS 1'000'000
template <typename T>
std::enable_if_t<!std::numeric_limits<T>::is_integer, T>
ulpadd(T x, int n)
{
if (n < 0) {
// Negation is guaranteed to be exact for IEEE 754 floats
return -ulpadd(-x, -n);
}
if (n == 0) {
return x;
}
return ulpadd(std::nextafter(x, std::numeric_limits<T>::infinity()), n - 1);
}
template <class T>
std::enable_if_t<!std::numeric_limits<T>::is_integer, bool>
ulpeq(T x, T y, int n)
{
assert(n >= 0);
return (ulpadd(y,-n) <= x) || (x <= ulpadd(y,n));
}
TEST_CASE_TEMPLATE("ulpeq-random", T, float, double, long double)
{
std::mt19937 g(0);
std::uniform_int_distribution<int> ulp_distrib(-MAX_ULPS, +MAX_ULPS);
std::lognormal_distribution<T> value_distrib(T{0}, T{20});
for (int i = 0; i < ITERATIONS; ++i)
{
const int n = ulp_distrib(g);
const T x = value_distrib(g);
const T y = ulpadd(x, n);
REQUIRE(ulpeq(x, y, abs(n)));
REQUIRE(ulpeq(y, x, abs(n)));
}
}
TEST_CASE_TEMPLATE("ulpeq-edge-cases", T, float, double, long double)
{
// Extreme values values
REQUIRE(ulpeq(std::numeric_limits<T>::max(), std::numeric_limits<T>::max(), 0));
REQUIRE(ulpeq(+std::numeric_limits<T>::min(), +std::numeric_limits<T>::min(), 0));
REQUIRE(ulpeq(T{0}, T{0}, 0));
REQUIRE(ulpeq(-std::numeric_limits<T>::min(), -std::numeric_limits<T>::min(), 0));
REQUIRE(ulpeq(std::numeric_limits<T>::lowest(), std::numeric_limits<T>::lowest(), 0));
// Infinities
REQUIRE(ulpeq(+std::numeric_limits<T>::infinity(), +std::numeric_limits<T>::infinity(), 0));
REQUIRE(ulpeq(-std::numeric_limits<T>::infinity(), -std::numeric_limits<T>::infinity(), 0));
REQUIRE(ulpeq(-std::numeric_limits<T>::infinity(), std::numeric_limits<T>::lowest(), 1));
REQUIRE(ulpeq(std::numeric_limits<T>::lowest(), -std::numeric_limits<T>::infinity(), 1));
REQUIRE(ulpeq(std::numeric_limits<T>::infinity(), std::numeric_limits<T>::max(), 1));
REQUIRE(ulpeq(std::numeric_limits<T>::max(), std::numeric_limits<T>::infinity(), 1));
// Subnormals
for (int i = 1; i <= MAX_ULPS; ++i) {
REQUIRE(ulpeq(T{0}, +ulpadd(T{0}, +i), i));
REQUIRE(ulpeq(T{0}, -ulpadd(T{0}, +i), i));
REQUIRE(ulpeq(T{0}, +ulpadd(T{0}, -i), i));
REQUIRE(ulpeq(T{0}, -ulpadd(T{0}, -i), i));
}
}
The draft implements a function ulpeq
which checks if two floating point values are equal within a specified number of ULPs.
There are two unit tests, one randomized (with a log-normal distribution to cover as much of the real line as possible), and one checking edge cases.
If you have ideas on how to improve it, then please reach out, either on email or continue the discussion in the linked GitHub issue.
Conclusion is that floating point computations can be more mysterious than you think. If you run into issues and suspect floating point errors, compile your code in “paranoid mode” described earlier.
Appendix: Building glibc
When building GCC from source and testing different versions of glibc I had to compile that from source as well. With some help from SO for using multiple versions of libc, this is how I did it. Clone, checkout tag, create separate build folder, configure, build and install
git clone https://sourceware.org/git/glibc.git
cd glibc
git checkout glibc-2.36
cd ..
mkdir glibc-build-${version}
cd glibc-build-${version}
../glibc/configure --prefix=$(realpath ../glibc-install-${version})
make -j$(nproc)
make install