Just bumped into the most bizarre bug at work. Given this source file

#define DOCTEST_CONFIG_IMPLEMENT_WITH_MAIN
#include <doctest/doctest.h>
#include <random>

TEST_CASE("gcc-sin-cos-O1-bug")
{
    std::mt19937 g(0);
    std::normal_distribution<double> d(0.0, 1.0);
    for (int i = 0; i < 100000; ++i)
    {
        const double a = d(g);
        const double s = sin(a);
        const double c = cos(a);
        INFO(i);
        REQUIRE(s == sin(a));
        REQUIRE(c == cos(a));
        REQUIRE(a == a);
    }
}

compiled with g++ -O1 gives the follwing output when executed

[doctest] doctest version is "2.4.8"
[doctest] run with "--help" for options
===============================================================================
sin-cos-doctest.cxx:5:
TEST CASE:  gcc-sin-cos-O1-bug

sin-cos-doctest.cxx:16: FATAL ERROR: REQUIRE( c == cos(a) ) is NOT correct!
  values: REQUIRE( -0.1912064051 == -0.1912064051 )
  logged: 833

===============================================================================
[doctest] test cases:    1 |    0 passed | 1 failed | 0 skipped
[doctest] assertions: 2501 | 2500 passed | 1 failed |
[doctest] Status: FAILURE!

What is going on!? Printing out the actual value for a in iteration 833 shows that the magic value that makes it fail is: 1.2475784324341769870869711667182855308055877685546875 (yes I have to print the whole thing out to ensure that the decimal representation it is exactly representable by a double). So, I put together a reduced test case

#define DOCTEST_CONFIG_IMPLEMENT_WITH_MAIN
#include <doctest/doctest.h>

TEST_CASE("gcc-cos-bug")
{
    const double a = 1.2475784324341769870869711667182855308055877685546875;
    const double c = cos(a);
    REQUIRE(c == cos(a));
}

The reduced test case only fails if compiled without optimization and gives the following output

[doctest] doctest version is "2.4.8"
[doctest] run with "--help" for options
===============================================================================
sin-cos-doctest-reduced.cxx:4:
TEST CASE:  gcc-cos-bug

sin-cos-doctest-reduced.cxx:8: FATAL ERROR: REQUIRE( c == cos(a) ) is NOT correct!
  values: REQUIRE( 0.317619466 == 0.317619466 )

===============================================================================
[doctest] test cases: 1 | 0 passed | 1 failed | 0 skipped
[doctest] assertions: 1 | 0 passed | 1 failed |
[doctest] Status: FAILURE!

Nice, we have a minimal test case that we can inspect further. Let us expand those REQUIRE macros and see what the compiler actually sees with g++ -E. After some cleanup of the preprocessed output one ends up with this

void DOCTEST_ANON_FUNC_5()
{
    const double a = 1.2475784324341769870869711667182855308055877685546875;
    const double c = cos(a);
    [&] {
        doctest::detail::ResultBuilder DOCTEST_RB(doctest::assertType::DT_REQUIRE, "sin-cos-doctest-reduced.cxx", 8, "c == cos(a)");
        try {
            DOCTEST_RB.setResult(doctest::detail::ExpressionDecomposer(doctest::assertType::DT_REQUIRE) << c == cos(a));
        } catch (...) {
            DOCTEST_RB.translateException();
        }
        if (DOCTEST_RB.log()) {
            __asm__("int $3\n" : :);
        }
        DOCTEST_RB.react();
        return !DOCTEST_RB.m_failed;
    }();
}

Ok, let’s see what we got here. The magical buggy constant a is still present and the expected value c is computed from that. Nothing looking suspicious yet. There is some weird looking inline assembly stuff going on which I have no idea what it does. I decide to create a minimal example that resemble the structure of the expanded reduced unit test without the doctest library and came up with this

#include <cassert>
#include <cmath>

int main()
{
    const double a = 1.2475784324341769870869711667182855308055877685546875;
    const double c = cos(a);
    [&] {
        assert(c == cos(a));
    }();
}

Compiling with the same flags still make this fail! Seems like the bug has some thing to do with different computations happening inside and outside a lambda!?

Let’s file a bug report with GCC! But hold on, gotta do some search for duplicate issues first. Looking at the most frequently reported bugs one quickly bumps into the infamous bug 323. This is an over 20 year old “bug” (within quotes since there is disagreement where the bug actually lies). Long story short: when doing floating point computations there is something called excess precision, which means that mathematical expressions are allowed to be carried out in a higher precision than the type of the source and target variables. To be concrete, in this little pseudo code snipped

double x = ...;
double y = ...;
double z = x + y;

the expression x + y might be carried out in a precision higher that the normal 64 bits that a double guarantees. In particular, on the x86 family of processors, the floating point unit (FPU) support 80 bit extended precision doubles, which tends to be used. That sounds good right? More precision for free, yay. While the intent is good, which is to allow for more precise floating point computations, it has the weird side effect of creating non-deterministic results when floating point values produced by computations with and without extended precision are compared. This leads to extremely weird bugs and can make harmless examples like this fail (thanks to Kasper Westman for pointing out a typo 1)

double x = ...;
double y = ...;
double z = x + y;
double w = x + y;
assert(z == w);

This can fail if the compiler decides to put z in a floating point register with extended precision, but put w back to memory causing double rounding.

Reading up on excess precision, I learn about the following compiler flags in GCC

  • -fexcess-precision=standard: This makes GCC follow ISO conventions regarding excess precision in variables. Without this, GCC is allowed to store a double in an extended double register.
  • -ffloat-store: This ensures that all variables are flushed to memory when assigned to a variable. This option will make code slower, since you basically make sure registers are not used properly. However, when debugging floating point stuff, this option removes on state from the Heisen code and makes you a bit less paranoid.
  • -mfpmath=sse: This option force the usage of SSE floating point instructions instead of x87 FPU. A note of caution though, this does not affect linked libraries compiled without this option so you still might have excess precision bugs. Also note that it effectively turns long double to double, so if you rely on extended precision this should not be used.

When I am discovering this bug I am on Ubuntu 22.04 LTS, and -fexcess-precision flag is not implemented for C++ in the distributed GCC yet. I compile GCC 14.2.0 from source (If you do this, take a lunch break or something because it takes a long time to build).

Using the freshly built toolchain, I compile the example in super paranoia mode (g++ -std=c++11 -O0 -Wall -Wextra -Wpedantic -Werror -ffloat-store -fexcess-precision=standard -mfpmath=sse), but the test still fails!

We have to go deeper…

Looking at the generated assembly (add -S -fverbose-asm flags and you will get a .s file next to your source file with the assembly code).

	.file	"sin-cos-lambda.cxx"
# GNU C++11 (GCC) version 14.2.0 (x86_64-pc-linux-gnu)
#	compiled by GNU C version 14.2.0, GMP version 6.3.0, MPFR version 4.2.1, MPC version 1.3.0, isl version none
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed: -mfpmath=sse -mtune=generic -march=x86-64 -O0 -std=c++11 -ffloat-store -fexcess-precision=standard
	.text
	.section	.rodata
.LC0:
	.string	"main()::<lambda()>"
.LC1:
	.string	"sin-cos-lambda.cxx"
.LC2:
	.string	"c == cos(a)"
	.text
	.align 2
	.type	_ZZ4mainENKUlvE_clEv, @function
_ZZ4mainENKUlvE_clEv:
.LFB248:
	.cfi_startproc
	pushq	%rbp	#
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp	#,
	.cfi_def_cfa_register 6
	subq	$48, %rsp	#,
	movq	%rdi, -40(%rbp)	# __closure, __closure
# sin-cos-lambda.cxx:9:         assert(c == cos(a));
	movq	-40(%rbp), %rax	# __closure, tmp100
	movq	(%rax), %rax	# __closure_7(D)->__c, _1
	movsd	(%rax), %xmm0	# *_1, tmp101
	movsd	%xmm0, -8(%rbp)	# tmp101,
	movq	-40(%rbp), %rax	# __closure, tmp102
	movq	8(%rax), %rax	# __closure_7(D)->__a, _3
	movsd	(%rax), %xmm0	# *_3, tmp103
	movsd	%xmm0, -16(%rbp)	# tmp103,
	movq	-16(%rbp), %rax	#, tmp104
	movq	%rax, %xmm0	# tmp104,
	call	cos	#
	movq	%xmm0, %rax	#, tmp105
	movq	%rax, -24(%rbp)	# tmp105,
# sin-cos-lambda.cxx:9:         assert(c == cos(a));
	movsd	-8(%rbp), %xmm0	#, tmp106
	ucomisd	-24(%rbp), %xmm0	#, tmp106
	jp	.L4	#,
	movsd	-8(%rbp), %xmm0	#, tmp107
	ucomisd	-24(%rbp), %xmm0	#, tmp107
	je	.L5	#,
.L4:
	movl	$.LC0, %ecx	#,
	movl	$9, %edx	#,
	movl	$.LC1, %esi	#,
	movl	$.LC2, %edi	#,
	call	__assert_fail	#
.L5:
# sin-cos-lambda.cxx:10:     }();
	nop	
	leave	
	.cfi_def_cfa 7, 8
	ret	
	.cfi_endproc
.LFE248:
	.size	_ZZ4mainENKUlvE_clEv, .-_ZZ4mainENKUlvE_clEv
	.globl	main
	.type	main, @function
main:
.LFB247:
	.cfi_startproc
	pushq	%rbp	#
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp	#,
	.cfi_def_cfa_register 6
	subq	$32, %rsp	#,
# sin-cos-lambda.cxx:6:     const double a = 1.2475784324341769870869711667182855308055877685546875;
	movsd	.LC3(%rip), %xmm0	#, tmp100
	movsd	%xmm0, -24(%rbp)	# tmp100, a
# sin-cos-lambda.cxx:7:     const double c = cos(a);
	movsd	.LC4(%rip), %xmm0	#, tmp101
	movsd	%xmm0, -32(%rbp)	# tmp101, c
# sin-cos-lambda.cxx:8:     [&] {
	leaq	-32(%rbp), %rax	#, tmp102
	movq	%rax, -16(%rbp)	# tmp102, D.11114.__c
	leaq	-24(%rbp), %rax	#, tmp103
	movq	%rax, -8(%rbp)	# tmp103, D.11114.__a
# sin-cos-lambda.cxx:10:     }();
	leaq	-16(%rbp), %rax	#, tmp104
	movq	%rax, %rdi	# tmp104,
	call	_ZZ4mainENKUlvE_clEv	#
# sin-cos-lambda.cxx:11: }
	movl	$0, %eax	#, _10
	leave	
	.cfi_def_cfa 7, 8
	ret	
	.cfi_endproc
.LFE247:
	.size	main, .-main
	.section	.rodata
	.align 8
.LC3:
	.long	-848822549
	.long	1072952852
	.align 8
.LC4:
	.long	-1732049745
	.long	1070879712
	.ident	"GCC: (GNU) 14.2.0"
	.section	.note.GNU-stack,"",@progbits

This can be a bit intimidating to look at, but don’t worry, one does not need to understand it to do some analysis on it. Let’s check those function calls using grep call sin-cos-lambda.s | c++filt

	call	cos	#
	call	__assert_fail	#
	call	main::{lambda()#1}::operator()() const	#

Hmm, interesting. There is only one call to cos, one assertion, and an evaluation of the lambda. This can’t be correct can it? Turns out that GCC computes constant literals at compile time (std::cos is required to be constexpr in C++26, and GCC is early on the ball). The fancy word for this optimization is constant folding. So the issue here is that the compile time result differs from the value computed at runtime! Filing a bug with GCC they confirm the issue (though they don’t classify this a bug) 2. I still think it is (but it’s pointless to try to convince others that is the case) because what GCC essentially does it to add another overload constexpr double cos(constexpr double) (note that this is an invalid prototype since you can’t overload on constexpr, but this is how the compiler behaves). The bug appears when the const double is promoted to constexpr double (you can always add constness right?). Now that would be fine if the constexpr overload gives the same result as the non-constexpr. But it doesn’t! And I think it is going to create a lot of headache for the GCC devs once people start doing crazy compile time computations with advanced math. If this is issue is not resolved, the hidden type information turns C/C++ programs into a weird heisencode where the type of every floating point variable can be in several states, but you wouldn’t know unless you bring out your assembler x-ray glasses and have a look underneath the high level language abstraction. This is an abstraction leakage.

So, back to the original bug, which was provoked using random numbers. That can’t be due to constant folding, so the original bug must be something else! It’s remarkable how the magic constant 1.2475784324341769870869711667182855308055877685546875 managed to trigger a completely different type of bug! So, realizing that one has to read the generated assembly to understand these bugs, we do the same exercise for the original test case. The compiler generates almost 100K lines of assembly code for the original example, so I’ll try to rewrite it without doctest again. I had to implement a custom assertion macro, the builtin version does not provoke the bug. Interestingly, this also makes the bug appear without the lambda. Understanding how the optimizer in the compiler thinks is non-trivial indeed…

#include <cmath>
#include <cstdio>
#include <cstdlib>

void assert_eq_impl(const char *expr,
                    const char *file,
                    int line,
                    double expected,
                    double actual)
{
    if (expected != actual) {
        printf("%s @ %s:%i\n", expr, file, line);
        exit(1);
    }
}

#define assert_eq(expected, actual) \
    assert_eq_impl(#expected " != " #actual,  __FILE__, __LINE__, expected, actual)

int main()
{
    double a;
    if (!scanf("%lf", &a)) {
        exit(1);
    }
    const double s = std::sin(a);
    const double c = std::cos(a);
    assert_eq(s, std::sin(a));
    assert_eq(c, std::cos(a));
    return 0;
}

Compiling it with g++ -std=c++17 -O1 -Wall -Wextra -Wpedantic -Werror -ffloat-store -fexcess-precision=standard -mfpmath=sse -fverbose-asm sin-cos-non-const.cxx and running the program with the magic constant as stdin produce the following output

c != std::cos(a) @ sin-cos-non-const.cxx:29

This is a true minimal reproducing example. Let’s look at the assembly.

	.file	"sin-cos-non-const.cxx"
# GNU C++17 (GCC) version 14.2.0 (x86_64-pc-linux-gnu)
#	compiled by GNU C version 14.2.0, GMP version 6.3.0, MPFR version 4.2.1, MPC version 1.3.0, isl version none
# GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
# options passed: -mfpmath=sse -mtune=generic -march=x86-64 -O1 -std=c++17 -ffloat-store -fexcess-precision=standard
	.text
	.section	.rodata.str1.1,"aMS",@progbits,1
.LC0:
	.string	"%s @ %s:%i\n"
	.text
	.globl	_Z14assert_eq_implPKcS0_idd
	.type	_Z14assert_eq_implPKcS0_idd, @function
_Z14assert_eq_implPKcS0_idd:
.LFB1033:
	.cfi_startproc
	subq	$24, %rsp	#,
	.cfi_def_cfa_offset 32
	movsd	%xmm0, 8(%rsp)	# expected, expected
	movsd	%xmm1, (%rsp)	# actual, actual
# sin-cos-non-const.cxx:11:     if (expected != actual) {
	movsd	8(%rsp), %xmm0	# expected, expected
	ucomisd	(%rsp), %xmm0	# actual, expected
	jp	.L4	#,
	movsd	8(%rsp), %xmm0	# expected, expected
	ucomisd	(%rsp), %xmm0	# actual, expected
	jne	.L4	#,
# sin-cos-non-const.cxx:15: }
	addq	$24, %rsp	#,
	.cfi_remember_state
	.cfi_def_cfa_offset 8
	ret	
.L4:
	.cfi_restore_state
# sin-cos-non-const.cxx:12:         printf("%s @ %s:%i\n", expr, file, line);
	movl	%edx, %ecx	# line,
	movq	%rsi, %rdx	# file,
	movq	%rdi, %rsi	# expr,
	movl	$.LC0, %edi	#,
	movl	$0, %eax	#,
	call	printf	#
# sin-cos-non-const.cxx:13:         exit(1);
	movl	$1, %edi	#,
	call	exit	#
	.cfi_endproc
.LFE1033:
	.size	_Z14assert_eq_implPKcS0_idd, .-_Z14assert_eq_implPKcS0_idd
	.section	.rodata.str1.1
.LC1:
	.string	"%lf"
.LC2:
	.string	"sin-cos-non-const.cxx"
.LC3:
	.string	"s != std::sin(a)"
.LC4:
	.string	"c != std::cos(a)"
	.text
	.globl	main
	.type	main, @function
main:
.LFB1034:
	.cfi_startproc
	subq	$88, %rsp	#,
	.cfi_def_cfa_offset 96
# sin-cos-non-const.cxx:23:     if (!scanf("%lf", &a)) {
	leaq	16(%rsp), %rsi	#, tmp100
	movl	$.LC1, %edi	#,
	movl	$0, %eax	#,
	call	__isoc99_scanf	#
# sin-cos-non-const.cxx:23:     if (!scanf("%lf", &a)) {
	testl	%eax, %eax	# tmp117
	je	.L9	#,
# sin-cos-non-const.cxx:26:     const double s = std::sin(a);
	movsd	16(%rsp), %xmm0	# a, a
	movsd	%xmm0, 72(%rsp)	# a,
	leaq	8(%rsp), %rdi	#, tmp102
	movq	%rsp, %rsi	#, tmp103
	movsd	72(%rsp), %xmm0	#,
	call	sincos	#
	movsd	(%rsp), %xmm0	#, tmp106
	movsd	%xmm0, 24(%rsp)	# tmp106,
	movsd	8(%rsp), %xmm0	#, tmp105
	movsd	%xmm0, 32(%rsp)	# tmp105,
	movsd	32(%rsp), %xmm0	#, tmp107
	movsd	%xmm0, 48(%rsp)	# tmp107, s
# sin-cos-non-const.cxx:27:     const double c = std::cos(a);
	movsd	24(%rsp), %xmm0	#, tmp108
	movsd	%xmm0, 40(%rsp)	# tmp108, c
# sin-cos-non-const.cxx:28:     assert_eq(s, std::sin(a));
	movsd	48(%rsp), %xmm0	# s, s
	movapd	%xmm0, %xmm1	# s,
	movl	$28, %edx	#,
	movl	$.LC2, %esi	#,
	movl	$.LC3, %edi	#,
	call	_Z14assert_eq_implPKcS0_idd	#
# sin-cos-non-const.cxx:29:     assert_eq(c, std::cos(a));
	movsd	16(%rsp), %xmm0	# a, a
	movsd	%xmm0, 64(%rsp)	# a,
	movsd	64(%rsp), %xmm0	#,
	call	cos	#
	movsd	%xmm0, 56(%rsp)	# tmp118,
# sin-cos-non-const.cxx:29:     assert_eq(c, std::cos(a));
	movsd	56(%rsp), %xmm1	#,
	movsd	40(%rsp), %xmm0	# c,
	movl	$29, %edx	#,
	movl	$.LC2, %esi	#,
	movl	$.LC4, %edi	#,
	call	_Z14assert_eq_implPKcS0_idd	#
# sin-cos-non-const.cxx:31: }
	movl	$0, %eax	#,
	addq	$88, %rsp	#,
	.cfi_remember_state
	.cfi_def_cfa_offset 8
	ret	
.L9:
	.cfi_restore_state
# sin-cos-non-const.cxx:24:         exit(1);
	movl	$1, %edi	#,
	call	exit	#
	.cfi_endproc
.LFE1034:
	.size	main, .-main
	.ident	"GCC: (GNU) 14.2.0"
	.section	.note.GNU-stack,"",@progbits

Inspecting the assembly with grep call again reveals that the compiler optimize the first sin/cos pair into a non-standard sincos call!

	call	printf	#
	call	exit	#
	call	__isoc99_scanf	#
	call	sincos	#
	call	assert_eq_impl(char const*, double, double, char const*, unsigned int)	#
	call	cos	#
	call	assert_eq_impl(char const*, double, double, char const*, unsigned int)	#
	call	puts	#
	call	exit	#

Also the sin call disappears, the optimizer seem to realize that sincos already has computed it, but for some reason not the cos. Is this a compiler bug? Is this a bug in glibc? Let us just check that the sincos really produce different result from separate sin and cos calls

#define _GNU_SOURCE
#include <math.h>
#include <stdio.h>
#include <stdlib.h>

void assert_eq_impl(const char *expr,
                    const char *file,
                    int line,
                    double expected,
                    double actual)
{
    if (expected != actual) {
        printf("%s @ %s:%i\n", expr, file, line);
        exit(1);
    }
}

#define assert_eq(expected, actual) \
    assert_eq_impl(#expected " != " #actual,  __FILE__, __LINE__, expected, actual)

int main()
{
    double x;
    if (!scanf("%lf", &x)) {
        return 1;
    }
    double c, s;
    sincos(x, &s, &c);
    assert_eq(s, sin(x));
    assert_eq(c, cos(x));
    return 0;
}

Compiling this with gcc -O1 -Wall -Wextra -Wpedantic -Werror -ffloat-store -fexcess-precision=standard -mfpmath=sse -fverbose-asm sin-cos-glibc.c -lm (note that this is C and not C++ now) reproduces the bug

c != cos(x) @ sin-cos-glibc.c:30

Hence, this is a most likely a bug in glibc. Checking the documentation 3 does indicate that sincos might give a different result compared with calling sin and cos differently (but they do have a very elaborate list of measured accuracies for each function across various platforms)! All three functions, sin, cos and sincos, have up to 1 ULP error on x86_64. But the error should be the same! Hence a bug. Turns on that this was discovered 2 years ago and fixed in glibc 2.36 4.

This demonstrates how weird things can get once you have an error in the libc implementation, and the compiler assumes this bug does not exist.

So how do we compare the results from floating point computations, knowing that there might be a certain amount of error in the last place? I searched the internet without getting any good answer, so I came up with a suggestion for the doctest library 5 to add a feature to compare floats in terms of ULPs. The current draft at the time of writing this document is given below

#define DOCTEST_CONFIG_IMPLEMENT_WITH_MAIN
#include <doctest/doctest.h>
#include <cassert>
#include <limits>
#include <random>

#define MAX_ULPS 100
#define ITERATIONS 1'000'000

template <typename T>
std::enable_if_t<!std::numeric_limits<T>::is_integer, T>
ulpadd(T x, int n)
{
    if (n < 0) {
        // Negation is guaranteed to be exact for IEEE 754 floats
        return -ulpadd(-x, -n);
    }
    if (n == 0) {
        return x;
    }
    return ulpadd(std::nextafter(x, std::numeric_limits<T>::infinity()), n - 1);
}

template <class T>
std::enable_if_t<!std::numeric_limits<T>::is_integer, bool>
ulpeq(T x, T y, int n)
{
    assert(n >= 0);
    return (ulpadd(y,-n) <= x) || (x <= ulpadd(y,n));
}

TEST_CASE_TEMPLATE("ulpeq-random", T, float, double, long double)
{
    std::mt19937 g(0);
    std::uniform_int_distribution<int> ulp_distrib(-MAX_ULPS, +MAX_ULPS);
    std::lognormal_distribution<T> value_distrib(T{0}, T{20});
    for (int i = 0; i < ITERATIONS; ++i)
    {
        const int n = ulp_distrib(g);
        const T x = value_distrib(g);
        const T y = ulpadd(x, n);
        REQUIRE(ulpeq(x, y, abs(n)));
        REQUIRE(ulpeq(y, x, abs(n)));
    }
}

TEST_CASE_TEMPLATE("ulpeq-edge-cases", T, float, double, long double)
{
    // Extreme values values
    REQUIRE(ulpeq(std::numeric_limits<T>::max(), std::numeric_limits<T>::max(), 0));
    REQUIRE(ulpeq(+std::numeric_limits<T>::min(), +std::numeric_limits<T>::min(), 0));
    REQUIRE(ulpeq(T{0}, T{0}, 0));
    REQUIRE(ulpeq(-std::numeric_limits<T>::min(), -std::numeric_limits<T>::min(), 0));
    REQUIRE(ulpeq(std::numeric_limits<T>::lowest(), std::numeric_limits<T>::lowest(), 0));

    // Infinities
    REQUIRE(ulpeq(+std::numeric_limits<T>::infinity(), +std::numeric_limits<T>::infinity(), 0));
    REQUIRE(ulpeq(-std::numeric_limits<T>::infinity(), -std::numeric_limits<T>::infinity(), 0));
    REQUIRE(ulpeq(-std::numeric_limits<T>::infinity(), std::numeric_limits<T>::lowest(), 1));
    REQUIRE(ulpeq(std::numeric_limits<T>::lowest(), -std::numeric_limits<T>::infinity(), 1));
    REQUIRE(ulpeq(std::numeric_limits<T>::infinity(), std::numeric_limits<T>::max(), 1));
    REQUIRE(ulpeq(std::numeric_limits<T>::max(), std::numeric_limits<T>::infinity(), 1));

    // Subnormals
    for (int i = 1; i <= MAX_ULPS; ++i) {
        REQUIRE(ulpeq(T{0}, +ulpadd(T{0}, +i), i));
        REQUIRE(ulpeq(T{0}, -ulpadd(T{0}, +i), i));
        REQUIRE(ulpeq(T{0}, +ulpadd(T{0}, -i), i));
        REQUIRE(ulpeq(T{0}, -ulpadd(T{0}, -i), i));
    }
}

The draft implements a function ulpeq which checks if two floating point values are equal within a specified number of ULPs. There are two unit tests, one randomized (with a log-normal distribution to cover as much of the real line as possible), and one checking edge cases. If you have ideas on how to improve it, then please reach out, either on email or continue the discussion in the linked GitHub issue.

Conclusion is that floating point computations can be more mysterious than you think. If you run into issues and suspect floating point errors, compile your code in “paranoid mode” described earlier.

Appendix: Building glibc

When building GCC from source and testing different versions of glibc I had to compile that from source as well. With some help from SO for using multiple versions of libc, this is how I did it. Clone, checkout tag, create separate build folder, configure, build and install

git clone https://sourceware.org/git/glibc.git
cd glibc
git checkout glibc-2.36
cd ..
mkdir glibc-build-${version}
cd glibc-build-${version}
../glibc/configure --prefix=$(realpath ../glibc-install-${version})
make -j$(nproc)
make install