Skip to content

Conversation

@mazunki
Copy link
Contributor

@mazunki mazunki commented Oct 23, 2025

It works! All tests are passing!

@mazunki
Copy link
Contributor Author

mazunki commented Oct 23, 2025

Regarding std::launder, see https://eel.is/c++draft/ptr.launder. We used to rely on undefined behaviour here.

Thanks compiler:)

@mazunki mazunki mentioned this pull request Oct 23, 2025
@alfreb
Copy link
Contributor

alfreb commented Oct 26, 2025

Love it!

@alfreb-scalemem
Copy link

Ok, looks like the std::launder is the only thing I'm tripping over here. I think it's worth having a quick discussion on that here, to make sure at least two of us understands this new thing before we start using it. Besides that I think it all looks good!

@mazunki
Copy link
Contributor Author

mazunki commented Oct 27, 2025

C++ distinguishes addresses from objects at a conceptual (compile-time) level. Objects have lifetimes, which just /happen/ to exist at addresses. This matters when it comes to optimizations. std::launder helps us by giving us the object existing at an address.

Suppose you do something like

int *a1 = malloc(sizeof(int));
int *a2 = a1;
*a1 = 42;
printf("%d\n", *a2);  // 42

a1 = realloc(a1, 16*sizeof(int));  // assuming we get the same address back
*a1 = 1337;

printf("%d\n", *a2);  // undefined behaviour

The first print statement is guaranteed to be 42, but a2 in the second one the value of a2 may never have been updated to match because we never claimed it refers to the same object. The compiler would have been free to internally convert it to a value-copy instead of a pointer which is dereferenced later.

Instead, if we did

a1 = realloc(a1, 16*sizeof(int));  // assuming we get the same address back
*a1 = 1337;

a2 = std::launder(a2);
printf("%d\n", *a2);  // this is fine now :)

we are no longer relying on undefined behaviour. With a2 = std::launder(a2) we're basically saying "please give me the object that resides at that address now". The compiler is no longer allowed to optimize the address away, guaranteeing a print of 1337.

Of course, realloc() can actually move the pointer: this is just an example where we pretend it remains the same with a larger allocation at the end (which for small size differences is probably happening anyway, but that's beside the point). In the code, we are using placement-new which puts us in control of the address of the allocation.

@alfreb-scalemem
Copy link

C++ distinguishes addresses from objects at a conceptual (compile-time) level. Objects have lifetimes, which just /happen/ to exist at addresses. This matters when it comes to optimizations. std::launder helps us by giving us the object existing at an address.

Suppose you do something like

int *a1 = malloc(sizeof(int));
int *a2 = a1;
*a1 = 42;
printf("%d\n", *a2);  // 42

a1 = realloc(a1, 16*sizeof(int));  // assuming we get the same address back
*a1 = 1337;

printf("%d\n", *a2);  // undefined behaviour

The first print statement is guaranteed to be 42, but a2 in the second one the value of a2 may never have been updated to match because we never claimed it refers to the same object. The compiler would have been free to internally convert it to a value-copy instead of a pointer which is dereferenced later.

Instead, if we did

a1 = realloc(a1, 16*sizeof(int));  // assuming we get the same address back
*a1 = 1337;

a2 = std::launder(a2);
printf("%d\n", *a2);  // this is fine now :)

we are no longer relying on undefined behaviour. With a2 = std::launder(a2) we're basically saying "please give me the object that resides at that address now". The compiler is no longer allowed to optimize the address away, guaranteeing a print of 1337.

Of course, realloc() can actually move the pointer: this is just an example where we pretend it remains the same with a larger allocation at the end (which for small size differences is probably happening anyway, but that's beside the point). In the code, we are using placement-new which puts us in control of the address of the allocation.

Wait - if you are getting the same pointer back here, which your explanation depends on - I don't see why this is UB.

A) You get the same pointer back. Now a1 == a2. They are both int*
B) You get a different pointer back - launder doesn't help with that at all.

#include <cstdlib>
#include <cstdint>
#include <cstdio>
#include <new>

int main()
{
  int *a1 = (int *)malloc(sizeof(int));
  int *a2 = a1;
  *a1 = 42;
  printf("%d\n", *a2); // 42

  a1 = (int *)realloc(a1, 16 * sizeof(int));  // We have to cast to int*, otherwise the assignment won't work.
  *a1 = 1337;

  printf("First addr: 0x%p Second addr: 0x%p \n", a1, a2);

  a2 = std::launder(a2); // No effect?
  printf("%d\n", *a2); // undefined behaviour if the pointer changed - the old pointer is freed.
}

Compiling this locally with clang++ -Wall -Wextra -std=c++23 -o test_cpp23 test.cpp gives no warnings and the output is basically garbage because the pointer changes after realloc:

❯ ./test_cpp23
42
First addr: 0x0x60000133c180 Second addr: 0x0x600000438020 
-1995145184

I think you need to find a different example here. And the original warning I assume you got, which prompted you to introduce launder in the first place, would be very helpful. I think placement new might be more relevant, but not sure.

@mazunki
Copy link
Contributor Author

mazunki commented Oct 28, 2025

Wait - if you are getting the same pointer back here, which your explanation depends on - I don't see why this is UB.

Because the compiler has no guarantee that the same address is referring to the same object.

We know it is, this is what we intended; but realloc() here is really no different to free() + malloc(), in which case the semantics become a bit more apparent:

int *addr = malloc(sizeof(int));
int *bananas = addr;

*bananas = 42;
printf("%d\n", *addr); // 42


int *apples = addr;  // this being up here causes UB
addr = (int*) realloc(addr, sizeof(int));  // even assuming the same address, it might not refer to the same object

// int *apples = addr;  // if it was down here it'd be fine
// addr = std::launder(addr);  // or use launder instead

*apples = 1337;
printf("%d\n", *addr);

a2 = std::launder(a2); // No effect?

You're right in that a2 = std::launder(a2) has no effect value-wise or pointer-wise, but it refreshes the object that the compiler is looking at for a2. We can read it as a hint to the compiler more so than a CPU instruction.

printf("%d\n", *a2); // undefined behaviour if the pointer changed - the old pointer is freed.

We are using placement-new, this is not a concern. We are in control of the address returned.

If it's not clear, I can draw up a placement-new example, but the syntax is a bit more awkward than simply using malloc/realloc. The key insight here is that addresses and objects are conceptually different things, even if both the value at the symbol and its address is identical.

@mazunki
Copy link
Contributor Author

mazunki commented Oct 28, 2025

#include <new>
#include <cstdio>
#include <print>

struct Count { int n; };

int main() {
    const Count* counter = new const Count{3};
    int apples = counter->n;

    new (const_cast<Count*>(counter)) const Count{5};

    int bananas = counter->n; // UB
    std::println("apples={}", apples);
    std::println("bananas={}", bananas);
}
$ clang++ -Wall -Wextra -Wpedantic -O3 -std=c++23 -fsanitize=address,undefined main.cpp
$ ./a.out                                                                              
apples=3
bananas=5

=================================================================
==7668==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x562bad2c87c1 in operator new(unsigned long) (/home/maz/nyaa/launder/a.out+0x15e7c1)
    #1 0x562bad2c9c6d in main (/home/maz/nyaa/launder/a.out+0x15fc6d)

SUMMARY: AddressSanitizer: 4 byte(s) leaked in 1 allocation(s).

@mazunki
Copy link
Contributor Author

mazunki commented Oct 28, 2025

And the original warning I assume you got, which prompted you to introduce launder in the first place, would be very helpful

The warning which made me look into this is aligned_storage_t being deprecated. I'm trying to get a warning of the UB itself, but can't find a way to have the compiler tell me that other than through runtime sanitation.

Related: https://www.think-cell.com/assets/en/career/talks/pdf/think-cell_talk_lifetime.pdf

@mazunki
Copy link
Contributor Author

mazunki commented Oct 28, 2025

This example is only leaking sometimes for me, lol

#include <new>
#include <cstdio>
#include <print>

int main() {
    const int* ptr = new const int(11); // provenance A
    std::destroy_at(ptr);

    void* raw = const_cast<void*>(static_cast<const void*>(ptr));

    int* new_ptr = ::new (raw) int(42); // provenance B

    std::print("*new_ptr = {}\n", *new_ptr);       // okay
    std::print("*ptr = {}\n", *ptr);               // UB
    std::print("*ptr = {}\n", *std::launder(ptr)); // okay
}

Interesting bit: if I use println instead of print, it always fails. Gotta love some UB.

clang++ -Wall -Wextra -Wpedantic -O3 -std=c++23 -fsanitize=address,undefined -ggdb3 main.cpp

@mazunki
Copy link
Contributor Author

mazunki commented Oct 31, 2025

Cleanly rebased due to conflicts. Still good on tests.

@alfreb
Copy link
Contributor

alfreb commented Nov 11, 2025

This example is only leaking sometimes for me, lol

#include <new>
#include <cstdio>
#include <print>

int main() {
    const int* ptr = new const int(11); // provenance A
    std::destroy_at(ptr);

    void* raw = const_cast<void*>(static_cast<const void*>(ptr));

    int* new_ptr = ::new (raw) int(42); // provenance B

    std::print("*new_ptr = {}\n", *new_ptr);       // okay
    std::print("*ptr = {}\n", *ptr);               // UB
    std::print("*ptr = {}\n", *std::launder(ptr)); // okay
}

Interesting bit: if I use println instead of print, it always fails. Gotta love some UB.

clang++ -Wall -Wextra -Wpedantic -O3 -std=c++23 -fsanitize=address,undefined -ggdb3 main.cpp

Ok, so this makes a lot more sense - and it's completely different from the first example. Right? I think it's important that we understand anything new we introduce to the kernel properly.

In the alloc / realloc example literally nothing changed, unless the pointer returned from realloc changed. And if the pointer changed, there's nothing std::launder could do for us. So the only examples we should consider are examples where the pointer itself doesn't change but what is stored at that pointer does. Agreed?

@alfreb
Copy link
Contributor

alfreb commented Nov 12, 2025

We had a good discussion offline. These are the two cases for std::launder according to cppreference (copy-pasting to preserve for the future):

#include <cassert>
#include <cstddef>
#include <new>
 
struct Base
{
    virtual int transmogrify();
};
 
struct Derived : Base
{
    int transmogrify() override
    {
        new(this) Base;
        return 2;
    }
};
 
int Base::transmogrify()
{
    new(this) Derived;
    return 1;
}
 
static_assert(sizeof(Derived) == sizeof(Base));
 
int main()
{
    // Case 1: the new object failed to be transparently replaceable because
    // it is a base subobject but the old object is a complete object.
    Base base;
    int n = base.transmogrify();
    // int m = base.transmogrify(); // undefined behavior
    int m = std::launder(&base)->transmogrify(); // OK
    [assert](https://en.cppreference.com/w/cpp/error/assert.html)(m + n == 3);
 
    // Case 2: access to a new object whose storage is provided
    // by a byte array through a pointer to the array.
    struct Y { int z; };
    alignas(Y) [std::byte](https://en.cppreference.com/w/cpp/types/byte.html) s[sizeof(Y)];
    Y* q = new(&s) Y{2};
    const int f = reinterpret_cast<Y*>(&s)->z; // Class member access is undefined
                                               // behavior: reinterpret_cast<Y*>(&s)
                                               // has value "pointer to s" and does
                                               // not point to a Y object
    const int g = q->z; // OK
    const int h = std::launder(reinterpret_cast<Y*>(&s))->z; // OK
 
    [](...){}(f, g, h); // evokes [[maybe_unused]] effect
}

https://en.cppreference.com/w/cpp/utility/launder.html

I think it's absolutely terrible that reinterpret_cast<Y*>(&s) is not a pointer to Y*, making this UB;

  const int f = reinterpret_cast<Y*>(&s)->z; 

even if we can guarantee that there is in fact a valid Y at the address pointed to by &s, because we just put it there with placement new.

I also think it's scary that using std::launder when not every condition on the list of requirements is fulfilled, it can itself cause UB, but I guess that's a consequence of the fact that in the end it's just a pointer and the compiler can't have full knowledge of what a given pointer points to at any given point in time because of the halting problem.

This is apparently OK:

    const int h = std::launder(reinterpret_cast<Y*>(&s))->z; // OK

In addition to the clunky cast, we now also have to reassure the compiler that yes, I know what I'm doing, I'm both telling you that there is in fact a valid Y at this address, and that the Y at that address is still alive. In two separate statements. Should I also sign a form claiming legal responsibility?

Anyway, launder it is. I don't fully understand it, but thanks @mazunki for finding this new protective gear against the dangers I wish we didn't have to know about.

@alfreb alfreb merged commit 62337fc into includeos:main Nov 12, 2025
@mazunki mazunki deleted the bump-cxx-23 branch November 12, 2025 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants