Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to explain linker symbols used as integers (and not pointers to an allocation)? #554

Open
ia0 opened this issue Feb 4, 2025 · 4 comments

Comments

@ia0
Copy link

ia0 commented Feb 4, 2025

Is there a way to explain linker symbols used only for their address? In particular when this address is not meant as a pointer but as an integer only. I have 2 examples.

Example 1: Using a linker symbol to communicate a number

The riscv-rt crate uses a _heap_size symbol to let users configure the heap size through the linker script. In particular, they have a code that looks like this in their documentation:

extern "C" {
    static _sheap: u8;
    static _heap_size: u8;
}

fn main() {
    unsafe {
        let heap_bottom = &_sheap as *const u8 as usize;
        let heap_size = &_heap_size as *const u8 as usize;
        some_allocator::initialize(heap_bottom, heap_size);
    }
}

We could argue whether u8 is the correct type. Let's assume it's a ZST to simplify this particular case.

Example 2: Using a linker symbol to "allocate" a unique (to the program) value

The defmt crate uses static variables in specific linker section (with specific name, but this is orthogonal to this issue) to allocate identifiers for interned strings. The address of the static variable (aka the value of the symbol) is the identifier. The static variable does not represent a proper allocation, and actually won't have any allocation at all from Rust point of view (the linker section is NOLOAD).

The proc-macro generating those static variables looks like this:

    quote!(
        #[cfg_attr(target_os = "macos", link_section = #section_for_macos)]
        #[cfg_attr(not(target_os = "macos"), link_section = #section)]
        #[export_name = #sym_name]
        static #name: u8 = 0;
    )

In this case, the u8 type matters. That's how the identifiers are unique and consecutive (they start at 1 so they can fit in a u16).

Related issues

I'm creating a new issue, although there are many related issues, because I feel this particular concern of static variables without allocation is not addressed yet. Here are the related issues:

Please dedup if I missed an issue or I'm wrong in my analysis.

Theoretical suggestion

There could be an attribute to indicate when a static does not have an allocation. It is thus UB for a static to not have an allocation if it does not have this attribute. Such "inaccessible statics" can only have their address taken. They don't have an allocation and can't be dereferenced.

#[inaccessible_static]
static MY_ADDRESS_IS_AN_IDENTIFIER: u8 = 0;

unsafe extern "C" {
    #[inaccessible_static]
    static MY_ADDRESS_IS_A_VALUE: ();
}
@RalfJung
Copy link
Member

RalfJung commented Feb 8, 2025

Wow those are terrifying hacks...

The current state is quite simple: non-zero-sized statics must point to an allocation; violating this is UB. I don't know how much of this we are at liberty to change -- does LLVM even permit statics that are not backed by actual memory? Cc @nikic

@nikic
Copy link

nikic commented Feb 8, 2025

LLVM requires globals to be dereferenceable up to at least their size, so if you have a u8 static it should also be dereferenceable for one byte. If it's a ZST static, then it doesn't of course.

Of course, from a practical perspective, LLVM will never materialize a load from a global out of thin air, only hoist it out of control flow. But I don't think it's possible to specify that operationally :)

@ia0
Copy link
Author

ia0 commented Feb 8, 2025

But I don't think it's possible to specify that operationally :)

I'm assuming that this conclusion assumes current Rust and LLVM, where there is only one notion of static/global. In a world where there are 2 notions of statics/globals (one you can always deref and one you can never deref), this seems easy to specify because there's no "maybe can deref". That's essentially the #[inaccessible_static] suggestion. But given the quote below, it seems even LLVM has only one notion of globals:

LLVM requires globals to be dereferenceable up to at least their size

Note that those use-cases are not specific to Rust, so I guess most people assume stronger guarantees from LLVM (namely that it doesn't assume globals to be dereferencable unless the program does it, which is not a clean specification as you mentioned).

@RalfJung
Copy link
Member

RalfJung commented Feb 8, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants