Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It is UB to use most of the library before init code is run #1046

Open
lilizoey opened this issue Feb 12, 2025 · 17 comments
Open

It is UB to use most of the library before init code is run #1046

lilizoey opened this issue Feb 12, 2025 · 17 comments
Labels
bug c: ffi Low-level components and interaction with GDExtension API ub Undefined behavior

Comments

@lilizoey
Copy link
Member

Most of the library relies on accessing the Godot binding, which is only initialized when the initialization code is run. But in most cases we only check if the initialization code is run when debug assertions are turned on.

This means that if you for instance make a unit test that accesses these APIs and run it in release mode, you'll hit UB.

Some possible solutions

  1. Check when debug assertions are turned off as well.

This would make everything safe, but would likely also incur performance penalties in situations where the initialization code will guaranteed be run, such as when actually being called by Godot.

The performance issues is why we only check when debug assertions are turned on in the first place.

  1. Track when code has been run using some kind of token in code.

This is hard to weave through every relevant part of the code, especially in a way that doesn't impact ergonomics. Additionally it will still require at least one check every time Godot calls the rust part of the code.

  1. Have an unsafe compile time flag for when the initialization code is run.

I am not entirely sure how we should decide when this flag is enabled or not. Initially i thought we could enable it when we're compiling as a cdylib but we cant actually determine that very easily, and it likely wouldn't work as expected if we could. As if you list several crate types in the Cargo.toml it'll compile against all of them at the same time.

So if for instance we had crate-type = ["rlib", "cdylib"] then anything behind cfg(crate-type = "cdylib") would also get compiled for the rlib target anyway.

See also this issue: rust-lang/rust#20267

We could also just require the user to explicitly enable this flag somehow, say a feature. But that means that suddenly people's code will become slower and it'd be an extra hurdle for making code run fast. But maybe we could add some useful warnings for when people are running the initialization code in release mode? since if the initialization code gets run in release mode they're likely in a situation where they'd want this feature enabled. We might also be able to add some other useful heuristics to give information to the user when they may be using it wrong.

  1. Make compiling in release mode unsafe

I.e we basically just say "you need to make sure you don't access the Godot API before the initialization code is run when you compile code in release mode".

This is very similar to the previous point but we basically just use debug_assertions as the relevant flag (or rather it's the inverse of the previous point, as not using this flag is what is unsafe).

However i think this is very strange from a Rust-perspective. In Rust it is generally assumed that --release wont make anything unsound, but will merely make the compiled code faster, compilation slower, and debug information worse. For beginners to the library it may be helpful, but in the long run i suspect it may be more confusing than anything else, leading to some subtle bugs.

@lilizoey lilizoey added bug c: ffi Low-level components and interaction with GDExtension API ub Undefined behavior labels Feb 12, 2025
@Bromeon
Copy link
Member

Bromeon commented Feb 23, 2025

Thanks for formalizing this and coming up with suggestions! Apart from the UB, agreeing on a way forward here would also allow us to make many of the low-level functions safe (under the assumption that Godot is loaded). At the moment it's a bit of a theatre: FFI functions are unsafe, but their callers don't really have more guarantees and yet those are safe.


3. Track when code has been run using some kind of token in code.

This is hard to weave through every relevant part of the code, especially in a way that doesn't impact ergonomics. Additionally it will still require at least one check every time Godot calls the rust part of the code.

Unfortunately, I don't see this as viable, either. Even basic functionality like godot_print! or GString::new() would need this token, unleashing a hell of boilerplate onto everyone. But it's a nice idea, maybe there are ways to use such type-state approaches in more limited contexts 🤔


3. Have an unsafe compile time flag for when the initialization code is run.

I am not entirely sure how we should decide when this flag is enabled or not. Initially i thought we could enable it when we're compiling as a cdylib but we cant actually determine that very easily, and it likely wouldn't work as expected if we could. As if you list several crate types in the Cargo.toml it'll compile against all of them at the same time.

So if for instance we had crate-type = ["rlib", "cdylib"] then anything behind cfg(crate-type = "cdylib") would also get compiled for the rlib target anyway.

See also this issue: rust-lang/rust#20267

We could also just require the user to explicitly enable this flag somehow, say a feature. But that means that suddenly people's code will become slower and it'd be an extra hurdle for making code run fast. But maybe we could add some useful warnings for when people are running the initialization code in release mode? since if the initialization code gets run in release mode they're likely in a situation where they'd want this feature enabled. We might also be able to add some other useful heuristics to give information to the user when they may be using it wrong.

Even cfg(crate-type = "cdylib") wouldn't guarantee safety; it would just state the the crate is used as a dynamic library, but it can be invoked by anything, doesn't have to be Godot. Even Godot would cause UB if the cdylib is treated as a GDNative library and loaded by Godot 3.

So it seems to me it's not just a matter of tooling -- at some point, the user needs to provide the guarantee "I use this as a dynamic library from a binary-compatible Godot engine version". And since this is the entire purpose of this project, it feels a bit strange to require extra steps (such as Cargo features) for this?

Unit tests could maybe be covered with #[cfg(test)] (although that flag has some pitfalls), are there other situations we should realistically protect against? Are there people trying to call godot-rust APIs from a regular Rust binary? (Debug mode is already covered; we're mostly talking about straight-to-release builds, which is somewhat unlikely on its own).


  1. Make compiling in release mode unsafe

I.e we basically just say "you need to make sure you don't access the Godot API before the initialization code is run when you compile code in release mode".

This is very similar to the previous point but we basically just use debug_assertions as the relevant flag (or rather it's the inverse of the previous point, as not using this flag is what is unsafe).

However i think this is very strange from a Rust-perspective. In Rust it is generally assumed that --release wont make anything unsound, but will merely make the compiled code faster, compilation slower, and debug information worse. For beginners to the library it may be helpful, but in the long run i suspect it may be more confusing than anything else, leading to some subtle bugs.

I agree it's a bit strange, but if other approaches don't work, this might be our best option?

The problem is that unlike when using godot-rust the intended way, we cannot force someone to implement an unsafe impl ExtensionLibrary which bears certain assumptions; users can try to access any public APIs directly.


  1. Check when debug assertions are turned off as well.

This would make everything safe, but would likely also incur performance penalties in situations where the initialization code will guaranteed be run, such as when actually being called by Godot.

Maybe worth mentioning that "in situations where" is 99.99% of the library usage 😉

I made some very superficial experiments with #[bench] and it seemed like both static mut bool and static AtomicBool had an impact, although tiny. It might need proper analysis in Godbolt, but I imagine it's hard to optimize atomic bool checks completely away. And since every single Godot interaction would require an extra check, even though in practice they're guaranteed to pass, we would give away performance for no benefit.

One idea might also be to use intrinsics such as unlikely and hope that branch prediction will be enough to make conditions non-noticeable. While those are nightly-only, the #[cold] attribute is stable, and there are ways to emulate likely/unlikely based on #[cold].

We would need to measure this. We might still pay a bit (e.g. larger instruction size for checks, thus tiny chance of other code to be cache-missed), but it would likely be the least impactful version of runtime checks.


Maybe last, how are others dealing with this problem? I know we're in a somewhat niche spot in that we not only expose a C API, but also allow custom Rust code to be executed. But are we really the only such project? 🤔

@lilizoey
Copy link
Member Author

lilizoey commented Mar 3, 2025

another option i just suddenly thought of, can we initialize the values to some kind of mock values initially? values and functions that are safe to access but which do nothing/panic when actually used.

@Bromeon
Copy link
Member

Bromeon commented Mar 3, 2025

It's a great idea! 💡

The Godot class methods are of type sys::GDExtensionMethodBindPtr and all have different (high-level) signatures. But we might not even need to mock those, because they aren't called directly, but first fetched.

However, we'd possibly need to mock the "foundation" of the GDExtension C binding, i.e. the generated sys::GDExtensionInterface with all the function pointers:

pub struct GDExtensionInterface {
    pub get_godot_version: crate::GDExtensionInterfaceGetGodotVersion,
    pub mem_alloc: crate::GDExtensionInterfaceMemAlloc,
    pub mem_realloc: crate::GDExtensionInterfaceMemRealloc,
    pub mem_free: crate::GDExtensionInterfaceMemFree,
    pub print_error: crate::GDExtensionInterfacePrintError,
    pub print_error_with_message: crate::GDExtensionInterfacePrintErrorWithMessage,
    ...
    pub get_native_struct_size: crate::GDExtensionInterfaceGetNativeStructSize,
    pub variant_new_copy: crate::GDExtensionInterfaceVariantNewCopy,
    pub variant_new_nil: crate::GDExtensionInterfaceVariantNewNil,
    pub variant_destroy: crate::GDExtensionInterfaceVariantDestroy,
    ...
}

Which would also mean distinct signatures for each. And their signatures aren't declarative, we'd need a C parser or... write C directly or... modify bindgen output 🤔

Then we'd need to give the global BINDING variable an initial value, which would need to be const. I assume this could work if all function pointers are generated and thus statically known?

@lilizoey
Copy link
Member Author

lilizoey commented Mar 4, 2025

i feel like the nicest outcome of this is if we can call interface functions by doing like interface.variant_new_copy(..) rather than something like interface_fn!(variant_new_copy)(..). not sure if that's actually possible but it would be nice.

@lilizoey
Copy link
Member Author

lilizoey commented Mar 4, 2025

It's a great idea! 💡

The Godot class methods are of type sys::GDExtensionMethodBindPtr and all have different (high-level) signatures. But we might not even need to mock those, because they aren't called directly, but first fetched.

I dont think we need to mock any of the method tables actually, since they can only be accessed by going through classes that can only be created once their respective method table has been loaded. might need to rework the safety invariants slightly to take into account that this assumption is load-bearing, but if im right then it should be fine to leave that basically as-is.

@lilizoey
Copy link
Member Author

lilizoey commented Mar 5, 2025

so i've been experimenting a bit, here's a branch where i mostly managed to roughly implement this:
master...lilizoey:gdextension:refactor/mock-init

this passes integration tests on my machine for single threaded usage. but is also very rough.

It's a great idea! 💡

The Godot class methods are of type sys::GDExtensionMethodBindPtr and all have different (high-level) signatures. But we might not even need to mock those, because they aren't called directly, but first fetched.

However, we'd possibly need to mock the "foundation" of the GDExtension C binding, i.e. the generated sys::GDExtensionInterface with all the function pointers:

pub struct GDExtensionInterface {
pub get_godot_version: crate::GDExtensionInterfaceGetGodotVersion,
pub mem_alloc: crate::GDExtensionInterfaceMemAlloc,
pub mem_realloc: crate::GDExtensionInterfaceMemRealloc,
pub mem_free: crate::GDExtensionInterfaceMemFree,
pub print_error: crate::GDExtensionInterfacePrintError,
pub print_error_with_message: crate::GDExtensionInterfacePrintErrorWithMessage,
...
pub get_native_struct_size: crate::GDExtensionInterfaceGetNativeStructSize,
pub variant_new_copy: crate::GDExtensionInterfaceVariantNewCopy,
pub variant_new_nil: crate::GDExtensionInterfaceVariantNewNil,
pub variant_destroy: crate::GDExtensionInterfaceVariantDestroy,
...
}

Which would also mean distinct signatures for each. And their signatures aren't declarative, we'd need a C parser or... write C directly or... modify bindgen output 🤔

This is a bit annoying yeah, i was able to parse the important functions by just modifying the regex a bit and making a simple c type parser. but it's not very reliable and stable. i looked a bit into c parsers, and these are the ones i found: cexpr is very limited, mini-c-parser and lang-c are both not very widely used so hard to judge if they're good, and tree-sitter-c requires learning how to use tree sitter, which is a lot.

we also need to modify the utility and lifecycle tables to have similar mock initialization values, since they arent optional. though maybe we could find another way to do it?

Then we'd need to give the global BINDING variable an initial value, which would need to be const. I assume this could work if all function pointers are generated and thus statically known?

this isnt too hard, once the above codegen stuff is dealt with. but figuring out exactly everything that needs to be done is a lot, and we'd probably need to rethink the binding storages a bit.

one issue is that i dont entirely know if we can let people keep stale function pointers after the library is deinitialized. it might be fine, since it'll only happen for hot reloading, but im not sure. we might need to enable some extra checks if hot reloading is enabled.

i'll probably take another stab at it at some point, but at least in principle this shows that it should be doable i think. now it's more about figuring out the higher level (though still low-level overall) api for this.

@Bromeon
Copy link
Member

Bromeon commented Mar 5, 2025

Very cool to see that there's at least a possible path forward here 👍

We might get far enough with some heuristics for function declarations, without needing to support the majority of the C syntax. Otherwise, before we start a huge not-invented-here project, we can definitely look into the lightweight parser libraries you suggested.

I was also wondering if bindgen has anything in store. The only thing I found is ParseCallbacks, but it seems to be very limited -- the trait only exposes names of some items, not their C code, let alone an AST.


one issue is that i dont entirely know if we can let people keep stale function pointers after the library is deinitialized. it might be fine, since it'll only happen for hot reloading, but im not sure. we might need to enable some extra checks if hot reloading is enabled.

But even hot reloading should unload the library completely, if implemented correctly, no? One problem was that some parts (TLS destructors) kept the library loaded, but hopefully this is mostly fixed 🤔

Otherwise, if we have an initial value where all function pointers are set to dummies, we could technically reset the binding to that immediately before the final line of godot-rust code runs?


All that said, I have some concerns:

  1. Implementing a sophisticated C parser is becoming a bit of a stretch 🙈 We went from "we should formally state that things are initialized" to "we need to see how we can parse C code" 😀

    • We need to make sure we don't deviate too far from our core goal, providing bindings to the Godot engine.
    • If small external crates help to reduce the work on our side, let's do that. It might also be possible to pre-generate some code into the gdextension-api crate, so that users don't have to recompile it (except for api-custom).
  2. There may also be quite a bit of complexity resulting from all the extra parsing and specialized codegen.

  3. If we have to generate hundreds to thousands of dummy functions, this may have an impact on binary size, memory usage and loading times, especially on mobile/Wasm platforms. We already observed negative impact from the function tables, which is why lazy-function-tables exists.

  4. We then need to ask ourselves, are we doing all the above only to save ourselves from an unlikely bool check? We don't even know if that's faster -- or maybe swapping out function pointers is harmful for other LLVM optimizations...

We need to keep in mind that solve a primarily formal problem, while the practical implications are limited to "you run a test in Release mode" and similar edge cases. Although making more code safe, like in your PoC, is a very nice result indeed. But we need to see how much machinery is necessary to achieve that -- maybe simpler approaches are more reasonable 🙂

@lilizoey
Copy link
Member Author

lilizoey commented Mar 6, 2025

I was also wondering if bindgen has anything in store. The only thing I found is ParseCallbacks, but it seems to be very limited -- the trait only exposes names of some items, not their C code, let alone an AST.

I did look into it a bit and from what i can tell bindgen only really lets you change the bindings that they generate. maybe it's possible to hack it into generating the entire interface struct and tables and stuff, but it seems difficult. under the hood it does use cexpr but im not entirely sure how and to what degree.

one issue is that i dont entirely know if we can let people keep stale function pointers after the library is deinitialized. it might be fine, since it'll only happen for hot reloading, but im not sure. we might need to enable some extra checks if hot reloading is enabled.

But even hot reloading should unload the library completely, if implemented correctly, no? One problem was that some parts (TLS destructors) kept the library loaded, but hopefully this is mostly fixed 🤔

Otherwise, if we have an initial value where all function pointers are set to dummies, we could technically reset the binding to that immediately before the final line of godot-rust code runs?

I guess yeah, do we even need deinitialization functions for the binding itself then?

All that said, I have some concerns:

1. Implementing a sophisticated C parser is becoming a bit of a stretch 🙈 We went from "we should formally state that things are initialized" to "we need to see how we can parse C code" 😀
   
   * We need to make sure we don't deviate too far from our core goal, providing bindings to the Godot engine.
   * If small external crates help to reduce the work on our side, let's do that. It might also be possible to pre-generate some code into the `gdextension-api` crate, so that users don't have to recompile it (except for `api-custom`).

Yeah definitely, we're parsing the header with regex currently and it's kinda fragile. if one of the c parsing crates are small and effective we could consider using that instead even if we dont end up doing this.

3. If we have to generate hundreds to thousands of dummy functions, this may have an impact on binary size, memory usage and loading times, especially on mobile/Wasm platforms. We already observed negative impact from the function tables, which is why `lazy-function-tables` exists.

Yeah, though luckily the function tables each have a single function signature so they just need one dummy function. builtins lifecycle has multiple different signatures, but a lot of them have very similar signatures. we might be able to get away with less than 10 dummy functions there. interface is a bit more tricky, a lot of the functions take pointers of different types but we could just type erase the pointers and coalesce a bunch of them into one dummy, but it also has integers, bools, and floats in some places so would probably need a bunch of functions. but it does also "only" have 150 ish functions to begin with. my guess is we'd probably only need like 100 ish dummy functions?

4. We then need to ask ourselves, are we doing all the above _only_ to save ourselves from an [unlikely](https://doc.rust-lang.org/std/intrinsics/fn.unlikely.html) bool check? We don't even know if that's faster -- or maybe swapping out function pointers is harmful for other LLVM optimizations...

we can't use unlikely but we can use #[cold] yeah. and im not sure about swapping out function pointers tbh. my initial guess is that it might not change much in terms of optimizations, since we already need to initialize the pointers at some point anyway and the pointers come from outside so im skeptical llvm can optimize much based on it. but i dont know really.

@lilizoey
Copy link
Member Author

lilizoey commented Mar 6, 2025

So mini-c-parser seems to not really be usable for this purpose at least according to this issue.

lang-c is usable, though it parses very much according to the c-standard. and the formal c ast is kinda weird tbh. Here's an example of me trying to take a top-level element and parsing it as the kind of typedef that we are dealing with in the gdextension interface:

code
struct Typedef<'a> {
    ty: &'a Node<TypeSpecifier>,
    kind: &'a Node<Declarator>,
}

fn parse_typedef(decl: &Node<ExternalDeclaration>) -> Result<Typedef, String> {
    let declaration = match &decl.node {
        ExternalDeclaration::Declaration(node) => node,
        other => return Err(format!("not a declaration, got {other:?}")),
    };

    let specifiers = &declaration.node.specifiers;
    let declarators = &declaration.node.declarators;

    let storage_class = specifiers
        .get(0)
        .ok_or(format!("specifiers missing <typedef> storage class"))?;
    let storage_class = match &storage_class.node {
        DeclarationSpecifier::StorageClass(node) => node,
        other => {
            return Err(format!(
                "first specifier isn't storage class, got: {other:?}"
            ));
        }
    };
    if !matches!(storage_class.node, StorageClassSpecifier::Typedef) {
        return Err(format!(
            "storage class isn't typedef, got: {:?}",
            storage_class.node
        ));
    }

    let type_specifier = specifiers
        .get(1)
        .ok_or(format!("specifiers missing type specifier"))?;
    let type_specifier = match &type_specifier.node {
        DeclarationSpecifier::TypeSpecifier(node) => node,
        other => {
            return Err(format!(
                "second specifier isn't type specifier, got: {other:?}"
            ));
        }
    };

    if specifiers.len() > 2 {
        return Err(format!(
            "too many declaration specifiers, expected 2 got {}",
            specifiers.len()
        ));
    }

    let init_declaration = declarators
        .get(0)
        .ok_or(format!("missing init declarator"))?;

    if declarators.len() > 1 {
        return Err(format!(
            "multiple init declarators are unsupported, got {}",
            declarators.len()
        ));
    }

    if init_declaration.node.initializer.is_some() {
        return Err(format!("init declarator should not have initializer"));
    }

    let init_declarator = &init_declaration.node.declarator;

    Ok(Typedef {
        ty: type_specifier,
        kind: init_declarator,
    })
}

Im not 100% sure this is fully correct but it should be pretty close. it also requires either gcc or clang in order to preprocess the c-file first. the biggest issue with it really is just a lack of QOL functions to help access stuff, it's basically just working with the C AST.

@lilizoey
Copy link
Member Author

lilizoey commented Mar 6, 2025

I think it would be possible to use bindgen by using an attribute macro. It's possible to tell bindgen to add attributes to the items it creates, so we could make an attribute macro that creates a dummy function with some known naming convention. then we can reuse the current interface generation code and not have to parse the C code any more than we currently do.

edit: this apparently only works for enums, structs, and unions. so we'd need to make bindgen generate functions as newtype wrappers. this is possible by changing the default alias style for typedefs, but this changes everything and not just the functions.

@Bromeon
Copy link
Member

Bromeon commented Mar 6, 2025

Thanks for the research!


I guess yeah, do we even need deinitialization functions for the binding itself then?

Yes, probably not -- I added those at the time because I had a misconception about hot-reloading on Linux, but once that cleared up I left them, as I thought they may come in handy one day.


Yeah definitely, we're parsing the header with regex currently and it's kinda fragile. if one of the c parsing crates are small and effective we could consider using that instead even if we dont end up doing this.

Most of the regex is about the comment though -- we'd still need to parse that even if we have a C parser. I'd recommend to change this only once we hit the limits with regex and not prematurely, as it has worked well so far. (I'm happy to fine-tune parts of the regex like you did in the PoC).


Im not 100% sure this is fully correct but it should be pretty close. it also requires either gcc or clang in order to preprocess the c-file first. the biggest issue with it really is just a lack of QOL functions to help access stuff, it's basically just working with the C AST.

Having gcc/clang dependency is not ideal, although bindgen itself also needs LLVM. But many people don't need this as it's moved to gdextension-api.

And also, the code to iterate through the AST doesn't really look simple, and it likely suffers from the same problem as parsing the code "by hand": if we forget to handle a certain syntax, it still breaks. I think I underestimated this a bit, but the library tries to be as general as possible...

Considering that we're dealing with a very small subset of C if we only check the typedef declarations -- and there's even extra information like @param names we can use if that helps -- it seems like parsing raw C may still be easier. We could just make it fail in CI if ever the header introduces new syntax -- that usually happens long before stable releases.


I think it would be possible to use bindgen by using an attribute macro. It's possible to tell bindgen to add attributes to the items it creates, so we could make an attribute macro that creates a dummy function with some known naming convention. then we can reuse the current interface generation code and not have to parse the C code any more than we currently do.

edit: this apparently only works for enums, structs, and unions. so we'd need to make bindgen generate functions as newtype wrappers. this is possible by changing the default alias style for typedefs, but this changes everything and not just the functions.

Too bad, that sounded too nice 😔 changing everything just for this seems a bit excessive.


we can't use unlikely but we can use #[cold] yeah. and im not sure about swapping out function pointers tbh. my initial guess is that it might not change much in terms of optimizations, since we already need to initialize the pointers at some point anyway and the pointers come from outside so im skeptical llvm can optimize much based on it. but i dont know really.

I just think our approach so far is largely based on guessing, and for a change this deep into the FFI binding, it would probably not hurt to have some real data. I wonder if it's possible to either do benchmarking without changing too much... or maybe have a minimal project without Godot which is still reproducing things reasonably closely? I don't want us to spend too much time on measuring either...


TLDR

I have to admit, replacing the function pointers sounds like a cool exercise, and personally I'd like to see it 😉 but I'm not sure if it's really the best for the project?

Maybe to approach this evaluation as objectively as possible:

  1. We should find out if the unlikely bool is adding measurable performance impact (let's test static mut just to get things out of the way, then we can also check with AtomicBool).
  2. If yes, we should also check if swapping function pointers actually performs better. And that the (100-200?) dummy pointers don't cause any other notable downsides 🤔
  3. If the fptr-swapping is the way to go, we should probably try the simplest approach first (least amount of new code/complexity needed). It seems like this may still be parsing raw C?

Other ideas on the approach?

@Bromeon
Copy link
Member

Bromeon commented Mar 6, 2025

Regarding an earlier comment of yours:

i feel like the nicest outcome of this is if we can call interface functions by doing like interface.variant_new_copy(..) rather than something like interface_fn!(variant_new_copy)(..). not sure if that's actually possible but it would be nice.

I think we could provide small #[inline] wrappers for the function pointers.

struct Interface {...};

impl Interface {
    /// [4.1] Creates a new Variant containing nil.
    /// 
    /// # Signature
    /// - `r_dest`: A pointer to the destination Variant.
    #[inline]
    pub unsafe fn variant_new_nil(&self, r_dest: GDExtensionUninitializedVariantPtr) {
        assert_initialized();
        (self.raw_variant_new_nil)(r_dest)
    }
    ...
}

Since they're effectively global, interface.variant_new_nil(...) may not be very practical though.
With a module, something like ffi::variant_new_nil(...) could be possible.

@lilizoey
Copy link
Member Author

lilizoey commented Mar 6, 2025

I think it would be possible to use bindgen by using an attribute macro. It's possible to tell bindgen to add attributes to the items it creates, so we could make an attribute macro that creates a dummy function with some known naming convention. then we can reuse the current interface generation code and not have to parse the C code any more than we currently do.
edit: this apparently only works for enums, structs, and unions. so we'd need to make bindgen generate functions as newtype wrappers. this is possible by changing the default alias style for typedefs, but this changes everything and not just the functions.

Too bad, that sounded too nice 😔 changing everything just for this seems a bit excessive.

So we can actually selectively change it by passing a couple of regexes for the function names, i came up with these two:

const FUNC_REGEX: &str = r#"(?x)
    ^(

    # Starts with
    (GDExtensionsInterface|GDExtensionInterface|GDExtensionScriptInstance|GDExtensionClass|GDExtensionCallable).+
    
    |
    
    # Ends with
    .+(Func|Method|Constructor|Destructor|Setter|Getter|Checker|Function|Callback|Set|Get|Evaluator|PropertyList)
    
    # Might be numbered
    [\d]*
    
    )$
"#;

const FUNC_REGEX_EXCEPT: &str = r#"(?x)
    ^(GDExtensionClassInstancePtr|GDExtensionClassMethodArgumentMetadata|GDExtensionClassMethodFlags)$
"#;

Which are passed to bindgen like this:

let bindings = bindgen::Builder::default()
    .new_type_alias(FUNC_REGEX)
    .type_alias(FUNC_REGEX_EXCEPT)
    ...

The order seems to work out correctly, in that it first applies FUNC_REGEX and then FUNC_REGEX_EXCEPT which means that GDExtensionClassInstancePtr isn't a newtype in the end.

@lilizoey
Copy link
Member Author

lilizoey commented Mar 6, 2025

TLDR

I have to admit, replacing the function pointers sounds like a cool exercise, and personally I'd like to see it 😉 but I'm not sure if it's really the best for the project?

Maybe to approach this evaluation as objectively as possible:

1. We should find out if the unlikely bool is adding measurable performance impact (let's test `static mut` just to get things out of the way, then we can also check with `AtomicBool`).

2. If yes, we should also check if swapping function pointers actually performs _better_. And that the (100-200?) dummy pointers don't cause any other notable downsides 🤔

3. If the fptr-swapping is the way to go, we should probably try the simplest approach first (least amount of new code/complexity needed). It seems like this may still be parsing raw C?

Other ideas on the approach?

Yeah i think it would be good to get some actual data on this.

for 3: it may be yeah, or maybe we can parse the raw gdextension.rs file? im not sure it's easier but it might be.

@lilizoey
Copy link
Member Author

lilizoey commented Mar 8, 2025

So i did a little bit of testing on the benchmark times, i increased the repeat of all the benchmarks in itest/src/benchmarks/mod.rs to 1500 so that the benchmark times would be a bit bigger, leading to most times being between 2.4-2.5 seconds, then i ran the benchmarks ten times each in release mode with release godot, and i got this:

  static_mut single_threaded multi_threaded single_threaded_cold atomic once_lock mock_fns
min 2.36 2.38 2.37 2.47 2.46 2.45 2.38
max 2.46 2.43 2.40 2.54 2.51 2.50 2.48
mean 2.41 2.41 2.39 2.50 2.48 2.47 2.44
stddev 0.03 0.01 0.01 0.02 0.02 0.02 0.03

the three first ones, static_mut, single_threaded, multi_threaded, and mock_fns are all implementations that dont check if the bindings are initialized on access, and the last three single_threaded_cold, atomic, and once_lock all do check if the bindings are initialized on access.

  • static_mut just stores the bindings as an Option<GodotBinding> in a static mut, and uses unwrap_unchecked in the accessor method
  • single_threaded and multi_threaded are the current implementations.
  • single_threaded_cold checks but uses #[cold] for when the bindings are uninitialized.
  • atomic uses an AtomicU8 to track initialization
  • once_lock uses OnceLock to track initialization
  • mock_fns is my above implementation

I also made one that uses Mutex but it turned out to be really annoying to weave through everything since i can't have it return a &'static GodotBinding like all the other ones.

As we can see there isn't a big difference between the ones that dont check (in fact according to this data the multi-threaded implementation is better on average than both the other ones?). and checking seems to incur about a 2.5-4.6% penalty. finally mock_fnsis better than checking, but not quite as good as not checking (about 1.7-2.1% difference) however it is also pretty inconsistent. i wasnt trying to make something super optimized either so it's possible some of the difference between not checking and pointer swapping would go away if we paid more attention to performance with it.

one thing i could consider doing is rerunning all the above, but making two versions of each. one that checks and one that doesnt. should give a more accurate picture of the impact of checking.

full data
  static_mut single_threaded multi_threaded single_threaded_cold atomic once_lock mock_fns
  2.39 2.40 2.37 2.50 2.46 2.47 2.48
  2.42 2.41 2.40 2.52 2.46 2.47 2.42
  2.37 2.40 2.40 2.51 2.48 2.47 2.44
  2.46 2.41 2.38 2.49 2.51 2.47 2.38
  2.42 2.42 2.40 2.51 2.48 2.45 2.42
  2.40 2.38 2.38 2.50 2.47 2.45 2.47
  2.44 2.40 2.39 2.47 2.47 2.49 2.45
  2.41 2.43 2.40 2.47 2.47 2.47 2.43
  2.36 2.41 2.38 2.54 2.50 2.47 2.46
  2.38 2.40 2.37 2.48 2.47 2.50 2.47
               
min 2.36 2.38 2.37 2.47 2.46 2.45 2.38
max 2.46 2.43 2.40 2.54 2.51 2.50 2.48
mean 2.41 2.41 2.39 2.50 2.48 2.47 2.44
stddev 0.03 0.01 0.01 0.02 0.02 0.02 0.03

@Bromeon
Copy link
Member

Bromeon commented Mar 12, 2025

Thanks so much, fantastic work! ❤️

single_threaded_cold checks but uses #[cold] for when the bindings are uninitialized.

How is the check done in this case? bool, Option (static mut), AtomicBool or something else?

Did you test a run that used a static mut bool but with checks (no unwrap_unchecked)? Even if thread-unsafe, that would give us a baseline for how much the pure check might cost, without synchronization or messing with #[cold] intrinsics.


Not directly related, but currently we disallow combining experimental-threads and lazy-function-tables due to thread-unsafe lazy initialization. That might also be addressed when solving this issue.

@lilizoey
Copy link
Member Author

Thanks so much, fantastic work! ❤️

single_threaded_cold checks but uses #[cold] for when the bindings are uninitialized.

How is the check done in this case? bool, Option (static mut), AtomicBool or something else?

it's just calling the is_initialized function and if false it calls a function marked with #[cold] that panics.

Did you test a run that used a static mut bool but with checks (no unwrap_unchecked)? Even if thread-unsafe, that would give us a baseline for how much the pure check might cost, without synchronization or messing with #[cold] intrinsics.

oh i did do a test where i added a check to the static mut case actually, it changed the mean from 2.41 to 2.46 which is about a 2% increase.

static_mut
no_check check
2.39 2.46
2.42 2.45
2.37 2.44
2.46 2.48
2.42 2.45
2.40 2.45
2.44 2.47
2.41 2.50
2.36 2.46
2.38 2.45
min 2.36 2.44
max 2.46 2.50
mean 2.41 2.46
stddev 0.03 0.02

Not directly related, but currently we disallow combining experimental-threads and lazy-function-tables due to thread-unsafe lazy initialization. That might also be addressed when solving this issue.

true yeah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug c: ffi Low-level components and interaction with GDExtension API ub Undefined behavior
Projects
None yet
Development

No branches or pull requests

2 participants