Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Godot-rust is hard depending on fragile parts of the Godot API and panicking if anything changes. #423

Closed
lyuma opened this issue Sep 23, 2023 · 12 comments · Fixed by #424
Closed
Labels
bug c: engine Godot classes (nodes, resources, ...)

Comments

@lyuma
Copy link

lyuma commented Sep 23, 2023

Ideally, the FFI would only load methods which are in use, and panic if the hash of the used APIs change.

However, currently, Godot rust gdext is loading all functions exported by the Godot API upfront, including deprecated and experimental methods (which by nature are subject to change).

In a particular example, Godot 4.2 has a few changes pending which could change method hashes on rarely used APIs. Sure, it would be better if they didn't, but it could conceivably happen.
And more worryingly, we might have future changes planned which may change some APIs marked as experimental (which in general should not be used if you want forward compatibility).

However, because ClassSceneMethodTable::load loads all methods defined in the JSON, used or not, it guarantees that any slight change in the Godot API even to a rarely-used, experimental or deprecated function, will break every godot-rust extension.

A good start on improving this might be to put experimental or deprecated APIs behind a flag, so at least those are not exported except for extensions that need them.
Beyond that, perhaps if there is some way to memoize APIs used (similar to how a C++ function-level static member is initialized exactly once on first call).

I'm not sure exactly, but I'm just filing this because it does seem like it could be a continuing point of frustration for deploying Godot rust plugins into the long term.

I know Godot 4.0 and Godot 4.1 had explicit GDExtension compatibility breakage because GDExt was new. But it is not intended to break the whole API globally, and ideally almost all APIs will remain compatible where possible.

I'm not really a contributor on the Godot GDExtension team, but I just ran into this and I am responsible for some of the APIs that might be subject to change, which are breaking all of my rust plugins, so I want to flag this issue and start the discussion on how to improve this situation, be it on Godot's side or Godot-rust's side.

@Bromeon
Copy link
Member

Bromeon commented Sep 23, 2023

Thanks for bringing this up! This is a good discussion to have.

However, currently, Godot rust gdext is loading all functions exported by the Godot API upfront, including deprecated and experimental methods (which by nature are subject to change).

Experimental, absolutely. I already planned to hide them behind a feature flag 🙂
There's even a TODO in the code:

// TODO feature-gate experimental classes.
/*
if !cfg!(feature = "experimental-godot-api") && is_class_experimental(class_name) {
return true;
}
*/

But should it also apply to deprecated methods? With the exception of rare cases that are totally broken, deprecated methods are supposed to keep working -- this is the whole point of compatibility after all?

@Bromeon
Copy link
Member

Bromeon commented Sep 23, 2023

Ideally, the FFI would only load methods which are in use, and panic if the hash of the used APIs change. However, currently, Godot rust gdext is loading all functions exported by the Godot API upfront, [...]

I'm not sure exactly, but I'm just filing this because it does seem like it could be a continuing point of frustration for deploying Godot rust plugins into the long term.

Worth noting here is that you have the same problems in other bindings, but you move the problem to runtime. Sure, you won't see it if you never use these methods. But imagine this: you develop your game with Godot 4.1, test it extensively. Then you update to 4.2, still test it. But you miss one rarely used code branch which calls experimental_method_that_now_breaks(), and your game explodes in release mode at the player. godot-rust would have caught this.

What I am missing in Godot is a more reliable way to know which APIs are stable, experimental, deprecated etc. This is often "implicit" knowledge, sometimes part of XML docs, sometimes just by convention, but most importantly it's not part of extension_api.json. Do you think it would make sense to expose more such information? It would require a bit more discipline but would manage expectations better. Even if there are hard breaking changes, that's fine, but at least developers have the chance to always know about them (through automated tooling).

@lyuma
Copy link
Author

lyuma commented Sep 23, 2023

Putting experimental behind a feature flag would address the most important concern, so I'd appreciate that.

And yes you are technically correct: any deprecated method / class should remain for the rest of the major version. Removing or renaming a method is against the policy (EDIT: linking to godot's compat policy https://docs.godotengine.org/en/stable/about/release_policy.html#what-are-the-criteria-for-compatibility-across-engine-versions ), so it shouldn't happen unless there is a very very good reason. However, everything in life is a trade-off, and there could come a time where an extremely rarely used method is deemed better to be removed or altered than kept, in order to achieve a major benefit in some form to the code.

I will provide the concrete example which is leading to this discussion: godotengine/godot#80813

This wasn't technically against policy, because the godot policy hadn't considered that changing the name of an enum parameter type would break compatibility. It only talks about adding or removing parameters needing a compat api. But now there's a problem here because two identical enums in two separate classes had to be merged in order to create a common base class method.

The issue I have basically is fragility anywhere, in any api, however unlikely that particular API is to be called, for any reason, will entirely break godot rust extensions.

I've also brought this up to gdextension and I think Tokage is adding a workaround in the PR I linked.

@Bromeon
Copy link
Member

Bromeon commented Sep 23, 2023

However, everything in life is a trade-off, and there could come a time where an extremely rarely used method is deemed better to be removed or altered than kept, in order to achieve a major benefit in some form to the code.

I agree, such things happen. But I also think it's justified to notify the developer about this, by not letting their function_that_previously_worked() compile anymore, instead of having a bug at runtime.


The issue I have basically is fragility anywhere, in any api, however unlikely that particular API is to be called, for any reason, will entirely break godot rust extensions.

Likeliness of being called is not a factor to me -- actually the opposite. Methods that are rarely used have much greater damage potential if they suddenly stop working, because fewer users test them, and users that do call them might not notice for a long time.

To me, this shows even more that there is need for stricter categorization of experimental/stable/deprecated. Until recently, I wasn't even aware that something like "experimental" existed; only the pre-fetching of functions made me realize (and also highlighted some bugs like godotengine/godot#80852). Do you think it would be possible to officially mark such APIs in a way that automated tools could pick their status up (e.g. extra field in extension_api.json?)


I will provide the concrete example which is leading to this discussion: godotengine/godot#80813

This wasn't technically against policy, because the godot policy hadn't considered that changing the name of an enum parameter type would break compatibility. It only talks about adding or removing parameters needing a compat api. But now there's a problem here because two identical enums in two separate classes had to be merged in order to create a common base class method.

I agree that this is a special case that should probably not break, because ABI is unaffected.

But technically, it is against policy. Such renamings were not foreseen. I'm not saying that's how it should be, of course 🙂 There are some solutions to fix/workaround the policy, but I don't see this as a reason to not load methods at startup time. Again, this would have equally failed at runtime, so... what do we gain?

I can post some solutions I can think of in the other thread.

@Bromeon
Copy link
Member

Bromeon commented Sep 23, 2023

Lyuma mentioned that EditorHelp::get_doc_data()->class_list[p_type].is_experimental is an option in C++ to get the experimental status (the one in the XML). So maybe we can have this automated once we import Godot docs. Until then, a manually maintained list should do.

@lyuma
Copy link
Author

lyuma commented Sep 23, 2023

For the record, while you are the maintainer and this is your decision, I still disagree with the approach of depending on every single API, to avoid the tiny chance of a rare API being used in a rare case causing a runtime panic. This wastes a lot of space in the binary, and it exacerbates the maintainability challenges. It is permissible to runtime panic in rust: there's no absolute need to catch panics earlier rather than later.

Is there any way to use rust macros to only bind APIs from classes that are used? Even this would drastically reduce the scope of compat breakage, should the unthinkable ever happen. For example, there's almost no reason a godot-rust should ever need to hard-depend on something random like VisualShaderNodeCurveXYZTexture...
(Binding to every imaginable method probably also increases binary size)

Or perhaps the behavior of binding methods upfront vs late could be opt-in like with a compile time flag.

If this were done, then it would not be so necessary to depend on the experimental flag other than for individual method bindings perhaps.

Oh, another reason to consider only binding classes that are used: Some users like to compile their templates to only include engine features which are being used (such as disable 3D in a 2D game. or disabling some module like GLTF if the game does not load GLTF files at runtime.)

Anyway this is all I'll say on the matter. But just wanted to make this argument.

@lilizoey
Copy link
Member

lilizoey commented Sep 24, 2023

loading functions the first time they're used is in fact the easier way to implement caching than what gdext currently has, and was where some of the initial experiments with the caching came from.

i guess it could be possible to have a toggle, like debug builds load all functions on startup but release builds load them on demand or something, but that might also add a lot of complexity to the library.

personally i do lean more towards the late loading of functions. It seems more in line with our api design principles, in particular simplicity. i think most people would find it more understandable and intuitive if their code panics because they called a broken method, than their code panicking because a broken method exists somewhere even if they don't ever use it.

maybe it could somehow be possible to detect which functions are actually ever referenced but im very skeptical of that.

@Bromeon
Copy link
Member

Bromeon commented Sep 24, 2023

This wastes a lot of space in the binary,

Last time I counted, there were 924 (server) + 9756 (scene) + 306 (editor) methods. Assuming function pointers to be 8 bytes wide, (924 + 9756 + 306) * 8 == 10986 * 8 == 87'888 Bytes == 86 KiB. It is quite a bit, but Godot's binary itself is 1000 times the size, so I'm not sure if this truly matters in practice.


It is permissible to runtime panic in rust: there's no absolute need to catch panics earlier rather than later.

I still haven't heard a convincing argument why catching errors later is preferable. You're arguing from the assumptions that methods are not being used; I'm saying if they happen to be used, I don't want to learn about it on the player's machine.


Oh, another reason to consider only binding classes that are used: Some users like to compile their templates to only include engine features which are being used (such as disable 3D in a 2D game. or disabling some module like GLTF if the game does not load GLTF files at runtime.)

If the Godot binary itself is also compiled without certain modules, this workflow is already supported using the custom-godot feature flag. If we want to allow even more customization, I still think the user should explicitly exclude certain APIs which are then unavailable. Randomly running into runtime errors is a recipe for frustration.


Is there any way to use rust macros to only bind APIs from classes that are used? Even this would drastically reduce the scope of compat breakage, should the unthinkable ever happen. For example, there's almost no reason a godot-rust should ever need to hard-depend on something random like VisualShaderNodeCurveXYZTexture...

This would be great -- on release, going over all possible method invocations and only retaining those which are actually called. I have no idea how we could possibly implement that though.


loading functions the first time they're used is in fact the easier way to implement caching than what gdext currently has, and was where some of the initial experiments with the caching came from.

It's easier from an implementation point of view (keeping everything local to the function), but logically, it adds complexity because you have to consider threads. Pointers need to be initialized via OnceLock, which incurs extra synchronization on every single access, even though a write happens exactly once.

Also, this must be stored in a static, which needs to be stored somewhere. Memory-wise, this is worse than a global table, because it additionally needs to book-keep whether it has already been initialized or not.

The only way how you could save memory is to have a single global HashMap of function pointers by name. This would also be possible, but it means a) you now need to synchronize between accesses to different functions, and b) you need to go through a hash computation for each access -- so you might as well just request the function from Godot, which looks it up via hash from its ClassDB.

@Bromeon
Copy link
Member

Bromeon commented Sep 24, 2023

TLDR: I'm open to reconsider the choice of global function pointer tables, if we have the indication that this makes the experience more robust.

What does not convince me though is the line of arguing "godot-rust is fragile because it relies on stable Godot APIs, however we break them from time to time". That sounds like something that we should fix in Godot. (Of course this excludes experimental APIs.)

Starting with Godot 4.1, the agreement was that GDExtension compatibility is taken seriously, and we are building the Rust extension bindings based on this promise. So it would be nice if we can stick to that, and not find reasons why extension bindings are in the wrong if they rely on this contract, just because Godot doesn't uphold their side.

As mentioned, there are always exceptions, but those should be rare and deserve breaking user code early rather than late. Examples like the enum renaming are valid use cases, but the way the hashes are currently designed, this needs to be accounted for on Godot side and is not a reason why we should loosen the contract in Rust.

I hope that makes my rationale on the matter a bit more clear 😉 and it's also not some theoretical principle I'm trying uphold. I've been around for a large part of the bindings during Godot 3, and it was a nightmare how even patch releases kept breaking stuff. Breaking changes also make it very hard to build an ecosystem. If you have different tools that were built against multiple Godot versions, and you now need to manually port everything to the exact same version, that's just impossible to scale. As such I really hope we can do what's necessary on Godot side to enable such use cases 🙂

@RedMser
Copy link

RedMser commented Nov 14, 2024

(I'll continue discussion here since it is a similar train of thought and discussion. I can also open a new issue instead.)

@Bromeon I've run into the panics, not in the context of unstable and changing Godot APIs, but via an intentional workflow which is made impossible due to how eagerly functions are loaded and compared.

I'm making a simple app which does not need many Godot features, so for example I'm setting disable_3d on my release export template.
But the editor requires 3D to be enabled when compiling, meaning my editor and export template build must use different build configurations (which in turn have a different set of classes).

As such, I am unable to build my rust gdext for release, because the custom-godot binary may not point to a export template executable. When running cargo build, it simply opens a dialog saying this:

Error: Couldn't load project data at path ".". Is the .pck file missing?
  If you've renamed the executable, the associated .pck file should also be renamed to match the executable's name (without the extension).

Using a DLL targetting the editor's API in the export template shows this familiar error:

ERROR: [panic]
  Failed to load class method PhysicsDirectBodyState3D::get_total_gravity (hash 3360562783).
  Make sure gdext and Godot are compatible: https://godot-rust.github.io/book/gdext/advanced/compatibility.html
   at: godot_core::private::handle_panic_with_print (E:\tools\cargo\git\checkouts\gdext-76630c89719e160c\f40fa27\godot-core\src\private.rs:418)

What is the recommended course of action to take for this use case of disabled classes?

  1. Should Godot allow generating the extension API via export templates?
  2. Should godot-rust allow disabling certain classes (e.g. via build profile files generated from Godot)?
  3. Ignore missing classes when doing hash comparisons (probably not a good idea)?
  4. Rework the FFI system to only consider used APIs (lots of discussion about this above, I'm not deep enough to know how feasible this is).
  5. Maybe there are more options?

For now I can try to work around this by enabling 3D nodes. But other build flags like disabling advanced GUI, or different server implementations, etc. might cause similar problems for other users, if those also can't be set in editor builds.

@Bromeon
Copy link
Member

Bromeon commented Nov 14, 2024

@RedMser the lazy-function-tables feature is not an option for you?

@RedMser
Copy link

RedMser commented Nov 14, 2024

Oh, I had not heard of this feature.

Docs say "This feature is not yet thread-safe and can thus not be combined with experimental-threads" and initially, I needed experimental-threads to workaround a limitation of godot-rust. But this is no longer needed nowadays it seems, so lazy-function-tables works perfectly.

Thanks for making me aware of it! ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug c: engine Godot classes (nodes, resources, ...)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants