`GString`, `StringName`: add conversions from bytes and C-strings #1062

Bromeon · 2025-03-02T11:26:57Z

Allows conversion to:

GString
StringName

from:

&CStr
&[u8]

with following encodings:

ASCII
Latin-1
UTF-8

The encoding is validated and results in a newly added StringError on failure.

This PR also adds an Encoding enum used as a parameter to the conversion constructors.

GodotRust · 2025-03-02T11:37:10Z

API docs are being generated and will be shortly available at: https://godot-rust.github.io/docs/gdext/pr-1062

lilizoey · 2025-03-03T17:37:36Z

godot-core/src/builtin/string/gstring.rs

+    pub fn try_from_cstr(cstr: &std::ffi::CStr, encoding: Encoding) -> Result<Self, StringError> {
+        Self::try_from_bytes(cstr.to_bytes(), encoding)
+    }


try_from_bytes checks for null bytes, but c-strings can't contain null bytes. idk if this is super important for performance, but may at least be worth mentioning in a comment?

Good call. I added an intermediate method taking a bool: 37bb78e

lilizoey · 2025-03-03T17:40:06Z

godot-core/src/builtin/string/mod.rs

+///
+/// Used in functions such as [`GString::try_from_bytes()`][GString::try_from_bytes] to handle multiple input string encodings.
+#[non_exhaustive]
+#[derive(Copy, Clone, Eq, PartialEq, Debug)]


maybe include Default? utf8 would make sense to me as it's the default in rust elsewhere.

This would lead to code such as:

GString::try_from_bytes(bytes, Default::default())

which I find much less readable than:

GString::try_from_bytes(bytes, Encoding::Utf8)

Since UTF-8 isn't a 100% obvious default (Godot itself uses a mix of Latin-1 and UTF-32), I'd rather make this explicit.

lilizoey · 2025-03-03T17:43:08Z

godot-core/src/builtin/string/string_name.rs

+            // Note: CStr guarantees no intermediate NUL bytes, so we don't need to check for them.
+
+            let is_static = sys::conv::SYS_FALSE;
+            let s = unsafe {


missing safety comment.

Many such FFI accesses have nothing:

gdext/godot-core/src/builtin/string/gstring.rs

Lines 258 to 260 in d1a25b3

unsafe {

Self::new_with_string_uninit(|string_ptr| {

let ctor = interface_fn!(string_new_with_utf8_chars_and_len);

gdext/godot-core/src/obj/gd.rs

Line 146 in d1a25b3

unsafe { Gd::from_obj_sys(object_ptr) }

gdext/godot-core/src/builtin/callable.rs

Lines 57 to 59 in d1a25b3

unsafe {

Self::new_with_uninit(|self_ptr| {

let ctor = sys::builtin_fn!(callable_from_object_method);

gdext/godot-core/src/builtin/signal.rs

Lines 48 to 51 in d1a25b3

unsafe {

Self::new_with_uninit(|self_ptr| {

let ctor = sys::builtin_fn!(signal_from_object_signal);

let raw = object.to_ffi();

gdext/godot-core/src/builtin/variant/mod.rs

Lines 179 to 182 in d1a25b3

let result = unsafe {

Variant::new_with_var_uninit(|variant_ptr| {

interface_fn!(variant_call)(

sys::SysPtr::force_mut(self.var_sys()),

Some have a comment that's not descriptive...

gdext/godot-core/src/builtin/callable.rs

Lines 232 to 234 in d1a25b3

// SAFETY: callable_custom_create() is a valid way of creating callables.

unsafe {

Callable::new_with_uninit(|type_ptr| {

...or only mentions one particular behavior, ignoring the wider safety:

gdext/godot-core/src/builtin/string/gstring.rs

Lines 273 to 275 in d1a25b3

// SAFETY: A `char` value is by definition a valid Unicode code point.

unsafe {

Self::new_with_string_uninit(|string_ptr| {

gdext/godot-core/src/builtin/variant/mod.rs

Lines 155 to 156 in d1a25b3

// SAFETY: safe to call for non-object variants (returns 0).

let raw_id: u64 = unsafe { interface_fn!(variant_get_object_instance_id)(self.var_sys()) };

gdext/godot-core/src/classes/class_runtime.rs

Lines 55 to 56 in d1a25b3

// SAFETY: Godot looks up ID in ObjectDB and returns null if not found.

unsafe { sys::interface_fn!(object_get_instance_from_id)(instance_id.to_u64()) }

(Yes, I wrote many of those 😅 )

When calling low-level Godot functions, there's often not much that can be said beyond "passed pointer is valid", "Godot function pointer is initialized" or other repetitive points that hold for all low-level constructors.

Lints that enforce safety comments are deliberately not turned on, as this would often lead to bureaucracy and likely decreased doc quality. Being more selective gives those unsafe statements with long safety comments more attention. This goes a bit into the topic of #1046.

That's at least how I see it 🙂

i dont think i entirely agree. i do agree that just saying "godot function pointer is initialized" isn't really needed at the moment, that's something which falls under #1046 and so is already technically unsound anyway. but the other parts, mainly explaining what kind of data is expected and why it's valid to call the function with this data. that part i think is still useful.

like here and the other place in this PR, you rely on a cstr/byte-array without null always being a valid latin1 encoded string. which isn't necessarily immediately obvious.

How would you write the safety statement in this case?

honestly i'd probably just make the Note above into the safety comment, and reword it slightly. so something like
// SAFETY: CStr guarantees no intermediate NUL bytes, and so is automatically a valid latin1 encoded string.

Edit: sorry I missed your intermediate response, took me a bit longer to formulate.

like here and the other place in this PR, you rely on a cstr/byte-array without null always being a valid latin1 encoded string. which isn't necessarily immediately obvious.

I agree with focusing on non-obvious cases, but here is a good example why a safety comment can be worse than none 🙂 the fact that it's valid Latin-1 isn't relevant for safety.

It's only relevant for correctness -- and that part is explained in the RustDoc already:

Latin-1: Since every byte is a valid Latin-1 character, no validation besides the NUL byte is performed.
It is your responsibility to ensure that the input is valid Latin-1.

And "valid" here means "correctly encoded text". If you pass UTF-8, you'll get garbage, but no UB.

What makes the GDExtension call actually sound is the validity of the pointers and length. Since this is true for every pointer-accepting FFI call in existence, I don't see the value of repeating it every time.

What makes the GDExtension call actually sound is the validity of the pointers and length. Since this is true for every pointer-accepting FFI call in existence, I don't see the value of repeating it every time.

no? what godot decides to do with the data behind the pointers you pass it is important for safety. so when making an ffi call into godot you should (in the safety comment) explain why the data you are passing to godot wont cause UB when godot uses it. and in this case, that involves explaining that godot will treat this as a latin1 encoded string.

so the fact that we are passing godot a latin1 encoded string, and that godot will treat it as a latin1 encoded string, that is what makes the call safe (in addition to the pointer validity and length).

my safety comment may not express that as clearly, the previous paragraph is left kinda implied. my assumption was that it was already understood that the safety comment before an ffi-call explains what godot does with the input and why this input wont be UB in that case. a pointer being valid for reads only explains that godot will access the value, not what godot will subsequently do with the value.

no? what godot decides to do with the data behind the pointers you pass it is important for safety. so when making an ffi call into godot you should (in the safety comment) explain why the data you are passing to godot wont cause UB when godot uses it.

I don't know the implementation. I know the signature, and from that I can infer that one pointer is the to-be-constructed string instance, and that the other (ptr, len) pair points to an allocated byte range representing the characters. I need to rely on Godot doing the expected thing.

If I didn't, I'd need to study every single implementation. We don't do this for all the examples I showed above, let alone the thousands of individual class methods. We rely on the signature from Godot being a contract. I don't see why this here is different from my examples or class methods.

Latin-1 is only relevant to mention if you can cause UB when not passing in Latin-1.

lilizoey · 2025-03-03T17:43:53Z

godot-core/src/builtin/string/gstring.rs

+                    return Err(StringError::new("intermediate NUL byte in Latin-1 string"));
+                }
+
+                let s = unsafe {


missing safety comment

Bromeon added the feature label Mar 2, 2025

Bromeon added this to the 0.3 milestone Mar 2, 2025

Bromeon force-pushed the feature/strings-from-cstr branch from 2e3a120 to 7a4868e Compare March 2, 2025 12:10

Bromeon added the c: core label Mar 2, 2025

Bromeon force-pushed the feature/strings-from-cstr branch 4 times, most recently from 6e12575 to 6872653 Compare March 2, 2025 15:17

Bromeon changed the title ~~GString: add try_from_bytes(), try_from_cstr(), Encoding~~ GString, StringName: add try_from_bytes() + try_from_cstr() + Encoding Mar 2, 2025

Bromeon changed the title ~~GString, StringName: add try_from_bytes() + try_from_cstr() + Encoding~~ GString, StringName: add conversions from bytes and C-strings Mar 2, 2025

h0lybyte mentioned this pull request Mar 3, 2025

[Concept] : [Q] Crate : Godot Extension KBVE/kbve#4145

Open

lilizoey reviewed Mar 3, 2025

View reviewed changes

GString, StringName: add try_from_bytes(), try_from_cstr(), Encoding

Loading
Loading status checks…

db41515

Bromeon force-pushed the feature/strings-from-cstr branch from 37bb78e to db41515 Compare March 4, 2025 20:31

Bromeon enabled auto-merge March 4, 2025 20:31

Bromeon added this pull request to the merge queue Mar 4, 2025

Merged via the queue into master with commit da23b37 Mar 4, 2025
16 checks passed

Bromeon deleted the feature/strings-from-cstr branch March 4, 2025 20:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`GString`, `StringName`: add conversions from bytes and C-strings #1062

`GString`, `StringName`: add conversions from bytes and C-strings #1062

Bromeon commented Mar 2, 2025 •

edited

Loading

GodotRust commented Mar 2, 2025

lilizoey Mar 3, 2025

Bromeon Mar 3, 2025

lilizoey Mar 3, 2025

Bromeon Mar 3, 2025

lilizoey Mar 3, 2025

Bromeon Mar 3, 2025

lilizoey Mar 3, 2025

Bromeon Mar 3, 2025

lilizoey Mar 3, 2025 •

edited

Loading

Bromeon Mar 3, 2025 •

edited

Loading

lilizoey Mar 4, 2025

Bromeon Mar 4, 2025 •

edited

Loading

lilizoey Mar 3, 2025

	unsafe {
	Self::new_with_string_uninit(\|string_ptr\| {
	let ctor = interface_fn!(string_new_with_utf8_chars_and_len);

	unsafe {
	Self::new_with_uninit(\|self_ptr\| {
	let ctor = sys::builtin_fn!(callable_from_object_method);

	unsafe {
	Self::new_with_uninit(\|self_ptr\| {
	let ctor = sys::builtin_fn!(signal_from_object_signal);
	let raw = object.to_ffi();

	let result = unsafe {
	Variant::new_with_var_uninit(\|variant_ptr\| {
	interface_fn!(variant_call)(
	sys::SysPtr::force_mut(self.var_sys()),

	// SAFETY: callable_custom_create() is a valid way of creating callables.
	unsafe {
	Callable::new_with_uninit(\|type_ptr\| {

	// SAFETY: A `char` value is by definition a valid Unicode code point.
	unsafe {
	Self::new_with_string_uninit(\|string_ptr\| {

	// SAFETY: safe to call for non-object variants (returns 0).
	let raw_id: u64 = unsafe { interface_fn!(variant_get_object_instance_id)(self.var_sys()) };

	// SAFETY: Godot looks up ID in ObjectDB and returns null if not found.
	unsafe { sys::interface_fn!(object_get_instance_from_id)(instance_id.to_u64()) }

GString, StringName: add conversions from bytes and C-strings #1062

GString, StringName: add conversions from bytes and C-strings #1062

Conversation

Bromeon commented Mar 2, 2025 • edited Loading

GodotRust commented Mar 2, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lilizoey Mar 3, 2025 • edited Loading

Choose a reason for hiding this comment

Bromeon Mar 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Bromeon Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

`GString`, `StringName`: add conversions from bytes and C-strings #1062

`GString`, `StringName`: add conversions from bytes and C-strings #1062

Bromeon commented Mar 2, 2025 •

edited

Loading

lilizoey Mar 3, 2025 •

edited

Loading

Bromeon Mar 3, 2025 •

edited

Loading

Bromeon Mar 4, 2025 •

edited

Loading