Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GString, StringName: add conversions from bytes and C-strings #1062

Merged
merged 1 commit into from
Mar 4, 2025

Conversation

Bromeon
Copy link
Member

@Bromeon Bromeon commented Mar 2, 2025

Allows conversion to:

  • GString
  • StringName

from:

  • &CStr
  • &[u8]

with following encodings:

  • ASCII
  • Latin-1
  • UTF-8

The encoding is validated and results in a newly added StringError on failure.

This PR also adds an Encoding enum used as a parameter to the conversion constructors.

@Bromeon Bromeon added the feature Adds functionality to the library label Mar 2, 2025
@Bromeon Bromeon added this to the 0.3 milestone Mar 2, 2025
@GodotRust
Copy link

API docs are being generated and will be shortly available at: https://godot-rust.github.io/docs/gdext/pr-1062

@Bromeon Bromeon force-pushed the feature/strings-from-cstr branch from 2e3a120 to 7a4868e Compare March 2, 2025 12:10
@Bromeon Bromeon added the c: core Core components label Mar 2, 2025
@Bromeon Bromeon force-pushed the feature/strings-from-cstr branch 4 times, most recently from 6e12575 to 6872653 Compare March 2, 2025 15:17
@Bromeon Bromeon changed the title GString: add try_from_bytes(), try_from_cstr(), Encoding GString, StringName: add try_from_bytes() + try_from_cstr() + Encoding Mar 2, 2025
@Bromeon Bromeon changed the title GString, StringName: add try_from_bytes() + try_from_cstr() + Encoding GString, StringName: add conversions from bytes and C-strings Mar 2, 2025
Comment on lines 136 to 138
pub fn try_from_cstr(cstr: &std::ffi::CStr, encoding: Encoding) -> Result<Self, StringError> {
Self::try_from_bytes(cstr.to_bytes(), encoding)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try_from_bytes checks for null bytes, but c-strings can't contain null bytes. idk if this is super important for performance, but may at least be worth mentioning in a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call. I added an intermediate method taking a bool: 37bb78e

///
/// Used in functions such as [`GString::try_from_bytes()`][GString::try_from_bytes] to handle multiple input string encodings.
#[non_exhaustive]
#[derive(Copy, Clone, Eq, PartialEq, Debug)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe include Default? utf8 would make sense to me as it's the default in rust elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would lead to code such as:

GString::try_from_bytes(bytes, Default::default())

which I find much less readable than:

GString::try_from_bytes(bytes, Encoding::Utf8)

Since UTF-8 isn't a 100% obvious default (Godot itself uses a mix of Latin-1 and UTF-32), I'd rather make this explicit.

// Note: CStr guarantees no intermediate NUL bytes, so we don't need to check for them.

let is_static = sys::conv::SYS_FALSE;
let s = unsafe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing safety comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many such FFI accesses have nothing:

unsafe {
Self::new_with_string_uninit(|string_ptr| {
let ctor = interface_fn!(string_new_with_utf8_chars_and_len);

unsafe { Gd::from_obj_sys(object_ptr) }

unsafe {
Self::new_with_uninit(|self_ptr| {
let ctor = sys::builtin_fn!(callable_from_object_method);

unsafe {
Self::new_with_uninit(|self_ptr| {
let ctor = sys::builtin_fn!(signal_from_object_signal);
let raw = object.to_ffi();

let result = unsafe {
Variant::new_with_var_uninit(|variant_ptr| {
interface_fn!(variant_call)(
sys::SysPtr::force_mut(self.var_sys()),


Some have a comment that's not descriptive...

// SAFETY: callable_custom_create() is a valid way of creating callables.
unsafe {
Callable::new_with_uninit(|type_ptr| {

...or only mentions one particular behavior, ignoring the wider safety:

// SAFETY: A `char` value is by definition a valid Unicode code point.
unsafe {
Self::new_with_string_uninit(|string_ptr| {

// SAFETY: safe to call for non-object variants (returns 0).
let raw_id: u64 = unsafe { interface_fn!(variant_get_object_instance_id)(self.var_sys()) };

// SAFETY: Godot looks up ID in ObjectDB and returns null if not found.
unsafe { sys::interface_fn!(object_get_instance_from_id)(instance_id.to_u64()) }

(Yes, I wrote many of those 😅 )


When calling low-level Godot functions, there's often not much that can be said beyond "passed pointer is valid", "Godot function pointer is initialized" or other repetitive points that hold for all low-level constructors.

Lints that enforce safety comments are deliberately not turned on, as this would often lead to bureaucracy and likely decreased doc quality. Being more selective gives those unsafe statements with long safety comments more attention. This goes a bit into the topic of #1046.

That's at least how I see it 🙂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think i entirely agree. i do agree that just saying "godot function pointer is initialized" isn't really needed at the moment, that's something which falls under #1046 and so is already technically unsound anyway. but the other parts, mainly explaining what kind of data is expected and why it's valid to call the function with this data. that part i think is still useful.

like here and the other place in this PR, you rely on a cstr/byte-array without null always being a valid latin1 encoded string. which isn't necessarily immediately obvious.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you write the safety statement in this case?

Copy link
Member

@lilizoey lilizoey Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

honestly i'd probably just make the Note above into the safety comment, and reword it slightly. so something like
// SAFETY: CStr guarantees no intermediate NUL bytes, and so is automatically a valid latin1 encoded string.

Copy link
Member Author

@Bromeon Bromeon Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: sorry I missed your intermediate response, took me a bit longer to formulate.

like here and the other place in this PR, you rely on a cstr/byte-array without null always being a valid latin1 encoded string. which isn't necessarily immediately obvious.

I agree with focusing on non-obvious cases, but here is a good example why a safety comment can be worse than none 🙂 the fact that it's valid Latin-1 isn't relevant for safety.

It's only relevant for correctness -- and that part is explained in the RustDoc already:

  • Latin-1: Since every byte is a valid Latin-1 character, no validation besides the NUL byte is performed.
    It is your responsibility to ensure that the input is valid Latin-1.

And "valid" here means "correctly encoded text". If you pass UTF-8, you'll get garbage, but no UB.

What makes the GDExtension call actually sound is the validity of the pointers and length. Since this is true for every pointer-accepting FFI call in existence, I don't see the value of repeating it every time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes the GDExtension call actually sound is the validity of the pointers and length. Since this is true for every pointer-accepting FFI call in existence, I don't see the value of repeating it every time.

no? what godot decides to do with the data behind the pointers you pass it is important for safety. so when making an ffi call into godot you should (in the safety comment) explain why the data you are passing to godot wont cause UB when godot uses it. and in this case, that involves explaining that godot will treat this as a latin1 encoded string.

so the fact that we are passing godot a latin1 encoded string, and that godot will treat it as a latin1 encoded string, that is what makes the call safe (in addition to the pointer validity and length).

my safety comment may not express that as clearly, the previous paragraph is left kinda implied. my assumption was that it was already understood that the safety comment before an ffi-call explains what godot does with the input and why this input wont be UB in that case. a pointer being valid for reads only explains that godot will access the value, not what godot will subsequently do with the value.

Copy link
Member Author

@Bromeon Bromeon Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no? what godot decides to do with the data behind the pointers you pass it is important for safety. so when making an ffi call into godot you should (in the safety comment) explain why the data you are passing to godot wont cause UB when godot uses it.

I don't know the implementation. I know the signature, and from that I can infer that one pointer is the to-be-constructed string instance, and that the other (ptr, len) pair points to an allocated byte range representing the characters. I need to rely on Godot doing the expected thing.

If I didn't, I'd need to study every single implementation. We don't do this for all the examples I showed above, let alone the thousands of individual class methods. We rely on the signature from Godot being a contract. I don't see why this here is different from my examples or class methods.

Latin-1 is only relevant to mention if you can cause UB when not passing in Latin-1.

return Err(StringError::new("intermediate NUL byte in Latin-1 string"));
}

let s = unsafe {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing safety comment

@Bromeon Bromeon force-pushed the feature/strings-from-cstr branch from 37bb78e to db41515 Compare March 4, 2025 20:31
@Bromeon Bromeon enabled auto-merge March 4, 2025 20:31
@Bromeon Bromeon added this pull request to the merge queue Mar 4, 2025
Merged via the queue into master with commit da23b37 Mar 4, 2025
16 checks passed
@Bromeon Bromeon deleted the feature/strings-from-cstr branch March 4, 2025 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: core Core components feature Adds functionality to the library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants