-
-
Notifications
You must be signed in to change notification settings - Fork 221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GString
, StringName
: add conversions from bytes and C-strings
#1062
Conversation
API docs are being generated and will be shortly available at: https://godot-rust.github.io/docs/gdext/pr-1062 |
2e3a120
to
7a4868e
Compare
6e12575
to
6872653
Compare
GString
: add try_from_bytes()
, try_from_cstr()
, Encoding
GString
, StringName
: add try_from_bytes()
+ try_from_cstr()
+ Encoding
GString
, StringName
: add try_from_bytes()
+ try_from_cstr()
+ Encoding
GString
, StringName
: add conversions from bytes and C-strings
pub fn try_from_cstr(cstr: &std::ffi::CStr, encoding: Encoding) -> Result<Self, StringError> { | ||
Self::try_from_bytes(cstr.to_bytes(), encoding) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try_from_bytes
checks for null bytes, but c-strings can't contain null bytes. idk if this is super important for performance, but may at least be worth mentioning in a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. I added an intermediate method taking a bool: 37bb78e
/// | ||
/// Used in functions such as [`GString::try_from_bytes()`][GString::try_from_bytes] to handle multiple input string encodings. | ||
#[non_exhaustive] | ||
#[derive(Copy, Clone, Eq, PartialEq, Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe include Default
? utf8
would make sense to me as it's the default in rust elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would lead to code such as:
GString::try_from_bytes(bytes, Default::default())
which I find much less readable than:
GString::try_from_bytes(bytes, Encoding::Utf8)
Since UTF-8 isn't a 100% obvious default (Godot itself uses a mix of Latin-1 and UTF-32), I'd rather make this explicit.
// Note: CStr guarantees no intermediate NUL bytes, so we don't need to check for them. | ||
|
||
let is_static = sys::conv::SYS_FALSE; | ||
let s = unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing safety comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many such FFI accesses have nothing:
gdext/godot-core/src/builtin/string/gstring.rs
Lines 258 to 260 in d1a25b3
unsafe { | |
Self::new_with_string_uninit(|string_ptr| { | |
let ctor = interface_fn!(string_new_with_utf8_chars_and_len); |
gdext/godot-core/src/obj/gd.rs
Line 146 in d1a25b3
unsafe { Gd::from_obj_sys(object_ptr) } |
gdext/godot-core/src/builtin/callable.rs
Lines 57 to 59 in d1a25b3
unsafe { | |
Self::new_with_uninit(|self_ptr| { | |
let ctor = sys::builtin_fn!(callable_from_object_method); |
gdext/godot-core/src/builtin/signal.rs
Lines 48 to 51 in d1a25b3
unsafe { | |
Self::new_with_uninit(|self_ptr| { | |
let ctor = sys::builtin_fn!(signal_from_object_signal); | |
let raw = object.to_ffi(); |
gdext/godot-core/src/builtin/variant/mod.rs
Lines 179 to 182 in d1a25b3
let result = unsafe { | |
Variant::new_with_var_uninit(|variant_ptr| { | |
interface_fn!(variant_call)( | |
sys::SysPtr::force_mut(self.var_sys()), |
Some have a comment that's not descriptive...
gdext/godot-core/src/builtin/callable.rs
Lines 232 to 234 in d1a25b3
// SAFETY: callable_custom_create() is a valid way of creating callables. | |
unsafe { | |
Callable::new_with_uninit(|type_ptr| { |
...or only mentions one particular behavior, ignoring the wider safety:
gdext/godot-core/src/builtin/string/gstring.rs
Lines 273 to 275 in d1a25b3
// SAFETY: A `char` value is by definition a valid Unicode code point. | |
unsafe { | |
Self::new_with_string_uninit(|string_ptr| { |
gdext/godot-core/src/builtin/variant/mod.rs
Lines 155 to 156 in d1a25b3
// SAFETY: safe to call for non-object variants (returns 0). | |
let raw_id: u64 = unsafe { interface_fn!(variant_get_object_instance_id)(self.var_sys()) }; |
gdext/godot-core/src/classes/class_runtime.rs
Lines 55 to 56 in d1a25b3
// SAFETY: Godot looks up ID in ObjectDB and returns null if not found. | |
unsafe { sys::interface_fn!(object_get_instance_from_id)(instance_id.to_u64()) } |
(Yes, I wrote many of those 😅 )
When calling low-level Godot functions, there's often not much that can be said beyond "passed pointer is valid", "Godot function pointer is initialized" or other repetitive points that hold for all low-level constructors.
Lints that enforce safety comments are deliberately not turned on, as this would often lead to bureaucracy and likely decreased doc quality. Being more selective gives those unsafe
statements with long safety comments more attention. This goes a bit into the topic of #1046.
That's at least how I see it 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont think i entirely agree. i do agree that just saying "godot function pointer is initialized" isn't really needed at the moment, that's something which falls under #1046 and so is already technically unsound anyway. but the other parts, mainly explaining what kind of data is expected and why it's valid to call the function with this data. that part i think is still useful.
like here and the other place in this PR, you rely on a cstr
/byte-array without null always being a valid latin1
encoded string. which isn't necessarily immediately obvious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you write the safety statement in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
honestly i'd probably just make the Note
above into the safety comment, and reword it slightly. so something like
// SAFETY: CStr guarantees no intermediate NUL bytes, and so is automatically a valid latin1 encoded string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: sorry I missed your intermediate response, took me a bit longer to formulate.
like here and the other place in this PR, you rely on a
cstr
/byte-array without null always being a validlatin1
encoded string. which isn't necessarily immediately obvious.
I agree with focusing on non-obvious cases, but here is a good example why a safety comment can be worse than none 🙂 the fact that it's valid Latin-1 isn't relevant for safety.
It's only relevant for correctness -- and that part is explained in the RustDoc already:
- Latin-1: Since every byte is a valid Latin-1 character, no validation besides the
NUL
byte is performed.
It is your responsibility to ensure that the input is valid Latin-1.
And "valid" here means "correctly encoded text". If you pass UTF-8, you'll get garbage, but no UB.
What makes the GDExtension call actually sound is the validity of the pointers and length. Since this is true for every pointer-accepting FFI call in existence, I don't see the value of repeating it every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes the GDExtension call actually sound is the validity of the pointers and length. Since this is true for every pointer-accepting FFI call in existence, I don't see the value of repeating it every time.
no? what godot decides to do with the data behind the pointers you pass it is important for safety. so when making an ffi call into godot you should (in the safety comment) explain why the data you are passing to godot wont cause UB when godot uses it. and in this case, that involves explaining that godot will treat this as a latin1 encoded string.
so the fact that we are passing godot a latin1 encoded string, and that godot will treat it as a latin1 encoded string, that is what makes the call safe (in addition to the pointer validity and length).
my safety comment may not express that as clearly, the previous paragraph is left kinda implied. my assumption was that it was already understood that the safety comment before an ffi-call explains what godot does with the input and why this input wont be UB in that case. a pointer being valid for reads only explains that godot will access the value, not what godot will subsequently do with the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no? what godot decides to do with the data behind the pointers you pass it is important for safety. so when making an ffi call into godot you should (in the safety comment) explain why the data you are passing to godot wont cause UB when godot uses it.
I don't know the implementation. I know the signature, and from that I can infer that one pointer is the to-be-constructed string instance, and that the other (ptr, len) pair points to an allocated byte range representing the characters. I need to rely on Godot doing the expected thing.
If I didn't, I'd need to study every single implementation. We don't do this for all the examples I showed above, let alone the thousands of individual class methods. We rely on the signature from Godot being a contract. I don't see why this here is different from my examples or class methods.
Latin-1 is only relevant to mention if you can cause UB when not passing in Latin-1.
return Err(StringError::new("intermediate NUL byte in Latin-1 string")); | ||
} | ||
|
||
let s = unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing safety comment
37bb78e
to
db41515
Compare
Allows conversion to:
GString
StringName
from:
&CStr
&[u8]
with following encodings:
The encoding is validated and results in a newly added
StringError
on failure.This PR also adds an
Encoding
enum used as a parameter to the conversion constructors.