Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an alternatives to &str for other encodings and potentially malformed strings #57

Closed
shadaj opened this issue Aug 4, 2021 · 2 comments
Assignees
Labels
enhancement New feature or request safety Uncertain implications on code safety

Comments

@shadaj
Copy link
Member

shadaj commented Aug 4, 2021

We should support UTF-16 strings an also provide a MaybeValidStr for situations where the string may not decode properly. Right now, we just panic if the string is not UTF-8 encoded.

@shadaj shadaj added enhancement New feature or request safety Uncertain implications on code safety labels Aug 4, 2021
@CBenoit
Copy link
Contributor

CBenoit commented Jan 25, 2022

Regarding the FIXME that is linking to this issue:

// TODO(#57): don't just unwrap? or should we assume that the other side gives us a good value?

My take on this is that it's okay to assume the other side is giving a good value and unwrap the result of core::str::from_utf8. If the function is taking a &str we already know UTF-8 is required when generating the code for the other side (using alternative types for UTF-16 and other would follow the same logic).
For the same reason, I also think we could use from_utf8_unchecked at least in release build (using the safe version in debug build could be useful when debugging backend code).

I'm not sure about the usefulness of MaybeValidStr though. What would be the advantage over &[u8] (or other as appropriate)?

@Manishearth
Copy link
Contributor

I think this is probably worth doing at a per-backend level; for example C and C++ can be asked to provide valid utf8, but JS/.NET/etc can enforce it so that we don't have crashes. This means that at the base C layer we always from_utf8_unchecked or from_utf8 + unwrap, but backends for safe languages perform additional checks if necessary. In many cases (e.g. JS) the string will need to be synthesized at the boundary anyway.

I'm not sure about the usefulness of MaybeValidStr though. What would be the advantage over &[u8] (or other as appropriate)?

Probably a cleaner generated API, but unsure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request safety Uncertain implications on code safety
Projects
None yet
Development

No branches or pull requests

4 participants