Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sensibly handle invalid strings across FFI #2534

Merged
merged 16 commits into from
Sep 9, 2022

Conversation

Manishearth
Copy link
Member

This implements the 1.0 plan laid out in #2520 (comment) for types that exist so far. The only thing missing is Normalizer, which doesn't have FFI yet (I'm working on that).

cc @hsivonen

@@ -609,20 +609,25 @@ impl AnyCalendarKind {
///
/// Returns None if the calendar is unknown
pub fn from_bcp47_string(x: &str) -> Option<Self> {
Self::from_bcp47_bytes(x.as_bytes())
}
/// Construct from a BCP-47 byte string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and everywhere: Clarify the expected encoding, i.e. not UTF-16.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think our plan is to expect std::string_view is always UTF8 (though sometimes allowed to be ill-formed).

I don't want to doc the encoding that way because in JS there's no choice but using UTF16.

if input.is_empty() {
return Err(OperandsError::Empty);
}

let abs_str = input.strip_prefix('-').unwrap_or(input);
let abs_str = input.strip_prefix(&[b'-'; 1]).unwrap_or(input);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: b"-"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to hit the fixed-size array codepath in a guaranteed way, though inlining probably means it doesn't matter

@hsivonen
Copy link
Member

hsivonen commented Sep 9, 2022

LGTM, but noting @sffc's previous terminology preference of well-formed/ill-formed over valid/invalid and various docs and identifiers in this PR using the latter at present.

Thank you.

@Manishearth
Copy link
Member Author

Good point, yeah. I couldn't come up with what to name things, I'll let shane come up with a naming suggestion and uniformly appyl that everywhere.

@sffc
Copy link
Member

sffc commented Sep 9, 2022

"potentially ill-formed UTF-8" is the most precise phrase to describe what we're working with based on my understanding from discussions with @markusicu.

@Manishearth
Copy link
Member Author

I've done some renaming and redoccing

@Manishearth Manishearth closed this Sep 9, 2022
@Manishearth Manishearth reopened this Sep 9, 2022
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First comments; still working

components/plurals/src/lib.rs Show resolved Hide resolved
components/plurals/src/operands.rs Outdated Show resolved Hide resolved
experimental/segmenter/src/grapheme.rs Outdated Show resolved Hide resolved
experimental/segmenter/src/line.rs Outdated Show resolved Hide resolved
@Manishearth Manishearth requested a review from sffc September 9, 2022 21:57
ffi/diplomat/src/collator.rs Outdated Show resolved Hide resolved
ffi/diplomat/src/pluralrules.rs Outdated Show resolved Hide resolved
ffi/diplomat/src/provider.rs Show resolved Hide resolved
ffi/diplomat/src/timezone.rs Show resolved Hide resolved
@Manishearth Manishearth requested a review from sffc September 9, 2022 23:19
@Manishearth Manishearth merged commit fe33171 into unicode-org:main Sep 9, 2022
@Manishearth Manishearth deleted the string-ffi branch September 9, 2022 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants