Sensibly handle invalid strings across FFI #2534

Manishearth · 2022-09-08T22:31:44Z

This implements the 1.0 plan laid out in #2520 (comment) for types that exist so far. The only thing missing is Normalizer, which doesn't have FFI yet (I'm working on that).

cc @hsivonen

robertbastian · 2022-09-09T08:20:04Z

components/calendar/src/any_calendar.rs

@@ -609,20 +609,25 @@ impl AnyCalendarKind {
    ///
    /// Returns None if the calendar is unknown
    pub fn from_bcp47_string(x: &str) -> Option<Self> {
+        Self::from_bcp47_bytes(x.as_bytes())
+    }
+    /// Construct from a BCP-47 byte string


Here and everywhere: Clarify the expected encoding, i.e. not UTF-16.

I think our plan is to expect std::string_view is always UTF8 (though sometimes allowed to be ill-formed).

I don't want to doc the encoding that way because in JS there's no choice but using UTF16.

robertbastian · 2022-09-09T08:25:19Z

components/plurals/src/operands.rs

        if input.is_empty() {
            return Err(OperandsError::Empty);
        }

-        let abs_str = input.strip_prefix('-').unwrap_or(input);
+        let abs_str = input.strip_prefix(&[b'-'; 1]).unwrap_or(input);


Nit: b"-"

I wanted to hit the fixed-size array codepath in a guaranteed way, though inlining probably means it doesn't matter

hsivonen · 2022-09-09T09:17:02Z

LGTM, but noting @sffc's previous terminology preference of well-formed/ill-formed over valid/invalid and various docs and identifiers in this PR using the latter at present.

Thank you.

Manishearth · 2022-09-09T16:38:44Z

Good point, yeah. I couldn't come up with what to name things, I'll let shane come up with a naming suggestion and uniformly appyl that everywhere.

experimental/segmenter/src/indices.rs

sffc · 2022-09-09T17:03:19Z

"potentially ill-formed UTF-8" is the most precise phrase to describe what we're working with based on my understanding from discussions with @markusicu.

Manishearth · 2022-09-09T17:06:48Z

I've done some renaming and redoccing

sffc

First comments; still working

components/plurals/src/lib.rs

components/plurals/src/operands.rs

experimental/segmenter/src/grapheme.rs

experimental/segmenter/src/line.rs

ffi/diplomat/src/collator.rs

ffi/diplomat/src/pluralrules.rs

ffi/diplomat/src/provider.rs

ffi/diplomat/src/timezone.rs

Manishearth added 6 commits September 8, 2022 15:27

Move identifier/basic parsing FFI APIs over to using bytes

9ea2da1

Fix collator APIs

a5a165e

Add PotentiallyInvalidUtf8Indices

3b89c3e

Refactor handle_complex_language to be shareable

94a327f

use in segmenter APIs

5072fcc

Update segmenter ffi

0edb205

Manishearth requested review from sffc, aethanyc, makotokato, zbraniecki and a team as code owners September 8, 2022 22:31

Manishearth removed request for a team, aethanyc, zbraniecki and makotokato September 8, 2022 22:33

fix

94c9d52

Manishearth mentioned this pull request Sep 8, 2022

Fix Segmenter's missing FFI #2533

Merged

robertbastian reviewed Sep 9, 2022

View reviewed changes

Manishearth commented Sep 9, 2022

View reviewed changes

experimental/segmenter/src/indices.rs Outdated Show resolved Hide resolved

Manishearth added 3 commits September 9, 2022 09:43

Use Utf8CharIndices

a8ad31a

rename

9aa9127

ill-formed

2a4cabf

Manishearth closed this Sep 9, 2022

Manishearth reopened this Sep 9, 2022

Manishearth added 2 commits September 9, 2022 10:09

Merge remote-tracking branch 'origin/main' into string-ffi

fbf1f43

fmt

df91bd0

sffc reviewed Sep 9, 2022

View reviewed changes

components/plurals/src/lib.rs Show resolved Hide resolved

components/plurals/src/operands.rs Outdated Show resolved Hide resolved

experimental/segmenter/src/grapheme.rs Outdated Show resolved Hide resolved

experimental/segmenter/src/line.rs Outdated Show resolved Hide resolved

Manishearth added 3 commits September 9, 2022 14:56

review-cmt

08ae7ba

rename PotentiallyInvalid

ae1ed14

Undo plurals

ab8d3f3

Manishearth requested a review from sffc September 9, 2022 21:57

sffc requested changes Sep 9, 2022

View reviewed changes

ffi/diplomat/src/collator.rs Outdated Show resolved Hide resolved

ffi/diplomat/src/pluralrules.rs Outdated Show resolved Hide resolved

ffi/diplomat/src/provider.rs Show resolved Hide resolved

ffi/diplomat/src/timezone.rs Show resolved Hide resolved

fixes

dbd86e8

Manishearth requested a review from sffc September 9, 2022 23:19

sffc approved these changes Sep 9, 2022

View reviewed changes

Manishearth merged commit fe33171 into unicode-org:main Sep 9, 2022

Manishearth deleted the string-ffi branch September 9, 2022 23:21

Manishearth mentioned this pull request Sep 9, 2022

Consistently deal with string encodings over FFI #2520

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sensibly handle invalid strings across FFI #2534

Sensibly handle invalid strings across FFI #2534

Manishearth commented Sep 8, 2022

robertbastian Sep 9, 2022

Manishearth Sep 9, 2022

robertbastian Sep 9, 2022

Manishearth Sep 9, 2022

hsivonen commented Sep 9, 2022

Manishearth commented Sep 9, 2022

sffc commented Sep 9, 2022 •

edited

Loading

Manishearth commented Sep 9, 2022

sffc left a comment

Sensibly handle invalid strings across FFI #2534

Sensibly handle invalid strings across FFI #2534

Conversation

Manishearth commented Sep 8, 2022

robertbastian Sep 9, 2022

Choose a reason for hiding this comment

Manishearth Sep 9, 2022

Choose a reason for hiding this comment

robertbastian Sep 9, 2022

Choose a reason for hiding this comment

Manishearth Sep 9, 2022

Choose a reason for hiding this comment

hsivonen commented Sep 9, 2022

Manishearth commented Sep 9, 2022

sffc commented Sep 9, 2022 • edited Loading

Manishearth commented Sep 9, 2022

sffc left a comment

Choose a reason for hiding this comment

sffc commented Sep 9, 2022 •

edited

Loading