We should have APIs that accept potentially-invalid UTF8 #135

Manishearth · 2024-04-30T15:51:31Z

Our UTF16 API is able to handle invalid UTF16 by pretending unpaired surrogates are U+FFFD REPLACEMENT CHARACTERs.

It would be nice to be able to do the same for the UTF8 API.

We could implement TextSource for [u8] and have a whole other set of BidiInfo copies for unvalidated UTF8.

We could also have .char_indices() bail on the first non-UTF8, and then use the same BidiInfo with a separate constructor that is documented to accept invalid UTF8 and truncate the returned levels based on that.

The text was updated successfully, but these errors were encountered:

Manishearth · 2024-04-30T16:46:25Z

cc @robertbastian

I'm inclined to do the "bail on first non-UTF8" thing for now since it's a smaller change.

If we ever 2.0, we should make this code generic over encodings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

We should have APIs that accept potentially-invalid UTF8 #135

We should have APIs that accept potentially-invalid UTF8 #135

Manishearth commented Apr 30, 2024

Manishearth commented Apr 30, 2024

We should have APIs that accept potentially-invalid UTF8 #135

We should have APIs that accept potentially-invalid UTF8 #135

Comments

Manishearth commented Apr 30, 2024

Manishearth commented Apr 30, 2024