-
Notifications
You must be signed in to change notification settings - Fork 14k
Description
Following up on #19194 and discussion with @aturon, I took a look at how things in the std::ascii module are used in the Rust repository and in Servo.
The std::ascii::Ascii type is a newtype of u8 that enforces (unless unsafe code is used) that the value is in the ASCII range, similar to char with u32 and the range of Unicode scalar values. [Ascii] is naturally a string of bytes entirely in the ASCII range.
Using the type system like this to enforce data invariants is interesting, but in practice [Ascii] is not that useful. Data (such as from the network) is rarely guaranteed to be ASCII only nor is it desirable to remove or replace non-ASCII bytes, even if ASCII-range-only operations are used. (E.g. “ASCII case-insensitivity” is common in HTML and CSS.)
Every single use of the Ascii type that I’ve found was only to use the to_lowercase or to_uppercase method, then immediately convert back to u8 or char.
Therefore, I suggest:
- Moving the
Asciitype as well as theAsciiCast,OwnedAsciiCast,AsciiStr, andIntoBytestraits into a newasciiCargo package on crates.io - Marking them as deprecated in
std::ascii, and removing them at some point before 1.0 - Reworking the rest of the module to provide the functionality on
u8,char,[u8]andstr. Specifically:- Keep the
AsciiExtandOwnedAsciiExttraits. (Maybe rename them?) - Implement
AsciiExtoncharandu8(in addition to the existing impls forstrand[u8]) - Add
is_ascii() -> bool. Maybe onAsciiExt? It’s mostly used onu8andchar, but it also makes sense onstrand[u8]. - Maybe
is_ascii_lowercase,is_ascii_uppercase,is_ascii_alphabetic, oris_ascii_alphanumericcould be useful, but I’d be fine with dropping them and reconsider if someone asks for them. The same result can be achieved with.is_ascii() &&and the correspondingUnicodeCharmethod, which in most cases has an ASCII fast path. - I don’t think the remaining
Asciimethods are valuable.is_digitandis_hexare identical toChar::is_digit(10)andChar::is_digit(16).is_blank,is_control,is_graph,is_print, andis_punctuationare never used.
- Keep the
How does this sound? I can help with the implementation work. Should this go through the RFC process?