Implement std::convert traits for char #35755

SimonSapin · 2016-08-17T17:13:57Z

This is motivated by avoiding the as operator, which sometimes silently truncates, and instead use conversions that are explicitly lossless and infallible.

I’m less certain that From<u8> for char should be implemented: while it matches an existing behavior of as, it’s not necessarily the right thing to use for non-ASCII bytes. It effectively decodes bytes as ISO/IEC 8859-1 (since Unicode designed its first 256 code points to be compatible with that encoding), but that is not apparent in the API name.

rust-highfive · 2016-08-17T17:14:05Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @aturon (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

sfackler · 2016-08-17T19:40:31Z

src/libcore/char.rs

+    ///
+    /// Surrogates are used in the UTF-16 encoding, and therefore are not characters.
+    SurrogateCodePoint,
+}


We're never going to need to add any extra cases to this enum, right? Should we stick a __ForExtensibility variant just in case?

sfackler · 2016-08-17T19:42:12Z

I personally feel okay about the u8 -> char impl as it seems like what everyone would expect to happen, but am interested to see what other people think.

cc @rust-lang/libs

alexcrichton · 2016-08-18T02:38:28Z

Seems reasonable to me, but I'd prefer to use an opaque struct with optional method accessors rather than an enum for the error type in TryFrom

SimonSapin · 2016-08-18T06:06:05Z

@sfackler Re extensibility, I don’t expect this to ever be needed. The range of Unicode Scalar Values changed exactly once in the history of Unicode. At first it was “16 bits ought to be enough for everybody” 0x0000...0xFFFF. When that turned out not to be enough and a lot of systems were already using u16 as a code unit, UTF-16 was introduced. The current range, 0x0000...0xD7FF | 0xE000...0x10FFFF, is exactly what can be encoded by UTF-16. (UTF-8’s original design can support up to 0x7FFF_FFFF with 6-bytes sequences. It is now artificially restricted to match.)

In Unicode 9.0, 76% of the million and some code points are unassigned, so they’re not expected to run out. And breaking compatibility with UTF-16 is such a breaking change that I imagine it’s not even considered.

And this concern disappears with…

@alexcrichton Yeah, I also considered an opaque struct. Even without accessor method since I can’t think of a use case for it. (Code like a WTF-8 implementation that wants to deal with surrogate code points will likely do its own code point arithmetic anyway.) And a method cal always be added later. I’ve updated the PR.

alexcrichton · 2016-08-23T05:48:10Z

Discussed during @rust-lang/libs triage today, conclusion was to merge. Thanks for the update @SimonSapin!

@bors: r+

bors · 2016-08-23T05:48:11Z

📌 Commit 82678c5 has been approved by alexcrichton

…hton Implement std::convert traits for char This is motivated by avoiding the `as` operator, which sometimes silently truncates, and instead use conversions that are explicitly lossless and infallible. I’m less certain that `From<u8> for char` should be implemented: while it matches an existing behavior of `as`, it’s not necessarily the right thing to use for non-ASCII bytes. It effectively decodes bytes as ISO/IEC 8859-1 (since Unicode designed its first 256 code points to be compatible with that encoding), but that is not apparent in the API name.

bors · 2016-08-23T18:18:42Z

☔ The latest upstream changes (presumably #35656) made this pull request unmergeable. Please resolve the merge conflicts.

ollie27 · 2016-08-27T15:40:26Z

src/libcore/char.rs

@@ -176,6 +172,41 @@ pub unsafe fn from_u32_unchecked(i: u32) -> char {
    transmute(i)
 }

+#[stable(feature = "char_convert", since = "1.12.0")]


Should these not be "1.13.0"?

Not at the time I first opened this PR, but now yes. Fixed.

bors · 2016-08-29T08:56:44Z

🔒 Merge conflict

These fit with other From implementations between integer types. This helps the coding style of avoiding the 'as' operator that sometimes silently truncates, and signals that these specific conversions are lossless and infaillible.

For symmetry with From<char> for u32.

alexcrichton · 2016-08-29T17:18:35Z

@bors: r+ f040208

bors · 2016-09-01T09:53:29Z

⌛ Testing commit f040208 with merge b2799a5...

Implement std::convert traits for char This is motivated by avoiding the `as` operator, which sometimes silently truncates, and instead use conversions that are explicitly lossless and infallible. I’m less certain that `From<u8> for char` should be implemented: while it matches an existing behavior of `as`, it’s not necessarily the right thing to use for non-ASCII bytes. It effectively decodes bytes as ISO/IEC 8859-1 (since Unicode designed its first 256 code points to be compatible with that encoding), but that is not apparent in the API name.

bors · 2016-09-01T13:05:02Z

bluss · 2016-09-08T13:13:31Z

Mini-reminder: let's tag more user-visible stuff with relnotes

rust-highfive assigned aturon Aug 17, 2016

sfackler added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Aug 17, 2016

sfackler reviewed Aug 17, 2016
View reviewed changes

SimonSapin force-pushed the char_convert branch from 249b789 to 82678c5 Compare August 18, 2016 06:05

eddyb mentioned this pull request Aug 23, 2016

Rollup of 15 pull requests #35944

Closed

ollie27 reviewed Aug 27, 2016
View reviewed changes

SimonSapin added 2 commits August 29, 2016 17:34

Implement From<char> for u32, and From<u8> for char

41d0a89

These fit with other From implementations between integer types. This helps the coding style of avoiding the 'as' operator that sometimes silently truncates, and signals that these specific conversions are lossless and infaillible.

Implement TryFrom<u32> for char

f040208

For symmetry with From<char> for u32.

SimonSapin force-pushed the char_convert branch from 82678c5 to f040208 Compare August 29, 2016 15:34

bors merged commit f040208 into rust-lang:master Sep 1, 2016

bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label Sep 8, 2016

SimonSapin deleted the char_convert branch September 15, 2016 07:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement std::convert traits for char #35755

Implement std::convert traits for char #35755

Uh oh!

SimonSapin commented Aug 17, 2016

Uh oh!

rust-highfive commented Aug 17, 2016

Uh oh!

sfackler Aug 17, 2016

Uh oh!

sfackler commented Aug 17, 2016

Uh oh!

alexcrichton commented Aug 18, 2016

Uh oh!

SimonSapin commented Aug 18, 2016

Uh oh!

alexcrichton commented Aug 23, 2016

Uh oh!

bors commented Aug 23, 2016

Uh oh!

bors commented Aug 23, 2016

Uh oh!

ollie27 Aug 27, 2016

Uh oh!

SimonSapin Aug 29, 2016

Uh oh!

bors commented Aug 29, 2016

Uh oh!

alexcrichton commented Aug 29, 2016

Uh oh!

bors commented Sep 1, 2016

Uh oh!

bors commented Sep 1, 2016

Uh oh!

bluss commented Sep 8, 2016

Uh oh!

Uh oh!

Implement std::convert traits for char #35755

Implement std::convert traits for char #35755

Uh oh!

Conversation

SimonSapin commented Aug 17, 2016

Uh oh!

rust-highfive commented Aug 17, 2016

Uh oh!

sfackler Aug 17, 2016

Choose a reason for hiding this comment

Uh oh!

sfackler commented Aug 17, 2016

Uh oh!

alexcrichton commented Aug 18, 2016

Uh oh!

SimonSapin commented Aug 18, 2016

Uh oh!

alexcrichton commented Aug 23, 2016

Uh oh!

bors commented Aug 23, 2016

Uh oh!

bors commented Aug 23, 2016

Uh oh!

ollie27 Aug 27, 2016

Choose a reason for hiding this comment

Uh oh!

SimonSapin Aug 29, 2016

Choose a reason for hiding this comment

Uh oh!

bors commented Aug 29, 2016

Uh oh!

alexcrichton commented Aug 29, 2016

Uh oh!

bors commented Sep 1, 2016

Uh oh!

bors commented Sep 1, 2016

Uh oh!

bluss commented Sep 8, 2016

Uh oh!

Uh oh!