std: Stabilize more of the `char` module #23126

alexcrichton · 2015-03-06T19:07:40Z

This commit performs another pass over the std::char module for stabilization.
Some minor cleanup is performed such as migrating documentation from libcore to
libunicode (where the std-facing trait resides) as well as a slight
reorganiation in libunicode itself. Otherwise, the stability modifications made
are:

char::from_digit is now stable
CharExt::is_digit is now stable
CharExt::to_digit is now stable
CharExt::to_{lower,upper}case are now stable after being modified to return
an iterator over characters. While the implementation today has not changed
this should allow us to implement the full set of case conversions in unicode
where some characters can map to multiple when doing an upper or lower case
mapping.
StrExt::to_{lower,upper}case was added as unstable for a convenience of not
having to worry about characters expanding to more characters when you just
want the whole string to get into upper or lower case.

This is a breaking change due to the change in the signatures of the
CharExt::to_{upper,lower}case methods. Code can be updated to use functions
like flat_map or collect to handle the difference.

[breaking-change]

Closes #20333

rust-highfive · 2015-03-06T19:07:44Z

r? @aturon

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton · 2015-03-06T19:07:53Z

r? @aturon
r? @SimonSapin

SimonSapin · 2015-03-06T20:06:42Z

CharExt::to_{lower,upper}case are now stable after being modified to return an iterator over characters. While the implementation today has not changed this should allow us to implement the full set of case conversions in unicode where some characters can map to multiple when doing an upper or lower case mapping.

Does "stable" mean we can still change their behavior after 1.0?

alexcrichton · 2015-03-06T21:25:01Z

Does "stable" mean we can still change their behavior after 1.0?

I believe that we reserve the right to update the unicode standard we're using, which can add new case mappings for existing characters. Along those lines, I see actually parsing the extra data and returning many characters as a similar enhancement.

All in all yes, I believe that this is a change we can make after 1.0

tbu- · 2015-03-07T00:10:50Z

src/libunicode/char.rs

    ///
-    /// If the buffer is not large enough, nothing will be written into it
-    /// and a `None` will be returned.
+    /// In both of these examples, 'ß' takes one byte to encode.


alexcrichton · 2015-03-09T16:45:19Z

Pushed some updates, thanks for the comments @tbu- and @SimonSapin!

SimonSapin · 2015-03-09T17:17:02Z

LGTM.

aturon · 2015-03-10T17:28:49Z

@bors: r+ d33b308

aturon · 2015-03-10T17:32:34Z

Note: this closes #20333

SimonSapin · 2015-03-10T17:50:29Z

Note: this closes #20333

Returning Iterator<Item=char> is a better (more idiomatic) solution than the Option<&'static str> that I suggested in that bug. (Maybe String::to_{upper,lower}case can gain some performance by not re-encoding to UTF-8 (though that remains to be proven) but it can still do so internally without affecting the API of the char methods.)

aturon · 2015-03-10T17:53:16Z

@SimonSapin Oh, I agree, but I still take this to be addressing the underlying issue of that bug.

SimonSapin · 2015-03-10T17:54:46Z

My message was a long way of saying “+1” :)

bors · 2015-03-10T19:48:19Z

⌛ Testing commit d33b308 with merge e840e74...

bors · 2015-03-10T19:51:23Z

💔 Test failed - auto-linux-32-opt

alexcrichton · 2015-03-10T20:10:12Z

@bors: r+

bors · 2015-03-10T20:10:13Z

~~@bors r=alexcrichton e74a79a~~

alexcrichton · 2015-03-10T20:10:17Z

@bors: r=aturon

bors · 2015-03-10T20:10:18Z

~~@bors r=aturon e74a79a~~

bors · 2015-03-10T20:14:58Z

⌛ Testing commit e74a79a with merge fde1502...

bors · 2015-03-10T20:20:14Z

💔 Test failed - auto-win-32-opt

alexcrichton · 2015-03-10T20:29:34Z

@bors: r=aturon 3bcd209

This commit performs another pass over the `std::char` module for stabilization. Some minor cleanup is performed such as migrating documentation from libcore to libunicode (where the `std`-facing trait resides) as well as a slight reorganiation in libunicode itself. Otherwise, the stability modifications made are: * `char::from_digit` is now stable * `CharExt::is_digit` is now stable * `CharExt::to_digit` is now stable * `CharExt::to_{lower,upper}case` are now stable after being modified to return an iterator over characters. While the implementation today has not changed this should allow us to implement the full set of case conversions in unicode where some characters can map to multiple when doing an upper or lower case mapping. * `StrExt::to_{lower,upper}case` was added as unstable for a convenience of not having to worry about characters expanding to more characters when you just want the whole string to get into upper or lower case. This is a breaking change due to the change in the signatures of the `CharExt::to_{upper,lower}case` methods. Code can be updated to use functions like `flat_map` or `collect` to handle the difference. [breaking-change]

alexcrichton · 2015-03-10T22:09:39Z

@bors: r=aturon 0f6a0b5

This commit performs another pass over the `std::char` module for stabilization. Some minor cleanup is performed such as migrating documentation from libcore to libunicode (where the `std`-facing trait resides) as well as a slight reorganiation in libunicode itself. Otherwise, the stability modifications made are: * `char::from_digit` is now stable * `CharExt::is_digit` is now stable * `CharExt::to_digit` is now stable * `CharExt::to_{lower,upper}case` are now stable after being modified to return an iterator over characters. While the implementation today has not changed this should allow us to implement the full set of case conversions in unicode where some characters can map to multiple when doing an upper or lower case mapping. * `StrExt::to_{lower,upper}case` was added as unstable for a convenience of not having to worry about characters expanding to more characters when you just want the whole string to get into upper or lower case. This is a breaking change due to the change in the signatures of the `CharExt::to_{upper,lower}case` methods. Code can be updated to use functions like `flat_map` or `collect` to handle the difference. [breaking-change] Closes #20333

bors · 2015-03-10T22:45:11Z

⌛ Testing commit 0f6a0b5 with merge cfea8ec...

bors · 2015-03-11T01:12:56Z

☀️ Test successful - auto-linux-32-nopt-t, auto-linux-32-opt, auto-linux-64-nopt-t, auto-linux-64-opt, auto-linux-64-x-android-t, auto-mac-32-opt, auto-mac-64-nopt-t, auto-mac-64-opt, auto-win-32-nopt-t, auto-win-32-opt, auto-win-64-nopt-t, auto-win-64-opt

rust-highfive assigned aturon Mar 6, 2015

aturon mentioned this pull request Mar 6, 2015

Stabilization for 1.0-beta #22500

Closed

91 tasks

alexcrichton force-pushed the char-third-pass branch 2 times, most recently from b0862e5 to 6d6cbbd Compare March 6, 2015 19:46

tbu- reviewed Mar 7, 2015
View reviewed changes

alexcrichton force-pushed the char-third-pass branch from 6d6cbbd to d33b308 Compare March 9, 2015 16:45

alexcrichton force-pushed the char-third-pass branch from d33b308 to e74a79a Compare March 10, 2015 20:10

alexcrichton force-pushed the char-third-pass branch from e74a79a to 3bcd209 Compare March 10, 2015 20:28

alexcrichton force-pushed the char-third-pass branch from 3bcd209 to 0f6a0b5 Compare March 10, 2015 22:08

bors merged commit 0f6a0b5 into rust-lang:master Mar 11, 2015

kevinmehall mentioned this pull request Mar 13, 2015

Fix build on latest Rust kevinmehall/rust-peg#63

Closed

alexcrichton deleted the char-third-pass branch March 27, 2015 20:35

This was referenced May 4, 2015

Uppercase strings rather than chars for small-caps #25106

Closed

Uppercase strings rather than chars for small-caps servo/servo#5938

Open

std: Stabilize more of the char module #23126

std: Stabilize more of the char module #23126

Conversation

alexcrichton commented Mar 6, 2015

Uh oh!

rust-highfive commented Mar 6, 2015

Uh oh!

alexcrichton commented Mar 6, 2015

Uh oh!

SimonSapin commented Mar 6, 2015

Uh oh!

alexcrichton commented Mar 6, 2015

Uh oh!

tbu- Mar 7, 2015

Choose a reason for hiding this comment

Uh oh!

alexcrichton commented Mar 9, 2015

Uh oh!

SimonSapin commented Mar 9, 2015

Uh oh!

aturon commented Mar 10, 2015

Uh oh!

aturon commented Mar 10, 2015

Uh oh!

SimonSapin commented Mar 10, 2015

Uh oh!

aturon commented Mar 10, 2015

Uh oh!

SimonSapin commented Mar 10, 2015

Uh oh!

bors commented Mar 10, 2015

Uh oh!

bors commented Mar 10, 2015

Uh oh!

alexcrichton commented Mar 10, 2015

Uh oh!

bors commented Mar 10, 2015

Uh oh!

alexcrichton commented Mar 10, 2015

Uh oh!

bors commented Mar 10, 2015

Uh oh!

bors commented Mar 10, 2015

Uh oh!

bors commented Mar 10, 2015

Uh oh!

alexcrichton commented Mar 10, 2015

Uh oh!

alexcrichton commented Mar 10, 2015

Uh oh!

bors commented Mar 10, 2015

Uh oh!

bors commented Mar 11, 2015

Uh oh!

Uh oh!

std: Stabilize more of the `char` module #23126

std: Stabilize more of the `char` module #23126