Skip to content

Commit 7f397bd

Browse files
committed
Auto merge of #43919 - frewsxcv:frewsxcv-char-primitive, r=QuietMisdreavus
Minor rewrite of char primitive unicode intro. Opened primarily to address #36998. Despite my love for emoji, the heart example is a little confusing because both heart characters start with the same code point and there can be stark rendering differences across browsers. I also spelled out what each of the code points is in the code block, which (hopefully) sheds light why one character is one code point while the other is two. Very much open to suggestion and improvements. I'm pretty tired when I wrote this so I might wake up and realize that this is making things more confusing 😅
2 parents c7e3c79 + 1065ad4 commit 7f397bd

File tree

1 file changed

+18
-13
lines changed

1 file changed

+18
-13
lines changed

src/libstd/primitive_docs.rs

+18-13
Original file line numberDiff line numberDiff line change
@@ -103,26 +103,31 @@ mod prim_bool { }
103103
/// [`String`]: string/struct.String.html
104104
///
105105
/// As always, remember that a human intuition for 'character' may not map to
106-
/// Unicode's definitions. For example, emoji symbols such as '❤️' can be more
107-
/// than one Unicode code point; this ❤️ in particular is two:
106+
/// Unicode's definitions. For example, despite looking similar, the 'é'
107+
/// character is one Unicode code point while 'é' is two Unicode code points:
108108
///
109109
/// ```
110-
/// let s = String::from("❤️");
110+
/// let mut chars = "é".chars();
111+
/// // U+00e9: 'latin small letter e with acute'
112+
/// assert_eq!(Some('\u{00e9}'), chars.next());
113+
/// assert_eq!(None, chars.next());
111114
///
112-
/// // we get two chars out of a single ❤️
113-
/// let mut iter = s.chars();
114-
/// assert_eq!(Some('\u{2764}'), iter.next());
115-
/// assert_eq!(Some('\u{fe0f}'), iter.next());
116-
/// assert_eq!(None, iter.next());
115+
/// let mut chars = "é".chars();
116+
/// // U+0065: 'latin small letter e'
117+
/// assert_eq!(Some('\u{0065}'), chars.next());
118+
/// // U+0301: 'combining acute accent'
119+
/// assert_eq!(Some('\u{0301}'), chars.next());
120+
/// assert_eq!(None, chars.next());
117121
/// ```
118122
///
119-
/// This means it won't fit into a `char`. Trying to create a literal with
120-
/// `let heart = '❤️';` gives an error:
123+
/// This means that the contents of the first string above _will_ fit into a
124+
/// `char` while the contents of the second string _will not_. Trying to create
125+
/// a `char` literal with the contents of the second string gives an error:
121126
///
122127
/// ```text
123-
/// error: character literal may only contain one codepoint: '
124-
/// let heart = '❤️';
125-
/// ^~
128+
/// error: character literal may only contain one codepoint: 'é'
129+
/// let c = '';
130+
/// ^^^^
126131
/// ```
127132
///
128133
/// Another implication of the 4-byte fixed size of a `char` is that

0 commit comments

Comments
 (0)