From dcf09632bc7d1ad20315ba8edebcae72b6ed6ed7 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Tue, 12 May 2020 09:22:50 +0200 Subject: [PATCH 1/5] work on char/str descriptions --- src/types/textual.md | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/src/types/textual.md b/src/types/textual.md index d90c89d64..47a415865 100644 --- a/src/types/textual.md +++ b/src/types/textual.md @@ -2,15 +2,20 @@ The types `char` and `str` hold textual data. -A value of type `char` is a [Unicode scalar value] (i.e. a code point that -is not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to -0xD7FF or 0xE000 to 0x10FFFF range. A `[char]` is effectively a UCS-4 / UTF-32 -string. +A value of type `char` is a [Unicode scalar value] (i.e. a code point that is +not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to 0xD7FF +or 0xE000 to 0x10FFFF range. It is immediate [Undefined Behavior] to create a +`char` that falls outside this range. A `[char]` is effectively a UCS-4 / UTF-32 +string of length 1. A value of type `str` is a Unicode string, represented as an array of 8-bit -unsigned bytes holding a sequence of UTF-8 code points. Since `str` is a -[dynamically sized type], it is not a _first-class_ type, but can only be -instantiated through a pointer type, such as `&str`. +unsigned bytes holding a sequence of UTF-8 code points. Note that this is a +library-level invariant: for the compiler and core language specification, `str` +is the same as `[u8]`, but methods working on `str` may assume that the data in +there is valid UTF-8 and may cause Undefined Behavior otherwise. Since `str` is +a [dynamically sized type], it can only be instantiated through a pointer type, +such as `&str`. [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value +[Undefined Behavior]: ../behavior-considered-undefined.html [dynamically sized type]: ../dynamically-sized-types.md From 3c6a8f459b351c027e300c16c86ec9c98b5ef9c1 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Tue, 12 May 2020 21:40:05 +0200 Subject: [PATCH 2/5] fix link Co-authored-by: Eric Huss --- src/types/textual.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/types/textual.md b/src/types/textual.md index 47a415865..6dcbb9d97 100644 --- a/src/types/textual.md +++ b/src/types/textual.md @@ -17,5 +17,5 @@ a [dynamically sized type], it can only be instantiated through a pointer type, such as `&str`. [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value -[Undefined Behavior]: ../behavior-considered-undefined.html +[Undefined Behavior]: ../behavior-considered-undefined.md [dynamically sized type]: ../dynamically-sized-types.md From bd74860ea129c954b16d751d957be43c6795c6ee Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Tue, 12 May 2020 21:42:43 +0200 Subject: [PATCH 3/5] clarify representation --- src/types/textual.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/types/textual.md b/src/types/textual.md index 6dcbb9d97..076feb959 100644 --- a/src/types/textual.md +++ b/src/types/textual.md @@ -9,12 +9,12 @@ or 0xE000 to 0x10FFFF range. It is immediate [Undefined Behavior] to create a string of length 1. A value of type `str` is a Unicode string, represented as an array of 8-bit -unsigned bytes holding a sequence of UTF-8 code points. Note that this is a -library-level invariant: for the compiler and core language specification, `str` -is the same as `[u8]`, but methods working on `str` may assume that the data in -there is valid UTF-8 and may cause Undefined Behavior otherwise. Since `str` is -a [dynamically sized type], it can only be instantiated through a pointer type, -such as `&str`. +unsigned bytes holding a sequence of UTF-8 encoded Unicode code points. Note +that this is a library-level invariant: for the compiler and core language +specification, `str` is the same as `[u8]`, but methods working on `str` may +assume that the data in there is valid UTF-8 and may cause Undefined Behavior +otherwise. Since `str` is a [dynamically sized type], it can only be +instantiated through a pointer type, such as `&str`. [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value [Undefined Behavior]: ../behavior-considered-undefined.md From ed7ad7290605b96b3c4415cbcd2afb16fce71cb6 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Wed, 13 May 2020 14:10:41 +0200 Subject: [PATCH 4/5] tweak str wording --- src/types/textual.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/types/textual.md b/src/types/textual.md index 076feb959..df817e91d 100644 --- a/src/types/textual.md +++ b/src/types/textual.md @@ -8,13 +8,13 @@ or 0xE000 to 0x10FFFF range. It is immediate [Undefined Behavior] to create a `char` that falls outside this range. A `[char]` is effectively a UCS-4 / UTF-32 string of length 1. -A value of type `str` is a Unicode string, represented as an array of 8-bit -unsigned bytes holding a sequence of UTF-8 encoded Unicode code points. Note -that this is a library-level invariant: for the compiler and core language -specification, `str` is the same as `[u8]`, but methods working on `str` may -assume that the data in there is valid UTF-8 and may cause Undefined Behavior -otherwise. Since `str` is a [dynamically sized type], it can only be -instantiated through a pointer type, such as `&str`. +A value of type `str` is represented the same way as `[u8]`, it is a slice of +8-bit unsigned bytes. However, the Rust standard library makes extra assumptions +about `str`: methods working on `str` assume and ensure that the data in there +is valid UTF-8. Calling a `str` method with a non-UTF-8 buffer can cause +[Undefined Behavior] now or in the future. \ +Since `str` is a [dynamically sized type], it can only be instantiated through a +pointer type, such as `&str`. [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value [Undefined Behavior]: ../behavior-considered-undefined.md From 9af5071f876111a09ba54a86655679de83eb464c Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Thu, 14 May 2020 08:50:32 -0700 Subject: [PATCH 5/5] Split str DST into a separate paragraph. --- src/types/textual.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/types/textual.md b/src/types/textual.md index df817e91d..7f3899d70 100644 --- a/src/types/textual.md +++ b/src/types/textual.md @@ -12,7 +12,8 @@ A value of type `str` is represented the same way as `[u8]`, it is a slice of 8-bit unsigned bytes. However, the Rust standard library makes extra assumptions about `str`: methods working on `str` assume and ensure that the data in there is valid UTF-8. Calling a `str` method with a non-UTF-8 buffer can cause -[Undefined Behavior] now or in the future. \ +[Undefined Behavior] now or in the future. + Since `str` is a [dynamically sized type], it can only be instantiated through a pointer type, such as `&str`.