Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes a technicality regarding the size of C's char type #89114

Merged
merged 1 commit into from
Sep 22, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Fix a technicality regarding the size of C's char type
Specifically, ISO/IEC 9899:2018 — better known as "C18" — (and at least
C11, C99 and C89) do not specify the size of `byte` in bits.
Section 3.6 defines "byte" as "addressable unit of data storage" while
section 6.2.5 ("Types") only defines "char" as "large enough to store
any member of the basic execution set" giving it a lower bound of 7 bit
(since there are 96 characters in the basic execution set).
With section 6.5.3.4 paragraph 4 "When sizeof is applied to an operant
that has type char […] the result is 1" you could read this as the size
of `char` in bits being defined as exactly the same as the number of
bits in a byte but it's also valid to read that as an exception.

In general implementations take `char` as the smallest unit of
addressable memory, which for modern byte-addressed architectures is
overwhelmingly 8 bits to the point of this convention being completely
cemented into just about all of our software.

So is any of this actually relevant at all? I hope not. I sincerely hope
that this never, ever comes up.
But if for some reason a poor rustacean is having to interface with C
code running on a Cray X1 that in 2003 is still doing word-addressed
memory with 64-bit words and they trust the docs here blindly it will
blow up in her face. And I'll be truly sorry for her to have to deal
with … all of that.
dequbed committed Sep 20, 2021
commit 23c608f3a1b0cf09344f0e356610e59d8ee581ac
2 changes: 1 addition & 1 deletion library/std/src/os/raw/char.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Equivalent to C's `char` type.

[C's `char` type] is completely unlike [Rust's `char` type]; while Rust's type represents a unicode scalar value, C's `char` type is just an ordinary integer. This type will always be either [`i8`] or [`u8`], as the type is defined as being one byte long.
[C's `char` type] is completely unlike [Rust's `char` type]; while Rust's type represents a unicode scalar value, C's `char` type is just an ordinary integer. On modern architectures this type will always be either [`i8`] or [`u8`], as they use byte-addresses memory with 8-bit bytes.

C chars are most commonly used to make C strings. Unlike Rust, where the length of a string is included alongside the string, C strings mark the end of a string with the character `'\0'`. See [`CStr`] for more information.