Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document C string literal tokens. #1423

Merged
merged 1 commit into from
Dec 2, 2023
Merged

Conversation

jmillikin
Copy link
Contributor

No description provided.

@jmillikin
Copy link
Contributor Author

Note: this feature is being stabilized in rust-lang/rust#117472 -- CI will fail until run with rustc that includes that PR.

I ran mdbook test && mdbook build locally to verify that the tests pass when run with a rustdoc containing the stabilization PR.

@ehuss ehuss added the S-waiting-on-stabilization Waiting for a stabilization PR to be merged in the main Rust repository label Nov 1, 2023
Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Can you also include a section that indicates that C-string literals are only available in Edition 2021 or newer? Edition differences are specified in blockquotes (search for "Edition Differences" for the format).

I believe the Reserved prefixes section needs to be updated with c and cr being excluded.

I believe Literal patterns will need to be updated since C-strings are accepted there syntactically. (They can't really be used since CStr doesn't implement Eq/PartialEq, though.)

src/tokens.md Outdated Show resolved Hide resolved
src/tokens.md Outdated Show resolved Hide resolved
src/tokens.md Outdated Show resolved Hide resolved
src/tokens.md Outdated Show resolved Hide resolved
src/tokens.md Outdated Show resolved Hide resolved
src/tokens.md Outdated Show resolved Hide resolved
src/tokens.md Outdated Show resolved Hide resolved
@jmillikin
Copy link
Contributor Author

Thank you for the quick review! I've made the recommended changes, fixed the ASCII vs Unicode misunderstanding, and tried to clarify the wording around NUL escapes.

diff: https://github.com/rust-lang/reference/compare/5d1950799abab6174d8a797952889851dbe2774b..2481014bf2a856425e2c0d9971b8f77becb6eebc

Comment on lines +346 to +351
A _C string literal_ is a sequence of Unicode characters and _escapes_,
preceded by the characters `U+0063` (`c`) and `U+0022` (double-quote), and
followed by the character `U+0022`. If the character `U+0022` is present within
the literal, it must be _escaped_ by a preceding `U+005C` (`\`) character.
Alternatively, a C string literal can be a _raw C string literal_, defined
below. The type of a C string literal is [`&core::ffi::CStr`][CStr].
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whilst below it is mentioned that code point escapes are encoded as UTF-8, nowhere is it stated how the Unicode characters contained within the C string literal are encoded in the ensuing CStr: I presume also UTF-8? Perhaps this should be stated explicitly for the avoidance of any doubt.

Copy link
Contributor Author

@jmillikin jmillikin Nov 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors added a commit to rust-lang-ci/rust that referenced this pull request Dec 1, 2023
…ilstrieb

Stabilize C string literals

RFC: https://rust-lang.github.io/rfcs/3348-c-str-literal.html

Tracking issue: rust-lang#105723

Documentation PR (reference manual): rust-lang/reference#1423

# Stabilization report

Stabilizes C string and raw C string literals (`c"..."` and `cr#"..."#`), which are expressions of type [`&CStr`](https://doc.rust-lang.org/stable/core/ffi/struct.CStr.html). Both new literals require Rust edition 2021 or later.

```rust
const HELLO: &core::ffi::CStr = c"Hello, world!";
```

C strings may contain any byte other than `NUL` (`b'\x00'`), and their in-memory representation is guaranteed to end with `NUL`.

## Implementation

Originally implemented by PR rust-lang#108801, which was reverted due to unintentional changes to lexer behavior in Rust editions < 2021.

The current implementation landed in PR rust-lang#113476, which restricts C string literals to Rust edition >= 2021.

## Resolutions to open questions from the RFC

* Adding C character literals (`c'.'`) of type `c_char` is not part of this feature.
  * Support for `c"..."` literals does not prevent `c'.'` literals from being added in the future.
* C string literals should not be blocked on making `&CStr` a thin pointer.
  * It's possible to declare constant expressions of type `&'static CStr` in stable Rust (as of v1.59), so C string literals are not adding additional coupling on the internal representation of `CStr`.
* The unstable `concat_bytes!` macro should not accept `c"..."` literals.
  * C strings have two equally valid `&[u8]` representations (with or without terminal `NUL`), so allowing them to be used in `concat_bytes!` would be ambiguous.
* Adding a type to represent C strings containing valid UTF-8 is not part of this feature.
  * Support for a hypothetical `&Utf8CStr` may be explored in the future, should such a type be added to Rust.
@jmillikin
Copy link
Contributor Author

The stabilization PR has merged and this PR's CI build is now green.

bors added a commit to rust-lang/miri that referenced this pull request Dec 2, 2023
Stabilize C string literals

RFC: https://rust-lang.github.io/rfcs/3348-c-str-literal.html

Tracking issue: rust-lang/rust#105723

Documentation PR (reference manual): rust-lang/reference#1423

# Stabilization report

Stabilizes C string and raw C string literals (`c"..."` and `cr#"..."#`), which are expressions of type [`&CStr`](https://doc.rust-lang.org/stable/core/ffi/struct.CStr.html). Both new literals require Rust edition 2021 or later.

```rust
const HELLO: &core::ffi::CStr = c"Hello, world!";
```

C strings may contain any byte other than `NUL` (`b'\x00'`), and their in-memory representation is guaranteed to end with `NUL`.

## Implementation

Originally implemented by PR rust-lang/rust#108801, which was reverted due to unintentional changes to lexer behavior in Rust editions < 2021.

The current implementation landed in PR rust-lang/rust#113476, which restricts C string literals to Rust edition >= 2021.

## Resolutions to open questions from the RFC

* Adding C character literals (`c'.'`) of type `c_char` is not part of this feature.
  * Support for `c"..."` literals does not prevent `c'.'` literals from being added in the future.
* C string literals should not be blocked on making `&CStr` a thin pointer.
  * It's possible to declare constant expressions of type `&'static CStr` in stable Rust (as of v1.59), so C string literals are not adding additional coupling on the internal representation of `CStr`.
* The unstable `concat_bytes!` macro should not accept `c"..."` literals.
  * C strings have two equally valid `&[u8]` representations (with or without terminal `NUL`), so allowing them to be used in `concat_bytes!` would be ambiguous.
* Adding a type to represent C strings containing valid UTF-8 is not part of this feature.
  * Support for a hypothetical `&Utf8CStr` may be explored in the future, should such a type be added to Rust.
Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ehuss ehuss added this pull request to the merge queue Dec 2, 2023
Merged via the queue into rust-lang:master with commit 21a27e1 Dec 2, 2023
1 check passed
@jmillikin jmillikin deleted the c-str-literals branch December 2, 2023 22:55
fee1-dead added a commit to fee1-dead-contrib/rust that referenced this pull request Dec 5, 2023
Update books

## rust-lang/nomicon

1 commits in 1842257814919fa62e81bdecd5e8f95be2839dbb..83d015105e6d490fc30d6c95da1e56152a50e228
2023-11-22 15:35:31 UTC to 2023-11-22 15:35:31 UTC

- Reword the section on general race conditions (rust-lang/nomicon#431)

## rust-lang/reference

5 commits in cd8193e972f61b92117095fc73b67af767b4d6bc..692d216f5a1151e8852ddb308ba64040e634c876
2023-12-04 09:45:06 UTC to 2023-11-21 17:57:18 UTC

- Fix note on `self` coercion (rust-lang/reference#1431)
- Document C string literal tokens. (rust-lang/reference#1423)
- type-layout.md: Warn about repr(align)/repr(packed) and field order (rust-lang/reference#1430)
- Lone `self` in a method body resolves to the self parameter (rust-lang/reference#1427)
- Reference wildcard patterns from underscore expr (rust-lang/reference#1428)

## rust-lang/rust-by-example

4 commits in a6581246f96837113968c02187db24f742af3908..da0a06aada31a324ae84a9eaee344f6a944b9683
2023-11-27 12:50:49 UTC to 2023-11-21 11:58:19 UTC

- fix tiny typo in string conversion docs (rust-lang/rust-by-example#1776)
- fix(arg): Remove reference to Rust Cookbook in arg parsing (rust-lang/rust-by-example#1775)
- fix:typo error (rust-lang/rust-by-example#1774)
- Remove space between `&` and `self` (rust-lang/rust-by-example#1772)

## rust-lang/rustc-dev-guide

5 commits in ddb8b1309f9e905804cea1e248a4572fed6b464b..904bb5aa7b21adad58ffae610e2830c7b0f813b0
2023-11-28 13:13:36 UTC to 2023-11-22 06:13:00 UTC

- Update how-to-build-and-run.md (rust-lang/rustc-dev-guide#1828)
- notification groups: add information about how to ping them (rust-lang/rustc-dev-guide#1818)
- Add explanations on how to run rustc_codegen_gcc tests (rust-lang/rustc-dev-guide#1821)
- Add back the `canonicalization` chapter. (rust-lang/rustc-dev-guide#1532)
- Emphasize that the experts map is not up to date (rust-lang/rustc-dev-guide#1826)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Dec 5, 2023
Update books

## rust-lang/nomicon

1 commits in 1842257814919fa62e81bdecd5e8f95be2839dbb..83d015105e6d490fc30d6c95da1e56152a50e228
2023-11-22 15:35:31 UTC to 2023-11-22 15:35:31 UTC

- Reword the section on general race conditions (rust-lang/nomicon#431)

## rust-lang/reference

5 commits in cd8193e972f61b92117095fc73b67af767b4d6bc..692d216f5a1151e8852ddb308ba64040e634c876
2023-12-04 09:45:06 UTC to 2023-11-21 17:57:18 UTC

- Fix note on `self` coercion (rust-lang/reference#1431)
- Document C string literal tokens. (rust-lang/reference#1423)
- type-layout.md: Warn about repr(align)/repr(packed) and field order (rust-lang/reference#1430)
- Lone `self` in a method body resolves to the self parameter (rust-lang/reference#1427)
- Reference wildcard patterns from underscore expr (rust-lang/reference#1428)

## rust-lang/rust-by-example

4 commits in a6581246f96837113968c02187db24f742af3908..da0a06aada31a324ae84a9eaee344f6a944b9683
2023-11-27 12:50:49 UTC to 2023-11-21 11:58:19 UTC

- fix tiny typo in string conversion docs (rust-lang/rust-by-example#1776)
- fix(arg): Remove reference to Rust Cookbook in arg parsing (rust-lang/rust-by-example#1775)
- fix:typo error (rust-lang/rust-by-example#1774)
- Remove space between `&` and `self` (rust-lang/rust-by-example#1772)

## rust-lang/rustc-dev-guide

5 commits in ddb8b1309f9e905804cea1e248a4572fed6b464b..904bb5aa7b21adad58ffae610e2830c7b0f813b0
2023-11-28 13:13:36 UTC to 2023-11-22 06:13:00 UTC

- Update how-to-build-and-run.md (rust-lang/rustc-dev-guide#1828)
- notification groups: add information about how to ping them (rust-lang/rustc-dev-guide#1818)
- Add explanations on how to run rustc_codegen_gcc tests (rust-lang/rustc-dev-guide#1821)
- Add back the `canonicalization` chapter. (rust-lang/rustc-dev-guide#1532)
- Emphasize that the experts map is not up to date (rust-lang/rustc-dev-guide#1826)
flip1995 pushed a commit to flip1995/rust-clippy that referenced this pull request Dec 5, 2023
Stabilize C string literals

RFC: https://rust-lang.github.io/rfcs/3348-c-str-literal.html

Tracking issue: rust-lang/rust#105723

Documentation PR (reference manual): rust-lang/reference#1423

# Stabilization report

Stabilizes C string and raw C string literals (`c"..."` and `cr#"..."#`), which are expressions of type [`&CStr`](https://doc.rust-lang.org/stable/core/ffi/struct.CStr.html). Both new literals require Rust edition 2021 or later.

```rust
const HELLO: &core::ffi::CStr = c"Hello, world!";
```

C strings may contain any byte other than `NUL` (`b'\x00'`), and their in-memory representation is guaranteed to end with `NUL`.

## Implementation

Originally implemented by PR rust-lang/rust#108801, which was reverted due to unintentional changes to lexer behavior in Rust editions < 2021.

The current implementation landed in PR rust-lang/rust#113476, which restricts C string literals to Rust edition >= 2021.

## Resolutions to open questions from the RFC

* Adding C character literals (`c'.'`) of type `c_char` is not part of this feature.
  * Support for `c"..."` literals does not prevent `c'.'` literals from being added in the future.
* C string literals should not be blocked on making `&CStr` a thin pointer.
  * It's possible to declare constant expressions of type `&'static CStr` in stable Rust (as of v1.59), so C string literals are not adding additional coupling on the internal representation of `CStr`.
* The unstable `concat_bytes!` macro should not accept `c"..."` literals.
  * C strings have two equally valid `&[u8]` representations (with or without terminal `NUL`), so allowing them to be used in `concat_bytes!` would be ambiguous.
* Adding a type to represent C strings containing valid UTF-8 is not part of this feature.
  * Support for a hypothetical `&Utf8CStr` may be explored in the future, should such a type be added to Rust.
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Dec 5, 2023
Rollup merge of rust-lang#118614 - rustbot:docs-update, r=ehuss

Update books

## rust-lang/nomicon

1 commits in 1842257814919fa62e81bdecd5e8f95be2839dbb..83d015105e6d490fc30d6c95da1e56152a50e228
2023-11-22 15:35:31 UTC to 2023-11-22 15:35:31 UTC

- Reword the section on general race conditions (rust-lang/nomicon#431)

## rust-lang/reference

5 commits in cd8193e972f61b92117095fc73b67af767b4d6bc..692d216f5a1151e8852ddb308ba64040e634c876
2023-12-04 09:45:06 UTC to 2023-11-21 17:57:18 UTC

- Fix note on `self` coercion (rust-lang/reference#1431)
- Document C string literal tokens. (rust-lang/reference#1423)
- type-layout.md: Warn about repr(align)/repr(packed) and field order (rust-lang/reference#1430)
- Lone `self` in a method body resolves to the self parameter (rust-lang/reference#1427)
- Reference wildcard patterns from underscore expr (rust-lang/reference#1428)

## rust-lang/rust-by-example

4 commits in a6581246f96837113968c02187db24f742af3908..da0a06aada31a324ae84a9eaee344f6a944b9683
2023-11-27 12:50:49 UTC to 2023-11-21 11:58:19 UTC

- fix tiny typo in string conversion docs (rust-lang/rust-by-example#1776)
- fix(arg): Remove reference to Rust Cookbook in arg parsing (rust-lang/rust-by-example#1775)
- fix:typo error (rust-lang/rust-by-example#1774)
- Remove space between `&` and `self` (rust-lang/rust-by-example#1772)

## rust-lang/rustc-dev-guide

5 commits in ddb8b1309f9e905804cea1e248a4572fed6b464b..904bb5aa7b21adad58ffae610e2830c7b0f813b0
2023-11-28 13:13:36 UTC to 2023-11-22 06:13:00 UTC

- Update how-to-build-and-run.md (rust-lang/rustc-dev-guide#1828)
- notification groups: add information about how to ping them (rust-lang/rustc-dev-guide#1818)
- Add explanations on how to run rustc_codegen_gcc tests (rust-lang/rustc-dev-guide#1821)
- Add back the `canonicalization` chapter. (rust-lang/rustc-dev-guide#1532)
- Emphasize that the experts map is not up to date (rust-lang/rustc-dev-guide#1826)
ehuss added a commit that referenced this pull request Jan 9, 2024
This reverts commit 21a27e1, reversing
changes made to 01a12f2.

This is being reverted in rust-lang/rust#119528
lnicola pushed a commit to lnicola/rust-analyzer that referenced this pull request Apr 7, 2024
Stabilize C string literals

RFC: https://rust-lang.github.io/rfcs/3348-c-str-literal.html

Tracking issue: rust-lang/rust#105723

Documentation PR (reference manual): rust-lang/reference#1423

# Stabilization report

Stabilizes C string and raw C string literals (`c"..."` and `cr#"..."#`), which are expressions of type [`&CStr`](https://doc.rust-lang.org/stable/core/ffi/struct.CStr.html). Both new literals require Rust edition 2021 or later.

```rust
const HELLO: &core::ffi::CStr = c"Hello, world!";
```

C strings may contain any byte other than `NUL` (`b'\x00'`), and their in-memory representation is guaranteed to end with `NUL`.

## Implementation

Originally implemented by PR rust-lang/rust#108801, which was reverted due to unintentional changes to lexer behavior in Rust editions < 2021.

The current implementation landed in PR rust-lang/rust#113476, which restricts C string literals to Rust edition >= 2021.

## Resolutions to open questions from the RFC

* Adding C character literals (`c'.'`) of type `c_char` is not part of this feature.
  * Support for `c"..."` literals does not prevent `c'.'` literals from being added in the future.
* C string literals should not be blocked on making `&CStr` a thin pointer.
  * It's possible to declare constant expressions of type `&'static CStr` in stable Rust (as of v1.59), so C string literals are not adding additional coupling on the internal representation of `CStr`.
* The unstable `concat_bytes!` macro should not accept `c"..."` literals.
  * C strings have two equally valid `&[u8]` representations (with or without terminal `NUL`), so allowing them to be used in `concat_bytes!` would be ambiguous.
* Adding a type to represent C strings containing valid UTF-8 is not part of this feature.
  * Support for a hypothetical `&Utf8CStr` may be explored in the future, should such a type be added to Rust.
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this pull request Apr 27, 2024
Stabilize C string literals

RFC: https://rust-lang.github.io/rfcs/3348-c-str-literal.html

Tracking issue: rust-lang/rust#105723

Documentation PR (reference manual): rust-lang/reference#1423

# Stabilization report

Stabilizes C string and raw C string literals (`c"..."` and `cr#"..."#`), which are expressions of type [`&CStr`](https://doc.rust-lang.org/stable/core/ffi/struct.CStr.html). Both new literals require Rust edition 2021 or later.

```rust
const HELLO: &core::ffi::CStr = c"Hello, world!";
```

C strings may contain any byte other than `NUL` (`b'\x00'`), and their in-memory representation is guaranteed to end with `NUL`.

## Implementation

Originally implemented by PR rust-lang/rust#108801, which was reverted due to unintentional changes to lexer behavior in Rust editions < 2021.

The current implementation landed in PR rust-lang/rust#113476, which restricts C string literals to Rust edition >= 2021.

## Resolutions to open questions from the RFC

* Adding C character literals (`c'.'`) of type `c_char` is not part of this feature.
  * Support for `c"..."` literals does not prevent `c'.'` literals from being added in the future.
* C string literals should not be blocked on making `&CStr` a thin pointer.
  * It's possible to declare constant expressions of type `&'static CStr` in stable Rust (as of v1.59), so C string literals are not adding additional coupling on the internal representation of `CStr`.
* The unstable `concat_bytes!` macro should not accept `c"..."` literals.
  * C strings have two equally valid `&[u8]` representations (with or without terminal `NUL`), so allowing them to be used in `concat_bytes!` would be ambiguous.
* Adding a type to represent C strings containing valid UTF-8 is not part of this feature.
  * Support for a hypothetical `&Utf8CStr` may be explored in the future, should such a type be added to Rust.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-stabilization Waiting for a stabilization PR to be merged in the main Rust repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants