Skip to content

Comments

Add negative lookahead#2172

Merged
traviscross merged 12 commits intorust-lang:masterfrom
ehuss:negative-lookahead
Feb 18, 2026
Merged

Add negative lookahead#2172
traviscross merged 12 commits intorust-lang:masterfrom
ehuss:negative-lookahead

Conversation

@ehuss
Copy link
Contributor

@ehuss ehuss commented Feb 13, 2026

This adds the ! negative lookahead to the grammar to make it easier to express certain rules, and to remove some of the English-based rules.

This updates several rules to use !, and also fixes mistakes in several rules. See the individual commits for more details.

As part of this, it also adds the ability to specify U+xxxx Unicode values in character ranges, since it was needed to express some things without English rules.

@rustbot rustbot added the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Feb 13, 2026
@ehuss ehuss force-pushed the negative-lookahead branch from 2bd22f8 to 3084dec Compare February 15, 2026 03:44
Copy link
Contributor

@traviscross traviscross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Pushing some fixes and tweaks.

@rustbot

This comment has been minimized.

| NegativeExpression

Unicode -> `U+` [`A`-`Z` `0`-`9`]4..4
Unicode -> `U+` [`A`-`Z` `0`-`9`]4..6
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the new range syntax is in.

Suggested change
Unicode -> `U+` [`A`-`Z` `0`-`9`]4..6
Unicode -> `U+` [`A`-`Z` `0`-`9`]4..=6

ehuss and others added 12 commits February 18, 2026 02:05
This adds the `!` prefix which represents negative lookahead. This was
included in the original PEG paper, though it was called "NOT", whereas
I went with a more explicit "NegativeLookahead".

This will be helpful in several productions which need to have these
kinds of exclusions.

The syntax is also commonly used in regular expression engines which
usually use `(?!expr)`. This is also common in many other PEG libraries.

There is a small risk this could be confusing, since `!` is sometimes
used for other purposes in other contexts. For example, Prolog uses `!`
for their cut operator. I think this should be fine since it is common
with PEG.
This adds the ability to specify Unicode code points in a character
range. This will be useful for defining some productions without using
English, and perhaps to be a little clearer.

This also extends the Unicode grammar to allow up to 6 characters for
larger code points.
This replaces some suffixes and prose with the new negative lookahead
syntax instead. This should all have the same meaning.
This clarifies that bare `//` is explicitly meant to be either followed
by LF or EOF. Otherwise it incorrectly matches other comment rules.
This fixes the BLOCK_COMMENT grammar so that it follows the rule that
the first alternation that matches wins. The previous grammar would fail
with the use of the cut operator to parse these two forms.
This fixes the doc comments so that they properly handle a carriage
return by using the cut operator. Rustc will fail parsing if a doc
comment contains a carriage return.

This requires including (LF|EOF) at the end of line so the cut operator
has something to complete the line.

This also removes the negative `/` from OUTER_LINE_DOC. This does not
work correctly with the check for CR, and is not needed because
LINE_COMMENT already matches `////`. Later I plan to include a rule for
comments that makes it clear the order that they are parsed.

A negative lookahead is necessary in OUTER_BLOCK_DOC to prevent it from
trying to parse what should be a BLOCK_COMMENT as an OUTER_BLOCK_DOC and
failing due to the cut operator.
This is intended to indicate the order that the rules are expected to be
processed (as defined in this grammar). Of course real parsers can take
a different approach if they have the same results.

This is roughly similar to the order that rustc takes, though
[`block_comment`](https://github.com/rust-lang/rust/blob/d7daac06d87e1252d10eaa44960164faac46beff/compiler/rustc_lexer/src/lib.rs#L782-L817)
roughly takes the approach of combining the `/*` prefix, and then
deciding if it is an inner doc comment, outer doc comment, or else a
regular block comment.

LINE_COMMENT must be first so that it is not confused with a doc
comment.

BLOCK_COMMENT must be last so that its cut operator does not interfere
with doc comments that start with `/*`. It could be moved up higher in
the list if it had negative lookahead to disambiguate OUTER_BLOCK_DOC,
but the expression for that is more complicated than the one in
OUTER_BLOCK_DOC.
rustc actually includes the spaces for doc comments.
The cut operator after (`e`|`E`) in `FLOAT_EXPONENT` reflects rustc's
actual parsing behavior: once the lexer sees an exponent indicator, it
commits and does not backtrack.  This makes the last `RESERVED_NUMBER`
alternative -- which existed to catch the empty-exponent case --
redundant, since the cut in `FLOAT_EXPONENT` now handles it directly.

Co-authored-by: Eric Huss <eric@huss.org>
The description says characters can be "surrounded in
backticks", but it'd be better to say "surrounded by".
The grammar now accepts 4-6 hex digits for Unicode code points (needed
for values above U+FFFF), so let's update the notation column to
reflect the variable width.  Let's also capitalize "Unicode", which is
a proper noun.
These tests cover:

- Parser: negative lookahead with nonterminals, terminals, charsets,
  grouped expressions, within sequences, repetitions, and
  alternations; error case for trailing `!`; Unicode code points with
  4, 5, and 6 hex digits; charset ranges with `Character::Char`,
  `Character::Unicode`, and mixed forms; charsets combining named
  entries, terminals, and Unicode ranges.

- Markdown renderer: negative lookahead rendering with `!`, Unicode
  rendering as `U+xxxx`, charset rendering with char and Unicode
  ranges, cut and neg expression rendering, and markdown escaping.

- Railroad renderer: negative lookahead renders as a "not followed by"
  labeled box, Unicode renders as terminal, charset ranges, cut
  renders as "no backtracking" labeled box, and neg expression renders
  as "with the exception of" labeled box.
@rustbot
Copy link
Collaborator

rustbot commented Feb 18, 2026

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@traviscross traviscross added this pull request to the merge queue Feb 18, 2026
Merged via the queue into rust-lang:master with commit c6be577 Feb 18, 2026
6 checks passed
@rustbot rustbot removed the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Feb 18, 2026
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Feb 24, 2026
Update books

## rust-embedded/book

1 commits in fe88fbb68391a465680dd91109f0a151a1676f3e..99d0341ff4e06757490af8fceee790c4ede50bc0
2026-02-11 12:58:13 UTC to 2026-02-11 12:58:13 UTC

- Remove triagebot.toml (rust-embedded/book#405)

## rust-lang/reference

21 commits in addd0602c819b6526b9cc97653b0fadca395528c..442cbef9105662887d5eae2882ca551f3726bf28
2026-02-22 02:55:12 UTC to 2026-02-11 01:41:05 UTC

- Document importing path-segment keyword (rust-lang/reference#2136)
- avoid needless dereference (rust-lang/reference#2180)
- Use `clobber_abi`s corresponding to the called functions in `[asm.abi-clobbers.many]`'s example. (rust-lang/reference#2170)
- expr.paren.evaluation: fix and make more simple (rust-lang/reference#2158)
- const-eval.const-context.outer-generics: make more clear/obvious (rust-lang/reference#2159)
- Nightly test links: update rust branch name (use `main`) (rust-lang/reference#2185)
- tools/xtask: update rust branch name for linkcheck script (use `main`) (rust-lang/reference#2184)
- specify `if let` guards with updated scoping rules (rust-lang/reference#1957)
- Document assignment expression as coercion site (rust-lang/reference#1954)
- Remove exception WRT same-crate `non_exhaustive` reads (rust-lang/reference#2162)
- Add negative lookahead (rust-lang/reference#2172)
- Switch to new range syntax (rust-lang/reference#2173)
- add mdbook output for dev-guide to ignore file (rust-lang/reference#2178)
- Fix rule name for while syntax (rust-lang/reference#2175)
- block-expr: add new rule expr.block.result-value (rust-lang/reference#2174)
- lifetime-elision.md: add some missing periods (rust-lang/reference#2176)
- Add cut operator (`^`) to grammar (rust-lang/reference#2104)
- dev-guide stabilization.md: add missing "not" (rust-lang/reference#2167)
- Add method call and await expr for Dot in syntax index (rust-lang/reference#2163)
- Fix sort of punctuation list (rust-lang/reference#2161)
- Add a contributor guide (rust-lang/reference#2097)

## rust-lang/rust-by-example

1 commits in bac931ef1673af63fb60c3d691633034713cca20..5383db524711c0c9c43c3ca9e5e706089672ed6a
2026-02-16 12:02:33 UTC to 2026-02-16 12:02:33 UTC

- 1.2.2 Display: Fix typo in bonus instructions (before -> after) (rust-lang/rust-by-example#1998)
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Feb 24, 2026
Update books

## rust-embedded/book

1 commits in fe88fbb68391a465680dd91109f0a151a1676f3e..99d0341ff4e06757490af8fceee790c4ede50bc0
2026-02-11 12:58:13 UTC to 2026-02-11 12:58:13 UTC

- Remove triagebot.toml (rust-embedded/book#405)

## rust-lang/reference

21 commits in addd0602c819b6526b9cc97653b0fadca395528c..442cbef9105662887d5eae2882ca551f3726bf28
2026-02-22 02:55:12 UTC to 2026-02-11 01:41:05 UTC

- Document importing path-segment keyword (rust-lang/reference#2136)
- avoid needless dereference (rust-lang/reference#2180)
- Use `clobber_abi`s corresponding to the called functions in `[asm.abi-clobbers.many]`'s example. (rust-lang/reference#2170)
- expr.paren.evaluation: fix and make more simple (rust-lang/reference#2158)
- const-eval.const-context.outer-generics: make more clear/obvious (rust-lang/reference#2159)
- Nightly test links: update rust branch name (use `main`) (rust-lang/reference#2185)
- tools/xtask: update rust branch name for linkcheck script (use `main`) (rust-lang/reference#2184)
- specify `if let` guards with updated scoping rules (rust-lang/reference#1957)
- Document assignment expression as coercion site (rust-lang/reference#1954)
- Remove exception WRT same-crate `non_exhaustive` reads (rust-lang/reference#2162)
- Add negative lookahead (rust-lang/reference#2172)
- Switch to new range syntax (rust-lang/reference#2173)
- add mdbook output for dev-guide to ignore file (rust-lang/reference#2178)
- Fix rule name for while syntax (rust-lang/reference#2175)
- block-expr: add new rule expr.block.result-value (rust-lang/reference#2174)
- lifetime-elision.md: add some missing periods (rust-lang/reference#2176)
- Add cut operator (`^`) to grammar (rust-lang/reference#2104)
- dev-guide stabilization.md: add missing "not" (rust-lang/reference#2167)
- Add method call and await expr for Dot in syntax index (rust-lang/reference#2163)
- Fix sort of punctuation list (rust-lang/reference#2161)
- Add a contributor guide (rust-lang/reference#2097)

## rust-lang/rust-by-example

1 commits in bac931ef1673af63fb60c3d691633034713cca20..5383db524711c0c9c43c3ca9e5e706089672ed6a
2026-02-16 12:02:33 UTC to 2026-02-16 12:02:33 UTC

- 1.2.2 Display: Fix typo in bonus instructions (before -> after) (rust-lang/rust-by-example#1998)
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Feb 24, 2026
Update books

## rust-embedded/book

1 commits in fe88fbb68391a465680dd91109f0a151a1676f3e..99d0341ff4e06757490af8fceee790c4ede50bc0
2026-02-11 12:58:13 UTC to 2026-02-11 12:58:13 UTC

- Remove triagebot.toml (rust-embedded/book#405)

## rust-lang/reference

21 commits in addd0602c819b6526b9cc97653b0fadca395528c..442cbef9105662887d5eae2882ca551f3726bf28
2026-02-22 02:55:12 UTC to 2026-02-11 01:41:05 UTC

- Document importing path-segment keyword (rust-lang/reference#2136)
- avoid needless dereference (rust-lang/reference#2180)
- Use `clobber_abi`s corresponding to the called functions in `[asm.abi-clobbers.many]`'s example. (rust-lang/reference#2170)
- expr.paren.evaluation: fix and make more simple (rust-lang/reference#2158)
- const-eval.const-context.outer-generics: make more clear/obvious (rust-lang/reference#2159)
- Nightly test links: update rust branch name (use `main`) (rust-lang/reference#2185)
- tools/xtask: update rust branch name for linkcheck script (use `main`) (rust-lang/reference#2184)
- specify `if let` guards with updated scoping rules (rust-lang/reference#1957)
- Document assignment expression as coercion site (rust-lang/reference#1954)
- Remove exception WRT same-crate `non_exhaustive` reads (rust-lang/reference#2162)
- Add negative lookahead (rust-lang/reference#2172)
- Switch to new range syntax (rust-lang/reference#2173)
- add mdbook output for dev-guide to ignore file (rust-lang/reference#2178)
- Fix rule name for while syntax (rust-lang/reference#2175)
- block-expr: add new rule expr.block.result-value (rust-lang/reference#2174)
- lifetime-elision.md: add some missing periods (rust-lang/reference#2176)
- Add cut operator (`^`) to grammar (rust-lang/reference#2104)
- dev-guide stabilization.md: add missing "not" (rust-lang/reference#2167)
- Add method call and await expr for Dot in syntax index (rust-lang/reference#2163)
- Fix sort of punctuation list (rust-lang/reference#2161)
- Add a contributor guide (rust-lang/reference#2097)

## rust-lang/rust-by-example

1 commits in bac931ef1673af63fb60c3d691633034713cca20..5383db524711c0c9c43c3ca9e5e706089672ed6a
2026-02-16 12:02:33 UTC to 2026-02-16 12:02:33 UTC

- 1.2.2 Display: Fix typo in bonus instructions (before -> after) (rust-lang/rust-by-example#1998)
rust-timer added a commit to rust-lang/rust that referenced this pull request Feb 24, 2026
Rollup merge of #153023 - rustbot:docs-update, r=ehuss

Update books

## rust-embedded/book

1 commits in fe88fbb68391a465680dd91109f0a151a1676f3e..99d0341ff4e06757490af8fceee790c4ede50bc0
2026-02-11 12:58:13 UTC to 2026-02-11 12:58:13 UTC

- Remove triagebot.toml (rust-embedded/book#405)

## rust-lang/reference

21 commits in addd0602c819b6526b9cc97653b0fadca395528c..442cbef9105662887d5eae2882ca551f3726bf28
2026-02-22 02:55:12 UTC to 2026-02-11 01:41:05 UTC

- Document importing path-segment keyword (rust-lang/reference#2136)
- avoid needless dereference (rust-lang/reference#2180)
- Use `clobber_abi`s corresponding to the called functions in `[asm.abi-clobbers.many]`'s example. (rust-lang/reference#2170)
- expr.paren.evaluation: fix and make more simple (rust-lang/reference#2158)
- const-eval.const-context.outer-generics: make more clear/obvious (rust-lang/reference#2159)
- Nightly test links: update rust branch name (use `main`) (rust-lang/reference#2185)
- tools/xtask: update rust branch name for linkcheck script (use `main`) (rust-lang/reference#2184)
- specify `if let` guards with updated scoping rules (rust-lang/reference#1957)
- Document assignment expression as coercion site (rust-lang/reference#1954)
- Remove exception WRT same-crate `non_exhaustive` reads (rust-lang/reference#2162)
- Add negative lookahead (rust-lang/reference#2172)
- Switch to new range syntax (rust-lang/reference#2173)
- add mdbook output for dev-guide to ignore file (rust-lang/reference#2178)
- Fix rule name for while syntax (rust-lang/reference#2175)
- block-expr: add new rule expr.block.result-value (rust-lang/reference#2174)
- lifetime-elision.md: add some missing periods (rust-lang/reference#2176)
- Add cut operator (`^`) to grammar (rust-lang/reference#2104)
- dev-guide stabilization.md: add missing "not" (rust-lang/reference#2167)
- Add method call and await expr for Dot in syntax index (rust-lang/reference#2163)
- Fix sort of punctuation list (rust-lang/reference#2161)
- Add a contributor guide (rust-lang/reference#2097)

## rust-lang/rust-by-example

1 commits in bac931ef1673af63fb60c3d691633034713cca20..5383db524711c0c9c43c3ca9e5e706089672ed6a
2026-02-16 12:02:33 UTC to 2026-02-16 12:02:33 UTC

- 1.2.2 Display: Fix typo in bonus instructions (before -> after) (rust-lang/rust-by-example#1998)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants