Skip to content

Commit b3e75e8

Browse files
committed
[overview.md] add lexer updates, parser updates
includes feedback from matklad (lexer) and centril (parser)
1 parent 79c19ab commit b3e75e8

File tree

1 file changed

+15
-17
lines changed

1 file changed

+15
-17
lines changed

src/overview.md

+15-17
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,16 @@ we'll talk about that later.
3030
to the rest of the compilation process as a [`rustc_interface::Config`].
3131
- The raw Rust source text is analyzed by a low-level lexer located in
3232
[`librustc_lexer`]. At this stage, the source text is turned into a stream of
33-
atomic source code units known as _tokens_. The lexer supports the Unicode
34-
character encoding.
33+
atomic source code units known as _tokens_. The lexer supports the
34+
Unicode character encoding.
3535
- The token stream passes through a higher-level lexer located in
3636
[`librustc_parse`] to prepare for the next stage of the compile process. The
3737
[`StringReader`] struct is used at this stage to perform a set of validations
3838
and turn strings into interned symbols.
39-
- **TODO**: Add lexer error information / error handling documentation
39+
- The lexer has a small interface and doesn't depend directly on the
40+
diagnostic infrastructure in `rustc`. Instead it provides diagnostics as plain
41+
data which are emitted in `librustc_parse::lexer::mod` as real diagnostics.
42+
- The lexer preseves full fidelity information for both IDEs and proc macros.
4043
- The parser [translates the token stream from the lexer into an Abstract Syntax
4144
Tree (AST)][parser]. It uses a recursive descent (top-down) approach to syntax
4245
analysis. The crate entry points for the parser are the `Parser.parse_crate_mod()` and
@@ -46,26 +49,21 @@ we'll talk about that later.
4649
- Parsing is performed with a set of `Parser` utility methods including `fn bump`,
4750
`fn check`, `fn eat`, `fn expect`, `fn look_ahead`.
4851
- Parsing is organized by the semantic construct that is being parsed. Separate
49-
`parse_*` methods can be found in `librustc_parse` `parser` directory. File
50-
naming follows the construct name. For example, the following files are found
52+
`parse_*` methods can be found in `librustc_parse` `parser` directory. The source
53+
file name follows the construct name. For example, the following files are found
5154
in the parser:
5255
- `expr.rs`
5356
- `pat.rs`
5457
- `ty.rs`
5558
- `stmt.rs`
56-
- This naming scheme is used across the parser, lowering, type checking,
57-
HAIR lowering, & MIR building stages of the compile process and you will
58-
find either a file or directory with the same name for most of these constructs
59-
at each of these stages of compilation.
60-
- For error handling, the parser uses the standard `DiagnosticBuilder` API, but we
59+
- This naming scheme is used across many compiler stages. You will find
60+
either a file or directory with the same name across the parsing, lowering,
61+
type checking, HAIR lowering, and MIR building sources.
62+
- Macro expansion, AST validation, name resolution, and early linting takes place
63+
during this stage of the compile process.
64+
- The parser uses the standard `DiagnosticBuilder` API for error handling, but we
6165
try to recover, parsing a superset of Rust's grammar, while also emitting an error.
62-
- The `rustc_ast::ast::{Crate, Mod, Expr, Pat, ...}` AST node returned from the parser.
63-
64-
- macro expansion (**TODO** chrissimpkins)
65-
- ast validation (**TODO** chrissimpkins)
66-
- nameres (**TODO** chrissimpkins)
67-
- early linting (**TODO** chrissimpkins)
68-
66+
- `rustc_ast::ast::{Crate, Mod, Expr, Pat, ...}` AST nodes are returned from the parser.
6967
- We then take the AST and [convert it to High-Level Intermediate
7068
Representation (HIR)][hir]. This is a compiler-friendly representation of the
7169
AST. This involves a lot of desugaring of things like loops and `async fn`.

0 commit comments

Comments
 (0)