From d23b66bca34da4796da653fba990d7e1a3295067 Mon Sep 17 00:00:00 2001 From: LeSeulArtichaut Date: Fri, 11 Sep 2020 00:29:15 +0200 Subject: [PATCH 1/4] Add a chapter on all the identifiers used through `rustc` --- src/SUMMARY.md | 1 + src/identifiers.md | 63 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+) create mode 100644 src/identifiers.md diff --git a/src/SUMMARY.md b/src/SUMMARY.md index edf97c6db..7f231209a 100644 --- a/src/SUMMARY.md +++ b/src/SUMMARY.md @@ -84,6 +84,7 @@ - [THIR and MIR construction](./mir/construction.md) - [MIR visitor and traversal](./mir/visitor.md) - [MIR passes: getting the MIR for a function](./mir/passes.md) +- [Identifiers in the Compiler](./identifiers.md) - [Closure expansion](./closure.md) # Analysis diff --git a/src/identifiers.md b/src/identifiers.md new file mode 100644 index 000000000..852bfe8a7 --- /dev/null +++ b/src/identifiers.md @@ -0,0 +1,63 @@ +# Identifiers in the Compiler + +If you have read the few previous chapters, you now know that the `rustc` uses +many different intermediate representations to perform different kinds of analysis. +However, like in every data structure, you need a way to traverse the structure +and refer to other elements. In this chapter, you will find information on the +different identifiers `rustc` uses for each intermediate representation. + +## In the AST + +A [`NodeId`] is an identifier number that uniquely identifies an AST node within +a crate. Every node in the AST has its own [`NodeId`], including top-level items +such as structs, but also individual statements and expressions. + +However, because they are absolute within in a crate, adding or removing a single +node in the AST causes all the subsequent [`NodeId`]s to change. This renders +[`NodeId`]s pretty much useless for incremental compilation, where you want as +few things as possible to change. + +[`NodeId`]s are used in all the `rustc` bits that operate directly on the AST, +like macro expansion and name resolution. + +[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html + +## In the HIR + +The HIR uses a bunch of different identifiers that coexist and serve different purposes. + +- A [`DefId`], as the name suggests, identifies a particular definition, or top-level + item, in a given grate. It is composed of two parts: a [`CrateNum`] which identifies + the crate the definition comes from, and a [`DefIndex`] which identifies the definition + within the crate. Unlike [`NodeId`]s, there isn't a [`DefId`] for every expression, which + makes them more stable across compilations. +- A [`LocalDefId`] is basically a [`DefId`] that is known to come from the current crate. + This allows us to drop the [`CrateNum`] part, and use the type system to ensure that + only local definitions are passed to functions that expect a local definition. +- A [`HirId`] uniquely identifies a node in the HIR of the current crate. It is composed + of two parts: an `owner` and a `local_id` that is unique within the `owner`. This + combination makes for more stable values which are helpful for incremental compilation. + Unlike [`DefId`]s, a [`HirId`] can refer to [fine-grained entities][Node] like expressions, + but stays local to the current crate. +- A [`BodyId`] identifies a HIR [`Body`] in the current crate. It is currenty only + a wrapper around a [`HirId`]. For more info about HIR bodies, please refer to the + [HIR chapter][hir-bodies]. + +These identifiers can be converted into one another through the [HIR map][map]. +See the [HIR chapter][hir-map] for more detailed information. + +[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html +[`LocalDefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.LocalDefId.html +[`HirId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir_id/struct.HirId.html +[`BodyId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/struct.BodyId.html +[`CrateNum`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/enum.CrateNum.html +[`DefIndex`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefIndex.html +[`Body`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/struct.Body.html +[Node]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/enum.Node.html +[hir-map]: ./hir.md#the-hir-map +[hir-bodies]: ./hir.md#hir-bodies +[map]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/hir/map/struct.Map.html + +## In the MIR + +**TODO** From 9c7c592eb3c350e393d18777b8fca1b1ddfb9541 Mon Sep 17 00:00:00 2001 From: LeSeulArtichaut Date: Fri, 11 Sep 2020 01:10:07 +0200 Subject: [PATCH 2/4] Apply suggestions from code review Co-authored-by: Tshepang Lekhonkhobe --- src/identifiers.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/identifiers.md b/src/identifiers.md index 852bfe8a7..a672c0274 100644 --- a/src/identifiers.md +++ b/src/identifiers.md @@ -1,7 +1,7 @@ # Identifiers in the Compiler -If you have read the few previous chapters, you now know that the `rustc` uses -many different intermediate representations to perform different kinds of analysis. +If you have read the few previous chapters, you now know that `rustc` uses +many different intermediate representations to perform different kinds of analyses. However, like in every data structure, you need a way to traverse the structure and refer to other elements. In this chapter, you will find information on the different identifiers `rustc` uses for each intermediate representation. From c56ac824a95267ffade49dce0ea050c1de41e499 Mon Sep 17 00:00:00 2001 From: LeSeulArtichaut Date: Fri, 11 Sep 2020 17:14:48 +0200 Subject: [PATCH 3/4] Make the HIR chapter point to the new chapter on IDs --- src/hir.md | 46 ++++++++++------------------------------------ 1 file changed, 10 insertions(+), 36 deletions(-) diff --git a/src/hir.md b/src/hir.md index 8b6637ebe..47262c485 100644 --- a/src/hir.md +++ b/src/hir.md @@ -68,45 +68,18 @@ the compiler a chance to observe that you accessed the data for ### Identifiers in the HIR -Most of the code that has to deal with things in HIR tends not to -carry around references into the HIR, but rather to carry around -*identifier numbers* (or just "ids"). Right now, you will find four -sorts of identifiers in active use: - -- [`DefId`], which primarily names "definitions" or top-level items. - - You can think of a [`DefId`] as being shorthand for a very explicit - and complete path, like `std::collections::HashMap`. However, - these paths are able to name things that are not nameable in - normal Rust (e.g. impls), and they also include extra information - about the crate (such as its version number, as two versions of - the same crate can co-exist). - - A [`DefId`] really consists of two parts, a `CrateNum` (which - identifies the crate) and a `DefIndex` (which indexes into a list - of items that is maintained per crate). -- [`HirId`], which combines the index of a particular item with an - offset within that item. - - the key point of a [`HirId`] is that it is *relative* to some item - (which is named via a [`DefId`]). -- [`BodyId`], this is an identifier that refers to a specific - body (definition of a function or constant) in the crate. It is currently - effectively a "newtype'd" [`HirId`]. -- [`NodeId`], which is an absolute id that identifies a single node in the HIR - tree. - - While these are still in common use, **they are being slowly phased out**. - - Since they are absolute within the crate, adding a new node anywhere in the - tree causes the [`NodeId`]s of all subsequent code in the crate to change. - This is terrible for incremental compilation, as you can perhaps imagine. +There are a bunch of different identifiers to refer to other nodes or definitions +in the HIR. In short: +- A [`DefId`] refers to a *definition* in any crate. +- A [`LocalDefId`] refers to a *definition* in the currently compiled crate. +- A [`HirId`] refers to *any node* in the HIR. + +For more detailed information, check out the [chapter on identifiers][ids]. [`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html +[`LocalDefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.LocalDefId.html [`HirId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir_id/struct.HirId.html -[`BodyId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/struct.BodyId.html -[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html - -We also have an internal map to go from `DefId` to what’s called "Def path". "Def path" is like a -module path but a bit more rich. For example, it may be `crate::foo::MyStruct` that identifies -this definition uniquely. It’s a bit different than a module path because it might include a type -parameter `T`, which you can't write in normal rust, like `crate::foo::MyStruct::T`. These are used -in incremental compilation. +[ids]: ./identifiers.md#in-the-hir ### The HIR Map @@ -129,6 +102,7 @@ something outside of the current crate (since then it has no HIR node), but otherwise returns `Some(n)` where `n` is the node-id of the definition. +[`NodeId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/node_id/struct.NodeId.html [as_local_node_id]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/hir/map/struct.Map.html#method.as_local_node_id Similarly, you can use [`tcx.hir.find(n)`][find] to lookup the node for a From c58eec88f7ebfddaed939c8c5010bb5101b32a12 Mon Sep 17 00:00:00 2001 From: LeSeulArtichaut Date: Sat, 12 Sep 2020 20:41:12 +0200 Subject: [PATCH 4/4] Fix typo Co-authored-by: Who? Me?! --- src/identifiers.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/identifiers.md b/src/identifiers.md index a672c0274..f485132c6 100644 --- a/src/identifiers.md +++ b/src/identifiers.md @@ -27,7 +27,7 @@ like macro expansion and name resolution. The HIR uses a bunch of different identifiers that coexist and serve different purposes. - A [`DefId`], as the name suggests, identifies a particular definition, or top-level - item, in a given grate. It is composed of two parts: a [`CrateNum`] which identifies + item, in a given crate. It is composed of two parts: a [`CrateNum`] which identifies the crate the definition comes from, and a [`DefIndex`] which identifies the definition within the crate. Unlike [`NodeId`]s, there isn't a [`DefId`] for every expression, which makes them more stable across compilations.