The ty
module defines how the Rust compiler represents types internally. It also defines the
typing context (tcx
or TyCtxt
), which is the central data structure in the compiler.
When we talk about how rustc represents types, we usually refer to a type called Ty
. There are
quite a few modules and types for Ty
in the compiler (Ty documentation).
The specific Ty
we are referring to is rustc_middle::ty::Ty
(and not
rustc_hir::Ty
). The distinction is important, so we will discuss it first before going
into the details of ty::Ty
.
The HIR in rustc can be thought of as the high-level intermediate representation. It is more or less the AST (see this chapter) as it represents the syntax that the user wrote, and is obtained after parsing and some desugaring. It has a representation of types, but in reality it reflects more of what the user wrote, that is, what they wrote so as to represent that type.
In contrast, ty::Ty
represents the semantics of a type, that is, the meaning of what the user
wrote. For example, rustc_hir::Ty
would record the fact that a user used the name u32
twice
in their program, but the ty::Ty
would record the fact that both usages refer to the same type.
Example: fn foo(x: u32) → u32 { x }
In this function, we see that u32
appears twice. We know
that that is the same type,
i.e. the function takes an argument and returns an argument of the same type,
but from the point of view of the HIR,
there would be two distinct type instances because these
are occurring in two different places in the program.
That is, they have two different Span
s (locations).
Example: fn foo(x: &u32) -> &u32
In addition, HIR might have information left out. This type
&u32
is incomplete, since in the full Rust type there is actually a lifetime, but we didn’t need
to write those lifetimes. There are also some elision rules that insert information. The result may
look like fn foo<'a>(x: &'a u32) -> &'a u32
.
In the HIR level, these things are not spelled out and you can say the picture is rather incomplete.
However, at the ty::Ty
level, these details are added and it is complete. Moreover, we will have
exactly one ty::Ty
for a given type, like u32
, and that ty::Ty
is used for all u32
s in the
whole program, not a specific usage, unlike rustc_hir::Ty
.
Here is a summary:
rustc_hir::Ty |
ty::Ty |
---|---|
Describe the syntax of a type: what the user wrote (with some desugaring). | Describe the semantics of a type: the meaning of what the user wrote. |
Each rustc_hir::Ty has its own spans corresponding to the appropriate place in the program. |
Doesn’t correspond to a single place in the user’s program. |
rustc_hir::Ty has generics and lifetimes; however, some of those lifetimes are special markers like LifetimeName::Implicit . |
ty::Ty has the full type, including generics and lifetimes, even if the user left them out |
fn foo(x: u32) → u32 { } - Two rustc_hir::Ty representing each usage of u32 , each has its own Span s, and rustc_hir::Ty doesn’t tell us that both are the same type |
fn foo(x: u32) → u32 { } - One ty::Ty for all instances of u32 throughout the program, and ty::Ty tells us that both usages of u32 mean the same type. |
fn foo(x: &u32) -> &u32) - Two rustc_hir::Ty again. Lifetimes for the references show up in the rustc_hir::Ty s using a special marker, LifetimeName::Implicit . |
fn foo(x: &u32) -> &u32) - A single ty::Ty . The ty::Ty has the hidden lifetime param. |
Order
HIR is built directly from the AST, so it happens before any ty::Ty
is produced. After
HIR is built, some basic type inference and type checking is done. During the type inference, we
figure out what the ty::Ty
of everything is and we also check if the type of something is
ambiguous. The ty::Ty
is then used for type checking while making sure everything has the
expected type. The astconv
module is where the code responsible for converting a
rustc_hir::Ty
into a ty::Ty
is located. The main routine used is ast_ty_to_ty
. This occurs
during the type-checking phase, but also in other parts of the compiler that want to ask
questions like "what argument types does this function expect?"
How semantics drive the two instances of Ty
You can think of HIR as the perspective of the type information that assumes the least. We assume two things are distinct until they are proven to be the same thing. In other words, we know less about them, so we should assume less about them.
They are syntactically two strings: "u32"
at line N column 20 and "u32"
at line N column 35. We
don’t know that they are the same yet. So, in the HIR we treat them as if they are different. Later,
we determine that they semantically are the same type and that’s the ty::Ty
we use.
Consider another example: fn foo<T>(x: T) -> u32
. Suppose that someone invokes foo::<u32>(0)
.
This means that T
and u32
(in this invocation) actually turns out to be the same type, so we
would eventually end up with the same ty::Ty
in the end, but we have distinct rustc_hir::Ty
.
(This is a bit over-simplified, though, since during type checking, we would check the function
generically and would still have a T
distinct from u32
. Later, when doing code generation,
we would always be handling "monomorphized" (fully substituted) versions of each function,
and hence we would know what T
represents (and specifically that it is u32
).)
Here is one more example:
mod a {
type X = u32;
pub fn foo(x: X) -> u32 { 22 }
}
mod b {
type X = i32;
pub fn foo(x: X) -> i32 { x }
}
Here the type X
will vary depending on context, clearly. If you look at the rustc_hir::Ty
,
you will get back that X
is an alias in both cases (though it will be mapped via name resolution
to distinct aliases). But if you look at the ty::Ty
signature, it will be either fn(u32) -> u32
or fn(i32) -> i32
(with type aliases fully expanded).
rustc_middle::ty::Ty
is actually a wrapper around
Interned<WithCachedTypeInfo<TyKind>>
.
You can ignore Interned
in general; you will basically never access it explicitly.
We always hide them within Ty
and skip over it via Deref
impls or methods.
TyKind
is a big enum
with variants to represent many different Rust types
(e.g. primitives, references, abstract data types, generics, lifetimes, etc).
WithCachedTypeInfo
has a few cached values like flags
and outer_exclusive_binder
. They
are convenient hacks for efficiency and summarize information about the type that we may want to
know, but they don’t come into the picture as much here. Finally, Interned
allows
the ty::Ty
to be a thin pointer-like
type. This allows us to do cheap comparisons for equality, along with the other
benefits of interning.
To allocate a new type, you can use the various new_*
methods defined on
Ty
.
These have names
that correspond mostly to the various kinds of types. For example:
let array_ty = Ty::new_array_with_const_len(tcx, ty, count);
These methods all return a Ty<'tcx>
– note that the lifetime you get back is the lifetime of the
arena that this tcx
has access to. Types are always canonicalized and interned (so we never
allocate exactly the same type twice).
You can also find various common types in the tcx
itself by accessing its fields:
tcx.types.bool
, tcx.types.char
, etc. (See CommonTypes
for more.)
Because types are interned, it is possible to compare them for equality efficiently using ==
– however, this is almost never what you want to do unless you happen to be hashing and looking
for duplicates. This is because often in Rust there are multiple ways to represent the same type,
particularly once inference is involved.
For example, the type {integer}
(ty::Infer(ty::IntVar(..))
an integer inference variable,
the type of an integer literal like 0
) and u8
(ty::UInt(..)
) should often be treated as
equal when testing whether they can be assigned to each other (which is a common operation in
diagnostics code). ==
on them will return false
though, since they are different types.
The simplest way to compare two types correctly requires an inference context (infcx
).
If you have one, you can use infcx.can_eq(param_env, ty1, ty2)
to check whether the types can be made equal.
This is typically what you want to check during diagnostics, which is concerned with questions such
as whether two types can be assigned to each other, not whether they're represented identically in
the compiler's type-checking layer.
When working with an inference context, you have to be careful to ensure that potential inference variables inside the types actually belong to that inference context. If you are in a function that has access to an inference context already, this should be the case. Specifically, this is the case during HIR type checking or MIR borrow checking.
Another consideration is normalization. Two types may actually be the same, but one is behind an
associated type. To compare them correctly, you have to normalize the types first. This is
primarily a concern during HIR type checking and with all types from a TyCtxt
query
(for example from tcx.type_of()
).
When a FnCtxt
or an ObligationCtxt
is available during type checking, .normalize(ty)
should be used on them to normalize the type. After type checking, diagnostics code can use
tcx.normalize_erasing_regions(ty)
.
There are also cases where using ==
on Ty
is fine. This is for example the case in late lints
or after monomorphization, since type checking has been completed, meaning all inference variables
are resolved and all regions have been erased. In these cases, if you know that inference variables
or normalization won't be a concern, #[allow]
or #[expect]
ing the lint is recommended.
When diagnostics code does not have access to an inference context, it should be threaded through the function calls if one is available in some place (like during type checking).
If no inference context is available at all, then one can be created as described in
type-inference. But this is only useful when the involved types (for example, if
they came from a query like tcx.type_of()
) are actually substituted with fresh
inference variables using fresh_args_for_item
. This can be used to answer questions
like "can Vec<T>
for any T
be unified with Vec<u32>
?".
Note: TyKind
is NOT the functional programming concept of Kind.
Whenever working with a Ty
in the compiler, it is common to match on the kind of type:
fn foo(x: Ty<'tcx>) {
match x.kind {
...
}
}
The kind
field is of type TyKind<'tcx>
, which is an enum defining all of the different kinds of
types in the compiler.
N.B. inspecting the
kind
field on types during type inference can be risky, as there may be inference variables and other things to consider, or sometimes types are not yet known and will become known later.
There are a lot of related types, and we’ll cover them in time (e.g regions/lifetimes, “substitutions”, etc).
There are many variants on the TyKind
enum, which you can see by looking at its
documentation. Here is a sampling:
- Algebraic Data Types (ADTs) An algebraic data type is a
struct
,enum
orunion
. Under the hood,struct
,enum
andunion
are actually implemented the same way: they are allty::TyKind::Adt
. It’s basically a user defined type. We will talk more about these later. - Foreign Corresponds to
extern type T
. - Str Is the type str. When the user writes
&str
,Str
is the how we represent thestr
part of that type. - Slice Corresponds to
[T]
. - Array Corresponds to
[T; n]
. - RawPtr Corresponds to
*mut T
or*const T
. - Ref
Ref
stands for safe references,&'a mut T
or&'a T
.Ref
has some associated parts, likeTy<'tcx>
which is the type that the reference references.Region<'tcx>
is the lifetime or region of the reference andMutability
if the reference is mutable or not. - Param Represents a type parameter (e.g. the
T
inVec<T>
). - Error Represents a type error somewhere so that we can print better diagnostics. We will discuss this more later.
- And many more...
Although there is no hard and fast rule, the ty
module tends to be used like so:
use ty::{self, Ty, TyCtxt};
In particular, since they are so common, the Ty
and TyCtxt
types are imported directly. Other
types are often referenced with an explicit ty::
prefix (e.g. ty::TraitRef<'tcx>
). But some
modules choose to import a larger or smaller set of names explicitly.
Let's consider the example of a type like MyStruct<u32>
, where MyStruct
is defined like so:
struct MyStruct<T> { x: u32, y: T }
The type MyStruct<u32>
would be an instance of TyKind::Adt
:
Adt(&'tcx AdtDef, GenericArgsRef<'tcx>)
// ------------ ---------------
// (1) (2)
//
// (1) represents the `MyStruct` part
// (2) represents the `<u32>`, or "substitutions" / generic arguments
There are two parts:
- The
AdtDef
references the struct/enum/union but without the values for its type parameters. In our example, this is theMyStruct
part without the argumentu32
. (Note that in the HIR, structs, enums and unions are represented differently, but inty::Ty
, they are all represented usingTyKind::Adt
.) - The
GenericArgsRef
is an interned list of values that are to be substituted for the generic parameters. In our example ofMyStruct<u32>
, we would end up with a list like[u32]
. We’ll dig more into generics and substitutions in a little bit.
AdtDef
and DefId
For every type defined in the source code, there is a unique DefId
(see this
chapter). This includes ADTs and generics. In the MyStruct<T>
definition we gave above, there are two DefId
s: one for MyStruct
and one for T
. Notice that
the code above does not generate a new DefId
for u32
because it is not defined in that code (it
is only referenced).
AdtDef
is more or less a wrapper around DefId
with lots of useful helper methods. There is
essentially a one-to-one relationship between AdtDef
and DefId
. You can get the AdtDef
for a
DefId
with the tcx.adt_def(def_id)
query. AdtDef
s are all interned, as shown
by the 'tcx
lifetime.
There is a TyKind::Error
that is produced when the user makes a type error. The idea is that
we would propagate this type and suppress other errors that come up due to it so as not to overwhelm
the user with cascading compiler error messages.
There is an important invariant for TyKind::Error
. The compiler should
never produce Error
unless we know that an error has already been
reported to the user. This is usually
because (a) you just reported it right there or (b) you are propagating an existing Error type (in
which case the error should've been reported when that error type was produced).
It's important to maintain this invariant because the whole point of the Error
type is to suppress
other errors -- i.e., we don't report them. If we were to produce an Error
type without actually
emitting an error to the user, then this could cause later errors to be suppressed, and the
compilation might inadvertently succeed!
Sometimes there is a third case. You believe that an error has been reported, but you believe it
would've been reported earlier in the compilation, not locally. In that case, you can invoke
delay_span_bug
This will make a note that you expect compilation to yield an error -- if however
compilation should succeed, then it will trigger a compiler bug report.
For added safety, it's not actually possible to produce a TyKind::Error
value
outside of rustc_middle::ty
; there is a private member of
TyKind::Error
that prevents it from being constructable elsewhere. Instead,
one should use the TyCtxt::ty_error
or
TyCtxt::ty_error_with_message
methods. These methods automatically
call delay_span_bug
before returning an interned Ty
of kind Error
. If you
were already planning to use delay_span_bug
, then you can just pass the
span and message to ty_error_with_message
instead to avoid
delaying a redundant span bug.
Recall that we represent a generic struct with (AdtDef, args)
. So why bother with this scheme?
Well, the alternate way we could have chosen to represent types would be to always create a new,
fully-substituted form of the AdtDef
where all the types are already substituted. This seems like
less of a hassle. However, the (AdtDef, args)
scheme has some advantages over this.
First, (AdtDef, args)
scheme has an efficiency win:
struct MyStruct<T> {
... 100s of fields ...
}
// Want to do: MyStruct<A> ==> MyStruct<B>
in an example like this, we can subst from MyStruct<A>
to MyStruct<B>
(and so on) very cheaply,
by just replacing the one reference to A
with B
. But if we eagerly substituted all the fields,
that could be a lot more work because we might have to go through all of the fields in the AdtDef
and update all of their types.
A bit more deeply, this corresponds to structs in Rust being nominal types — which means that they are defined by their name (and that their contents are then indexed from the definition of that name, and not carried along “within” the type itself).