Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor impl trait to model abstract type a bit better #44727

Closed
nikomatsakis opened this issue Sep 20, 2017 · 7 comments
Closed

refactor impl trait to model abstract type a bit better #44727

nikomatsakis opened this issue Sep 20, 2017 · 7 comments
Labels
A-trait-system Area: Trait system C-cleanup Category: PRs that clean code up or issues documenting cleanup. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-traits Working group: Traits, https://internals.rust-lang.org/t/announcing-traits-working-group/6804

Comments

@nikomatsakis
Copy link
Contributor

After some discussion with @cramertj, I wanted to write up a rough idea for how to represent impl Trait in the HIR etc. The key idea is to move towards a place where we represent the abstract type that an impl Trait conceptually desugars to as a distinct "item" in the HIR.

Today, for each usage of impl Trait, we create a def-id, which basically represents the abstract type behind the impl Trait. However, in the HIR itself, we continue to mirror the syntax, so for example the variant for ImplTrait includes the bounds listed inline. This is not I think what we really want.

The refactoring then is to do the following:

  • Add to the hir::Crate a "abstract type" vector sort of like the list of bodies.
  • Add a hir::AbstractType struct that contains the following:
    • A scope (which is the def-id of some item)
    • A list of generics (types, lifetimes)
    • A list of bounds
  • Modify the ImplTrait variant of hir::Ty with to include a "list of substs" in some form (and not have bounds)
  • During HIR lowering:
    • track the generics that are in scope at any given item
    • when we encounter an impl trait, we create and push a hir::AbstractType:
      • scope is the enclosing item
      • clone the generics that are in scope to serve as the list of generics
      • move the list of bounds from the ast::Ty into the hir::AbstractType
    • generate the hir::ImplTrait(DefId, [T, 'a]) where the "substs" are references to all the lifetimse/types in scope (I'm not entirely sure how best to represent and synthesize those?)
  • If we do this right:
    • Region name resolution will basically just work, and in particular there is no need to worry about early- vs late-bound resolution.
    • When we add first class abstract types, they can build on this same data structure
      • we'll have to solve the unification problems of course =)
@nikomatsakis nikomatsakis added A-trait-system Area: Trait system E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-traits Working group: Traits, https://internals.rust-lang.org/t/announcing-traits-working-group/6804 labels Sep 20, 2017
@nikomatsakis
Copy link
Contributor Author

I think @cramertj is probably going to be hacking on this.

@nikomatsakis
Copy link
Contributor Author

cc @eddyb -- do you think this rough idea makes any sense?

@Mark-Simulacrum Mark-Simulacrum added C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC C-cleanup Category: PRs that clean code up or issues documenting cleanup. and removed C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Sep 21, 2017
@arielb1
Copy link
Contributor

arielb1 commented Sep 21, 2017

It makes sense to me. Not sure what's the best way to handle generics, but we'll discover that while implementing.

@nikomatsakis
Copy link
Contributor Author

@cramertj -- just checking in, how goes?

@cramertj
Copy link
Member

cramertj commented Sep 25, 2017

@nikomatsakis It goes! I've got most of the generics_of/predicates_of stuff sorted out. I'm trying to figure out the best way to handle the privacy and access visitors, but it's just taking time-- I'm not blocked. I'll try to have something up for review by Wednesday evening (PST).

@nikomatsakis
Copy link
Contributor Author

So there's been a fair amount of discussion on the best way to do this. I am becoming wary of trying to do this desugaring at the HIR level -- I think it's a bit of an open question just how much we should desugar at HIR. I think I like the idea of trying to hew fairly closely to the user's AST, but do "light desugaring" -- e.g., the way we "embed" symbol tables paths now. But that too is a bit challenging, since e.g. doing name resolution on the AST implies that we have to kind of "re-use" those results later on, even if the desugared version might look quite different.

In fact, I think integration with name resolution is a key point here. In particular, if we have something like fn foo<T>() -> impl Iterator<Item = T>, we've already done the name resolution on the path T that appears in Iterator<Item = T> and so we have an entry in the name resolution map that points with the DefId of T.

This is a bit of a complication because if we wanted to "faithfully" desugar to something like this:

abstract type Foo<U>: impl Iterator<Item = U>;
fn foo<T>() -> Foo<T> { ... }

As you can see, the bound impl Iterator<Item = U> is now referencing U, not T -- in particular, when we "desugar" we would be synthesizing new generics for the existential type Foo, but if we just lower the bounds into HIR unchanged, we will wind up with paths whose Def references T.

Interestingly, right now, this problem is specific to type parameters. We don't have this problem with lifetimes yet, since those are resolved after HIR construction. But there's talk of moving lifetime resolution to occur earlier, and then it will encounter a similar tension. In addition, the early- vs late-bound distinction for lifetimes makes this more complex, as I'll get to in a bit.

There are three ways to address this that I can see:

  • We modify name resolution to understand the desugaring better: it would create the parameters, give them def-ids, and hence resolve T to U in the first place.
  • We make the ty::Generics for Foo also include the type parameter T.
  • We modify HIR lowering to "rewrite" paths that resolve to T so that they resolve to U when we are converting the AST bounds into HIR bounds.

The first option seems like a bad idea to me. The more we can contain knowledge of how the desugaring works into a narrow band of the compiler, the better. The first option forces everything between the parser and HIR lowering to "coordinate" on how the desugaring works.


The second choice feels the closest to the current implementation. I think it would work like this. We already have a DefId that we create for the ImplTrait type-node -- that is basically the DefId for the abstract type Foo that we are synthesizing. The generics_of(Foo) predicate would return a Generics whose parent is generics_of(foo) (that is, an extension of the generics of the parent item). This is a tweak on what we do today, where generics_of(Foo) == generics_of(foo) -- the reasons to use an extension are to handle lifetimes, but let's ignore that for a second and focus on type parameters. In particular, because both generics_of(foo) is a prefix of generics_of(Foo), and hence both contain the same type parameter T at the same offset, we don't have to change the name resolution results.

The challenge here is that the generics_of(Foo) then always include all the lifetimes from foo, though, which we don't want! To accommodate this, we would have to do a sort of hack, one suggested by @arielb1. Basically we would have generics_of(Foo) include an extension with all the (named) lifetimes found in the bounds themselves. So if our example were:

fn foo<'a,'b,T>(...) -> impl Iterator<Item = &'a T> { .. }
// for now imagine 'a and 'b are both early bound

then we would effectively wind up desugaring to this:

abstract type Foo<'a, 'b, T, 'c>: impl Iterator<Item = &'c T>;
fn foo<'a,'b,T>(...) -> Foo<'static, 'static, T, 'a>

That is, the generics_of(Foo) would include an extra lifetime (which I'm calling 'c) to represent the 'a that appears in the bounds (more generally, we add an extra lifetime parameter for each lifetime that appears in the bounds of the impl trait):

- ['a, 'b, T] // generics_of(foo) (the common base)
- ['c]        // generics_of(Foo) extends with this

This means of course that the lifetime resolution has to be aware of this: in particular, when it resolves the 'a that appears in impl Iterator<Item = &'a T>, it doesn't want to resolve it to offset 0 but rather offset 3 (what I called 'c in the list above). Since lifetime resolution takes place after HIR construction, though, this isn't so complex to achieve. When we pass into a hir::TyImplTrait we just tweak the bindings in scope appropriately.

On the other hand, we also need to generate the Ty<'tcx> from a hir::TyImplTrait. As you can see from the example above, this needs to be tweaked slightly from what we do today. Today we generate an "identity substitution" for generics_of(foo), but now we would want to instead generate Foo<'static, 'static, T, 'a>. This means we need to keep the "mapping" for each of the synthesized lifetime parameters on Foo to the lifetime parameters declared in foo.

I imagine extending hir::TyImplTrait from carrying just a list of bounds (as today) to also carrying a Vec<hir::Lifetime>. Then the lifetime resolution code can resolve those lifetimes in the scope of the function foo, and resolve the lifetimes in the bounds in to the scope of the abstract type Foo.

This basically means that the code which has to know about how the desugaring works is:

  • the generics_of predicate provider in collect
  • the predicates_of predicate provider in collect
  • the astconv code
  • the lifetime resolution code
  • error messages and reporting (see below)

I think the main downside here is that the "clever trick" to keep the lifetimes of foo in the list of generics but replace them with 'static in the subst is non-obvious and will have to be hidden from the user in error messages and so forth.

The upside is that I do have this nagging feeling that HIR lowering is not the right place to do 'large-scale' desugarings where we synthesize new items and things. To be honest, I'm not even sure if the desugarings we do now (e.g., ?, do catch, for) necessarily make sense -- maybe we should do them in MIR lowering -- although it seems like we've found ways to hide that from the user in error-reporting, so it's actually working out pretty well.


I think that there is a "slightly tweaked" variant of the previous plan in which we do a bit more desugaring during HIR lowering. In particular, we would introduce def-ids for the type and lifetime parameters that the abstract type we will need (rather than piggybacking on the existing def-ids, as I described in the previous section). Perhaps we wind up with hir::TyImplTrait looking like:

TyImplTrait {
    // If `-> impl Iterator<Item = &'a T>` becomes `-> Foo<'a, T>`, then these vectors
    // store the `'a` and `T`
    lifetime_parameters: Vec<Lifetime>, // `['a]` in the above example
    type_parameters: Vec<DefId>, // `[D]` where `D` is the def-id of `T`, in the above example
    exist_ty: ExistTy,
}

struct ExistTy {
    // In above example, this would include `'a` and `T`, each with fresh def-ids
    generics: hir::Generics,

    // Paths in here are "rewritten" to refer to the `T` in our generics above
    bounds: hir::TyParamBounds,
}

The challenges here:

  • We need to "synthesize" def-ids (not impossible, just annoying)
    • need to figure out the "id" story
    • are the hir-ids relative to the enclosing item? probably? or maybe the def-id of the impl trait? does it even matter?
  • We need to intercept HIR lowering to "redirect" the lowering of T into U as described earlier
    • This doesn't seem that hard, it's a very particular case we have to care about (lowering of paths that resolve to a type parameter) that we can easily identify

But after doing it:

  • No need to have "funny games" around generics
  • Lifetime resolution probably needs small alterations but basically "just works"

So I'm not sure where I land, but I think both of these plans seem reasonably viable.

@cramertj
Copy link
Member

cramertj commented Oct 4, 2017

It's probably worth noting just for posterity that I had started on a version of the HIR-desugaring approach here, although it handles IDs incorrectly. I'll continue to pursue this-- I'm going to start on the second choice mentioned in @nikomatsakis's comment above, and I'll update with my progress.

bors added a commit that referenced this issue Nov 21, 2017
impl Trait Lifetime Handling

This PR implements the updated strategy for handling `impl Trait` lifetimes, as described in [RFC 1951](https://github.com/rust-lang/rfcs/blob/master/text/1951-expand-impl-trait.md) (cc #42183).

With this PR, the `impl Trait` desugaring works as follows:
```rust
fn foo<T, 'a, 'b, 'c>(...) -> impl Foo<'a, 'b> { ... }
// desugars to
exists type MyFoo<ParentT, 'parent_a, 'parent_b, 'parent_c, 'a, 'b>: Foo<'a, 'b>;
fn foo<T, 'a, 'b, 'c>(...) -> MyFoo<T, 'static, 'static, 'static, 'a, 'b> { ... }
```
All of the in-scope (parent) generics are listed as parent generics of the anonymous type, with parent regions being replaced by `'static`. Parent regions referenced in the `impl Trait` return type are duplicated into the anonymous type's generics and mapped appropriately.

One case came up that wasn't specified in the RFC: it's possible to write a return type that contains multiple regions, neither of which outlives the other. In that case, it's not clear what the required lifetime of the output type should be, so we generate an error.

There's one remaining FIXME in one of the tests: `-> impl Foo<'a, 'b> + 'c` should be able to outlive both `'a` and `'b`, but not `'c`. Currently, it can't outlive any of them. @nikomatsakis and I have discussed this, and there are some complex interactions here if we ever allow `impl<'a, 'b> SomeTrait for AnonType<'a, 'b> { ... }`, so the plan is to hold off on this until we've got a better idea of what the interactions are here.

cc #34511.
Fixes #44727.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-trait-system Area: Trait system C-cleanup Category: PRs that clean code up or issues documenting cleanup. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-traits Working group: Traits, https://internals.rust-lang.org/t/announcing-traits-working-group/6804
Projects
None yet
Development

No branches or pull requests

5 participants