Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the ParamSpace separation from formal and actual generics in rustc. #35605

Merged
merged 12 commits into from
Aug 17, 2016

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Aug 11, 2016

This is the first step towards enabling the typesystem implemented by rustc to be extended
(with generic modules, HKT associated types, generics over constants, etc.).

The current implementation splits all formal (ty::Generics) and actual (Substs) lifetime and type parameters (and even where clauses) into 3 "parameter spaces":

  • TypeSpace for enum, struct, trait and impl
  • SelfSpace for Self in a trait
  • FnSpace for functions and methods

For example, in <X as Trait<A, B>>::method::<T, U>, the Substs are [[A, B], [X], [T, U]].
The representation uses a single Vec with 2 indices where it's split into the 3 "parameter spaces".
Such a simplistic approach doesn't scale beyond the Rust 1.0 typesystem, and its existence was mainly motivated by keeping code manipulating generic parameters correct, across all possible situations.

Summary of changes:

  • ty::Generics are uniformly stored and can be queried with tcx.lookup_generics(def_id)
    • the typeck::collect changes for this resulted in a function to lazily compute the ty::Generics for a local node, given only its DefId - this can be further generalized to other kinds of type information
  • ty::Generics and ty::GenericPredicates now contain only their own parameters (or where clauses, respectively), and refer to their "parent", forming a linked list
    • right now most items have one level of nesting, only associated items and variants having two
    • in the future, if <X as mod1<A>::mod2<B>::mod3::Trait<C>>::Assoc<Y> is supported, it would be represented by item with the path mod1::mod2::mod3::Trait::Assoc, and 4 levels of generics: mod1 with [A], mod2 with [B], Trait with [X, C] and Assoc with [Y]
  • Substs gets two new APIs for working with arbitrary items:
    • Substs::for_item(def_id, mk_region, mk_type) will construct Substs expected by the definition def_id, calling mk_region for lifetime parameters and mk_type for type parameters, and it's guaranteed to always return Substs compatible with def_id
    • substs.rebase_onto(from_base_def_id, to_base_substs) can be used if substs is for an item nested within from_base_def_id (e.g. an associated item), to replace the "outer parameters" with to_base_substs - for example, you can translate a method's Substs between a trait and an impl (in both directions) if you have the DefId of one and Substs for the other
  • trait objects, without a Self in their Substs, use solely ExistentialTraitRef now, letting TraitRef assume it always has a Self present
  • both TraitRef and ExistentialTraitRef get methods which do operations on their Substs which are valid only for traits (or trait objects, respectively)
  • Substs loses its "parameter spaces" distinction, with effectively no code creating Substs in an ad-hoc manner, or inspecting them, without knowing what shape they have already

Future plans:

  • combine both lifetimes and types in a single Vec<Kind<'tcx>> where Kind would be a tagged pointer that can be Ty<'tcx>, &'tcx ty::Region or, in the future, potentially-polymorphic constants
    • this would require some performance investigation, if it implies a lot of dynamic checks
  • introduce an abstraction for (T, Substs), where the Substs are even more hidden away from code
    manipulating it; a precedent for this is Instance in trans, which has T = DefId; @nikomatsakis also referred to this, as "lazy substitution", when T = Ty
  • rewrite type pretty-printing to fully take advantage of this to inject actual in the exact places of formal generic parameters in any paths
  • extend the set of type-level information (e.g. beyond ty::Generics) that can be lazily queried during typeck and introduce a way to do those queries from code that can't refer to typeck directly
    • this is almost unrelated but is necessary for DAG-shaped recursion between constant evaluation and type-level information, i.e. for implementing generics over constants

r? @nikomatsakis
cc @rust-lang/compiler

cc @nrc Could get any perf numbers ahead of merging this?

@bors
Copy link
Contributor

bors commented Aug 12, 2016

☔ The latest upstream changes (presumably #35592) made this pull request unmergeable. Please resolve the merge conflicts.

@eddyb eddyb force-pushed the substs branch 2 times, most recently from cdcc1cd to 7060002 Compare August 12, 2016 14:30
@bors
Copy link
Contributor

bors commented Aug 13, 2016

☔ The latest upstream changes (presumably #35138) made this pull request unmergeable. Please resolve the merge conflicts.

@eddyb eddyb force-pushed the substs branch 6 times, most recently from 907d823 to 9e5f42f Compare August 15, 2016 19:34
let trait_method_start = trait_predicates.len();
let impl_method_start = impl_predicates.len();
assert_eq!(&trait_predicates[..], &trait_m.predicates.predicates[..trait_method_start]);
assert_eq!(&impl_predicates[..], &impl_m.predicates.predicates[..impl_method_start]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit yucky but...ok. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should look at this at the end of the PR ;).

@bors
Copy link
Contributor

bors commented Aug 16, 2016

☔ The latest upstream changes (presumably #35162) made this pull request unmergeable. Please resolve the merge conflicts.

@nikomatsakis
Copy link
Contributor

OK, @eddyb, I read (or at least skimmed) the code here and it's lovely as usual. That said, when I've contemplated generalizing VecPerParamSpace, I've debated between this approach and one another (which is admittedly a bit vague).

For example, one thing I was thinking about was adopting more of a debruijn-index-like approach: in this case, we would have instead of Space an index, counting backwards (so, on a trait impl, 0 might be the fn substs, 1 might be the self substs, 2 the trait substs, etc). Then the Substs could be a chain struct Substs<'tcx> { parent: Option<&'tcx Substs<'tcx>>, ... } and we count backwards. This would make it easy to "pop off" the method substs -- and, bonus, would probably require somewhat less interning, or at least hashing, since if the parent substs don't change, we can just re-use the pointer. It does mean that indexing is a bit more expensive when doing substitutions though.

Admittedly I probably should have chatted with you some to hash out the desired approach better before you opened this whole PR. That said, I think you were like 75% of the way through before I found out you were working on it. 👅

Anyway, I would be happy to land this PR as is, and consider refinements (like maybe the one I just proposed) separately. It seems like overall the refinements you've made (e.g., the helper methods for creating substs) are good, and would presumably make it easier to pursue other variations.

Thoughts? Do you agree that this code will be easier to adapt in the future if we choose a different strategy?

@eddyb
Copy link
Member Author

eddyb commented Aug 16, 2016

@nikomatsakis With something like Kind = Ty | Region you'd need dedicated accessors, so the main difference between that and what you're proposing is all of the usize or u32 indices present everywhere, but it should be fairly doable.

However, I think that it doesn't bring us anything, we don't translate anything other than Substs themselves between trait and impl, the two items are otherwise distinct.
How often do you ever see definitions with more than 8 generic parameters? I think we can get much more by reducing the overhead of Substs in the common case.

But yes, many of the changes here are essential to making any other design changes wrt generics.

@nikomatsakis
Copy link
Contributor

@eddyb

However, I think that it doesn't bring us anything, we don't translate anything other than Substs themselves between trait and impl, the two items are otherwise distinct.

I am thinking specifically of a world where mod can have type parameters. I think in that context it might be valuable. But it's not worth going into now, I agree that this PR is a better foundation than what we have.

r+ from me.

@eddyb
Copy link
Member Author

eddyb commented Aug 17, 2016

@bors r=nikomatsakis p=1 (this should really not bitrot)

@bors
Copy link
Contributor

bors commented Aug 17, 2016

📌 Commit 9453d9b has been approved by nikomatsakis

@bors
Copy link
Contributor

bors commented Aug 17, 2016

⌛ Testing commit 9453d9b with merge d6d0590...

bors added a commit that referenced this pull request Aug 17, 2016
Remove the ParamSpace separation from formal and actual generics in rustc.

This is the first step towards enabling the typesystem implemented by `rustc` to be extended
(with generic modules, HKT associated types, generics over constants, etc.).

The current implementation splits all formal (`ty::Generics`) and actual (`Substs`) lifetime and type parameters (and even `where` clauses) into 3 "parameter spaces":
* `TypeSpace` for `enum`, `struct`, `trait` and `impl`
* `SelfSpace` for `Self` in a `trait`
* `FnSpace` for functions and methods

For example, in `<X as Trait<A, B>>::method::<T, U>`, the `Substs` are `[[A, B], [X], [T, U]]`.
The representation uses a single `Vec` with 2 indices where it's split into the 3 "parameter spaces".
Such a simplistic approach doesn't scale beyond the Rust 1.0 typesystem, and its existence was mainly motivated by keeping code manipulating generic parameters correct, across all possible situations.

Summary of changes:
* `ty::Generics` are uniformly stored and can be queried with `tcx.lookup_generics(def_id)`
 * the `typeck::collect` changes for this resulted in a function to lazily compute the `ty::Generics` for a local node, given only its `DefId` - this can be further generalized to other kinds of type information
* `ty::Generics` and `ty::GenericPredicates` now contain only their own parameters (or `where` clauses, respectively), and refer to their "parent", forming a linked list
 * right now most items have one level of nesting, only associated items and variants having two
 * in the future, if `<X as mod1<A>::mod2<B>::mod3::Trait<C>>::Assoc<Y>` is supported, it would be represented by item with the path `mod1::mod2::mod3::Trait::Assoc`, and 4 levels of generics: `mod1` with `[A]`, `mod2` with `[B]`, `Trait` with `[X, C]` and `Assoc` with `[Y]`
* `Substs` gets two new APIs for working with arbitrary items:
 * `Substs::for_item(def_id, mk_region, mk_type)` will construct `Substs` expected by the definition `def_id`, calling `mk_region` for lifetime parameters and `mk_type` for type parameters, and it's guaranteed to *always* return `Substs` compatible with `def_id`
 * `substs.rebase_onto(from_base_def_id, to_base_substs)` can be used if `substs` is for an item nested within `from_base_def_id` (e.g. an associated item), to replace the "outer parameters" with `to_base_substs` - for example, you can translate a method's `Substs` between a `trait` and an `impl` (in both directions) if you have the `DefId` of one and `Substs` for the other
* trait objects, without a `Self` in their `Substs`, use *solely* `ExistentialTraitRef` now, letting `TraitRef` assume it *always* has a `Self` present
* both `TraitRef` and `ExistentialTraitRef` get methods which do operations on their `Substs` which are valid only for traits (or trait objects, respectively)
* `Substs` loses its "parameter spaces" distinction, with effectively no code creating `Substs` in an ad-hoc manner, or inspecting them, without knowing what shape they have already

Future plans:
* combine both lifetimes and types in a single `Vec<Kind<'tcx>>` where `Kind` would be a tagged pointer that can be `Ty<'tcx>`, `&'tcx ty::Region` or, in the future, potentially-polymorphic constants
 * this would require some performance investigation, if it implies a lot of dynamic checks
* introduce an abstraction for `(T, Substs)`, where the `Substs` are even more hidden away from code
manipulating it; a precedent for this is `Instance` in trans, which has `T = DefId`; @nikomatsakis also referred to this, as "lazy substitution", when `T = Ty`
* rewrite type pretty-printing to fully take advantage of this to inject actual in the exact places of formal generic parameters in any paths
* extend the set of type-level information (e.g. beyond `ty::Generics`) that can be lazily queried during `typeck` and introduce a way to do those queries from code that can't refer to `typeck` directly
 * this is almost unrelated but is necessary for DAG-shaped recursion between constant evaluation and type-level information, i.e. for implementing generics over constants

r? @nikomatsakis
cc @rust-lang/compiler

cc @nrc Could get any perf numbers ahead of merging this?
@bors bors merged commit 9453d9b into rust-lang:master Aug 17, 2016
@eddyb eddyb deleted the substs branch August 17, 2016 14:37
Manishearth added a commit to Manishearth/rust that referenced this pull request Aug 27, 2016
…akis

Combine types and regions in Substs into one interleaved list.

Previously, `Substs` would contain types and regions, in two separate vectors, for example:
```rust
<X as Trait<'a, 'b, A, B>>::method::<'p, 'q, T, U>
/* corresponds to */
Substs { regions: ['a, 'b, 'p, 'q], types: [X, A, B, T, U] }
```

This PR continues the work started in rust-lang#35605 by further removing the distinction.
A new abstraction over types and regions is introduced in the compiler, `Kind`.
Each `Kind` is a pointer (`&TyS` or `&Region`), with the lowest two bits used as a tag.
Two bits were used instead of just one (type = `0`, region = `1`) to allow adding more kinds.

`Substs` contain only a `Vec<Kind>`, with `Self` first, followed by regions and types (in the definition order):
```rust
Substs { params: [X, 'a, 'b, A, B, 'p, 'q, T, U] }
```
The resulting interleaved list has the property of being the concatenation of parameters for the (potentially) nested generic items it describes, and can be sliced back into those components:
```rust
params[0..5] = [X, 'a, 'b, A, B] // <X as Trait<'a, 'b, A, B>>
params[5..9] = ['p, 'q, T, U] // <_>::method::<'p, 'q, T, U>
```

r? @nikomatsakis
bors added a commit that referenced this pull request Sep 23, 2016
Don't let a type parameter named "Self" unchanged past HIR lowering.

Fixes #36638 by rewriting `Self` type parameters (which are a parse error) to a `gensym("Self")`.

Background: #35605 introduced code across rustc that determines `Self` by its keyword name.
Reverting the sanity checks around that would inadvertently cause confusion between the true `Self` of a `trait` and other type parameters named `Self` (which have caused parse errors already).

I do not like to use `gensym`, and we may do something different here in the future, but this should work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants