Skip to content

Commit

Permalink
Add language proposal for where clauses (#5015)
Browse files Browse the repository at this point in the history
This is meant to be a starting point, such that we can refine the
proposal and the implementation in tandem until we are happy with both.

Co-authored-by: Yong He <yonghe@outlook.com>
  • Loading branch information
tangent-vector and csyonghe authored Sep 5, 2024
1 parent e5ead40 commit a3b25ce
Showing 1 changed file with 370 additions and 0 deletions.
370 changes: 370 additions & 0 deletions docs/proposals/001-where-clauses.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,370 @@
`where` Clauses
===============

We propose to allow generic declarations in Slang to move the constraints on generic type parameters outside of the `<>` and onto distinct `where` clauses.

Status
------

Unimplemented.

Background
----------

Slang supports generic type parameters with *constraints* on them.
Currently constraints can only be written as part of the declaration of the type parameter itself, e.g.:

void resolve<T: IResolvable, U: IResolver<T>, V: IResolveDestination<T>>(
ResolutionContext<U> context, List<T> stuffToResolve, out V destination)
{ ... }

The above example illustrates how intermixing the declaration of the type parameters with their constraints can make for long declarations that can be difficult for programmers to read and understand.

Introducing `where` clauses allows a programmer to state the constraints *after* the rest of the declaration header, e.g.:

void resolve<T, U, V>(ResolutionContext<U> context, List<T> stuffToResolve, out V destination)
where T : IResolvable,
U : IResolver<T>,
V : IResolveDestination<T>
{ ... }

This latter form makes it easier to quickly glean the overall shape of the function signature.

A second important benefit of `where` clauses is that they open the door to expressing more complicated constraints on and between type parameters.
While this document does not propose to allow any new forms of constraints right away, we can imagine things like allowing constraints on *associated types*, e.g.:

void writePackedData<T, U>(T src, out U dst)
where T : IPackable,
T.Packed : IWritable<U>
{ .. }

Related Work
------------

Many other languages with support for generics have introduced `where` clauses, and most follow a broadly similar shape. To present our `resolve` example in various other languages:

### Rust

Rust supports `where` clauses with a comma-separated list of constraints:

fn resolve<T, U, V>(context: ResolutionContext<U>, stuffToResolve: List<T>, destination: mut& V)
where T : IResolvable,
U : IResolver<T>,
V : IResolveDestination<T>,
{ ... }

### Swift

Swift's `where` clauses are nearly identical to Rust's:

fn resolve<T, U, V>(context: ResolutionContext<U>, stuffToResolve: List<T>, destination: out V)
where T : IResolvable,
U : IResolver<T>,
V : IResolveDestination<T>,
{ ... }

### C#

C# is broadly similar, but uses multiple `where` clauses, one per constraint:

void resolve<T, U, V>(ResolutionContext<U> context, List<T> stuffToResolve, out V destination)
where T : IResolvable
where U : IResolver<T>
where V : IResolveDestination<T>
{ ... }

### Haskell

While Haskell is a quite different language from the others mentioned here, Haskell typeclasses have undeniably influenced the concept of traits/protocols in Rust/Swift.

In Haskell a typeclass is not somethign a type "inherits" from, and instead uses type parameter for even the `This` type.
Type parameters in Haskell are also introduced implicitly rather than explicitly.
The `resolve` example above would become something like:

resolve :: (Resolvable t, Resolver u t, ResolveDestination v t) =>
ResolutionContext u -> List t -> v

We see here that the constraints are all grouped together in the `(...) =>` clause before the actual type signature of the function.
That clause serves a simlar semantic role to `where` clauses in these other languages.

Proposed Approach
-----------------

For any kind of declaration that Slang allows to have generic parameters, we will allow a `where` clause to appear after the *header* of that declaration.
A `where` clause consists of the (contextual) keyword `where`, following by a comma-separated list of *constraints*:

struct MyStuff<T> : Base, IFoo
where T : IFoo,
T : IBar
{ ... }

A `where` clause is only allowed after the header of a declaration that has one or more generic parameters.

Each constraint must take the form of one of the type parameters from the immediately enclosing generic parameter list, followed by a colon (`:`), and then followed by a type expression that names an interface or a conjunction of interfaces.
Multiple constraints can be defined for the same parameter.

We haven't previously defined what the header of a declaration is, so we briefly illustrate what we mean by showing where the split between the header and the *body* of a declaration is for each of the major kinds of declarations that are supported. In each case a comment `/****/` is placed between the header and body:

// variables:
let v : Int /****/ = 99;
var v : Int /****/ = 99;
Int v /****/ = 99;

// simple type declarations:
typealias X : IFoo /****/ = Y;
associatedtype X : IFoo /****/;

// functions and other callables:
Int f(Float y) /****/ { ... }
func f(Float y) -> Int /****/ { ... }
init(Float y) /****/ { ... }
subscript(Int idx) -> Float /****/ { ... }

// properties
property p : Int /****/ { ... }

// aggregates
extension Int : IFoo /****/ { ... }
struct Thing : Base /****/ { ... }
class Thing : Base /****/ { ... }
interface IThing : IBase /****/ { ... }
enum Stuff : Int /****/ { ... }

In practice, the body of a declaration starts at the `=` for declarations with an initial-value expression, at the opening `{` for declarations with a `{}`-enclosed body, or at the closing `;` for any other declarations.

Detailed Explanation
--------------------

### Implementation

The compiler implementation already represents generics in a form where the type parameters are encoded separately from the constraints that depend on them.
The constraints act somewhat like additional unnamed parameters of a generic.
At the Slang IR level these constraint parameters are made into explicit parameters used to pass around *witness tables*.

During parsing, a `where` clause can simply add the constraints to the outer generic (and error out if there isn't one).
The actual representation of constraints will be no different than before, so many downstream compilation steps should be unaffected.

Some parts of the codebase have historically assumed that a given generic type parameter can have at most *one* constraint;
these cases will need to be identified and fixed to allow for zero or more constraints per parameter.

Semantic checking of generics will need to validate that the left-hand side of each constraint is a direct reference to one of the type parameters of the immediately enclosing generic;
previously, the semantic checking logic could *assume* that this was the case, since the parser would only create constraints in that form.

### Interaction With Overloading and Redeclaration

Probably the most important semantic issue that arises from `where` clauses is deciding whether two different function declarations count as distinct overloads, or as redeclarations (or redfinitions) of the same function signature.

The existing form for declaring constraints:

void f<T : IFoo>( ... )
{ ... }

should be treated as sugar for the equivalent `where`-based form:

void f<T>( ... )
where T : IFoo
{ ... }

The two declarations of `f` there should not only be counted as redeclarations/redefinitions, but they should also be *indistinguishable* to all clients of the module where they appear.
A module that `import`s the module defining `f` should not be able to tell which form it was declared with.
Both forms of the definition should result in the *same* signature and mangled name in Slang IR.

Furthermore, with `where` clauses it becomes possible to write equivalent constraints in more than one way.
A `where` clause can be used instead of a conjunction of interfaces:

void f<T : IFoo & IBar>( ... )
{ ... }

void f<T>( ... )
where T : IFoo,
T : IBar
{ ... }

It is also possible to use `where` clauses to introduce constraints that are *redundant*, either by repeating the same constraint:

void f<T>( ... )
where T : IFoo,
T : IFoo
{ ... }

or by constraining a type to two interfaces, where one inherits from the other:

interface IBase {}
interface IDerived : IBase {}

void f<T>( ... )
where T : IBase,
T : IDerived
{ ... }

Technically it was already possible to have redundancy in a constraint by using a conjunction of two interfaces where one inherits from the other:

void f<T : IBase & IDerived>( ... )
{ ... }

One question that is raised by the possiblity of redundant constraints is whether the compiler should produce a diagnostic for them and, if so, whether it should be a warning or an error.
While it may seem obvious that redundant constraints are to be avoided, it is possible that refactoring of `interface` hierarchies could change whether existing constraints are redundant or not, potentially forcing widespread edits to code that is semantically unambiguous (and just a little more verbose than necessary).
We propose that redundant constraints should probably produce a warning, with a way to silence that warning easily.

### Canonicalization

The long and short of the above section is that there can be multiple ways to write semantically equivalent generic declarations, by changing the form, order, etc. of constraints.
We want the signature of a function (and its mangled name, etc.) to be identical for semantically equivalent declaration syntax.
In order to ensure that a declaration's mangled name is indepenent of the form of its constraints, we must have a way to *canonicalize* those constraints.

The Swift compiler codebase includes a document that details the rules used for canonicalization of constraints for that compiler, and we can take inspiration from it.
Our constraints are currently much more restricted, so canonicalization can follow a much simpler process, such as:

* Start with the list of user-written constraints, in declaration order
* Iterate the following to convergence:
* For each constraint of the form `T : ILeft & IRight`, replace that constraint with constraints `T : ILeft` and `T : IRight`
* Remove each constraint that is implied by another constraint
* For now, that means removing `T : IBase` if there is already a constraint `T : IDerived` where `IDerived` inherits from `IBase`
* Sort the constraints
* For constraints `T : IFoo` and `U : IBar` on different type parameters, order them based on the order of the type parameters `T` and `U`
* For constraints `T : IFoo` and `T : IBar` on the *same* type parameter, order them based on a canonicalized ordering on the interfaces `IFoo` and `IBar`

The above ordering assumes that we can produce a canonical ordering of `interface`s.
More generally, we will eventually want a canonical ordering on all types and *values* that might appear in constraints.
For now, we will limit ourselves to an ordering on nominal types, and other declaration references:

* A generic parameter is always ordered before anything other than generic parameters
* Parameters from outer generics are ordered before those from inner generics
* Parameters from the same generic are ordered based on their order in the parameter list
* Two declaration references to distinct declarations are ordered based on a lexicographic order for their qualified names, meaning:
* If one qualified name is a prefix of the other (e.g., `A.B` and `A.B.C`), then the prefix is ordered first
* Otherwise, compare the first name component (from left to right) where the names differ, and order them based on a lexicographic string comparison of the name at that component.

Alternatives Considered
-----------------------

There really aren't any compelling alternatives to `where` clauses among the languages that Slang takes design influence from.
We could try to design something to solve the same problems from first principles, but the hypothetical benefits of doing so are unclear.

When it comes to the syntactic details, we could consider allowing for *multiple* `where` clauses (matching the C# syntax) as an alternative to the comma-separated list:

struct MyStuff<T> : Base, IFoo
where T : IFoo
where T : IBar
{ ... }

This alternative form may result in more compact and tidy diffs when editing the constraints on declarations, at the cost of repeating the `where` keyword many times.

Future Directions
-----------------

### Allow more general types on the left-hand side of `:`

There are many cases where it would be helpful to be able to introduce constraints on associated types.
As a contrived example:

interface IPrintable { ... }
interface ISequence
{
associatedtype Element;
...
}
extention<T> T : IPrintable
where T : ISequence,
T.Element : IPrintable
{ ... }

In this example, an `extension` is used to declare that sequences are printable if their elements are printable.


### Allow more general types on the right-hand side of `:`

Currently, the only constraints allowed using `:` have a concrete (non-`interface`) type on the left-hand side, and an `interface` (or conjunction of interfaces) on the right-hand side.
In the context of `class`-based hierarchies, we can also consider having constraints that limit a type parameter to subtypes of a specific concrete type:

class Base { ... }
class Derived : Base { ... }

void f<T>( ... )
where T : Base
{ ... }

### Equality Constraints

One future direction that we already intend to pursue is allowing exact equality constraints.
The primary use case envisioned for equality constraints is to express restrictions on associated types of type parameters.
As a contrived example:

interface IProducer
{
associatedtype Element;
// ...
}
interface IConsumer
{
associatedtype Element;
// ...
}
void runPipeline<P, C>(P producer, C consumer)
where P : IProducer,
C : IConsumer,
P.Element == C.Element
{ ... }

An equality constraint could either constrain an associated type to be equal to some concrete type, or to some other associated type.

### Allow `where` clauses on non-generic declarations

We could consider allowing `where` clauses to appear on any declaration nested under a generic, such that those declarations are only usable when certain additinal constraints are met.
E.g.,:

struct MyDictionary<K,V>
{
...

K minimumKeyUsed()
where K : IComparable
{ ... }
}

In this example, the user's dictionary type can be queried for the minimum key that is used for any entry, but *only* if the keys are comparable.

Most of what can be done with this more flexible placement of `where` clauses can *also* be accomplished using extensions.
E.g., the above example could instead be written:

struct MyDictionary<K,V>
{ ... }

extension<K,V> MyDictionary<K,v>
where K : IComparable
{
K minimumKeyUsed()
{ ... }
}

### Implied Constraints

In many cases a generic function signature will use the type parameters as explicit arguments to generic types that impose their own requirements.
To be concrete, consider:

struct Dictionary<K, V>
where K : IHashable
{ ... }

V myLookupFunc<K,V>(
Dictionary<K,V> dictionary, K key, V default)
{ ... }

In this case, the current Slang language rules will reject `myLookupFunc`. The type of the `dictionary` parameter is passing `K` as an argument to `Dictionary<...>` but does not have an in-scope constraint that ensures that `K : IHashable`.
The current compiler requires the function to be rewritten as:

V myLookupFunc<K,V>(
Dictionary<K,V> dictionary, K key, V default)
where K : IHashable
{ ... }

But this additional constraint ends up being pointless; in order to invoke `myLookupFunc` the programmer must have a `Dictionary<K,V>` to pass as argument for the `dictionary` parameter, which means that the `Dictionary<K,V>` type must already be well-formed based on the information the caller function has.

The compiler can eliminate the need for such constraints by adding additional rules for expanding the set of constraints on a generic during canonicalization.
For any generic type `X<A, B, C, ...>` appearing in:

* the signature of a function declaration
* the bases of a type declaration
* the existing generic constraints

The expansion step would add whatever constraints are required by `X`, with the arguments `A, B, C, ...` substituted in for the parameters of `X`.

0 comments on commit a3b25ce

Please sign in to comment.