Erase the distinction between terms and types #101

dougalm · 2020-06-17T21:57:41Z

We've been gradually adding dependent features on top of our Milner-ish type system and it's become an awkward hybrid. This PR rebuilds the type system from scratch, embracing dependent types and removing the type-term distinction completely. The result is simpler and more expressive. (There must be a catch, and I'm sure we'll learn what it is soon enough!)

Moving to dependent types was Adam Paszke's first suggestion on looking closely at Dex. It also seems to be the direction the typed functional programming community is heading generally. Our design borrows ideas from many places, especially the work of Edwin Brady and Conor McBride.

No more special type-level constructs

One surprising (to me) result is that the dependent system is much simpler to describe and implement than the original Milner version. We've been able to get rid of several bespoke language features for doing things at the type level since we can now just use the ordinary (and more general) term-level versions:

Milner-style types          | dependent types
----------------------------+--------------------------
type abstraction            | (implicit) lambda
forall                      | dependent arrow
(implicit) type application | (implicit) application
type alias                  | let binding
parametric type alias       | let binding with lambda

Checking type equality

We might be concerned that it's now expensive to compare types for equality, since types can be arbitrary terms. For example, checking that this program is well-typed would require showing that expensive_plus 2 2 is equal to 4.

xs : (Fin 4)=>Real = ... 
n = expensive_plus 2 2

for i:(Fin n). xs.i

The obvious pragmatic solution is to only attempt a limited amount of reduction and then fall back to syntactic equality. Currently we inline let bindings and do beta reduction but we don't evaluate any primitive operations like +. In the future we might want to go further. Perhaps we could evaluate any expression that doesn't require a loop, with the assumption that in numerical programming all the non-trivial work is in loops.

Type inference

As Conor McBride points out, there are two different things that we often lump under "type inference". There's implicit application, where we turn id 1 into id Int 1 and then there's implicit abstraction, where we turn \x. x into \a:Type. \x:a. x. By a happy coincidence, implicit application is very useful and also easy to implement, whereas implicit abstraction, which is much harder to implement, is not actually very useful! I don't think we use it anywhere in the Haskell program that implements Dex, for example. As programmers, we're actively eager to provide explicit annotations for top-level polymorphic functions. It's what lets us reason locally about types.

Roughly following Practical type inference for arbitrary-rank types, we implement a bidirectional type inference system, in which we're either checking that a term has a particular type or inferring the type of a term. In either case, we produce an explicitly typed term in the core language as a result. We do implicit application everywhere but we only do implicit abstraction when we're in checking mode, e.g. checking a user-supplied annotation. So we'll wrap \a:Type. ... around \x. x in id : (a:Type ?-> a -> a) = \x. x but not in id = \x. x. Importantly, we're in checking mode when we look at the argument of a function application, so we can infer the right polymorphic type when we use our runST-like effect dischargers like withAccum : (h:Type ?-> Ref h w -> {Accum h|eff} a) -> {|eff} (a & w)

Thanks to Adam Paszke and Dimitrios Vytiniotis for helping figure out how to make this work.

Implicit arguments and holes

We now allow functions with Idris-style implicit arguments. These replace and generalize forall/type-abstraction, where the arguments can be ordinary terms, not just types. We write the type of such a function as a ?-> b, with ?-> a variant on the arrow type to go along with our other three (and counting):

a  -> {eff} b  Function arrow (with effects)
a  => b        Table arrow
a ?-> b        Implicit function arrow
a --o b        Linear arrow (for AD)

Also following Idris, free lowercase variables in top-level type annotations are lifted as implicit arguments of type Type by the parser, so we can still write id : a -> a, which is syntactic sugar for id : (a:Type) ?-> a -> a.

Arguments to implicit functions are filled in during type inference in exactly the same way we used to fill in implicit type arguments. We can also use the same mechanism to fill in arguments for explicit functions, by providing a hole _ instead of an ordinary argument. At some point we'll add the converse mechanism, for explicitly providing arguments for implicit functions.

Lexical syntax changes

Since we've lost the syntactic distinction between terms and types, we can't "pun" type constructors and term constructors, because we can't (easily) distinguish based on the context. Previously, we used (,) both for the pair constructor, like (1,2.3), and the type constructor of the type of pairs, (Int, Real). Now we use & (evoking Curry-Howard) for the type constructor, (Int & Real). Similarly, we used literal integers, like 3, for literal index sets. Now we write Fin 4 for the index set {0, 1, 2, 3}.

We also introduce Coq-style syntactic sugar for let-bound lambdas that lets us share a binder between the type and the term:

def id (a:Type) (x:a) : a = x

It's still valid to use the Haskell-style "unzipped" form:

id : a:Type -> a -> a =
  \a x. x

I can't decide which one I prefer.

Region types

References now carry an ST-style "region" parameter. References with the same region paramater may alias each other, which is important for efficient AD (we want aliased indexed reads to become aliased indexed accumulations).

Small changes

Parser improvements (especially distinguishing lexeme parsers from the rest)
Switch to ordinary names instead of de Bruijn indices for pi binders, handling alpha renaming with skolems when we check equality.
Define more things in the prelude rather than creating special built-in syntactic forms. This is easier to do now that our type system is more expressive.
Like Haskell, treat parenthesized operators (+) as names, allowing them to be defined like ordinary variables and used in prefix position, like (+) 1.0 2.0.

Still to do: typeclasses, linearity, AD of effects

We're hoping to handle typeclasses like Idris does, as a special sort of implicit argument representing a dictionary that can be synthesized by the compiler from available definitions. Linearity and AD shouldn't cause any special problems, but I think we should merge with main even though they're currently not ready.

(And merge `Kind` with `Type`.) This is preparation for an experiment in embracing dependent types more fully.

This frees up `MonadCat` for other purposes.

…ent types. We recently added dependent types for ranges and effects. If we lean on them more heavily, we can get rid of several specialized type system features. We can use pi types instead of forall types, Idris-style implicit arguments in place of implicit type application, and ordinary let bindings in place of type aliases. We can also de-duplicate implementation machinery that's replicated at the term level and the type level (e.g. getType/getKind). Adam Paszke has been patiently suggesting this approach for a while, and I'm finally starting to understand what a good idea it is. It would be a big change to do atomically in-place, so we're taking the John Carmack "parallel implementation" approach, implementing the `UExpr` language from scratch, alongside the existing `FExpr`. A command-line flag `--frontend uexpr` swaps out the parser and type inference pass to use the new front-end (everything after normalization is the same). Once it's feature-complete, we'll have the satisfaction of deleting the `FExpr` implementation.

Also fix a bug in (Expr) de Bruijn instantiation/abstraction. (We'll probably eventually get rid of de Bruijn indices there too, but it'll be easier once we've deleted FExpr.)

This time I'm trying to be more careful about making sure that lexeme parsers don't overlap with each other, to avoided misparses like `:=` as `:` and `=`. We're also allowing the definitions (but not the fixity and associativity) of infix ops like `+` to be provided in the prelude instead of built into the parser, so the parser itself can be simpler.

…`forall`. We use a "bidirectional" type inference approach similar to that described in "Practical type inference for arbitrary-rank types". Our implicit-argument function types play the role of sigma types (types with leading `forall`). Thanks to Dimitrios and Adam for help figuring it out.

Haskell-style function definitions put the annotation on the let binder, which means writing binders twice if they appear in both the type and the term, like `a` here: id : (a:Type) -> a -> a = \a x. x Coq-style function definitions put the annotations on the lambda binders and let us share the binder between the type and the term. def id (a:Type) (x:a) : a = x I added the `def` keyword, because otherwise everything up to the `=` has to be under a backtracking `try` combinator, which causes confusing parser errors when there's a syntax error in the binders.

I tried making these just primitive constants, just like `(+)`. It would look something like this: def (=>) (n:Type) (a:Type) : Type = %TabType n a def (.) {n:Type} {a:Type} (xs: n => a) (i:n) : a = %TabGet xs i def for {n:Type} {a:Type} (f: n -> a) : n => a = %for f The problem is that this doesn't let us have dependent table arrows. In the above, we can't have an `a` that depends on the index of type `n`. Adam pointed out that we could make it work if the type argument `a` was actually a type-level function (of `i:n`), and maybe you could even infer the function. For now, I'm leaving for/get as builtins, with very similar inference rules to lambda/application. We have four different function-like things now: a -> b -- plain function a --o b -- linear function a => b -- tabulated function {a} -> b -- implicit function In UExpr I'm trying to treat them more uniformly than we used to. Each of the three function-related constructs (lambda, application and the arrow type constructor) takes an `ArrowHead` parameter to specify which flavor of arrow we're talking about.

It's becoming too much hassle to maintain it alongside UExpr. This frees us up to make changes to the core `Expr` IR too, without worrying about updating the UExpr->Expr lowering.

* Use the `ArrowHead` parameter for lambda/app/arrow. * Allow atoms on the right-hand side of decls. (This means we're not syntactically forced to inline lambda.) * Rename `Expr` to `Block` and `CExpr` to `Expr`.

We'd been trying to use one set of per-primitive typing rules to do three things: checking, inference, and check-free querying, by abstracting over how to compare types (check equality, generate constraints, or do nothing). With this latest refactor we're no longer doing any type inference on primitives (we assume they're trivially wrapped in the prelude). That still leaves checking and querying, but they're already closer: we can just carry an optional typing environment and skip the checks if it's absent. We're also able to unify type checking and kind checking, which simplifies things further. The better errors are because we're using do/monadfail to unpack type-promised patterns (e.g. `Arrow a b <- ...` for the lhs of a function application). These are softer than the hard-failing pattern matches we used to use so we can add context before re-raising them.

De Bruijn indices made it trivial to test equality without worrying about alpha equivalence. But they were a regular source of complexity, since we had to duplicate a lot of our variable-handling machinery. Handling alpha equivalence directly isn't actually too bad after all, because we already have the substitution mechanisms we need. To unify or equality-test two pi types, we just rename the pi-bound variables to a common skolem variable that's fresh with respect to the free variables of the two pi types. We needed to change module structure a bit, since the `Eq` instance for `PiType` now needs to do substitutions. The easiest solution was just to put the substitution implementations in `Syntax.hs`. I also merged the `HasVars` and `Subst` classes while I was at it.

Also split out "higher-order" primitives that take a lambda param (`linearize`, `runWriter` etc) into a separate data type, `PrimHof`. We can now work with (derivable) `Traversable` and `Bitraversable` instances instead of our more complicated triple-functor `TraversableExpr` thing. I also cut out some of the more complex pretty-printer rules. The pretty-printer only needs to be good enough for debugging, and the rules were becoming a hassle to maintain every time we changed things.

This is simpler than reviving records and I think we might soon delete the current record type anyway. For dependent pairs, we'll want to use pairs rather than records. And for everything else we'll want true extensible records rather than our current rigid version. Required some small fixes to implicit arrow inference. I have no idea if it's correct now, but at least it's more symmetric.

This time we have references like `Ref r a`, parameterized by both the region `r` (like the phantom type in the ST monad) as well as the type of the data they refer to, `a`. So we can have aliased references that share a region, which is important for AD! It also means that our higher-order primitives like `withState` now take a two-argument function: withState : s -> ({r:Region} -> Ref r s -> {State r} a) -> (a ** s) This breaks our pattern of using `AbsBlock` (really a `(Binder, Block)` pair) as syntactic arguments to primitives HOFs. Rather than add another syntactic element, say `(Binder, Binder, Block)` it seemed simpler to just reuse actual lambda. One concern with this is that a lambda is annotated with all the effects needed by its body whereas our `AbsBlock` inherited its effects from its context. This means (1) more book keeping burden to maintain these annotations and (2) checking effects will be quadratic in the depth of effect nesting (which will be high when we're generating `runAccum` in AD).

The main change is to put effect annotations under the `PlainArrow` case of the `Arrow` parameter. Now table arrows are syntactically prevented from carrying effects, which gives us a little more safety. Also remove the arrow parameter to application. This means that function application and table indexing are only distinguishable by querying the type of the lhs. It makes the AST less redundant but it makes pattern-matching on table indexing more awkward. We may end up reverting it.

We look for free variables in type annotations of top-level let bindings that start with a lowercase letter. These get added as implicit arguments of type `Type`. For example: id : a -> a = \x. x becomes id : {a:Type} -> a -> a = \x. x

Improve parse errors by adding a few more labels. Improve type errors by renaming the inference variables like `?_3`, `?_15` to `a`, `b` etc., and reporting the types of the variables being solved for. This should also make the error text less sensitive to the internal details of type inference, so we don't have to change the expected test outputs so often.

Also experimenting with concrete syntax for implicit args: `(a:Type)?->` instead of `{a:Type}`. It's a bit more uniform, and `{..}` is already used for effects.

We now just have a pair type and a unit type, which is simpler. Eventually we want to have extensible records but these will look very different from our old built-in tuple/record type anyway.

Also rewrite `x=<e>; x` -> `<e>` when forming blocks.

(!) : Ref h (n=>a) -> n -> Ref h a Now that we have effect regions, this can be just an ordinary effect-free operation rather than a complicated combinator like our old `indexReader`.

Also simplified the top level environment. Removed the typing environment, because we can get that from the subst/binding environment (which type inference now needs anyway) and removed the derivative rules. I think we can just make AD annotations a first-class language feature if we add `fix` for statically unrollable recursion.

We work with `Atom` instead of`(Atom, Type)` pairs, since the type is cheaply derivable from the (well-typed) atom.

* Fix bug in function argument patterns * Add parsing rules for underscore binders * Handle writer and reader effects in Imp lowering

For example, if we have `id : (a:Type) -> a -> a` with an *explicit* type argument, we can ask type inference to infer the type argument by writing `id _ 1` instead of `id Int 1`. It uses the same mechanism as implicit arguments. We just create a fresh variable and proceed with unification.

This allows us to easily emit operations like `runWriter`, which need to know about all the effects in the body (not just the new one).

Also some improvements to type error messages and a printing bugfix.

For integer ranges, we'll just stick with `Range` and `Fin` for now.

apaszke

Looks awesome!

makefile

src/lib/Embed.hs

src/lib/Env.hs

src/lib/Imp.hs

src/lib/TopLevel.hs

src/lib/Syntax.hs

src/lib/TopLevel.hs

src/lib/Simplify.hs

dougalm added 30 commits May 22, 2020 09:26

Merge Type with Atom.

10cba45

(And merge `Kind` with `Type`.) This is preparation for an experiment in embracing dependent types more fully.

Make a dedicated typeclass for our embedding/builder monad.

63c2875

This frees up `MonadCat` for other purposes.

Use names instead of de Bruijn indices in UExpr pi binders.

42bcbc3

Also fix a bug in (Expr) de Bruijn instantiation/abstraction. (We'll probably eventually get rid of de Bruijn indices there too, but it'll be easier once we've deleted FExpr.)

Add source annotations to UExpr to improve error reporting.

07ae3c6

Check for unbound/shadowed variables in UExpr.

6e67271

Evaluate type expressions just enough to handle parametric type aliases.

f169361

Delete FExpr!

77ee8db

It's becoming too much hassle to maintain it alongside UExpr. This frees us up to make changes to the core `Expr` IR too, without worrying about updating the UExpr->Expr lowering.

Reorganize core Expr IR in line with UExpr.

dde4ea0

* Use the `ArrowHead` parameter for lambda/app/arrow. * Allow atoms on the right-hand side of decls. (This means we're not syntactically forced to inline lambda.) * Rename `Expr` to `Block` and `CExpr` to `Expr`.

Fix a couple of bugs blocking effects in loops.

f4162d3

Parsing rules for table constructor.

f4d86c5

Get FFI working again and start UExpr version of prelude.

567261a

Add patterns to UExpr lambda and let.

cd82579

Also experimenting with concrete syntax for implicit args: `(a:Type)?->` instead of `{a:Type}`. It's a bit more uniform, and `{..}` is already used for effects.

Remove tuples/records.

79754af

We now just have a pair type and a unit type, which is simpler. Eventually we want to have extensible records but these will look very different from our old built-in tuple/record type anyway.

Make more effort to preserve source names in Embed.

b970b76

Also rewrite `x=<e>; x` -> `<e>` when forming blocks.

Add (!) operation for indexing into references.

04bd1e9

(!) : Ref h (n=>a) -> n -> Ref h a Now that we have effect regions, this can be just an ordinary effect-free operation rather than a complicated combinator like our old `indexReader`.

Proper effect row unification.

21ad938

Add a check for variables escaping their scopes due to unification.

5ff8391

dougalm added 14 commits June 11, 2020 22:15

Simplify inference a bit.

1b16cb0

We work with `Atom` instead of`(Atom, Type)` pairs, since the type is cheaply derivable from the (well-typed) atom.

Port most of prelude over to UExpr.

68db99e

Various UExpr fixes

2cb2f76

* Fix bug in function argument patterns * Add parsing rules for underscore binders * Handle writer and reader effects in Imp lowering

Track allowed effects in Embed.

80ae79c

This allows us to easily emit operations like `runWriter`, which need to know about all the effects in the body (not just the new one).

Start handling AD using new effect system.

27d75de

Transposition of loops and indexing.

087e53e

Also some improvements to type error messages and a printing bugfix.

Start updating tests to UExpr frontend and fix various bugs uncovered.

709db40

Handle sum types and case expressions in parser and prelude.

8d7baf5

Add index range syntax to UExpr.

5245766

For integer ranges, we'll just stick with `Range` and `Fin` for now.

Fix table equality comparisons and mutual shadowing in patterns.

aee0ec3

Update most of the remaining tests and examples to UExpr frontend.

4433c48

Merge branch 'main' into dependenter-types

b92e76c

googlebot added the cla: yes label Jun 17, 2020

apaszke approved these changes Jun 18, 2020

View reviewed changes

apaszke reviewed Jun 18, 2020

View reviewed changes

src/lib/Syntax.hs Outdated Show resolved Hide resolved

src/lib/Syntax.hs Show resolved Hide resolved

src/lib/Syntax.hs Show resolved Hide resolved

src/lib/TopLevel.hs Outdated Show resolved Hide resolved

src/lib/Simplify.hs Show resolved Hide resolved

Make edits suggested by Adam.

e564e62

dougalm merged commit e7743cf into main Jun 18, 2020

dougalm mentioned this pull request Jun 22, 2020

include seems not to be bringing things into scope #107

Closed

This was referenced Jul 8, 2020

Can't iterate over all possible values of a table #146

Closed

Reimplement autodiff on effects and loops #149

Merged

dan-zheng deleted the dependenter-types branch December 18, 2020 23:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Erase the distinction between terms and types #101

Erase the distinction between terms and types #101

dougalm commented Jun 17, 2020 •

edited

Loading

apaszke left a comment

Erase the distinction between terms and types #101

Erase the distinction between terms and types #101

Conversation

dougalm commented Jun 17, 2020 • edited Loading

No more special type-level constructs

Checking type equality

Type inference

Implicit arguments and holes

Lexical syntax changes

Region types

Small changes

Still to do: typeclasses, linearity, AD of effects

apaszke left a comment

Choose a reason for hiding this comment

dougalm commented Jun 17, 2020 •

edited

Loading