Fix incongruous generics after type-checking. #922

KtorZ · 2024-05-05T12:31:44Z

📍 Panic when encountering unknown generics.
This should not happen; if it does, it's an error from the type-checker. So instead of silently swallowing the error and adopting a behavior which is only sometimes right, it is better to fail loudly and investigate.
📍 Re-use generic id across builtin type-definitions.
This was somehow wrong and corrected by codegen later on, but we should be re-using the same generic id across an entire definition if the variable refers to the same element.
📍 Change pretty-printing of unbound variable to '?'
Until now, we would pretty-print unbound variable the same way we would pretty-print generics. This turned out to be very confusing when debugging, as they have a quite different semantic and it helps to visualize unbound types in definitions.
📍 Infer callee first in function call
The current inferrence system walks expressions from "top to bottom".
Starting from definitions higher in the source file, and down. When a
call is encountered, we use the information known for the callee
definition we have at the moment it is inferred.

This causes interesting issues in the case where the callee doesn't
have annotations and in only partially known. For example:
```
pub fn list(fuzzer: Option<a>) -> Option<List<a>> {
  inner(fuzzer, [])
}

fn inner(fuzzer, xs) -> Option<List<b>> {
  when fuzzer is {
    None -> Some(xs)
    Some(x) -> Some([x, ..xs])
  }
}
```
In this small program, we infer list first and run into inner.
Yet, the arguments for inner are not annotated, so since we haven't
inferred inner yet, we will create two unbound variables.

And naturally, we will link the type of [] to being of the same type
as xs -- which is still unbound at this point. The return type of
inner is given by the annotation, so all-in-all, the unification
will work without ever having to commit to a type of [].

It is only later, when inner is inferred, that we will generalise
the unbound type of xs to a generic which the same as b in the
annotation. At this point, [] is also typed with this same generic,
which has a different id than a in list since it comes from
another type definition.

This is unfortunate and will cause issues down the line for the code
generation. The problem doesn't occur when inner's arguments are
properly annotated or, when inner is actually inferred first.

Hence, I saw two possible avenues for fixing this problem:
1. Detect the presence of 'uncongruous generics' in definitions after
  they've all been inferred, and raise a user error asking for more
  annotations.
2. Infer definitions in dependency order, with definitions used in
  other inferred first.
This commit does (2) (although it may still be a good idea to do (1)
eventually) since it offers a much better user experience. One way to
do (2) is to construct a dependency graph between function calls, and
ensure perform a topological sort.

Building such graph is, however, quite tricky as it requires walking
through the AST while maintaining scope etc. which is more-or-less
already what the inferrence step is doing; so it feels like double
work.

Thus instead, this commit tries to do a deep-first inferrence and
"pause" inferrence of definitions when encountering a call to fully
infer the callee first. To achieve this properly, we must ensure that
we do not infer the same definition again, so we "remember" already
inferred definitions in the environment now.
📍 Add now-necessary type-annotation to 077
📍 Add new acceptance test illustrating need for fn call ordering

Closes #906

KtorZ · 2024-05-05T12:34:01Z

crates/aiken-lang/src/tipo/expr.rs

+    if let Some(typed_fun) = environment.inferred_functions.get(&fun.name) {
+        return Ok(typed_fun.clone());
+    };


Some memoization happening here, now-necessary since we might have already processed that definition. This is because:

There might be multiple function calls to that function

Simply because it's processed as part of the normal "infer_definition" loop, but has already been seen in a call from a previous definition.

KtorZ · 2024-05-05T12:35:13Z

crates/aiken-lang/src/tipo/expr.rs

+        let hydrator = hydrators
+            .remove(name)
+            .unwrap_or_else(|| panic!("Could not find hydrator for fn {name}"));
+
+        let mut expr_typer = ExprTyper::new(environment, hydrators, lines, tracing);
+
+        expr_typer.hydrator = hydrator;


Nothing is really changing in this function other than this bit. We now do pass the hydrators to the expression typer, to allow calling infer_function from within inferring an expression (e.g. a function body).

KtorZ · 2024-05-05T12:35:44Z

crates/aiken-lang/src/tipo/expr.rs

+    environment
+        .inferred_functions
+        .insert(name.to_string(), inferred_fn.clone());


And the last actual change, we store the inferred function once we have it.

KtorZ · 2024-05-05T12:36:26Z

crates/aiken-lang/src/tipo/infer.rs

@@ -83,7 +88,7 @@ impl UntypedModule {
        for def in consts.into_iter().chain(not_consts) {
            let definition = infer_definition(
                def,
-                &name,
+                &module_name,


Just renamed this because it was getting confusing down the line. name is a bit too ambiguous.

KtorZ · 2024-05-05T12:37:48Z

crates/aiken-lang/src/tipo/expr.rs

+            ungeneralised_function_used: false,
+            lines,
+        }
+    }


Just moved this one up and added hydrators.

This should not happen; if it does, it's an error from the type-checker. So instead of silently swallowing the error and adopting a behavior which is only _sometimes_ right, it is better to fail loudly and investigate.

This was somehow wrong and corrected by codegen later on, but we should be re-using the same generic id across an entire definition if the variable refers to the same element.

Until now, we would pretty-print unbound variable the same way we would pretty-print generics. This turned out to be very confusing when debugging, as they have a quite different semantic and it helps to visualize unbound types in definitions.

The current inferrence system walks expressions from "top to bottom". Starting from definitions higher in the source file, and down. When a call is encountered, we use the information known for the callee definition we have at the moment it is inferred. This causes interesting issues in the case where the callee doesn't have annotations and in only partially known. For example: ``` pub fn list(fuzzer: Option<a>) -> Option<List<a>> { inner(fuzzer, []) } fn inner(fuzzer, xs) -> Option<List<b>> { when fuzzer is { None -> Some(xs) Some(x) -> Some([x, ..xs]) } } ``` In this small program, we infer `list` first and run into `inner`. Yet, the arguments for `inner` are not annotated, so since we haven't inferred `inner` yet, we will create two unbound variables. And naturally, we will link the type of `[]` to being of the same type as `xs` -- which is still unbound at this point. The return type of `inner` is given by the annotation, so all-in-all, the unification will work without ever having to commit to a type of `[]`. It is only later, when `inner` is inferred, that we will generalise the unbound type of `xs` to a generic which the same as `b` in the annotation. At this point, `[]` is also typed with this same generic, which has a different id than `a` in `list` since it comes from another type definition. This is unfortunate and will cause issues down the line for the code generation. The problem doesn't occur when `inner`'s arguments are properly annotated or, when `inner` is actually inferred first. Hence, I saw two possible avenues for fixing this problem: 1. Detect the presence of 'uncongruous generics' in definitions after they've all been inferred, and raise a user error asking for more annotations. 2. Infer definitions in dependency order, with definitions used in other inferred first. This commit does (2) (although it may still be a good idea to do (1) eventually) since it offers a much better user experience. One way to do (2) is to construct a dependency graph between function calls, and ensure perform a topological sort. Building such graph is, however, quite tricky as it requires walking through the AST while maintaining scope etc. which is more-or-less already what the inferrence step is doing; so it feels like double work. Thus instead, this commit tries to do a deep-first inferrence and "pause" inferrence of definitions when encountering a call to fully infer the callee first. To achieve this properly, we must ensure that we do not infer the same definition again, so we "remember" already inferred definitions in the environment now.

…a function

KtorZ self-assigned this May 5, 2024

KtorZ commented May 5, 2024

View reviewed changes

crates/aiken-lang/src/tipo/expr.rs

ungeneralised_function_used: false,

lines,

}

}

Copy link

Member Author

KtorZ May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moved this one up and added hydrators.

KtorZ changed the title ~~Fix uncongruous generics after type-checking.~~ Fix incongruous generics after type-checking. May 5, 2024

KtorZ force-pushed the fix/uncongruous-generics branch 3 times, most recently from 10b94e9 to c5227a2 Compare May 5, 2024 17:23

KtorZ marked this pull request as ready for review May 5, 2024 17:24

KtorZ requested a review from a team as a code owner May 5, 2024 17:24

MicroProofs force-pushed the fix/uncongruous-generics branch 2 times, most recently from 7e65df5 to 17df3a8 Compare May 6, 2024 15:48

KtorZ and others added 9 commits May 6, 2024 11:58

Panic when encountering unknown generics.

2ba640d

This should not happen; if it does, it's an error from the type-checker. So instead of silently swallowing the error and adopting a behavior which is only _sometimes_ right, it is better to fail loudly and investigate.

Re-use generic id across builtin type-definitions.

afa4cb1

This was somehow wrong and corrected by codegen later on, but we should be re-using the same generic id across an entire definition if the variable refers to the same element.

Add now-necessary type-annotations in tests

4f2030d

Add new acceptance test illustrating need for fn call ordering

0fb99d7

Add new acceptance test 101 capturing type mismatch at runtime.

8c8448b

don't use generic data type lookup when you have a Record Variant as …

465efe8

…a function

activate test 75

9e5e3a3

MicroProofs force-pushed the fix/uncongruous-generics branch from 17df3a8 to 9e5e3a3 Compare May 6, 2024 15:58

update changelog

489d4bd

MicroProofs merged commit 30436a8 into main May 6, 2024
2 checks passed

MicroProofs deleted the fix/uncongruous-generics branch May 6, 2024 19:17

KtorZ mentioned this pull request May 14, 2024

typecheck permitting variables not in (hygenic) scope #941

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix incongruous generics after type-checking. #922

Fix incongruous generics after type-checking. #922

KtorZ commented May 5, 2024 •

edited by MicroProofs

Loading

KtorZ May 5, 2024

KtorZ May 5, 2024

KtorZ May 5, 2024

KtorZ May 5, 2024

KtorZ May 5, 2024

Fix incongruous generics after type-checking. #922

Fix incongruous generics after type-checking. #922

Conversation

KtorZ commented May 5, 2024 • edited by MicroProofs Loading

KtorZ May 5, 2024

Choose a reason for hiding this comment

KtorZ May 5, 2024

Choose a reason for hiding this comment

KtorZ May 5, 2024

Choose a reason for hiding this comment

KtorZ May 5, 2024

Choose a reason for hiding this comment

KtorZ May 5, 2024

Choose a reason for hiding this comment

KtorZ commented May 5, 2024 •

edited by MicroProofs

Loading