-
-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix incongruous generics after type-checking. #922
Conversation
if let Some(typed_fun) = environment.inferred_functions.get(&fun.name) { | ||
return Ok(typed_fun.clone()); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some memoization happening here, now-necessary since we might have already processed that definition. This is because:
- There might be multiple function calls to that function
- Simply because it's processed as part of the normal "infer_definition" loop, but has already been seen in a call from a previous definition.
let hydrator = hydrators | ||
.remove(name) | ||
.unwrap_or_else(|| panic!("Could not find hydrator for fn {name}")); | ||
|
||
let mut expr_typer = ExprTyper::new(environment, hydrators, lines, tracing); | ||
|
||
expr_typer.hydrator = hydrator; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing is really changing in this function other than this bit. We now do pass the hydrators
to the expression typer, to allow calling infer_function
from within inferring an expression (e.g. a function body).
environment | ||
.inferred_functions | ||
.insert(name.to_string(), inferred_fn.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the last actual change, we store the inferred function once we have it.
@@ -83,7 +88,7 @@ impl UntypedModule { | |||
for def in consts.into_iter().chain(not_consts) { | |||
let definition = infer_definition( | |||
def, | |||
&name, | |||
&module_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just renamed this because it was getting confusing down the line. name
is a bit too ambiguous.
ungeneralised_function_used: false, | ||
lines, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just moved this one up and added hydrators
.
10b94e9
to
c5227a2
Compare
7e65df5
to
17df3a8
Compare
This should not happen; if it does, it's an error from the type-checker. So instead of silently swallowing the error and adopting a behavior which is only _sometimes_ right, it is better to fail loudly and investigate.
This was somehow wrong and corrected by codegen later on, but we should be re-using the same generic id across an entire definition if the variable refers to the same element.
Until now, we would pretty-print unbound variable the same way we would pretty-print generics. This turned out to be very confusing when debugging, as they have a quite different semantic and it helps to visualize unbound types in definitions.
The current inferrence system walks expressions from "top to bottom". Starting from definitions higher in the source file, and down. When a call is encountered, we use the information known for the callee definition we have at the moment it is inferred. This causes interesting issues in the case where the callee doesn't have annotations and in only partially known. For example: ``` pub fn list(fuzzer: Option<a>) -> Option<List<a>> { inner(fuzzer, []) } fn inner(fuzzer, xs) -> Option<List<b>> { when fuzzer is { None -> Some(xs) Some(x) -> Some([x, ..xs]) } } ``` In this small program, we infer `list` first and run into `inner`. Yet, the arguments for `inner` are not annotated, so since we haven't inferred `inner` yet, we will create two unbound variables. And naturally, we will link the type of `[]` to being of the same type as `xs` -- which is still unbound at this point. The return type of `inner` is given by the annotation, so all-in-all, the unification will work without ever having to commit to a type of `[]`. It is only later, when `inner` is inferred, that we will generalise the unbound type of `xs` to a generic which the same as `b` in the annotation. At this point, `[]` is also typed with this same generic, which has a different id than `a` in `list` since it comes from another type definition. This is unfortunate and will cause issues down the line for the code generation. The problem doesn't occur when `inner`'s arguments are properly annotated or, when `inner` is actually inferred first. Hence, I saw two possible avenues for fixing this problem: 1. Detect the presence of 'uncongruous generics' in definitions after they've all been inferred, and raise a user error asking for more annotations. 2. Infer definitions in dependency order, with definitions used in other inferred first. This commit does (2) (although it may still be a good idea to do (1) eventually) since it offers a much better user experience. One way to do (2) is to construct a dependency graph between function calls, and ensure perform a topological sort. Building such graph is, however, quite tricky as it requires walking through the AST while maintaining scope etc. which is more-or-less already what the inferrence step is doing; so it feels like double work. Thus instead, this commit tries to do a deep-first inferrence and "pause" inferrence of definitions when encountering a call to fully infer the callee first. To achieve this properly, we must ensure that we do not infer the same definition again, so we "remember" already inferred definitions in the environment now.
17df3a8
to
9e5e3a3
Compare
📍 Panic when encountering unknown generics.
This should not happen; if it does, it's an error from the type-checker. So instead of silently swallowing the error and adopting a behavior which is only sometimes right, it is better to fail loudly and investigate.
📍 Re-use generic id across builtin type-definitions.
This was somehow wrong and corrected by codegen later on, but we should be re-using the same generic id across an entire definition if the variable refers to the same element.
📍 Change pretty-printing of unbound variable to '?'
Until now, we would pretty-print unbound variable the same way we would pretty-print generics. This turned out to be very confusing when debugging, as they have a quite different semantic and it helps to visualize unbound types in definitions.
📍 Infer callee first in function call
The current inferrence system walks expressions from "top to bottom".
Starting from definitions higher in the source file, and down. When a
call is encountered, we use the information known for the callee
definition we have at the moment it is inferred.
This causes interesting issues in the case where the callee doesn't
have annotations and in only partially known. For example:
In this small program, we infer
list
first and run intoinner
.Yet, the arguments for
inner
are not annotated, so since we haven'tinferred
inner
yet, we will create two unbound variables.And naturally, we will link the type of
[]
to being of the same typeas
xs
-- which is still unbound at this point. The return type ofinner
is given by the annotation, so all-in-all, the unificationwill work without ever having to commit to a type of
[]
.It is only later, when
inner
is inferred, that we will generalisethe unbound type of
xs
to a generic which the same asb
in theannotation. At this point,
[]
is also typed with this same generic,which has a different id than
a
inlist
since it comes fromanother type definition.
This is unfortunate and will cause issues down the line for the code
generation. The problem doesn't occur when
inner
's arguments areproperly annotated or, when
inner
is actually inferred first.Hence, I saw two possible avenues for fixing this problem:
Detect the presence of 'uncongruous generics' in definitions after
they've all been inferred, and raise a user error asking for more
annotations.
Infer definitions in dependency order, with definitions used in
other inferred first.
This commit does (2) (although it may still be a good idea to do (1)
eventually) since it offers a much better user experience. One way to
do (2) is to construct a dependency graph between function calls, and
ensure perform a topological sort.
Building such graph is, however, quite tricky as it requires walking
through the AST while maintaining scope etc. which is more-or-less
already what the inferrence step is doing; so it feels like double
work.
Thus instead, this commit tries to do a deep-first inferrence and
"pause" inferrence of definitions when encountering a call to fully
infer the callee first. To achieve this properly, we must ensure that
we do not infer the same definition again, so we "remember" already
inferred definitions in the environment now.
📍 Add now-necessary type-annotation to 077
📍 Add new acceptance test illustrating need for fn call ordering
Closes #906