Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incongruous generics after type-checking. #922

Merged
merged 10 commits into from
May 6, 2024

Conversation

KtorZ
Copy link
Member

@KtorZ KtorZ commented May 5, 2024

  • 📍 Panic when encountering unknown generics.
    This should not happen; if it does, it's an error from the type-checker. So instead of silently swallowing the error and adopting a behavior which is only sometimes right, it is better to fail loudly and investigate.

  • 📍 Re-use generic id across builtin type-definitions.
    This was somehow wrong and corrected by codegen later on, but we should be re-using the same generic id across an entire definition if the variable refers to the same element.

  • 📍 Change pretty-printing of unbound variable to '?'
    Until now, we would pretty-print unbound variable the same way we would pretty-print generics. This turned out to be very confusing when debugging, as they have a quite different semantic and it helps to visualize unbound types in definitions.

  • 📍 Infer callee first in function call
    The current inferrence system walks expressions from "top to bottom".
    Starting from definitions higher in the source file, and down. When a
    call is encountered, we use the information known for the callee
    definition we have at the moment it is inferred.

    This causes interesting issues in the case where the callee doesn't
    have annotations and in only partially known. For example:

    pub fn list(fuzzer: Option<a>) -> Option<List<a>> {
      inner(fuzzer, [])
    }
    
    fn inner(fuzzer, xs) -> Option<List<b>> {
      when fuzzer is {
        None -> Some(xs)
        Some(x) -> Some([x, ..xs])
      }
    }
    

    In this small program, we infer list first and run into inner.
    Yet, the arguments for inner are not annotated, so since we haven't
    inferred inner yet, we will create two unbound variables.

    And naturally, we will link the type of [] to being of the same type
    as xs -- which is still unbound at this point. The return type of
    inner is given by the annotation, so all-in-all, the unification
    will work without ever having to commit to a type of [].

    It is only later, when inner is inferred, that we will generalise
    the unbound type of xs to a generic which the same as b in the
    annotation. At this point, [] is also typed with this same generic,
    which has a different id than a in list since it comes from
    another type definition.

    This is unfortunate and will cause issues down the line for the code
    generation. The problem doesn't occur when inner's arguments are
    properly annotated or, when inner is actually inferred first.

    Hence, I saw two possible avenues for fixing this problem:

    1. Detect the presence of 'uncongruous generics' in definitions after
      they've all been inferred, and raise a user error asking for more
      annotations.

    2. Infer definitions in dependency order, with definitions used in
      other inferred first.

    This commit does (2) (although it may still be a good idea to do (1)
    eventually) since it offers a much better user experience. One way to
    do (2) is to construct a dependency graph between function calls, and
    ensure perform a topological sort.

    Building such graph is, however, quite tricky as it requires walking
    through the AST while maintaining scope etc. which is more-or-less
    already what the inferrence step is doing; so it feels like double
    work.

    Thus instead, this commit tries to do a deep-first inferrence and
    "pause" inferrence of definitions when encountering a call to fully
    infer the callee first. To achieve this properly, we must ensure that
    we do not infer the same definition again, so we "remember" already
    inferred definitions in the environment now.

  • 📍 Add now-necessary type-annotation to 077

  • 📍 Add new acceptance test illustrating need for fn call ordering

Closes #906

@KtorZ KtorZ self-assigned this May 5, 2024
Comment on lines +40 to +42
if let Some(typed_fun) = environment.inferred_functions.get(&fun.name) {
return Ok(typed_fun.clone());
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some memoization happening here, now-necessary since we might have already processed that definition. This is because:

  1. There might be multiple function calls to that function
  2. Simply because it's processed as part of the normal "infer_definition" loop, but has already been seen in a call from a previous definition.

Comment on lines +77 to +83
let hydrator = hydrators
.remove(name)
.unwrap_or_else(|| panic!("Could not find hydrator for fn {name}"));

let mut expr_typer = ExprTyper::new(environment, hydrators, lines, tracing);

expr_typer.hydrator = hydrator;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is really changing in this function other than this bit. We now do pass the hydrators to the expression typer, to allow calling infer_function from within inferring an expression (e.g. a function body).

Comment on lines +137 to +139
environment
.inferred_functions
.insert(name.to_string(), inferred_fn.clone());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the last actual change, we store the inferred function once we have it.

@@ -83,7 +88,7 @@ impl UntypedModule {
for def in consts.into_iter().chain(not_consts) {
let definition = infer_definition(
def,
&name,
&module_name,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just renamed this because it was getting confusing down the line. name is a bit too ambiguous.

ungeneralised_function_used: false,
lines,
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moved this one up and added hydrators.

@KtorZ KtorZ changed the title Fix uncongruous generics after type-checking. Fix incongruous generics after type-checking. May 5, 2024
@KtorZ KtorZ force-pushed the fix/uncongruous-generics branch 3 times, most recently from 10b94e9 to c5227a2 Compare May 5, 2024 17:23
@KtorZ KtorZ marked this pull request as ready for review May 5, 2024 17:24
@KtorZ KtorZ requested a review from a team as a code owner May 5, 2024 17:24
@MicroProofs MicroProofs force-pushed the fix/uncongruous-generics branch 2 times, most recently from 7e65df5 to 17df3a8 Compare May 6, 2024 15:48
KtorZ and others added 9 commits May 6, 2024 11:58
  This should not happen; if it does, it's an error from the type-checker. So instead of silently swallowing the error and adopting a behavior which is only _sometimes_ right, it is better to fail loudly and investigate.
  This was somehow wrong and corrected by codegen later on, but we should be re-using the same generic id across an entire definition if the variable refers to the same element.
  Until now, we would pretty-print unbound variable the same way we would pretty-print generics. This turned out to be very confusing when debugging, as they have a quite different semantic and it helps to visualize unbound types in definitions.
  The current inferrence system walks expressions from "top to bottom".
  Starting from definitions higher in the source file, and down. When a
  call is encountered, we use the information known for the callee
  definition we have at the moment it is inferred.

  This causes interesting issues in the case where the callee doesn't
  have annotations and in only partially known. For example:

  ```
  pub fn list(fuzzer: Option<a>) -> Option<List<a>> {
    inner(fuzzer, [])
  }

  fn inner(fuzzer, xs) -> Option<List<b>> {
    when fuzzer is {
      None -> Some(xs)
      Some(x) -> Some([x, ..xs])
    }
  }
  ```

  In this small program, we infer `list` first and run into `inner`.
  Yet, the arguments for `inner` are not annotated, so since we haven't
  inferred `inner` yet, we will create two unbound variables.

  And naturally, we will link the type of `[]` to being of the same type
  as `xs` -- which is still unbound at this point. The return type of
  `inner` is given by the annotation, so all-in-all, the unification
  will work without ever having to commit to a type of `[]`.

  It is only later, when `inner` is inferred, that we will generalise
  the unbound type of `xs` to a generic which the same as `b` in the
  annotation. At this point, `[]` is also typed with this same generic,
  which has a different id than `a` in `list` since it comes from
  another type definition.

  This is unfortunate and will cause issues down the line for the code
  generation. The problem doesn't occur when `inner`'s arguments are
  properly annotated or, when `inner` is actually inferred first.

  Hence, I saw two possible avenues for fixing this problem:

  1. Detect the presence of 'uncongruous generics' in definitions after
     they've all been inferred, and raise a user error asking for more
     annotations.

  2. Infer definitions in dependency order, with definitions used in
     other inferred first.

  This commit does (2) (although it may still be a good idea to do (1)
  eventually) since it offers a much better user experience. One way to
  do (2) is to construct a dependency graph between function calls, and
  ensure perform a topological sort.

  Building such graph is, however, quite tricky as it requires walking
  through the AST while maintaining scope etc. which is more-or-less
  already what the inferrence step is doing; so it feels like double
  work.

  Thus instead, this commit tries to do a deep-first inferrence and
  "pause" inferrence of definitions when encountering a call to fully
  infer the callee first. To achieve this properly, we must ensure that
  we do not infer the same definition again, so we "remember" already
  inferred definitions in the environment now.
@MicroProofs MicroProofs merged commit 30436a8 into main May 6, 2024
2 checks passed
@MicroProofs MicroProofs deleted the fix/uncongruous-generics branch May 6, 2024 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🚀 Released
Development

Successfully merging this pull request may close these issues.

aiken check coersion error
2 participants