Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental constraint solver #450

Merged
merged 85 commits into from
Dec 31, 2022
Merged

Conversation

erszcz
Copy link
Collaborator

@erszcz erszcz commented Sep 1, 2022

This rebases #284 by @josefs onto the current master branch. It also adds some polymorphic function tests borrowed from Bidirectional typing for Erlang to show the solver works - at least in some cases.

Constraint solving is enabled by using a Gradualizer option solve_constraints. Without the option the behaviour is unchanged. I think this approach will allow to merge this PR early, yet make the functionality opt-in while it's being polished.

I'm making this PR a draft for now, since it shows some suspicious reports when running a self-check:

ebin/gradualizer_db.beam: The type 0..1114111 is not a subtype of {_TyVar-576460752303417119, _TyVar-576460752303417087}
ebin/gradualizer_db.beam: The type [char()] is not a subtype of {_TyVar-576460752303415487, _TyVar-576460752303415455}

In general, the number of warnings on a self-check has grown significantly with this change. However, that was to be expected.

@erszcz
Copy link
Collaborator Author

erszcz commented Sep 1, 2022

Actually, I don't think map_type_var should pass - as the list element type is union of a map and an atom, I think the map pattern match might fail. I'll move it around as appropriate tomorrow.

@erszcz
Copy link
Collaborator Author

erszcz commented Nov 24, 2022

I've rebased this on top of the current master and force pushed.

Solving constraints is currently enabled by a command-line flag or an option set via an attribute. It's enabled for the self-check, which bumps the number of errors above 100 lines - that's why CI fails. It's not a lot more - 116 lines - but still more than the self-imposed threshold of 100.

I'm also gathering examples that would be useful as tests (currently the poly1.erl file apart from the tests already committed under test), which allows me to understand the places where the type checker approach fails (apart from a lot of places currently just dropping constraints). One such example is:

%% false negative, i.e. there's an undetected bug here
-spec i([binary() | integer()]) -> [integer()].
i(L) ->
    map(fun
            (I) when is_integer(I) -> I * 2;
            (B) when is_list(B) -> list_to_integer(B)
        end, L).

B should be constrained to a binary, but this inconsistency is not caught. It's clear issues arise when union types, type refinement, and constraint solving overlap.

@erszcz erszcz force-pushed the constraint_solver branch 3 times, most recently from 9e03120 to 84490d4 Compare December 3, 2022 14:19
@erszcz
Copy link
Collaborator Author

erszcz commented Dec 3, 2022

I think this is ready for review now. My rationale for merging this in is that on a feature branch it's subject to bit rot, whereas on master it will be regularly tested for compatibility with the rest of the project thanks to CI.

Main changes from #284:

  • a number of tests, all test files enabling constraint solving start with poly_ prefix for clarity
  • locating a constraint error is more convenient:
    • constraint error messages contain type variable names
    • type variable names are deterministic and descriptive - they start with the original type var name, e.g. A_... for a (A) -> boolean() function
    • constraint error messages report the location of the top-level function for which a constraint error was identified as that's the most precise location that's available to the type checker
    • see https://github.com/josefs/Gradualizer/actions/runs/3608900282/jobs/6081768978 ("Run self check") for an example output
  • number of self-check error lines < 100, i.e. we're below the self-imposed CI limit
  • constraint solving is opt-in, it requires a command-line --solve_constraints flag or an in-file Gradualizer solve_constraints attribute to be enabled

Constraint solving will certainly require more work to be fully functional, especially in the light of #488, but I think it's better to have whatever already works merged to master.

@josefs @zuiderkwast WDYT?

@erszcz erszcz marked this pull request as ready for review December 3, 2022 14:33
@erszcz erszcz changed the title Rebase constraint_solver Experimental constraint solver Dec 3, 2022
@erszcz
Copy link
Collaborator Author

erszcz commented Dec 28, 2022

I've fixed a few more self-check errors. Do you have any comments on this, @josefs @zuiderkwast? Especially any comments vetoing a merge into master? I'd like to rebase this onto master and merge after #499.

Copy link
Collaborator

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests look good. It's nice to see that the solver does something useful.

I've only skimmed thought the implementation. I think it's a good idea to merge this with constraint solving off by default, as you suggested.

Is the solver implemented as if subtyping is transitive? This is what Josef mentioned, but I haven't really spotted this part of the implementation. It would be nice to add some example of what the solver can't do if it assumes typing is transitive, which it actually isn't when any() is involved. (X :: Y and Y :: Z doesn't imply X :: Z if Y is any()).

src/typechecker.erl Show resolved Hide resolved
Comment on lines +21 to +26
-spec poly_2(fun((A) -> A)) -> {integer(), boolean()}.
poly_2(F) -> {F(42), F(false)}.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one shouldn't fail. Should it?

Copy link
Collaborator Author

@erszcz erszcz Dec 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel completely comfortable with this topic yet, but I think it depends on the future goals: if we ever want to be able to infer the type of a function like poly_2 (for example, when it's used as an inline fun), then we're limited to rank-2 polymorphism for which inference is decidable.

According to Bidirectional Typing for Erlang:

Erlang allows higher-ranked polymorphism where polymorphic types occur anywhere in the type signature. The Hindley-Milner type inference only supports the inference of rank-1 (prenex) polymorphism. Type inference for the higher-ranked polymorphism is undecidable, which is a significant limitation for the Hindley-Milner type inference. For example, the poly_2 function shown next has rank-2 polymorphism that ETC fails to type-check:

-spec poly_2(fun((A) -> A)) -> {integer(), boolean()}.
poly_2(F) -> {F(42), F(false)}.

Similarly, ETC fails for poly_fail/3 due to its rank-2 polymorphism.

-spec poly_fail(fun((A) -> A), boolean(), integer()) -> {boolean(), integer()}.
poly_fail(F, B, I) -> {F(I), F(B)}.

In contrast, Dialyzer here finds the return type mismatch error.

As we can see from this test, Gradualizer is also able to detect this error.

I'd leave it as is until anyone of us is comfortable enough with constraint solving and polymorphism to confidently move it to should_pass tests.

erszcz and others added 17 commits December 30, 2022 15:25
This is my first stab at the constraint solver. It is only very
lightly tested and is not hooked in to the type checker yet.
When checking toplevel functions, call the constraint solver to work
out if the existential types are correct.

Unimplemented:

* Applying the substitution on the remaining types.
* Check that the remaining constraints are compatible with the
  constraints declared in the spec.

I fixed a few bugs in the process.
A variable might not have any constraints and that's fine!
Keeping the constraints when checking unions requires having a union
of constraints and the current implementation doesn't support that. So
right now we just drop the constraints.

This means that we currently don't report as many type errors as one
might expect.
The current representation for constraints cannot handle union types
so the constraints are dropped for now.

This means that some type errors will not be caught.
The function instantiate_fun_type now returns constraints which
have to be propagated.

This was caught by dialyzer
…d_should_fail.erl to test/should_fail/poly_union_lower_bound_fail.erl
…st/known_problems/should_pass/poly_lists_map_constraints_should_pass.erl
…ould_pass.erl -> test/should_pass/poly_lists_map_constraints_pass.erl
Based on:

    -spec get_unassigned_fields(Fields, All) -> [atom()] when
          Fields :: [record_field() | typed_record_field()],
          All :: [typed_record_field()].
    get_unassigned_fields(Fields, All) ->
        FieldNames = lists:flatmap(fun
                                       (?record_field(Field)) -> [Field];
                                       (?typed_record_field(Field)) -> [Field];
                                       %% We might run into values like this, which don't match above:
                                       %% {record_field, _, {var,{12,11},'_'}, _}
                                       (_) -> []
                                   end, Fields),
        AllNames = lists:map(fun (?typed_record_field(Field)) -> Field end, All),
        AllNames -- FieldNames.

Which currently raises:

    ebin/typechecker.beam: Lower bound record_field() | typed_record_field()
        of type variable A_typechecker_3467_838 on line 2026 is not a subtype
        of {A_typechecker_3467_838_typechecker_1226_848,
            A_typechecker_3467_838_typechecker_1226_849,
            A_typechecker_3467_838_typechecker_1226_850}
…o test/should_pass/poly_union_lower_bound_pass.erl
Based on:

    -spec combine_clauses_argument_constraints([type()], constraints:t(), map(), env()) -> constraints:t().
    combine_clauses_argument_constraints(ArgsTys, Cs, CatchAllArgs, Env) ->
        ArgsTyVars = lists:flatmap(fun
                                       ({var, _, Var}) -> [Var];
                                       (_) -> []
                                   end, ArgsTys),
        ...

The catch-all pattern does not register a constraint, while the tuple pattern does.
Because of that the type checker fails matching any type() with the funs clauses and returns a warning,
even though any type() instance will be handled by the fun.
@erszcz
Copy link
Collaborator Author

erszcz commented Dec 31, 2022

I've merged #499 to master and rebased this PR on top of that.

The self-check has uncovered a number of issues with functions with multiple clauses registering too narrow constraints on type variables. I came up with the idea that an argument type var should be constrained with a union of types accepted by all of such a fun's clauses. Let's consider flatmap/2 and the following example:

  -spec flatmap(Fun, List1) -> List2
                   when
                       Fun :: fun((A) -> [B]),
                       List1 :: [A],
                       List2 :: [B].
-type t1() :: {_, _, _, _}.
-type t2() :: {_, _, _}.

-spec f([t1() | t2()]) -> ok.
f(Fields) ->
    lists:flatmap(fun
                      ({_, _, _, _}) -> [t1];
                      ({_, _, _}) -> [t2]
                  end, Fields),
    ok.

When constraining A we must take all of the inline fun's clauses into account, i.e. constrain A :: {_, _, _, _} | {_, _, _}. The previous implementation used a constraint obtained from the first matching clause, which lead to Fields element t1() | t2() not type checking.

Another aspect of that can be seen in the following example:

  -spec foldl(Fun, Acc, List) -> Acc
                 when
                     Fun :: fun((T, Acc) -> Acc),
                     List :: [T].
-spec g([t1() | t2()]) -> ok.
g(Fields) ->
    lists:foldl(fun
                    ({_, _, _, _}, Acc) -> [t1 | Acc];
                    (_, Acc) -> [default | Acc]
                end, [], Fields),
    ok.

The (_, Acc) clause did not register any constraint on the first arg (_). Now, though, the catch-all clause is reflected in the union constraint as any(), i.e. the full constraint is T :: {_, _, _, _} | any().

I was afraid the idea of a union upper bound constraint might be a bit fragile, but so far it seems to work quite well 👍

@erszcz erszcz merged commit 22a8e19 into josefs:master Dec 31, 2022
@erszcz erszcz mentioned this pull request Dec 31, 2022
erszcz added a commit to erszcz/erlang_ls that referenced this pull request Jun 16, 2023
This version brings approx. 30 new PRs. The highlights are:

- Improve map exhaustiveness checking [erlang-ls#524](josefs/Gradualizer#524) by @xxdavid
- Fix all remaining self-check errors [erlang-ls#521](josefs/Gradualizer#521) by @erszcz
- Fix intersection-typed function calls with union-typed arguments [erlang-ls#514](josefs/Gradualizer#514) by @erszcz
- Experimental constraint solver [erlang-ls#450](josefs/Gradualizer#450) by @erszcz
erszcz added a commit to erszcz/erlang_ls that referenced this pull request Jun 16, 2023
This version brings approx. 30 PRs. The highlights are:

- Improve map exhaustiveness checking [erlang-ls#524](josefs/Gradualizer#524) by @xxdavid
- Fix all remaining self-check errors [erlang-ls#521](josefs/Gradualizer#521) by @erszcz
- Fix intersection-typed function calls with union-typed arguments [erlang-ls#514](josefs/Gradualizer#514) by @erszcz
- Experimental constraint solver [erlang-ls#450](josefs/Gradualizer#450) by @erszcz
erszcz added a commit to erszcz/erlang_ls that referenced this pull request Jun 16, 2023
This version brings approx. 30 PRs. The highlights are:

- Improve map exhaustiveness checking [erlang-ls#524](josefs/Gradualizer#524) by @xxdavid
- Fix all remaining self-check errors [erlang-ls#521](josefs/Gradualizer#521) by @erszcz
- Fix intersection-typed function calls with union-typed arguments [erlang-ls#514](josefs/Gradualizer#514) by @erszcz
- Experimental constraint solver [erlang-ls#450](josefs/Gradualizer#450) by @erszcz
robertoaloi pushed a commit to erlang-ls/erlang_ls that referenced this pull request Jun 17, 2023
This version brings approx. 30 PRs. The highlights are:

- Improve map exhaustiveness checking [#524](josefs/Gradualizer#524) by @xxdavid
- Fix all remaining self-check errors [#521](josefs/Gradualizer#521) by @erszcz
- Fix intersection-typed function calls with union-typed arguments [#514](josefs/Gradualizer#514) by @erszcz
- Experimental constraint solver [#450](josefs/Gradualizer#450) by @erszcz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants