Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve map exhaustiveness checking #524

Merged
merged 7 commits into from
May 26, 2023

Conversation

xxdavid
Copy link
Collaborator

@xxdavid xxdavid commented Mar 29, 2023

As I've promised in #513, I've looked at exhaustiveness checking for maps, and spent the past several weeks trying to figure out how the algorithms should work. This is like my third iteration of the implementation, the previous ones that I wrote were buggy, so there may be bugs even in this one (but I hope not).

I've tried to test it thoroughly and everything seems to work (except for two edge cases described in the commits). I've also tried to make booleans refinable (as in #513) and everything passes as well.

@erszcz @zuiderkwast What do you think?

In the CheckAssoc lambda function inside add_type_pat
for maps, we already create all associations with
map_field_exact, so there is no need to rewrite it.
This is probably a relic of some refactoring.

I am removing the function rewrite_map_assocs_to_exacts
as this was the only place where it was used.
Fixes the following test in typechecker_tests.erl:
    ?_assertEqual(?t(none()), type_pat(?e(#{"abc" := 42}), ?t(#{string() => integer()})))

I am not completely sure about this change, but I
think returning a map type containing a "none() := ..."
association does not make sense, and that we want
to return just none() in such cases.
This handles cases when you don't match on all of the fields
of the map. For instance, if you have the type #{x := a, y := b}
and the pattern #{x := a}. This pattern matches the value #{x := a}
but also #{x := a, y := b}. The resulting pattern type thus has
to be #{x := a, y => b}.

It also handles cases when you match only on part of the key type.
For instance, if you have the type #{x|y := a} and the pattern
is #{x := a}, it matches both #{x := a} and #{x := a, y := a},
and thus the pattern should be #{x := a, y => a}.

Unfortunately, this commit fixes stuff but it also breaks other
stuff. If you have a pattern with duplicate keys
(like #{x := a, x := a}), Gradualizer will now report a key
error, although this pattern is completely okay (given that the
type is for example #{x := a}). However, this is a really rare
use of pattern matching, and can always be rewritten (by just
dedupliting the keys in the pattern). Still, it should be fixed
in the future.
Copy link
Collaborator

@erszcz erszcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is terrific work, @xxdavid! 🚀
And sorry for taking this long with the review.

Copy link
Collaborator

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a full review. I've just skimmed through it. It seems to solve a lot of things. 👍

src/typechecker.erl Outdated Show resolved Hide resolved
Comment on lines +7 to +11
% It has surfaced because
% {'type', anno(), 'map_field_assoc' | 'map_field_exact', [abstract_type()]}
% is not a subtype of
% {'type', anno(), 'map_field_assoc', [abstract_type()]}
% | {'type', anno(), 'map_field_exact', [abstract_type()]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we try to solve this by normalizeing them to the same structure?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe we could. But I'm not sure what should be criterion/algorithm for merging. We probably agree that {t, a} | {t, b} should be normalized into {t, a|b}. But what about {t, a, c} | {t, b, d}, should it be merged into {t, a|b, c|d}? Probably not because this way, we would lose information. So what would be the criterion? Whether all the elements of the tuples but one are the same?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's not trivial in the generic case.

Maybe tuples that differ in only one element can be a good candidate, but it may be hard to detect them. We need some algorithm to group them. Maybe something similar for unions of records and maps where only one element differs. Let's leave it for another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. 🙂

Copy link
Collaborator

@erszcz erszcz Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some algorithm to group them.

I think we have to do the opposite, i.e. expand them. I've stumbled upon this in practice when dogfooding.

When we have a function with a lot of clauses, a clause that matches on a structure with an inner union needs an assertion from the programmer or an inner case expr to ensure the entire union is handled. This happens even if a previous clause handles a specific case of the structure with one of the union elements.

It's painful because it breaks exhaustiveness checking and requires a seemingly unnecessary annotation.

I think that if we expanded to a union of structures this would be refined and eliminated properly. However, it would come at a cost of having do deal with way bigger unions at the top level.

I'm writing from my phone, I'll follow up with specific examples from the codebase later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, but if we expand them, there a risk they will explode. (Too many elements.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the promised example from dogfooding that I think is related to this problem:

compat_ty({type, _, tuple, any}, {type, _, tuple, _Args}, Seen, _Env) ->
ret(Seen);
compat_ty({type, _, tuple, _Args}, {type, _, tuple, any}, Seen, _Env) ->
ret(Seen);
compat_ty({type, _, tuple, Args1}, {type, _, tuple, Args2}, Seen, Env) ->
%% We can assert because we match out `any' in previous clauses.
%% TODO: it would be perfect if Gradualizer could refine this type automatically in such a case
compat_tys(?assert_type(Args1, [type()]),
?assert_type(Args2, [type()]), Seen, Env);

If we do:

 compat_ty({type, _, tuple, Args1}, {type, _, tuple, Args2}, Seen, Env) ->
     %% We can assert because we match out `any' in previous clauses.
     %% TODO: it would be perfect if Gradualizer could refine this type automatically in such a case
-    compat_tys(?assert_type(Args1, [type()]),
+    compat_tys(Args1,
                ?assert_type(Args2, [type()]), Seen, Env);

we get the following self-check error:

ebin/typechecker.beam: The variable on line 413 at column 16 is expected to have type [type()] but it has type any | [abstract_type()]

compat_ty({type, _, tuple, Args1},
          {type, _, tuple, Args2}, Seen, Env) ->
    compat_tys(Args1,
               ^^^^^

This happens even though, as the comment says, the any element of that union is explicitly matched two clauses above.

Copy link
Collaborator

@erszcz erszcz Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, Lambda Days 2023 took place yesterday and today, so I had a chance to talk with Guillaume Duboc, who works on a type system for Elixir. He pointed me to this paper, which suggests using a disjunctive normal form, i.e. a union of intersections, to represent types in the subtyping algorithm. This seems to match my intuition about expanding unions for the sake of subtyping and calculating the type_diff. I'll try experimenting with the idea.

@xxdavid
Copy link
Collaborator Author

xxdavid commented May 22, 2023

Thank you for your reviews! 🙂

Discovered by the work in the previous commit.
I've also moved in a similar case from known_problems/should_pass/map_union.erl.
@xxdavid xxdavid force-pushed the fix_map_exhaustiveness branch 3 times, most recently from 26ac4d1 to 59d1786 Compare May 22, 2023 16:29
The current algorithm was too simplistic and didn't work in many cases.
Most of the newly added golden tests are failing (producing wrong results)
on master.

Together with the rewrite of add_pats_types for maps, it should work
correctly for most cases. The only known problem is that maps with
non-singleton required keys are not checked for exhaustiveness. But
this should not cause much trouble because such maps aren't often
used in practice.
@erszcz
Copy link
Collaborator

erszcz commented May 26, 2023

Ok, it seems all comments are addressed, so I think it's ready to merge - I'm clicking the button.

@erszcz erszcz merged commit 7433863 into josefs:master May 26, 2023
erszcz added a commit to erszcz/erlang_ls that referenced this pull request Jun 16, 2023
This version brings approx. 30 new PRs. The highlights are:

- Improve map exhaustiveness checking [erlang-ls#524](josefs/Gradualizer#524) by @xxdavid
- Fix all remaining self-check errors [erlang-ls#521](josefs/Gradualizer#521) by @erszcz
- Fix intersection-typed function calls with union-typed arguments [erlang-ls#514](josefs/Gradualizer#514) by @erszcz
- Experimental constraint solver [erlang-ls#450](josefs/Gradualizer#450) by @erszcz
erszcz added a commit to erszcz/erlang_ls that referenced this pull request Jun 16, 2023
This version brings approx. 30 PRs. The highlights are:

- Improve map exhaustiveness checking [erlang-ls#524](josefs/Gradualizer#524) by @xxdavid
- Fix all remaining self-check errors [erlang-ls#521](josefs/Gradualizer#521) by @erszcz
- Fix intersection-typed function calls with union-typed arguments [erlang-ls#514](josefs/Gradualizer#514) by @erszcz
- Experimental constraint solver [erlang-ls#450](josefs/Gradualizer#450) by @erszcz
erszcz added a commit to erszcz/erlang_ls that referenced this pull request Jun 16, 2023
This version brings approx. 30 PRs. The highlights are:

- Improve map exhaustiveness checking [erlang-ls#524](josefs/Gradualizer#524) by @xxdavid
- Fix all remaining self-check errors [erlang-ls#521](josefs/Gradualizer#521) by @erszcz
- Fix intersection-typed function calls with union-typed arguments [erlang-ls#514](josefs/Gradualizer#514) by @erszcz
- Experimental constraint solver [erlang-ls#450](josefs/Gradualizer#450) by @erszcz
robertoaloi pushed a commit to erlang-ls/erlang_ls that referenced this pull request Jun 17, 2023
This version brings approx. 30 PRs. The highlights are:

- Improve map exhaustiveness checking [#524](josefs/Gradualizer#524) by @xxdavid
- Fix all remaining self-check errors [#521](josefs/Gradualizer#521) by @erszcz
- Fix intersection-typed function calls with union-typed arguments [#514](josefs/Gradualizer#514) by @erszcz
- Experimental constraint solver [#450](josefs/Gradualizer#450) by @erszcz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants