Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List patterns: subsumption checking #53891

Merged
merged 72 commits into from
Sep 16, 2021

Conversation

alrz
Copy link
Member

@alrz alrz commented Jun 4, 2021

Test plan: #51289

@alrz alrz requested a review from a team as a code owner June 4, 2021 21:48
@alrz alrz force-pushed the list-patterns-04 branch from c953af7 to 7e24c46 Compare June 4, 2021 22:07
@jcouv jcouv self-assigned this Jun 5, 2021
@jcouv jcouv added this to the C# 10 milestone Jun 5, 2021
@alrz alrz marked this pull request as draft June 8, 2021 07:13
@alrz
Copy link
Member Author

alrz commented Jun 8, 2021

Based on the last LDM, we want to cover nested list patterns in slices - ignoring Slice actual return value (assuming it's correct) in which case we can just flatten the list and proceed as usual, meaning that {1, ..{2, ..p}, 3} will yield the same dag as {1, 2, ..p, 3} and therefore will produce the same codegen and subsumption results.

@CyrusNajmabadi
Copy link
Member

meaning that {1, ..{2, ..p}, 3} will yield the same dag as {1, 2, ..p, 3} and therefore will produce the same codegen and subsumption results.

@333fred @jcouv for clarification here. My understanding was that we would think of the above two as the same for purposes of subsumption checking. however, i don't think we stated or made a conclusion on if we'd expect the same codegen. For example, i could imagine different codegen for {1, ..{2, ..p}, 3} where we are actually operating on the result of the slice.

That said, perhaps havin ghte same codegen is the logic outcome of us effectively saying that we presume that if you implement hte list shape that you are 'well behaved' and we can make these sorts of assumptions. Just wanted to ensure that this was what both of you intended.

Thanks!

@alrz
Copy link
Member Author

alrz commented Jun 8, 2021

My understanding was that we would think of the above two as the same for purposes of subsumption checking.

I guess that was indeed the intent.

But when we're going to assume that the subslice can subsume outer subpatterns, that means we expect values to be the same in which case it would be correct to flatten the list (otherwise the subsumption error is incorrect in the first place).

We're treating properties in this manner: again, we assume the implementation is idempotent, that applies to both subsumption checking as well as codegen. The two are actually very much in sync as to what counts as subsumption and what the codegen looks like.

@CyrusNajmabadi
Copy link
Member

@alrz Yup. That all makes 100% sense to me. I just want to triple check with @333fred and @jcouv that we're all ok with that being hte natural fallout here. I didn't consider it at the time, but it does seem reasonable to me. I just want ot make sure we're all in agreement here. Thanks!

@jcouv
Copy link
Member

jcouv commented Jun 8, 2021

@alrz I agree that the "well-behaved" assumption would allow us to do some codegen optimizations.
That would be consistent although we didn't explicitly discuss that in LDM. We'd probably want to bring it up if we want to explore seriously.
Such codegen optimizations seem secondary to addressing the exhaustiveness/subsumption analysis (ie. they would be allowed but not necessary).

What scenarios might we want to optimize?

  • scenarios where the slice is captured (for positional or Type or designation pattern) cannot be optimized, because we don't generally know how to cook up a slice even if we've already fetched all the items.
  • scenarios where the slice isn't captured (like {1, ..{2, ..p}, 3}) aren't worth optimizing in my opinion. I would keep the "natural" codegen implied by the code (ie. calling Slice) and leave flattening the list patterns up to the user.

In reply to: 856929120


In reply to: 856929120

@alrz

This comment has been minimized.

@alrz alrz force-pushed the list-patterns-04 branch 3 times, most recently from 4704ce9 to 6eb6db8 Compare July 2, 2021 16:58
@jaredpar jaredpar modified the milestones: C# 10, 17.0 Jul 13, 2021
@alrz alrz force-pushed the list-patterns-04 branch from 9f06161 to eb4fa3a Compare July 14, 2021 13:18
@alrz alrz force-pushed the list-patterns-04 branch from eb4fa3a to 8ddbf28 Compare July 21, 2021 12:01
@alrz alrz marked this pull request as ready for review July 21, 2021 12:27
@alrz alrz requested review from jcouv and a team and removed request for a team July 21, 2021 12:28
@agocke agocke self-assigned this Sep 9, 2021
@alrz
Copy link
Member Author

alrz commented Sep 9, 2021

Nice. https://github1s.com/dotnet/roslyn/pull/53891 If it only could add review comments..

Copy link
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks (iteration 67) with two nits to double-check with Fred: confirm semantics for new parameters of CheckConsistentDecision and naming of the "assignment" bound node.

@jcouv
Copy link
Member

jcouv commented Sep 10, 2021

FWIW, I just noticed that we have some tests that directly verify the DAG. Just wanted to mention it as this could be useful for this feature (it's easier to verify that IL). For example, DecisionDag_Dump_SwitchExpression.

@agocke
Copy link
Member

agocke commented Sep 13, 2021

Sorry, I can't promise I'll have time to fully review this. It looks like there's a lot of additional code in the branch already that's pretty important to the behavior, but looking through the existing branch code will take a lot more time.

: !decisionPermitsTrueOther
? AndSequence.Create(Not.Create(AndSequence.Create(precondition, sideeffect)), other)
: AndSequence.Create(OrSequence.Create(Not.Create(precondition), sideeffect), other);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a design review with the dev team on this feature. The goal was mostly education (refresh the team based on what Fred and I now understand about DAG design and explain the changes from this PR) and also allow feedback.
@AlekseyTs raised an interesting idea which I'd like to bounce with you. This is not meant to block this PR, but in case it could spark your imagination with a potentially simpler design.

Instead of injecting preconditions and sideeffects, what if this Filter method had the following structure:
// step 1: if the length is known to have a single value (**), then rewrite the current test in light of that knowledge
// step 2: existing logic CheckConsistentDecision followed by simple ternaries (as it existed prior to this PR)

In step 1: if in the current state we know that Length==1 (by looking at remaining values), and the current test/evaluation is regarding an indexer access, then we replace the current test. For example t3 == 42 would be rewritten as t2 == 42. Basically, from this point on in the DAG we'd be dealing with normalized temps (no longer dealing with t3, and dealing with t2 instead).

We'd need to think more about how to deal with remaining values in that design. Conceptually a similar approach might work (if the length is known to have a specific value in the current state, then remaining values of certain temps can be merged).

(**) in writing this summary, I realized that identifying states where length gets narrowed to a single value isn't easy. If the selected test is Length == k then this is easy. But the selected test could be Length >= k which could narrow the set of remaining values to only k. So we'd have to compute remaining values first to see if Length was narrowed to a single value...

Copy link
Member Author

@alrz alrz Sep 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrite the current test in light of that knowledge

we'd have to compute remaining values first

Since we've been through many MANY flaws in the current design already (thanks for that btw!) I think we can easily poke holes in alternative approaches. IIRC we did consider both of these (which are interdependent) but I was convinced it's going to get too subtle. There's just many questions to answer before we even get close to a rough sketch. Here's some examples:

  • What happens if RemainingTests is not linear? A precomputed length value may not be always the same for all indexer evaluations e.g. when we have or combinators which will cause the dag to split. That would require a lot of plumbing to get right (remember those could come from an explicit Length test in a property pattern so there's infinite possibilities).

  • Rewriting tests would have the same problem, plus we may still need the original input for subsequent tests so we should keep track of both somewhere. When you say "from this point on in the dag" that means we need to adjust DagState which has its own way of handling we didn't need to touch in this PR.

  • At a given state, what temps need to be merged exactly? It's possible that the rewrite depends on multiple length values and we can't depend on a direct value test for the reasons mentioned above.

The current approach is formulated inside the decision dag itself, rather than a pre-processing step + maintaining a bunch of additional states to cover all corner cases. So I believe this has made as simple as possible, if we go any further there's a high chance that we could miss something. (edit: I'd be absolutely curious to see this implemented differently, it just took me three months to wrap my head around this one!)

Copy link
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks (iteration 71)

Copy link
Member

@333fred 333fred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (commit 71). We can look at possible simplifications as a followup.

@jcouv
Copy link
Member

jcouv commented Sep 16, 2021

Not sure why license/cla CI leg is showing "Expected — Waiting for status to be reported". It's preventing the merge.
@jaredpar Any idea?

@jaredpar
Copy link
Member

jaredpar commented Sep 16, 2021

Seems like a one off problem. It's working fine in other PRs. We can merge over that if it's the only problem.

@jcouv jcouv enabled auto-merge (squash) September 16, 2021 05:18
@jcouv jcouv merged commit ffe7d77 into dotnet:features/list-patterns Sep 16, 2021
@RikkiGibson
Copy link
Contributor

🎉🎉🎉

@jcouv
Copy link
Member

jcouv commented Sep 16, 2021

Thanks @alrz for pushing through. This was an especially interesting and challenging change. I think the result is pretty fantastic, as it achieves the maximum exhaustiveness/subsumption enforcement that we were hoping for 🎉
:shipit:

{
return this == obj ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return this == obj ||

Here and below it looks like this change removed the reference equality optimization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just moved to the base; Equals does its own and then calls into the overrides.
For direct IsEquivalent usages it's not important because it's rather unlikely to have identical nodes where we use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just moved to the base; Equals does its own and then calls into the overrides.

Because of this change we no longer bypass the following checks when the same instance is compared.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        public bool Equals(BoundDagEvaluation other)
        {
            return this == other ||
                this.IsEquivalentTo(other) &&
                this.Input.Equals(other.Input);
        }

I think this implementation does just that if I'm not mistaken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implementation does just that if I'm not mistaken.

I am not sure what implementation you are looking at, I am commenting on this implementation:

        public override bool IsEquivalentTo(BoundDagEvaluation obj)
        {
            return base.IsEquivalentTo(obj) &&
                // base.IsEquivalentTo checks the kind field, so the following cast is safe
                this.Index == ((BoundDagIndexEvaluation)obj).Index;
        }

I think, the checks after base.IsEquivalentTo(obj) are performed for the same instance. Whereas before the change we would bypass all checks.

Copy link
Member Author

@alrz alrz Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about BoundDagEvaluation.Equals which is an existing code path with reference check intact.

A I said earlier, for direct IsEquivalentTo usages (where we indeed don't have a reference check) most of the time we've already determined that the two are not the same instance, e.g. Equals just returned false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just revert the check if you feel it's going to make a visible difference. I have no objection on that.

Copy link
Contributor

@AlekseyTs AlekseyTs Oct 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just revert the check if you feel it's going to make a visible difference. I have no objection on that.

I prefer that we don't remove optimazation just because we assume that APIs are going to be called in certain order. That doesn't feel like the right approoach in the long term.

// Ignore input source as it will be matched in the subsequent iterations.
if (s1Input.IsEquivalentTo(s2Input) &&
// We don't want to pair two indices within the same pattern.
s1LengthTemp.Syntax != s2LengthTemp.Syntax)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s1LengthTemp.Syntax != s2LengthTemp.Syntax

An interpretation of bound nodes shouldn't depend on syntax they are associated with. I think we should find another way to perform this check.

Copy link
Contributor

@AlekseyTs AlekseyTs Oct 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syntax in bound nodes is just a pointer to related location in source, it is not meant to convey any semantical information. In general, we shouldn't even make assumptions what syntax node a bound node points to. Binding should keep working the same way even when all bound nodes point to the same syntax node, from a dummy tree for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants