Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved pattern compilation #11993

Merged
merged 16 commits into from
Mar 15, 2022
Merged

Improved pattern compilation #11993

merged 16 commits into from
Mar 15, 2022

Conversation

dsyme
Copy link
Contributor

@dsyme dsyme commented Aug 18, 2021

This contains two related improvements to how we approach pattern-match compilation

Improved pattern match compilation for type tests and null tests

Using multiple "columns" of type tests and null tests was generating exponential amounts of code, e.g. see #12687

This is fixed by moving to a "column" of type tests and doing the work to properly determine when the success/failure of earlier elements of the column inform us about later patterns. For example:

let TestTwoColumnsOfTypeTestsWithSealedTypes(x: obj, y: obj) =
    match x, y with
    | :? string, :? string -> 1
    | :? int, :? int -> 2
    | :? bool, :? bool -> 3
    | :? float, :? float -> 4
    | :? char, :? char -> 5
    | _ -> 6

Here the success of :? string tells us that all the other (sealed) type tests will fail. Likewise consider this:

type A() = class end

type B() =
    inherit A()

let TestOneColumnOfTypeTestsWithUnSealedClassTypes_Redundant2(x: obj) =
    match x with
    | :? A -> 1
    | :? B -> 2 // expect - never matched 
    | _ -> 3

Here the failure of :? A tells us that :? B will fail. Similar information can be deduced from null tests and type tests on interfaces.

This removes a cause of exponential code generation, and for examples like #12687 the code goes down from, say 17MB of IL in one method to 3K - you can make this as dramatic as you like by adding extra clauses.

This also has the benefit that many new cases of "rule never matched" are detected.

There is also an improvement with generating more "fast" type tests that use the isinst instruction rather than a helper. Also the IL code sequence isinst; ldnull; cgt.un; brtrue/brfalse is improved to isinst; brtrue/brfalse

Improved pattern match compilation with when

While looking at #11981 I went back and reconsidered out approach for "problematic pattern matching clauses" that can lead to substantial code generation. For example, consider code like this:

type U = 
    | A1  of int 
    | A2 of int
    | A3 of int
    | A4 of int
    | A5 of int

let TestMatchWithWhen u  =
    match u with
    | A1 x when x < 4 -> 1000
    | A2 x when x < 4 -> 2000
    | A3 x when x < 4 -> 3000
    | A4 x when x < 4 -> 4000
    | A1 x -> x
    | _ -> 5000

Previously, we were doing these clauses "one by one", because the presence of the "when" pattern led us to consider the clause set "potentially-problematic", and so the complexity of pattern matching was reduced by doing it clause-by-clause. This means we fetched and tested the tag again and again, instead of doing it in a switch. This PR uses a new way of detecting potentially-problematic clauses.

After this PR, the clauses above are considered "all together" and a switch is now emitted. This is because the first thing tested in pattern matching is the Tag of U, that is, we use a switch to test down the "column" A1/A2/A3/A4, all of which correspond to the same "investigation". After this investigation, the pattern logic reduces to a set of conditionals (for the when clauses) and, crucially, the failure branch of a when clause just proceeds on to th next when clause (and the success branch goes to the necessary target). That is, the pattern logic is linear and without duplication or problematic expansion.

The IL optimized code size for the above example reduces from 181 to 173. Is that important? Not in itself. Does this help performance? I'm not certain. It feels better, the code looks improved.

This comes with a gain - we now have improved detection of unused patterns! This can give rise to additional warnings (which may mean it needs a --langversion switch?). For example,

type U = 
    | A1  of int 
    | A2 of int

let TestMatchWithWhen u  =
    match u with
    | A1 x when x < 4 -> 1000
    | A2 x when x < 4 -> 2000
    | A1 x -> x
    | A2 x -> x
    | _ -> 5000

Previously this didn't give an "unused match" warning. Now it does. This is because the presence of the initial when clauses don't disable unused clause warning errors.

There were about 4 places in our codebase where this detected unused pattern match clauses. Two of these were in the TypeProvider SDK.

Specification of pattern clause grouping

A note from the code is copied below.

// Three pattern constructs can cause significant code expansion in various combinations
//   - Partial active patterns
//   - Disjunctive patterns
//   - Pattern clauses with 'when'
//
// Partial active patterns that are not the "last" thing in a clause,
// combined with subsequent clauses, can cause significant code expansion
// because they are decided on one by one. Each failure path expands out the subsequent
// clause logic (with the active pattern contributing no reduction of those subsequent
// clauses).  Each success path expands out any subsequent logic in the clause plus
// subsequent clause logic.
//
//    | ActivePat1, ActivePat2 -> ...
//    | more-logic
//
// goes to
//     switch (ActivePat1)
//        switch (ActivePat2)
//           --> tgt1
//           --> more-logic
//     --> more-logic
//
// When a partial active pattern is used in the last meaningful position the clause is
// not problematic, e.g.
//
//    | ActivePat1, ActivePat2 -> ...
//    | more-logic
//
// So when generating code we take clauses up until the first one containing
// a partial pattern.  This can lead to sub-standard code generation
// but has long been the technique we use to avoid blow-up of pattern matching.
//
// Disjunctive patterns combined with 'when' clauses can also cause signficant code
// expansion. In particular this leads to multiple copies of 'when' expressions (even for one clause)
// and each failure path of those 'when' will then continue on the expand any remaining
// pattern logic in subsequent clauses. So when generating code we take clauses up
// until the first one containing a disjunctive pattern with a 'when' clause.
//
// Disjunction will still cause significant expansion, e.g. 
//    (A | B), (C | D) ->
// is immediately expanded out to four frontiers each with two investigation points.
//    A, C -> ...
//    A, D -> ...
//    B, C -> ...
//    B, D -> ...
//
// Of course, some decision-logic expansion here is expected. Further, for unions, integers, characters, enums etc.
// the column-based matching on A/B and C/D eliminates these relatively efficiently, e.g. to
//    one-switch-on-A/B 
//    on each path, one switch on C/D
// So disjunction alone isn't considered problematic, but in combination with 'when' patterns

@dsyme dsyme changed the title Experiment: improved pattern analysis and codegen for problematic disjunctive/active patterns [WIP] improved pattern analysis and codegen for type tests and problematic disjunctive/active patterns Mar 3, 2022
@dsyme dsyme changed the title [WIP] improved pattern analysis and codegen for type tests and problematic disjunctive/active patterns Improved pattern compilation Mar 3, 2022
@dsyme
Copy link
Contributor Author

dsyme commented Mar 3, 2022

This is ready for review @vzarytovskii @KevinRansom @TIHan

@KevinRansom KevinRansom merged commit 597446a into dotnet:main Mar 15, 2022
pirrmann pushed a commit to pirrmann/fsharp that referenced this pull request Mar 22, 2022
* update

* fix test case

* column-based type tests

* column-based type tests

* update test

* improved type matching analysis

* improve diagnostics

* fix codegen

* fix baselines

* fix baselines

* update baselines and improve isinst codegen

* missing file

* update baselines

Co-authored-by: Don Syme <donsyme@fastmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants