Improved pattern compilation #11993

dsyme · 2021-08-18T23:41:54Z

This contains two related improvements to how we approach pattern-match compilation

Improved pattern match compilation for type tests and null tests

Using multiple "columns" of type tests and null tests was generating exponential amounts of code, e.g. see #12687

This is fixed by moving to a "column" of type tests and doing the work to properly determine when the success/failure of earlier elements of the column inform us about later patterns. For example:

let TestTwoColumnsOfTypeTestsWithSealedTypes(x: obj, y: obj) =
    match x, y with
    | :? string, :? string -> 1
    | :? int, :? int -> 2
    | :? bool, :? bool -> 3
    | :? float, :? float -> 4
    | :? char, :? char -> 5
    | _ -> 6

Here the success of :? string tells us that all the other (sealed) type tests will fail. Likewise consider this:

type A() = class end

type B() =
    inherit A()

let TestOneColumnOfTypeTestsWithUnSealedClassTypes_Redundant2(x: obj) =
    match x with
    | :? A -> 1
    | :? B -> 2 // expect - never matched 
    | _ -> 3

Here the failure of :? A tells us that :? B will fail. Similar information can be deduced from null tests and type tests on interfaces.

This removes a cause of exponential code generation, and for examples like #12687 the code goes down from, say 17MB of IL in one method to 3K - you can make this as dramatic as you like by adding extra clauses.

This also has the benefit that many new cases of "rule never matched" are detected.

There is also an improvement with generating more "fast" type tests that use the isinst instruction rather than a helper. Also the IL code sequence isinst; ldnull; cgt.un; brtrue/brfalse is improved to isinst; brtrue/brfalse

Improved pattern match compilation with `when`

While looking at #11981 I went back and reconsidered out approach for "problematic pattern matching clauses" that can lead to substantial code generation. For example, consider code like this:

type U = 
    | A1  of int 
    | A2 of int
    | A3 of int
    | A4 of int
    | A5 of int

let TestMatchWithWhen u  =
    match u with
    | A1 x when x < 4 -> 1000
    | A2 x when x < 4 -> 2000
    | A3 x when x < 4 -> 3000
    | A4 x when x < 4 -> 4000
    | A1 x -> x
    | _ -> 5000

Previously, we were doing these clauses "one by one", because the presence of the "when" pattern led us to consider the clause set "potentially-problematic", and so the complexity of pattern matching was reduced by doing it clause-by-clause. This means we fetched and tested the tag again and again, instead of doing it in a switch. This PR uses a new way of detecting potentially-problematic clauses.

After this PR, the clauses above are considered "all together" and a switch is now emitted. This is because the first thing tested in pattern matching is the Tag of U, that is, we use a switch to test down the "column" A1/A2/A3/A4, all of which correspond to the same "investigation". After this investigation, the pattern logic reduces to a set of conditionals (for the when clauses) and, crucially, the failure branch of a when clause just proceeds on to th next when clause (and the success branch goes to the necessary target). That is, the pattern logic is linear and without duplication or problematic expansion.

The IL optimized code size for the above example reduces from 181 to 173. Is that important? Not in itself. Does this help performance? I'm not certain. It feels better, the code looks improved.

This comes with a gain - we now have improved detection of unused patterns! This can give rise to additional warnings (which may mean it needs a --langversion switch?). For example,

type U = 
    | A1  of int 
    | A2 of int

let TestMatchWithWhen u  =
    match u with
    | A1 x when x < 4 -> 1000
    | A2 x when x < 4 -> 2000
    | A1 x -> x
    | A2 x -> x
    | _ -> 5000

Previously this didn't give an "unused match" warning. Now it does. This is because the presence of the initial when clauses don't disable unused clause warning errors.

There were about 4 places in our codebase where this detected unused pattern match clauses. Two of these were in the TypeProvider SDK.

Specification of pattern clause grouping

A note from the code is copied below.

// Three pattern constructs can cause significant code expansion in various combinations
//   - Partial active patterns
//   - Disjunctive patterns
//   - Pattern clauses with 'when'
//
// Partial active patterns that are not the "last" thing in a clause,
// combined with subsequent clauses, can cause significant code expansion
// because they are decided on one by one. Each failure path expands out the subsequent
// clause logic (with the active pattern contributing no reduction of those subsequent
// clauses).  Each success path expands out any subsequent logic in the clause plus
// subsequent clause logic.
//
//    | ActivePat1, ActivePat2 -> ...
//    | more-logic
//
// goes to
//     switch (ActivePat1)
//        switch (ActivePat2)
//           --> tgt1
//           --> more-logic
//     --> more-logic
//
// When a partial active pattern is used in the last meaningful position the clause is
// not problematic, e.g.
//
//    | ActivePat1, ActivePat2 -> ...
//    | more-logic
//
// So when generating code we take clauses up until the first one containing
// a partial pattern.  This can lead to sub-standard code generation
// but has long been the technique we use to avoid blow-up of pattern matching.
//
// Disjunctive patterns combined with 'when' clauses can also cause signficant code
// expansion. In particular this leads to multiple copies of 'when' expressions (even for one clause)
// and each failure path of those 'when' will then continue on the expand any remaining
// pattern logic in subsequent clauses. So when generating code we take clauses up
// until the first one containing a disjunctive pattern with a 'when' clause.
//
// Disjunction will still cause significant expansion, e.g. 
//    (A | B), (C | D) ->
// is immediately expanded out to four frontiers each with two investigation points.
//    A, C -> ...
//    A, D -> ...
//    B, C -> ...
//    B, D -> ...
//
// Of course, some decision-logic expansion here is expected. Further, for unions, integers, characters, enums etc.
// the column-based matching on A/B and C/D eliminates these relatively efficiently, e.g. to
//    one-switch-on-A/B 
//    on each path, one switch on C/D
// So disjunction alone isn't considered problematic, but in combination with 'when' patterns

tests/service/PatternMatchCompilationTests.fs

dsyme · 2022-03-03T23:35:19Z

This is ready for review @vzarytovskii @KevinRansom @TIHan

* update * fix test case * column-based type tests * column-based type tests * update test * improved type matching analysis * improve diagnostics * fix codegen * fix baselines * fix baselines * update baselines and improve isinst codegen * missing file * update baselines Co-authored-by: Don Syme <donsyme@fastmail.com>

Don Syme added 2 commits August 19, 2021 00:45

update

54006b2

update

28a4af5

dsyme force-pushed the pm1 branch from 518f10f to 28a4af5 Compare August 18, 2021 23:47

fix test case

c99d220

Happypig375 suggested changes Aug 22, 2021

View reviewed changes

tests/service/PatternMatchCompilationTests.fs Outdated Show resolved Hide resolved

dsyme mentioned this pull request Feb 4, 2022

Large Pattern match results in InvalidProgramException runtime exception #12687

Closed

dsyme and others added 7 commits February 23, 2022 18:22

Merge branch 'main' of https://github.com/dotnet/fsharp into pm1

85578d4

column-based type tests

6cd9bce

column-based type tests

2a79dd4

update test

4dc1adf

improved type matching analysis

1c956a7

improve diagnostics

14b7577

Merge branch 'main' of https://github.com/dotnet/fsharp into pm1

273e032

dsyme changed the title ~~Experiment: improved pattern analysis and codegen for problematic disjunctive/active patterns~~ [WIP] improved pattern analysis and codegen for type tests and problematic disjunctive/active patterns Mar 3, 2022

dsyme added 5 commits March 3, 2022 13:10

fix codegen

7f0cb49

fix baselines

1d0b29c

fix baselines

e7fe62b

update baselines and improve isinst codegen

e1edd3c

missing file

0066e06

dsyme changed the title ~~[WIP] improved pattern analysis and codegen for type tests and problematic disjunctive/active patterns~~ Improved pattern compilation Mar 3, 2022

update baselines

36d0f18

dsyme mentioned this pull request Mar 3, 2022

Crash in VS for files containing large pattern matches #11783

Closed

vzarytovskii approved these changes Mar 4, 2022

View reviewed changes

KevinRansom approved these changes Mar 8, 2022

View reviewed changes

KevinRansom merged commit 597446a into dotnet:main Mar 15, 2022

vzarytovskii mentioned this pull request May 23, 2022

Regression in dotnet SDK 6.0.300: optimizing match with type checks #13175

Closed

alrz mentioned this pull request Jan 9, 2023

Missing warning for unreachable arm in the presence of an intermediate when clause #14573

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved pattern compilation #11993

Improved pattern compilation #11993

dsyme commented Aug 18, 2021 •

edited

Loading

dsyme commented Mar 3, 2022

Improved pattern compilation #11993

Improved pattern compilation #11993

Conversation

dsyme commented Aug 18, 2021 • edited Loading

Improved pattern match compilation for type tests and null tests

Improved pattern match compilation with when

Specification of pattern clause grouping

dsyme commented Mar 3, 2022

dsyme commented Aug 18, 2021 •

edited

Loading

Improved pattern match compilation with `when`