Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling causes OutOfMemoryException and IDE unresponsive/crash when pattern matching against record with many nested DU fields #4691

Open
cmeeren opened this issue Apr 6, 2018 · 17 comments
Labels
Area-Compiler-PatternMatching pattern compilation, active patterns, performance, codegen Feature Improvement Theme-Performance
Milestone

Comments

@cmeeren
Copy link
Contributor

cmeeren commented Apr 6, 2018

I expecience OutOfMemoryException when compiling code that pattern matches many cases on a record with many fields that are nested DU (e.g. int option option as shown in the repro below, but I originally experienced it with other DU types). Furthermore, the IDE/Intellisense becomes unresponsive and may crash.

Repro steps

Test.zip

Either try the test solution, or simply create a new .NET Standard library (others may produce the error too, but I haven't tested), paste the following code and compile/observe IDE behavior.

module Test

type ManyFields =
  {
    Field1: int option option
    Field2: int option option
    Field3: int option option
    Field4: int option option
    Field5: int option option
    Field6: int option option
    Field7: int option option
    Field8: int option option
    Field9: int option option
    Field10: int option option
    Field11: int option option
    Field12: int option option
    Field13: int option option
    Field14: int option option
    Field15: int option option
    Field16: int option option
    Field17: int option option
    Field18: int option option
    Field19: int option option
    Field20: int option option
  }

let getFirstField manyFields =
  match manyFields with
  | { Field1 = Some (Some i) } -> Some i
  | { Field2 = Some (Some i) } -> Some i
  | { Field3 = Some (Some i) } -> Some i
  | { Field4 = Some (Some i) } -> Some i
  | { Field5 = Some (Some i) } -> Some i
  | { Field6 = Some (Some i) } -> Some i
  | { Field7 = Some (Some i) } -> Some i
  | { Field8 = Some (Some i) } -> Some i
  | { Field9 = Some (Some i) } -> Some i
  | { Field10 = Some (Some i) } -> Some i
  | { Field11 = Some (Some i) } -> Some i
  | { Field12 = Some (Some i) } -> Some i
  | { Field13 = Some (Some i) } -> Some i
  | { Field14 = Some (Some i) } -> Some i
  | { Field15 = Some (Some i) } -> Some i
  | { Field16 = Some (Some i) } -> Some i
  | { Field17 = Some (Some i) } -> Some i
  | { Field18 = Some (Some i) } -> Some i
  | { Field19 = Some (Some i) } -> Some i
  | { Field20 = Some (Some i) } -> Some i
  | _ -> None

Expected behavior

The code compiles successfully.

Actual behavior

Compilation fails with the following output (the number of exceptions may vary):

1>------ Rebuild All started: Project: Test, Configuration: Debug Any CPU ------
1>error FS0193 : internal error : Exception of type 'System.OutOfMemoryException' was thrown.
1>error FS0193 : internal error : Exception of type 'System.OutOfMemoryException' was thrown.
1>Done building project "Test.fsproj" -- FAILED.
========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========

Known workarounds

  • Break up the pattern matching if possible. On my machine it seems that 18 cases are fine, but 19+ will cause the error. For example, the following works fine:
let getFirstField manyFields =
  match manyFields with
  | { Field1 = Some (Some i) } -> Some i
  | { Field2 = Some (Some i) } -> Some i
  | { Field3 = Some (Some i) } -> Some i
  | { Field4 = Some (Some i) } -> Some i
  | { Field5 = Some (Some i) } -> Some i
  | { Field6 = Some (Some i) } -> Some i
  | { Field7 = Some (Some i) } -> Some i
  | { Field8 = Some (Some i) } -> Some i
  | { Field9 = Some (Some i) } -> Some i
  | { Field10 = Some (Some i) } -> Some i
  | _ -> match manyFields with
      | { Field11 = Some (Some i) } -> Some i
      | { Field12 = Some (Some i) } -> Some i
      | { Field13 = Some (Some i) } -> Some i
      | { Field14 = Some (Some i) } -> Some i
      | { Field15 = Some (Some i) } -> Some i
      | { Field16 = Some (Some i) } -> Some i
      | { Field17 = Some (Some i) } -> Some i
      | { Field18 = Some (Some i) } -> Some i
      | { Field19 = Some (Some i) } -> Some i
      | { Field20 = Some (Some i) } -> Some i
      | _ -> None
  • Find a completely different way to do control flow
  • Find a different way to design your code

Related information

This SO answer by Tomas Petricek looks relevant. If it's the same underlying problem, then it has been around since at least 2012. About time it's fixed :)

Versions:

  • VS 15.6.5
  • VF# Tools 10.1 for F# 4.1 - 15.6.0.0
@realvictorprm
Copy link
Contributor

realvictorprm commented Apr 6, 2018

@cmeeren Must it be an int option option?

(Is it reproducable with a simple int option too?)

@cmeeren
Copy link
Contributor Author

cmeeren commented Apr 6, 2018

Not reproducible with int option. Furthermore, if you match against Some i instead of Some (Some i), it also works fine.

This all seems to fit with the SO answer I linked to, which commented that when pattern matching, the compiler sometimes produces a decision tree that grows exponentially with the number of cases.

@7sharp9
Copy link
Contributor

7sharp9 commented Apr 6, 2018 via email

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Apr 6, 2018

What happens here, if you look at the compiled code, is that it gets compiled into nested if-statements for situations that are too difficult for the compiler to rewrite more efficiently (currently).

First I ran some numbers with your code as example (but Vitaliy's code from the SO question probably has the same characteristic), scenario: just this code in a single .NET Framework project, compiling for Release Build, but I assume .NET Standard and .NET Core would give the same results:

module Bloated =

    type ManyFields =
      {
        Field1: int option option   // 10kB
        Field2: int option option
        Field3: int option option
        Field4: int option option   // 12kB
        Field5: int option option
        Field6: int option option
        Field7: int option option
        Field8: int option option   // 24kB
        Field9: int option option   // 36kB
        Field10: int option option  // 60kB
        Field11: int option option  // 106kB
        Field12: int option option  // 200kB
        Field13: int option option  // 387kB, 2s
        Field14: int option option  // 761kB, 3.5s
        Field15: int option option  // 1509kB, 7s
        Field16: int option option  // 3005kB, 16s
        Field17: int option option  // 6MB, 35s -- ~800MB
        Field18: int option option  // 12MB, 1m -- ~1200MB
        Field19: int option option  // probably 24MB, -- after ~1500MB BOOM! CRASH!
      }

    let getFirstField manyFields =
      match manyFields with
      | { Field1 = Some (Some i) } -> Some i
      | { Field2 = Some (Some i) } -> Some i
      | { Field3 = Some (Some i) } -> Some i
      | { Field4 = Some (Some i) } -> Some i
      | { Field5 = Some (Some i) } -> Some i
      | { Field6 = Some (Some i) } -> Some i
      | { Field7 = Some (Some i) } -> Some i
      | { Field8 = Some (Some i) } -> Some i
      | { Field9 = Some (Some i) } -> Some i
      | { Field10 = Some (Some i) } -> Some i
      | { Field11 = Some (Some i) } -> Some i
      | { Field12 = Some (Some i) } -> Some i
      | { Field13 = Some (Some i) } -> Some i
      | { Field14 = Some (Some i) } -> Some i
      | { Field15 = Some (Some i) } -> Some i
      | { Field16 = Some (Some i) } -> Some i
      | { Field17 = Some (Some i) } -> Some i
      | { Field18 = Some (Some i) } -> Some i
      | { Field19 = Some (Some i) } -> Some i
      | _ -> None

Since I have 48GB available, the OutOfMemoryException must be caused by some 32-bit or other limit. After each change, VS's memory also spiked, but then calmed down after a minute or so to a steady 1GB.

The numbers clearly show a correlation between build time, size and memory usage, each growing exponentially. This suggests that the same algorithm is at fault here that @tpetricek analyzed in that SO question. If I take a look at the decompiled C# code for only 3 fields (otherwise it gets unreadable), it quickly becomes clear what's happening:

    public static FSharpOption<int> getFirstField(ManyFields manyFields)
    {
        FSharpOption<FSharpOption<int>> option;
        int num;
        FSharpOption<FSharpOption<int>> option2;
        if (manyFields.Field1@ == null)
    {
            if (manyFields.Field2@ != null)
        {
                option = manyFields.Field2@;
                if (option.get_Value() != null)
                {
                    num = option.get_Value().get_Value();
                    goto Label_0050;
                }
                if (manyFields.Field3@ == null)
            {
                    goto Label_0081;
                }
                option2 = manyFields.Field3@;
                if (option2.get_Value() == null)
                {
                    goto Label_0081;
                }
                num = option2.get_Value().get_Value();
            }
        else
        {
                if (manyFields.Field3@ == null)
            {
                    goto Label_0081;
                }
                option = manyFields.Field3@;
                if (option.get_Value() == null)
                {
                    goto Label_0081;
                }
                num = option.get_Value().get_Value();
            }
            goto Label_007A;
        }
        option = manyFields.Field1@;
        if (option.get_Value() != null)
        {
            return FSharpOption<int>.Some(option.get_Value().get_Value());
        }
        if (manyFields.Field2@ == null)
    {
            if (manyFields.Field3@ == null)
        {
                goto Label_0081;
            }
            option2 = manyFields.Field3@;
            if (option2.get_Value() == null)
            {
                goto Label_0081;
            }
            num = option2.get_Value().get_Value();
            goto Label_007A;
        }
        option2 = manyFields.Field2@;
        if (option2.get_Value() == null)
        {
            if (manyFields.Field3@ == null)
        {
                goto Label_0081;
            }
            FSharpOption<FSharpOption<int>> option3 = manyFields.Field3@;
            if (option3.get_Value() == null)
            {
                goto Label_0081;
            }
            num = option3.get_Value().get_Value();
            goto Label_007A;
        }
        num = option2.get_Value().get_Value();
        Label_0050:
        return FSharpOption<int>.Some(num);
        Label_007A:
        return FSharpOption<int>.Some(num);
        Label_0081:
        return null;
    }

In other words, it looks like the compiler cannot ascertain that if a first match succeeds partially, that other parts need no tests anymore, hence the recursive repetition of all cases under each main if-branch.

In Vitaliy's question there was something to say for the way the compiler treats this, since it was about interface matching, and any class could have multiple interfaces (which doesn't mean it cannot be optimized, but it is harder). But in this case, that shouldn't be necessary at all.

I remembered this situation and I have some places where I have put in some exclamation marks "Do not simplify this code from nested patterns, it creates bloated IL!!!". Though it never occurred to me that the OOM is related.

Furthermore, we sometimes have mysterious crashes of Visual Studio simply disappearing, or FCS crashing behind the scenes. It is very well possible that some of these issues are related to this.

One more musing: solving this from exponentially growing code into linearly growing code may have a big impact on the size of the binaries, including FSharp.Core.dll, the speed of the compiler and a host of other things. However, considering that this hasn't been done in the past 8 years (it's been reported in 2010 for the first time), suggests that this isn't as trivial to fix as it may seem.

@7sharp9
Copy link
Contributor

7sharp9 commented Apr 6, 2018 via email

@cmeeren
Copy link
Contributor Author

cmeeren commented Apr 11, 2018

One more musing: solving this from exponentially growing code into linearly growing code may have a big impact on the size of the binaries, including FSharp.Core.dll, the speed of the compiler and a host of other things. However, considering that this hasn't been done in the past 8 years (it's been reported in 2010 for the first time), suggests that this isn't as trivial to fix as it may seem.

It might also suggest that it simply hasn't been prioritized for whichever reason. (I wouldn't know, of course, but it's a possibility.)

@cartermp
Copy link
Contributor

I can't speak towards this issue since it may be related to the initial design of the compiler, but there are all sorts of little weird things that can happen in compilers which lead to OOM or other bad behavior.

For example, getting clever with generics is a good way to hose the C# compiler:

class Class<A, B, C, D, E, F>
{
    class Inner : Class<Inner, Inner, Inner, Inner, Inner, Inner>
    {
        Inner.Inner.Inner.Inner.Inner.Inner.Inner.Inner.Inner inner;
    }
}

(If this works just fine for you, just add some more generics)

The C# compiler also solves the 3-SAT problem for overload resolution, so if you're clever with lambdas you may be in for a surprise as well. Haskell and other ML languages also have awful worst-case scenarios for type inference that you can run into from time to time. Swift can also stop compiling after a while on very trivial code and say the expression is too complex.

@abelbraaksma
Copy link
Contributor

@cartermp, thanks for mentioning the 3-SAT problem, reading up on that gives some hints on how this problem can be problematic.

Though I don't think it is unsolvable, because DU matching does not necessarily need to be compiled this way. For instance, it could be compiled as a list of if...elif...elif...else... clauses. I can imagine that for instance the current approach is good for any match expression up until 4-5 branches, but after that it becomes unwieldy and a different approach should be used.

As an optimization to the if...elif...elif...else approach, a grouping can be done if the first part of the match is a DU itself, a boolean or another known limited set, which could first be compiled in a switch (i.e., table lookup). This, btw, happens already in the general case, but not in slightly more complex cases.

@dsyme dsyme added Tenet-Performance Bug Impact-Medium (Internal MS Team use only) Describes an issue with moderate impact on existing code. labels Apr 12, 2018
@dsyme
Copy link
Contributor

dsyme commented Apr 12, 2018

@cmeeren Thanks for the report. There are a couple of ways that pattern matching can cause slow compilation, but this looks like one we should be able to address

@cmeeren
Copy link
Contributor Author

cmeeren commented Apr 12, 2018

@dsyme very glad to hear it. Especially if @abelbraaksma is correct in that

solving this from exponentially growing code into linearly growing code may have a big impact on the size of the binaries, including FSharp.Core.dll, the speed of the compiler and a host of other things.

@abelbraaksma
Copy link
Contributor

Especially if @abelbraaksma is correct in that

I like to think I am ;). The code above can easily be tested, just check the compiled binaries (and watch the clock and memory usage by the compiler & VS).

@dsyme
Copy link
Contributor

dsyme commented Apr 13, 2018

solving this from exponentially growing code into linearly growing code may have a big impact on the size of the binaries, including FSharp.Core.dll, the speed of the compiler and a host of other things.

The majority of pattern matching code compiles well. I suspect the problem here is something combination of things, I suspect that implicit extra record fields may be being expanded and bound in the record patterns, even if they are wildcards, though that shouldn't ultimately cause the extra code.

@abelbraaksma
Copy link
Contributor

abelbraaksma commented Jun 19, 2018

The majority of pattern matching code compiles well.

Issue #5212 suggests that exponential growth can happen really easy and diesn't require record fields. Looking at the IL I don't think this is caused by implicit record fields per se.

It basically looks as if all permutations are being created by the compiler, while that isn't necessarily required (technically, it could compile to a list of elif instead). For small patterns this may hardly be a problem, though still suboptimal, but with (slightly) larger patterns this can lead to quickly increasing binary sizes, and with it longer JIT times.

@cartermp
Copy link
Contributor

Another thing to keep in mind here is that this can basically bring any editor to a crawl. Here's VS on my laptop (granted, via Parallels)

Screen Shot 2019-05-14 at 20 35 42

This is just from pasting in the text in the OP. VS just goes unresponsive on and off seemingly indefinitely.

@TIHan
Copy link
Contributor

TIHan commented May 15, 2019

In a matter of seconds, I can get VS to reach OOM levels. Ran a perf trace on it, these are the allocations. Yea, this is a bug in pattern matching.

image

@TIHan TIHan self-assigned this Sep 26, 2019
@dsyme dsyme unassigned TIHan Mar 30, 2022
@dsyme dsyme added Feature Improvement Area-Compiler-PatternMatching pattern compilation, active patterns, performance, codegen and removed Bug Impact-Medium (Internal MS Team use only) Describes an issue with moderate impact on existing code. Area-Compiler labels Mar 30, 2022
@vzarytovskii vzarytovskii moved this to Not Planned in F# Compiler and Tooling Jun 17, 2022
@psfinaki
Copy link
Member

psfinaki commented May 16, 2024

The example code didn't cause SO on my machine but it still took ages to compile and made the machine unusable for some time.

On the other hand, given all the notes in this discussion (e.g. that it needs this much cases or that it needs to be not options but option options) - I wonder how acute this is. Probably still worth looking at it at least to see if there are some glaring misses in the pattern matching impl.

@vzarytovskii
Copy link
Member

I think I've fixed the SO some time ago, but yes, it is slow, which is kinda expected (not desired though).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compiler-PatternMatching pattern compilation, active patterns, performance, codegen Feature Improvement Theme-Performance
Projects
None yet
Development

No branches or pull requests

10 participants