reduce allocations #1207

vasily-kirichenko · 2016-05-19T17:59:57Z

Compiling FSharp.Configuration project:

Before

After

~8.5% less allocations. Compilation time has not changed.

smoothdeveloper · 2016-05-19T18:08:22Z

src/fsharp/tast.fs

+
+    if x.IsResolved && y.IsResolved && not compilingFslib then
+        x.ResolvedTarget === y.ResolvedTarget 
+    else


style only: elif to avoid nesting

forki · 2016-05-19T18:55:31Z

That is pretty cool.

vasily-kirichenko · 2016-05-19T19:04:29Z

I'm not sure it helps even in long running scenarios like FCS/VFPT. It seems allocations do not cause any performance problems.

KevinRansom · 2016-05-19T19:18:31Z

@vasily-kirichenko , @dsyme we are going to implement struct tuples in the next release, Don has already submitted a prototype PR.

Do you think that struct tuples would have an equivalent improvement. I am particularly concerned about manually unwinding idiomatic code such as the pattern matching and it's replacement with complex if then else expressions.

What do you think, would struct tuples solve some of these allocation issues?

Kevin

forki · 2016-05-19T19:31:10Z

(unrelated to perf:)

If you look at the nested pattern match in https://github.com/Microsoft/visualfsharp/pull/1207/files#diff-5a2b2c121409423e80d58b7ffaccd472L4401 - I think we can argue if that really was better readable. I'm not saying it was not better, but it also wasn't exactly beautiful ;-)

smoothdeveloper · 2016-05-19T19:59:17Z

Agree with @forki on the if/elif now kind of looking more readable, mostly due to the | _ with nested match at same level which I'm not fond of.

The rest of changes do look idiomatic (non tupled arguments).

Style-wise, for involved conditionals, I tend to be very pedantic by putting each expression on it's own line with the operator at the beginning (a win especially when the expression is long).

If the compiler is not turning inline tuple in match expression it would be worth to have that optimized away, in many cases, tuples are just used as local sugar (I have no idea what the compiler is doing in the optimization phase so sorry if it's obvious question).

tpetricek · 2016-05-19T20:18:27Z

I think this PR also improves readability - the conditionals using if make more sense to me than the original pattern matching on pairs of Booleans.

dsyme · 2016-05-19T20:35:49Z

src/utils/prim-lexing.fs

@@ -201,7 +201,7 @@ namespace Internal.Utilities.Text.Lexing
        let numUnicodeCategories = 30 
        let numLowUnicodeChars = 128 
        let numSpecificUnicodeChars = (trans.[0].Length - 1 - numLowUnicodeChars - numUnicodeCategories)/2
-        let lookupUnicodeCharacters (state,inp) = 
+        let lookupUnicodeCharacters state inp =


This change is harmless but AFAICS doesn't alter the representation or calls of the function? e.g. for

type C() = let f (x,y) = x + y member a.M(b,c) = f (b,c)

we get

.method assembly hidebysig instance int32 f(int32 x, int32 y) cil managed

@vasily-kirichenko Could you remove the changes in prim-lexing.fs please? I'm pretty sure they don't remove any allocations. (If they do then let's discuss further, there must be something I'm missing). Thanks!!

KevinRansom · 2016-05-19T20:40:58Z

Well I would agree that the original author went out of his way to make the code unreadable. (comments in the middle of expressions, idiosyncratic indenting, nested pattern matching.)

My real question though is would struct tuples eliminate the need to go through the code eliminating tuples and replacing them with separate arguments?

dsyme · 2016-05-19T20:42:54Z

I'm not sure about this PR. We need to see actual perf benefit.

We can make more and more things structs, and the risk is that they are just getting copied around a whole lot (at possibly extra cost). It's very difficult to work out the amount of struct copying being done by looking at the code unless we are sure of the storage location of the struct (e.g. in an array).

Heap allocations have the advantage that passing the value around is relatively cheap (one word).

In this case, the TokenTup struct is now very, very, very big. I can't even count the number of words.

PositionTuple is already quite big and is a struct
LexbufState is already quite big and is a struct
So TokenTup is the sum of these and more

TokenTup is now so large that it's possible that this actually slows down the lexer. So we need to see concrete performance benefits - not just reduced allocations - to know if this is a good change. Running repeated lexings of a file should help determine where the threshold for useful struct size is.

dsyme · 2016-05-19T20:47:04Z

To be clear, the changes in this function are good and the ones removing the 20MB of tuple allocation.

We should definitely accept this part of the change. It's the other changes in the Lexfilter I'm not sure of.

dsyme · 2016-05-19T20:59:17Z

@KevinRansom Struct tuples would be of only marginal use here - they would allow a more local change by using match (struct (x.IsResolved, y.IsResolved)) with ... but that would only make the code worse. And while it would have removed the allocation it would equally cause more copying.

dsyme · 2016-05-19T21:00:31Z

Here's the test failure on Jenkins, I'm not sure what's causing it.

1) Failed : FSharp-Tests-Typecheck+Sigs.sigs
Error running command 'D:\j\workspace\release_ci_pa---866fd2c3\tests\fsharpqa\testenv\bin\diff.exe' with args 'neg10.err neg10.bsl normalize' in directory 'D:\j\workspace\release_ci_pa---866fd2c3\tests\fsharp\typecheck\sigs'. ERRORLEVEL 1 ERRORLEVEL 1
at NUnitConf.checkTestResult(Result`2 result) in D:\j\workspace\release_ci_pa---866fd2c3\tests\fsharp\nunitConf.fs:line 14

KevinRansom · 2016-05-19T22:07:14Z

@dsyme it's not clear to me that a struct tuple would cause any additional copying at that pattern match after all the values are not used beyond the actual pattern match. I do agree that the original code would not be improved by adding struct ( ) as it was quite hideous code anyway.

I'm not arguing to not make change but I'm not certain that manually unlinking tuples and rewriting pattern matches throughout the compiler is the way to go. If there are things about tuples and patterns that are inefficient we should fix them ... somehow.

dsyme · 2016-05-20T09:23:38Z

@vasily-kirichenko Could you resubmit (or reopen this & update) the part of this change that's an incontrovertible improvement, i.e. this bit? https://github.com/Microsoft/visualfsharp/pull/1207/files#diff-5a2b2c121409423e80d58b7ffaccd472L4401 . Please :) Thanks!

dsyme · 2016-05-20T09:23:59Z

I'll reopen this for tracking, as we definitely want to take the part mentioned above.

dsyme · 2016-05-20T09:27:35Z

@KevinRansom F# does a pretty good job of eliminating tuples - but I agree, in this sort of code below we should do it automatically

match expr1, expr2 with 
| true, true -> ...
| _ ->

If expr1 and expr2 are known not to have side effects then we get good code for this. However in this particular case expr1 and expr2 both include a property access, which is not known to be side-effect-free, so not all optimizations kick in.

forki · 2016-05-20T13:17:38Z

src/fsharp/tast.fs

-    match x.IsLocalRef,y.IsLocalRef with 
-    | false, false when 
+
+    if x.IsResolved && y.IsResolved && not compilingFslib then


the same stuff is happening couple of lines down in primValRefEq - and that method shows up in hot path when I compile Paket.Core

Ah ok, then yes, that should also be fixed.

we have this fix included as well.

vasily-kirichenko · 2016-05-21T07:02:11Z

I removed everything but tuples elimination.

Reduce allocations futher

forki · 2016-05-21T10:21:35Z

Just looked at the reported error

that is interesting.

The code in question is:

module EnumOfString_FSharp_1_0_bug_1743 = begin
    type IA<'a> =
        interface 
            abstract X : unit -> 'a
        end

    type IB<'a,'b> =
        interface 
            inherit IA<'a>
            inherit IA<'b>
        end

    let x = { new IB<_,_> with X() = failwith "" } // this is the reported line
end

forki · 2016-05-21T10:24:02Z

src/fsharp/MethodOverrides.fs

-
-                        match dispatchSlots  |> List.filter (fun (RequiredSlot(dispatchSlot,_)) -> OverrideImplementsDispatchSlot g amap m dispatchSlot overrideBy) with 
-                        | [] -> 
+                        if dispatchSlots |> List.exists (fun (RequiredSlot(dispatchSlot,_)) -> OverrideImplementsDispatchSlot g amap m dispatchSlot overrideBy) then


I think this is wrong! there should be a |> not

can we bind to a variable, the if condition is super long?

forki · 2016-05-21T10:27:50Z

@vasily-kirichenko can you please merge vasily-kirichenko#2 - I think that will solve it

Fix bug in error reporting

forki · 2016-05-21T11:13:03Z

yay that helped.

vasily-kirichenko · 2016-05-21T14:14:51Z

I also refactored CheckDispatchSlotsAreImplemented, it was a mess.

dsyme · 2016-05-21T21:06:37Z

src/fsharp/MethodOverrides.fs

-        let res = ref true
-        let fail exn = (res := false ; if showMissingMethodsAndRaiseErrors then errorR exn)
+        let mutable res = true
+        let fail exn = (res <- false; if showMissingMethodsAndRaiseErrors then errorR exn)

        // Index the availPriorOverrides and overrides by name
        let availPriorOverridesKeyed = availPriorOverrides |> NameMultiMap.initBy (fun ov -> ov.LogicalName)
        let overridesKeyed = overrides |> NameMultiMap.initBy (fun ov -> ov.LogicalName)



@vasily-kirichenko Is this part below cleanup or performance improvements? If the former let's put in in a separate PR? If the latter then please look for a way to minimize the diff, e.g. by locally using 2-space indentation so old/new lines match exactly, or some other technique. Thanks!

dsyme · 2016-05-21T21:08:09Z

@vasily-kirichenko This is looking good - just a couple of new comments above.

dsyme · 2016-05-21T21:10:27Z

@vasily-kirichenko @forki There's still a test failure - could you reduce the diff in the changes to CheckDispatchSlotsAreImplemented please so we can see what's changed? Or perhaps put those changes in another PR (since they seem to be causing failures)

vasily-kirichenko · 2016-05-22T10:54:23Z

all removed, all fixed

…mance than Batch

dsyme · 2016-05-23T15:45:20Z

@vasily-kirichenko Thanks. Please put 828dfab in a separate PR?

dsyme · 2016-05-23T15:46:08Z

@vasily-kirichenko Could you remove the changes in prim-lexing.fs please? I'm pretty sure they don't remove any allocations. (If they do then let's discuss further, there must be something I'm missing). Thanks!!

vasily-kirichenko · 2016-05-23T17:05:41Z

I tried to rebase with squash, tried to push it as a new branch, but github does not seem to peek up the changes. I've given up on all this. Frankly, all this allocations story causes only trouble with literally no performance improvements.

smoothdeveloper · 2016-05-23T17:17:20Z

@dsyme would it be possible to get a set of guidelines WRT performing benchmarking of code changes in the compiler, and also benchmarking pieces of the compiler in isolation (kind of like one would do in unit tests setup).

I spent significant time yesterday trying to work on #343 and trying to benchmark runs of fsi.exe but the variation in runtime and overhead (benchmarking the whole roundtrip rather than benchmarking code change itself) made my attempt a waste of effort.

I'm sure all the great F# hackers who contributed to the compiler have a wealth of knowledge they could maybe share in a informal way in a wiki page or markdown file in this repository to help others following in their steps.

We also have people on slack channel sharing few hints about how to use 3rd part tools (like dottrace) or what to test against, but this is only adhoc and "not as informed as ideally" kind of support.

Also, it would be amazing to have benchmarking environment for fsharp, see what the chaps at xamarin have done:

http://open.xamarin.com/benchmarker/front-end/

Also @ Mozilla: https://arewefastyet.com/

forki · 2016-05-23T17:28:46Z

Vasily the github has serious trouble right now and all kinds of things are
failing.
On May 23, 2016 7:17 PM, "Gauthier Segay" notifications@github.com wrote:

@dsyme https://github.com/dsyme would it be possible to get a set of
guidelines WRT performing benchmarking of code changes in the compiler, and
also benchmarking pieces of the compiler in isolation (kind of like one
would do in unit tests setup).

I spent significant time yesterday trying to work on #343
#343 and trying to
benchmark runs of fsi.exe but the variation in runtime and overhead
(benchmarking the whole roundtrip rather than benchmarking code change
itself) made my attempt a waste of effort.

I'm sure all the great F# hackers who contributed to the compiler have a
wealth of knowledge they could maybe share in a informal way in a wiki page
or markdown file in this repository to help others following in their steps.

We also have people on slack channel sharing few hints about how to use
3rd part tools (like dottrace) or what to test against, but this is only
adhoc and "not as informed as ideally" kind of support.

Also, it would be amazing to have benchmarking environment for fsharp, see
what the chaps at xamarin have done:

http://open.xamarin.com/benchmarker/front-end/

Also @ Mozilla: https://arewefastyet.com/

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1207 (comment)

vasily-kirichenko · 2016-05-23T17:50:30Z

OK, I've managed to open a new PR from against branch, it look like all the good changes are there.

#1215

smoothdeveloper · 2016-05-23T18:44:10Z

I like the title of the new PR btw, seems pretty accurate :)

msftclas added the cla-already-signed label May 19, 2016

smoothdeveloper reviewed May 19, 2016
View reviewed changes

vasily-kirichenko force-pushed the optimizations branch from 92133d3 to f95ad73 Compare May 19, 2016 18:48

dsyme reviewed May 19, 2016
View reviewed changes

vasily-kirichenko closed this May 20, 2016

dsyme reopened this May 20, 2016

msftclas added the cla-already-signed label May 20, 2016

forki reviewed May 20, 2016
View reviewed changes

vasily-kirichenko force-pushed the optimizations branch from f95ad73 to 678d463 Compare May 21, 2016 07:00

vasily-kirichenko force-pushed the optimizations branch from 678d463 to a23a52c Compare May 21, 2016 07:14

vasily-kirichenko and others added 3 commits May 21, 2016 10:14

reduce allocations

a23a52c

Remove tuples in primValRefEq

32673e4

Merge pull request #1 from forki/optimizations

5757164

Reduce allocations futher

forki reviewed May 21, 2016
View reviewed changes

Fix bug in error reporting

c6be69d

Merge pull request #2 from forki/tuplealloc

2f9a55b

Fix bug in error reporting

refactor CheckDispatchSlotsAreImplemented

453c95d

dsyme reviewed May 21, 2016
View reviewed changes

vasily-kirichenko added 2 commits May 22, 2016 13:51

rollback refactor CheckDispatchSlotsAreImplemented

955a726

fix condition

417e75d

change fsc.exe GCLatencyMode to LowLatency as it showed better perfor…

828dfab

…mance than Batch

vasily-kirichenko closed this May 23, 2016

vasily-kirichenko mentioned this pull request May 23, 2016

Reduce memory allocations #1215

Merged

reduce allocations #1207

reduce allocations #1207

Conversation

vasily-kirichenko commented May 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

forki commented May 19, 2016

vasily-kirichenko commented May 19, 2016

KevinRansom commented May 19, 2016

forki commented May 19, 2016 • edited Loading

smoothdeveloper commented May 19, 2016

tpetricek commented May 19, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KevinRansom commented May 19, 2016

dsyme commented May 19, 2016 • edited Loading

dsyme commented May 19, 2016 • edited Loading

dsyme commented May 19, 2016 • edited Loading

dsyme commented May 19, 2016 • edited Loading

KevinRansom commented May 19, 2016

dsyme commented May 20, 2016

dsyme commented May 20, 2016

dsyme commented May 20, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vasily-kirichenko commented May 21, 2016

forki commented May 21, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

forki commented May 21, 2016

forki commented May 21, 2016

vasily-kirichenko commented May 21, 2016

Choose a reason for hiding this comment

dsyme commented May 21, 2016

dsyme commented May 21, 2016

vasily-kirichenko commented May 22, 2016

dsyme commented May 23, 2016

dsyme commented May 23, 2016

vasily-kirichenko commented May 23, 2016

smoothdeveloper commented May 23, 2016

forki commented May 23, 2016

vasily-kirichenko commented May 23, 2016

smoothdeveloper commented May 23, 2016

forki commented May 19, 2016 •

edited

Loading

dsyme commented May 19, 2016 •

edited

Loading

dsyme commented May 19, 2016 •

edited

Loading

dsyme commented May 19, 2016 •

edited

Loading

dsyme commented May 19, 2016 •

edited

Loading

forki commented May 21, 2016 •

edited

Loading