Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetMemberTypeInFSharpForm produces indirectly produces a large amount of Tuple instantiations causing UI delays in VS #5938

Closed
davkean opened this issue Nov 20, 2018 · 24 comments

Comments

@davkean
Copy link
Member

davkean commented Nov 20, 2018

Trace: https://developercommunity.visualstudio.com/content/problem/245786/slow-f-editing-experience-up-to-a-minute-until-typ.html

GC Rollup By Generation

Gen Count MaxPause MaxPeak MB Max AllocMB/sec TotalPause TotalAlloc MB Alloc MB/MSec GC Survived MB/MSec GC MeanPause Induced
ALL 5196 1,652.0 744.6 8,424.125 40,245.7 29,166.1 0.7 1.026 7.7 18
0 3869 886.0 744.6 8,424.125 14,865.1 20,456.4 0.5 0.083 3.8 0
1 1286 53.5 744.6 3,284.108 17,473.7 8,524.0 0.6 0.746 13.6 0
2 41 1,652.0 744.6 502.652 7,906.8 185.7 0.3 7.355 192.8 18

This method indirectly causes large amounts of allocations, in above trace we're talking 6 GB:

image

Of which are made up of nearly entirely Tuples or lists of Tuples:

image

I don't know enough about this method to know if this needs to broken up further into a little more actionable task.

@forki
Copy link
Contributor

forki commented Nov 20, 2018

from first look: GetTopTauTypeInFSharpForm looks Ok.
would it make sense to use valuetuple? they would still be in lists - so I guess not!?

@cartermp cartermp added this to the 16.0 milestone Nov 20, 2018
@TIHan
Copy link
Contributor

TIHan commented Nov 22, 2018

To note, just from experience, I have tried changing tuples similar to this to value tuples and the results were minimal. The copying of value tuples can be more expensive thus not really helping much. The benefit it relieves GC pressure, but it's hard decide: more copying and less GC pressure, or no copying and GC pressure? It depends on the scenario.

@davkean
Copy link
Member Author

davkean commented Nov 22, 2018

I would evaluate why we are allocating this much data in such a short period. This is 6GB (only 20% of the trace!) of data over a couple of minutes. We must be missing caches, not caching enough or need a better algorithm.

For a comparison, I'm looking at the trace of a load of a solution with 780 C# projects and 30 seconds after load; it allocates 2 GB in total over that entire period and I'm filing bugs to remove 30MB here and 50MB there.

@forki
Copy link
Contributor

forki commented Nov 22, 2018 via email

@gerardtoconnor
Copy link

@davkean There are a lot of allocs due to heavy use of linked lists, tuples & immutable AVL tree (map) construction throughout (incl DU) but in general overuse of temporary (bulky) data structures. It is consistent throughout so would be a lot of work to change the fundamental design, introducing more algos & mutable states/structures. Your analysis will hopefully uncover some memory bugs that can be easily fixed or improved as we clearly need to reduce the heavy memory bloat.

One of the things that comes to mind on these tuple lists are that they could be converted to enumerators, avoiding loads of temp lists, so only a few persisted structures need to be kept in memory?

@cartermp
Copy link
Contributor

We'll need to spend more time looking at this particular trace to see if there's an easy win here or not. For example, the function in the issue title just directly uses the results of [this function](https://github.com/Microsoft/visualfsharp/blob/master/src/fsharp/TastOps.fs#L1566, so maybe there's more investigative work involved.

One of the goals of doing this analysis work is identifying if there is an easy win (e.g., #5941) vs. something that is fundamental to the design.

@gerardtoconnor
Copy link

I can look into doing a PR for this if you like? if you do not want to do drastic changes than I can change usages of GetTopValTypeInFSharpForm , splitting it out into multiple more specialised functions as at the moment we use it like 8 time like a sledge hammer on drawing pins (eg we transform full lists of args ect to new lists of stuff … just to see if the list is empty!?!) Given these are called multiple times at different places, by potentially the same value, it might be candidate for caching but we should probably profile that as may be unique calls so would just be clogging memory up worse?

@gerardtoconnor
Copy link

gerardtoconnor commented Nov 23, 2018

An example of my proposal is as follows:
for each method calling GetTopValTypeInFSharpForm / GetTopTauTypeInFSharpForm function (and similar), here is IsCompiledAsStaticProperty, I inline the calls and short cut the logic to achieve the same pattern so the current def of

let IsCompiledAsStaticProperty g (v:Val) =
    match v.ValReprInfo with
    | Some valReprInfoValue ->
         match GetTopValTypeInFSharpForm g valReprInfoValue v.Type v.Range with 
         | [], [], _, _ when not v.IsMember -> true
         | _ -> false
    | _ -> false

Goes to my new refactored version of

let IsCompiledAsStaticProperty g (v:Val) =
    if v.IsMember then
        false
    else
        match v.ValReprInfo with
        | Some (ValReprInfo (ntps, argInfos, _)) ->
            match ntps , argInfos with
            | [] , [] -> isForallTy g v.Type |> not
            | _ -> false
        | _ -> false

Although it looks like same, it computes and allocates significantly less as I traced how the lists map and this achieves same result, all the mappings inside GetTopValTypeInFSharpForm are not needed in this instance. I can refactor and reduce down each of the usages of these functions so all the extra work for data not needed is not computed?

@cartermp
Copy link
Contributor

@gerardtoconnor We'd certainly welcome help! I think the main thing is establishing a good perf test. A good first start could be to use this: https://github.com/Microsoft/visualfsharp/blob/master/tests/scripts/compiler-perf.fsx

@gerardtoconnor
Copy link

@cartermp are the compiler perf tests not already operational with this script? I was going to do the PR, and include "[CompilerPerf]" in the title so that I (we) could then run a comparison with master to verify it had produced the improvement?

My example above turned out to be the simplest case where given there were no list reductions, only mappings, we could match empty on the source lists without mapping. The rest are more tricky and seems like we may need some cache/algo strategy. ValReprInfo type may be a candidate to structurally tweak with a cache graph, will investigate and let you know if I find anything useful

@cartermp
Copy link
Contributor

Sorry, I should have clarified - we'll likely want to verify memory usage in a scenario that stresses these code paths. I don't think the existing scrips do this, but that area would be where to add one.

@davkean
Copy link
Member Author

davkean commented Nov 28, 2018

Before we look at making stuff faster, we should look to see if we have a miss in an existing cache, or we are missing caches in general. I did some quick math to show how serious this is and why this situation isn't because VS is low on memory, this situation is causing the low memory situation.

Based on the data above, 326 MB/s (326k/ms) was surviving collection. This is either because it's faster than the GC can collect (I have no idea on the limits here) or its still being rooted (looking at the dump will probably confirm this). At that rate, it would take approx 10 seconds before VS runs out of memory. The GC is absolutely forced to collect because of the sheer amount of allocations that this code path and other code paths in that trace. This is a crazy amount of data, and smells like we are calculating stuff over and over again.

@davkean
Copy link
Member Author

davkean commented Nov 28, 2018

Looking at the GC heap size, it's significantly smaller than what we would expect; 700 MB versus up to 3 GB we see in other traces. This is definitely exacerbating the issue, but not the only cause, the allocation pattern is still huge. I suspect in this case, and another trace I saw from 15.7, we have some sort of native leak on top of this. Maybe it's tied to a managed object lifetime so we can see allocation matching it.

The other thing is that this trace is from 15.7, given some of the changes since then I'm not sure we can trust its data. I'd like to get traces against newer version that match above.

@gerardtoconnor
Copy link

@davkean are you referring to CPU caching or computation caching? either way, the compiler is using deep nested recursive functions, some of which are not tail call optimised, as well as mapping from list, to list , to list and blowing up allocs, I'm guessing the interim lists are rooted in temp stack-frames (this might explain blow-up, then cleardown). There is no caching strategy I can see on these paths, GetMemberTypeInFSharpForm (that uses GetTopValTypeInFSharpForm) is called from 5 different places, with likely duplicated lookups on each call for same member type, in line with your theory that things are be calculated & recalculated over and over again.

@gerardtoconnor
Copy link

@cartermp ok, I will wait till these are in place before doing anything further … but from looking over the code, there is no caching, abusive over-mapping list & allocs (linked lists allocate ~x2 more then an array). The compiler functions really should be redesigned to use streaming functions, go from 'T list -> 'U list to 'T -> 'U ValueOption or something similar so that nested mapping/reduction operations do not need interim temp list structures and instead can be chained from source structure to target structure item by item.

@gerardtoconnor
Copy link

On doing my testing, the recursive type striping function was re-querying the same types over and over again, the constraint resolver resolution prevents caching of Vars/Funs but caching everything else works and passes the tests. I used a WeakConditionalTable for the TType Cache mapping. I want to sequence/stream the workflow throughput away from lists & tuples if I can, so still investigating that before I do PR but in my build of compiler it's retrieving from cache 300k times (30k for small app), which shows how wasteful the repeated recursive calls were being.

Calling from cache speeds things up but as I said, would like to try remove heavy list usage in this workflow as this is ballooning the memory. we may need to change ArgInfos to arrays instead of lists to compact the data usage, provided there is no mutability issues. will test later.

@forki
Copy link
Contributor

forki commented Dec 13, 2018 via email

@abelbraaksma
Copy link
Contributor

@gerardtoconnor, the original trace was mine, and while the project is closed source, if you need help testing I can try certain fixes, and redo the trace, see if it helps. Also, I can share the code privately (it's a relatively straight project structure, no weird msbuild steps, should load and compile out of the box), ping me if you think this would be helpful in any way.

I'm still daily having issues with VS getting sluggish, though subjectively there's been some improvement over the last couple of versions of the F# stack. Noticeable because I need to restart VS a bit less than one an hour now ;).

@gerardtoconnor
Copy link

@abelbraaksma I have the caching code already done, I can prioritise this if needed? Are you able to pull down a branch if I post it to see if it's improved? Also you mentioned you posted the trace, I couldn't see it, can you share link?

@abelbraaksma
Copy link
Contributor

Most traces that @davkean posted were originally posted by me on the visual studio forum at the top of this post including this one, so I think you already have them and that you're actually working with them ;). I think the original traces are only visible to Microsoft employees, I could send them privately though if that's helpful.

@abelbraaksma
Copy link
Contributor

And yes, I assume I should be able to pull a branch, though it's been a while that I did it, it can't be that hard, I assume the process is still mostly the same.

@gerardtoconnor
Copy link

The details above are enough for now, registering big CPU times on these functions also just editing compiler. Will try have PR up tomorrow

@gerardtoconnor
Copy link

I have added basic caching on this path in two places in PR #6202, tests pass. I had to roll back most of the changes as there were failing tests I just didn't have time to investigate and fix.

Given this issue relates to memory bloat during VS usage, how do we want to test?

@dsyme
Copy link
Contributor

dsyme commented Mar 31, 2022

I'm closing out this specific performance tracking issue as a lot of work has been done on it back in 2019

@dsyme dsyme closed this as completed Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants