Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequentialize GetAllUsesOfAllSymolsInFile #10357

Merged
merged 11 commits into from
Nov 6, 2020

Conversation

cartermp
Copy link
Contributor

@cartermp cartermp commented Oct 30, 2020

Some cleanup + a slightly different way to get symbols yielded some minor improvements as per the benchmark that comes with these changes.

This sequentializes GetAllUsesOfAllSymbols. Moving to a sequence instead of a generated array appears to give, overall, some improvements as measured by a benchmark in the three primary routines in Visual Studio that use this API.

CPU time:

Routine Current mean time New mean time Result
Simplify Names 29,670.7 us 30,076.3 us 1.37% slower
Unused Opens 1,408.4 us 1,320.1 us 4.52% faster
Unused Declarations 730.9 us 250.3 us 65.7% faster

Memory Usage:

Routine Current memory usage New memory usage Result
Simplify Names 16263.42 KB 16085.83 1.1% less memory
Unused Opens 903.31 KB 876.64 KB 3% less memory
Unused Declarations 749.99 KB 450.82 KB 40% less memory

Since Unused Opens and Unused Declarations run by default after every time we check a document (which is all the damn time), this should improve things a bit. Simplify Names is slightly slower, but the memory usage is also slightly better, and it's off by default. Frankly, Simplify Names needs to be re-architected anyways.

One thing that became apparent to me when writing a first pass of this with GetAllUsesOfAllSymbolsInFileByPredicate is that these kinds of symbol operations aren't working with the right data. Every time this is called we traverse a big set of TcSymbolUseData [] [] and construct FSharpSymbols each time. The FSharpSymbol construction is expensive and we need to do it for each item until we get to filter it out. The API could change to work off of the Item type, but that would be more complicated and only serve to further push Name Resolution data structures out further into the service layers, which I'd personally like to avoid.

I wonder if it would be better to keep around a pool of FSharpSymbols constructed from the TcSymbolUseData instead. That way subsequent calls to this method (and others like it) won't unnecessarily re-create FSharpSymbols all the time. I imagine it might be tricky keeping that resident in memory and "smart" so that we're not re-creating everything all the time, though.

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.19042
Intel Core i7-6700K CPU 4.00GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
  [Host]     : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit LegacyJIT-v4.8.4250.0 DEBUG
  DefaultJob : .NET Framework 4.7.2 (CLR 4.0.30319.42000), 64bit RyuJIT-v4.8.4250.0

Current

Method Mean Error StdDev Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
SimplifyNames 29,670.7 us 376.91 us 334.12 us 3906.2500 125.0000 - 16263.42 KB
UnusedOpens 1,408.4 us 17.34 us 15.37 us 193.3594 56.6406 - 903.31 KB
UnusedDeclarations 730.9 us 16.90 us 26.31 us 169.9219 48.8281 - 749.99 KB

Modified

Method Mean Error StdDev Median Gen 0/1k Op Gen 1/1k Op Gen 2/1k Op Allocated Memory/Op
SimplifyNames 30,076.3 us 561.798 us 498.019 us 29,994.2 us 3906.2500 - - 16085.83 KB
UnusedOpens 1,344.6 us 25.686 us 24.027 us 1,337.6 us 212.8906 1.9531 - 876.64 KB
UnusedDeclarations 250.3 us 4.997 us 9.864 us 245.5 us 109.8633 - - 450.82 KB

@cartermp cartermp requested review from dsyme and TIHan October 30, 2020 06:55
Copy link
Contributor

@dsyme dsyme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly coding style but I do think we should look at just making GetAllUsesOfAllSymbolsInFile return a seq { ... } as that is more compositional with separation of concerns. I can help draft that if you like.

Great to see this being driven by benchmarking

Copy link
Contributor

@dsyme dsyme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry meant to mark that as request changes)

@cartermp cartermp changed the title Minor perf improvements to ServiceAnalysis functionality Perf improvements to ServiceAnalysis functionality Oct 30, 2020
@cartermp cartermp changed the title Perf improvements to ServiceAnalysis functionality Sequentialize GetAllUsesOfAllSymols Oct 30, 2020
@cartermp
Copy link
Contributor Author

@dsyme this is ready for review again. Sequentializing GetAllUsesOfAllSymbols offers a substantial perf benefit for finding all unused declarations, a marginal benefit for finding all unused opens, and a marginal detriment to name simplification (which frankly needs a complete re-think anyways). Memory usage is also down across the board.

@cartermp cartermp requested a review from KevinRansom October 31, 2020 00:17
@cartermp
Copy link
Contributor Author

cartermp commented Oct 31, 2020

Ergh, I regressed unused opens. Investigating... -- done

@cartermp cartermp requested a review from dsyme October 31, 2020 00:45
@cartermp cartermp changed the title Sequentialize GetAllUsesOfAllSymols Sequentialize GetAllUsesOfAllSymolsInFile Oct 31, 2020
@cartermp
Copy link
Contributor Author

cartermp commented Nov 2, 2020

FYI @auduchinok and @baronfel as an FYI, this is a breaking change to FCS

@auduchinok
Copy link
Member

auduchinok commented Nov 5, 2020

@cartermp Thanks for letting me know!

Related thing: I'm going to revisit this PR about scoping captured types, and the next goal for me there is to try to do the same for symbols, so range-based chunks could be processed independently.

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice improvement, awesome

// For the rest of symbols we pick only those which are the first part of a long ident, because it's they which are
// contained in opened namespaces / modules. For example, we pick `IO` from long ident `IO.File.OpenWrite` because
// it's `open System` which really brings it into scope.
let partialName = QuickParse.GetPartialLongNameEx (getSourceLineStr su.RangeAlternate.StartLine, su.RangeAlternate.EndColumn - 1)
List.isEmpty partialName.QualifyingIdents)
|> Array.ofSeq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess ideally the consumption points where we consume these very long and expeensive sequences should always use a cancellation token, e.g. a new thing |> Array.toSeqCancellable token that drives the iterator in a loop and checks the cancellation at each step.

It normally doesn't matter but for these sequences I think it might?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the operation isn't that time-consuming. It's the for loop that follows that is wild. That one gets all declaration items for each symbol as it processes them, which is the bulk of CPU time.

Doing a Array.toSeqCancellable token could maybe speed things up but it wouldn't do much I think.

Copy link
Contributor

@dsyme dsyme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, fantastic work!. Just that one comment, I don't think it's a blocking issue.

@cartermp cartermp merged commit b0fac9d into dotnet:main Nov 6, 2020
@cartermp cartermp deleted the analysis-updates branch November 6, 2020 03:23
@dsyme
Copy link
Contributor

dsyme commented Nov 6, 2020

I wonder if it would be better to keep around a pool of FSharpSymbols constructed from the TcSymbolUseData instead. That way subsequent calls to this method (and others like it) won't unnecessarily re-create FSharpSymbols all the time. I imagine it might be tricky keeping that resident in memory and "smart" so that we're not re-creating everything all the time, though.

The main problem with this is that it breaks the current layering in the compiler - where FSharpSymbol is used for "public" views and Item etc. is used internally

This is a common problem when trying to reduce allocations - the objects for exterior wrappers need to be pushed down a layer.

An alternative is to simply use the exterior wrappers within more and more of your codebase. We could gradually move to doing that, pushing "symbols.fs/fsi" down to just after TypedTree.fs and then using them. They are however a little heavier as they capture a number of things (TcGlobals etc.) that normally get passed around or picked up from some other context

@auduchinok
Copy link
Member

auduchinok commented Nov 6, 2020

The main problem with this is that it breaks the current layering in the compiler - where FSharpSymbol is used for "public" views and Item etc. is used internally

As someone who uses FSharpSymbol all the time (unlike, say, FSharpExpr that we don't use at all), I often wish I was able to go down the level, since we're not only accessing symbols but also getting lot of info about each such symbol in many places, making even more objects we don't need right afterwards (consider getting curried parameter groups on mfv when all we need is groups count in some place). I'm OK with it being more difficult to work with and possibly changing often, but we got used to it after working with SyntaxTree over the last few years. It's a good tradeoff for being able to get some symbol info efficiently. FSharpSymbol sometimes also feels limited compared to what the compiler has internally.

I'd also be happy to use the internal helpers for reasoning about code instead of relatively few ones that work with FSharpSymbol.

Making it possible to work with internals would probably make it easier to find places to contribute to, since it'd make it exposed to more people than only those who work on the compiler internals often.

nosami pushed a commit to xamarin/visualfsharp that referenced this pull request Feb 23, 2021
nosami pushed a commit to xamarin/visualfsharp that referenced this pull request Jan 26, 2022
nosami pushed a commit to xamarin/visualfsharp that referenced this pull request Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants