-
Notifications
You must be signed in to change notification settings - Fork 789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequentialize GetAllUsesOfAllSymolsInFile #10357
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly coding style but I do think we should look at just making GetAllUsesOfAllSymbolsInFile
return a seq { ... }
as that is more compositional with separation of concerns. I can help draft that if you like.
Great to see this being driven by benchmarking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry meant to mark that as request changes)
@dsyme this is ready for review again. Sequentializing |
|
FYI @auduchinok and @baronfel as an FYI, this is a breaking change to FCS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a nice improvement, awesome
// For the rest of symbols we pick only those which are the first part of a long ident, because it's they which are | ||
// contained in opened namespaces / modules. For example, we pick `IO` from long ident `IO.File.OpenWrite` because | ||
// it's `open System` which really brings it into scope. | ||
let partialName = QuickParse.GetPartialLongNameEx (getSourceLineStr su.RangeAlternate.StartLine, su.RangeAlternate.EndColumn - 1) | ||
List.isEmpty partialName.QualifyingIdents) | ||
|> Array.ofSeq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess ideally the consumption points where we consume these very long and expeensive sequences should always use a cancellation token, e.g. a new thing |> Array.toSeqCancellable token
that drives the iterator in a loop and checks the cancellation at each step.
It normally doesn't matter but for these sequences I think it might?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the operation isn't that time-consuming. It's the for
loop that follows that is wild. That one gets all declaration items for each symbol as it processes them, which is the bulk of CPU time.
Doing a Array.toSeqCancellable token
could maybe speed things up but it wouldn't do much I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, fantastic work!. Just that one comment, I don't think it's a blocking issue.
The main problem with this is that it breaks the current layering in the compiler - where FSharpSymbol is used for "public" views and Item etc. is used internally This is a common problem when trying to reduce allocations - the objects for exterior wrappers need to be pushed down a layer. An alternative is to simply use the exterior wrappers within more and more of your codebase. We could gradually move to doing that, pushing "symbols.fs/fsi" down to just after TypedTree.fs and then using them. They are however a little heavier as they capture a number of things (TcGlobals etc.) that normally get passed around or picked up from some other context |
As someone who uses I'd also be happy to use the internal helpers for reasoning about code instead of relatively few ones that work with Making it possible to work with internals would probably make it easier to find places to contribute to, since it'd make it exposed to more people than only those who work on the compiler internals often. |
Some cleanup + a slightly different way to get symbols yielded some minor improvements as per the benchmark that comes with these changes.This sequentializes
GetAllUsesOfAllSymbols
. Moving to a sequence instead of a generated array appears to give, overall, some improvements as measured by a benchmark in the three primary routines in Visual Studio that use this API.CPU time:
Memory Usage:
Since Unused Opens and Unused Declarations run by default after every time we check a document (which is all the damn time), this should improve things a bit. Simplify Names is slightly slower, but the memory usage is also slightly better, and it's off by default. Frankly, Simplify Names needs to be re-architected anyways.
One thing that became apparent to me when writing a first pass of this with
GetAllUsesOfAllSymbolsInFileByPredicate
is that these kinds of symbol operations aren't working with the right data. Every time this is called we traverse a big set ofTcSymbolUseData [] []
and constructFSharpSymbol
s each time. TheFSharpSymbol
construction is expensive and we need to do it for each item until we get to filter it out. The API could change to work off of theItem
type, but that would be more complicated and only serve to further push Name Resolution data structures out further into the service layers, which I'd personally like to avoid.I wonder if it would be better to keep around a pool of
FSharpSymbol
s constructed from theTcSymbolUseData
instead. That way subsequent calls to this method (and others like it) won't unnecessarily re-createFSharpSymbol
s all the time. I imagine it might be tricky keeping that resident in memory and "smart" so that we're not re-creating everything all the time, though.Current
Modified