Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/go/types/objectpath: frequent use of scope.Names burns significant cycles #51017

Closed
amscanne opened this issue Feb 4, 2022 · 16 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Milestone

Comments

@amscanne
Copy link
Contributor

amscanne commented Feb 4, 2022

What version of Go are you using (go version)?

go1.18-pre10

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

Linux, amd64. Not dependent on OS or architecture, however.

What did you do?

The objectpath.For method relies on frequent calls to scope.Names(), which always performs a sort of all names. For large-scale analysis trees, this can consume a significant portion of cycles (30-50% in my case) and can easily be memoized. This can either be done by objectpath as a consumer of the API, or within the *types.Scope object.

What did you expect to see?

Fewer cycles spent sorting.

What did you see instead?

Lots of cycles spent sorting.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/383075 mentions this issue: go/types: memoize scope.Names() result

@findleyr
Copy link
Contributor

findleyr commented Feb 4, 2022

Thanks for filing.

Need to think about this. I'll just note that in the profile linked on that CL, there is also significant amount of time spent in Scope.Lookup. But objectpath just needs to walk all the objects, and doesn't care about sorting. I wonder if we should be considering an entirely new Scope API... Or maybe that is further evidence that it would be better for us to build a layer in objectpath that can significantly optimize the calculation of paths. Compare x/tools/go/ast/inspector.

@cherrymui cherrymui added Performance NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Feb 8, 2022
@cherrymui cherrymui added this to the Backlog milestone Feb 8, 2022
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/404514 mentions this issue: go/types/objectpath: implement fast path for concrete methods

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/405354 mentions this issue: go/types/objectpath: cache repeated queries (WiP)

gopherbot pushed a commit to golang/tools that referenced this issue May 13, 2022
Don't search for a matching method on all types in the package scope
when we can trivially determine the type's name. This avoids both a
linear search over the scope, as well as the currently rather costly
call to scope.Names itself.

Updates golang/go#51017

Change-Id: I614f4b1b149d6b42728d3f22f5e47e8ff87a5392
Reviewed-on: https://go-review.googlesource.com/c/tools/+/404514
Run-TryBot: Alan Donovan <adonovan@google.com>
gopls-CI: kokoro <noreply+kokoro@google.com>
Run-TryBot: Dominik Honnef <dominik@honnef.co>
Reviewed-by: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@findleyr findleyr added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 9, 2022
@findleyr findleyr modified the milestones: Backlog, Go1.20 Sep 9, 2022
@findleyr
Copy link
Contributor

findleyr commented Sep 9, 2022

@griesemer @alandonovan what do you think about this change? https://go.dev/cl/383075 appears to be stuck in limbo, and it would be good to make a decision for 1.20.

Being conservative, I am inclined to say that changing scope.Names to return a memoized slice is a breaking change, may increase steady-state memory, and is inconsistent with the rest of the go/types API which goes to lengths to avoid exposing internal data (consider Struct.NumFields, etc.). If the problem is mostly isolated to usage of objectpath, we could expose a batch or memoizing API.

But I don't feel strongly. If it is most expedient to save sorted names on the scope, in reality it is probably harmless.

@adonovan
Copy link
Member

adonovan commented Sep 9, 2022

The actual code review hasn't addressed all the comments (e.g. re: atomicity) required for +2, but perhaps that's just because the author doesn't know if the design is worth pursuing.

I agree that if the problem is really just in objectpath, then that's a safer place for the fix: perhaps objectpath could compute the paths for the entire package more cheaply in a single pass, then serve a large number of queries on that---like the AST inspector. Even if we don't implement all that at once, the same API could take the first step of memoizing scope.Names() across multiple calls, but it seems likely there are even bigger potential wins if we take that approach.

@findleyr
Copy link
Contributor

findleyr commented Sep 9, 2022

The actual code review hasn't addressed all the comments (e.g. re: atomicity) required for +2, but perhaps that's just because the author doesn't know if the design is worth pursuing.

I assumed as much. We did not resolve the viability of the approach in that CL.

I agree that if the problem is really just in objectpath, then that's a safer place for the fix: perhaps objectpath could compute the paths for the entire package more cheaply in a single pass, then serve a large number of queries on that---like the AST inspector. Even if we don't implement all that at once, the same API could take the first step of memoizing scope.Names() across multiple calls, but it seems likely there are even bigger potential wins if we take that approach.

Yes, that is approximately the API I had in mind, and the AST inspector is a good example. Agree that there are probably other potential optimizations in that approach.

@timothy-king
Copy link
Contributor

https://go-review.git.corp.google.com/c/tools/+/405354 (WiP for memoizing) was stuck due to not having a clear performance justification yet.

@findleyr
Copy link
Contributor

@griesemer and I don't think we want to make this change in go/types, and would prefer improving the objectpath API. Even if we pre-sorted, we'd still want to make a copy and so would lose some of the benefit of memoizing. Moving out of the Go1.20 milestone.

@findleyr findleyr modified the milestones: Go1.20, Backlog Nov 10, 2022
@findleyr findleyr changed the title go/types, x/tools/go/types/objectpath: frequent use of scope.Names burns significant cycles x/tools/go/types/objectpath: frequent use of scope.Names burns significant cycles Nov 10, 2022
@mvdan
Copy link
Member

mvdan commented Dec 20, 2022

I had an overly simplistic form of objectpath.For which I wrote a while ago, which you can see here: https://github.com/burrowers/garble/blob/98466a1f64dfc9a3b9d4bb49dcdb2b3a5ff5f245/main.go#L1533

I tried to switch to objectpath, as it's clearly more complete and I think it might fix two bugs I've been struggling with. However, re-running go test prompty crashed my computer out of memory :) It of course was due to all the Scope.Names() calls.

In my use case, the tool will be calling objectpath.For for almost every object reachable from a package. I know that's perhaps the worst case scenario, but I don't think I could avoid the majority of those calls. I need to essentially record and retrieve a "fact" about every exported name in a package: whether its name was changed or not.

I don't think I can use objectpath right now, but I'd be more than happy to help review or test any improvement or new API in objectpath itself. Memoization or a batch API are likely to make a big difference.

@adonovan
Copy link
Member

adonovan commented Dec 20, 2022

I'd approve a CL that added new API to the objectpath package that allowed it to memoize lookups, e.g. NewEncoder(pkg).For(obj). Internally, the Encoder could assert that obj.pkg = enc.pkg. The first time a For call falls through to the need for Scope.Names ("step 4"), the encoder could call Names once (using sync.Once if concurrency is required) then sort the slice into two parts, a slice of TypeName and one of non-TypeName, and retain these two. Then subsequent calls could skip the Names call, its allocation and sort, all the name-based Lookups and type assertions. Would that significantly improve performance in your case? Care to measure it?

@findleyr
Copy link
Contributor

I just ran into this while working on the new incremental scalability work. I observed that building our xrefs Index takes ~50% as long as type checking (6s vs 12s in my benchmark), largely due to calls to scope.Names.

So we're going to need a new API for gopls to use, anyway. We can experiment with it internally first, but should be sure to file a proposal once we're satisfied.

@findleyr
Copy link
Contributor

Another place that burned significant cycles was objectpath.canonicalize, which could also be optimized with a stateful encoder.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/470678 mentions this issue: go/types/objectpath: add Encoder type, to amortize Scope.Names

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/470679 mentions this issue: go/types/objectpath: add encoder type, to amortize Scope.Names

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/466975 mentions this issue: gopls/internal/lsp/cache: make type-checking incremental

gopherbot pushed a commit to golang/tools that referenced this issue Feb 25, 2023
This change adds a new encoder type with For method, that
is functionally equivalent to the old For function but avoids
the surprising cost of repeated calls to Scope.Names and
canonicalize (now called namedMethods), both of which
allocate and sorts a slice.

The former canonicalize function was previously applied to
interface types too, but this was costly and unnecessary
as they are already sorted.

The public API change will have to wait for proposal 58668
to be accepted and thus fix issue 51017; this change uses
the linkname hack to expose the function to x/tools to fix a
pressing bug in gopls.

See golang/go#58668
Updates golang/go#51017

Change-Id: Id3e8726517d31371b9376b0e47e320f8b6c5e085
Reviewed-on: https://go-review.googlesource.com/c/tools/+/470679
TryBot-Result: Gopher Robot <gobot@golang.org>
gopls-CI: kokoro <noreply+kokoro@google.com>
Run-TryBot: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
gopherbot pushed a commit to golang/tools that referenced this issue Mar 3, 2023
In this CL type-checked packages are made entirely independent of each
other, and package export data and indexes are stored in a file cache.
As a result, gopls uses significantly less memory, and (with a warm
cache) starts significantly faster. Other benchmarks have regressed
slightly due to the additional I/O and export data loading, but not
significantly so, and we have some ideas for how to further narrow or
even close the performance gap.

In the benchmarks below, based on the x/tools repository, we can see
that in-use memory was reduced by 88%, and startup time with a warm
cache by 65% (this is the best case where nothing has changed). Other
benchmarks regressed by 10-50%, much of which can be addressed by
improvements to the objectpath package (golang/go#51017), and by making
package data serialization asynchronous to type-checking.

Notably, we observe larger regressions in implementations, references,
and rename because the index implementations (by Alan Donovan) preceded
this change to type-checking, and so these benchmark statistics compare
in-memory index performance to on-disk index performance. Again, we can
optimize these if necessary by keeping certain index information in
memory, or by decoding more selectively.

name                              old in_use_bytes  new in_use_bytes  delta
InitialWorkspaceLoad/tools-12            432M ± 2%          50M ± 2%    -88.54%  (p=0.000 n=10+10)

name                              old time/op       new time/op       delta
StructCompletion/tools-12              27.2ms ± 5%       31.8ms ± 9%    +16.99%  (p=0.000 n=9+9)
ImportCompletion/tools-12              2.07ms ± 8%       2.21ms ± 6%     +6.64%  (p=0.004 n=9+9)
SliceCompletion/tools-12               29.0ms ± 5%       32.7ms ± 5%    +12.78%  (p=0.000 n=10+9)
FuncDeepCompletion/tools-12            39.6ms ± 6%       39.3ms ± 3%       ~     (p=0.853 n=10+10)
CompletionFollowingEdit/tools-12       72.7ms ± 7%      108.1ms ± 7%    +48.59%  (p=0.000 n=9+9)
Definition/tools-12                     525µs ± 6%        601µs ± 2%    +14.33%  (p=0.000 n=9+10)
DidChange/tools-12                     6.17ms ± 7%       6.77ms ± 2%     +9.64%  (p=0.000 n=10+10)
Hover/tools-12                         2.11ms ± 5%       2.61ms ± 3%    +23.87%  (p=0.000 n=10+10)
Implementations/tools-12               4.04ms ± 3%      60.19ms ± 3%  +1389.77%  (p=0.000 n=9+10)
InitialWorkspaceLoad/tools-12           3.84s ± 4%        1.33s ± 2%    -65.47%  (p=0.000 n=10+9)
References/tools-12                    9.72ms ± 6%      24.28ms ± 6%   +149.83%  (p=0.000 n=10+10)
Rename/tools-12                         121ms ± 8%        168ms ±12%    +38.92%  (p=0.000 n=10+10)
WorkspaceSymbols/tools-12              14.4ms ± 6%       15.6ms ± 3%     +8.76%  (p=0.000 n=9+10)

This CL is one step closer to the end* of a long journey to reduce
memory usage and statefulness in gopls, so that it can be more
performant and reliable.

Specifically, this CL implements a new type-checking pass that loads and
stores export data, cross references, serialized diagnostics, and method
set indexes in the file system. Concurrent type-checking passes may
share in-progress work, but after type-checking only active packages are
kept in memory. Consequently, there can be no global relationship
between type-checked packages. The work to break any dependence on
global relationships was done over a long time leading up to this CL.

In order to approach the previous type-checking performance, the
following new optimizations are made:
 - the global FileSet is completely removed: repeatedly importing from
   export data resulted in a tremendous amount of unnecessary token.File
   information, and so FileSets had to be scoped to packages
 - files are parsed as a batch and stored in the LRU cache implemented
   in the preceding CL
 - type-checking is also turned into a batch process, so that
   overlapping nodes in the package graph may be shared during large
   type-checking operations such as the initial workspace load

This new execution model enables several simplifications:
 - We no longer need to trim the AST before type-checking:
   TypeCheckMode and ParseExported are gone.
 - We no longer need to do careful bookkeeping around parsed files: all
   parsing uses the LRU parse cache.
 - It is no longer necessary to estimate cache heap usage in debug
   information.

There is still much more to do. This new model for gopls's execution
requires significant testing and experimentation. There may be new bugs
in the complicated new algorithms that enable this change, or bugs
related to the new reliance on export data (this may be the first time
export data for packages with type errors is significantly exercised).
There may be new environments where the new execution model does not
have the same beneficial effect. (On the other hand, there may be
some where it has an even more beneficial effect, such as resource
limited environments like dev containers.) There are also a lot of new
opportunities for optimization now that we are no longer tied to a rigid
structure of in-memory data.

*Furthermore, the following planned work is simply not done yet:
 - Implement precise pruning based on "deep" hash of imports.
 - Rewrite unimported completion, now that we no longer have cached
   import paths.

For golang/go#57987

Change-Id: Iedfc16656f79e314be448b892b710b9e63f72551
Reviewed-on: https://go-review.googlesource.com/c/tools/+/466975
Run-TryBot: Robert Findley <rfindley@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopls-CI: kokoro <noreply+kokoro@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
CarlosRosuero pushed a commit to CarlosRosuero/typeparams that referenced this issue Jun 13, 2023
In this CL type-checked packages are made entirely independent of each
other, and package export data and indexes are stored in a file cache.
As a result, gopls uses significantly less memory, and (with a warm
cache) starts significantly faster. Other benchmarks have regressed
slightly due to the additional I/O and export data loading, but not
significantly so, and we have some ideas for how to further narrow or
even close the performance gap.

In the benchmarks below, based on the x/tools repository, we can see
that in-use memory was reduced by 88%, and startup time with a warm
cache by 65% (this is the best case where nothing has changed). Other
benchmarks regressed by 10-50%, much of which can be addressed by
improvements to the objectpath package (golang/go#51017), and by making
package data serialization asynchronous to type-checking.

Notably, we observe larger regressions in implementations, references,
and rename because the index implementations (by Alan Donovan) preceded
this change to type-checking, and so these benchmark statistics compare
in-memory index performance to on-disk index performance. Again, we can
optimize these if necessary by keeping certain index information in
memory, or by decoding more selectively.

name                              old in_use_bytes  new in_use_bytes  delta
InitialWorkspaceLoad/tools-12            432M ± 2%          50M ± 2%    -88.54%  (p=0.000 n=10+10)

name                              old time/op       new time/op       delta
StructCompletion/tools-12              27.2ms ± 5%       31.8ms ± 9%    +16.99%  (p=0.000 n=9+9)
ImportCompletion/tools-12              2.07ms ± 8%       2.21ms ± 6%     +6.64%  (p=0.004 n=9+9)
SliceCompletion/tools-12               29.0ms ± 5%       32.7ms ± 5%    +12.78%  (p=0.000 n=10+9)
FuncDeepCompletion/tools-12            39.6ms ± 6%       39.3ms ± 3%       ~     (p=0.853 n=10+10)
CompletionFollowingEdit/tools-12       72.7ms ± 7%      108.1ms ± 7%    +48.59%  (p=0.000 n=9+9)
Definition/tools-12                     525µs ± 6%        601µs ± 2%    +14.33%  (p=0.000 n=9+10)
DidChange/tools-12                     6.17ms ± 7%       6.77ms ± 2%     +9.64%  (p=0.000 n=10+10)
Hover/tools-12                         2.11ms ± 5%       2.61ms ± 3%    +23.87%  (p=0.000 n=10+10)
Implementations/tools-12               4.04ms ± 3%      60.19ms ± 3%  +1389.77%  (p=0.000 n=9+10)
InitialWorkspaceLoad/tools-12           3.84s ± 4%        1.33s ± 2%    -65.47%  (p=0.000 n=10+9)
References/tools-12                    9.72ms ± 6%      24.28ms ± 6%   +149.83%  (p=0.000 n=10+10)
Rename/tools-12                         121ms ± 8%        168ms ±12%    +38.92%  (p=0.000 n=10+10)
WorkspaceSymbols/tools-12              14.4ms ± 6%       15.6ms ± 3%     +8.76%  (p=0.000 n=9+10)

This CL is one step closer to the end* of a long journey to reduce
memory usage and statefulness in gopls, so that it can be more
performant and reliable.

Specifically, this CL implements a new type-checking pass that loads and
stores export data, cross references, serialized diagnostics, and method
set indexes in the file system. Concurrent type-checking passes may
share in-progress work, but after type-checking only active packages are
kept in memory. Consequently, there can be no global relationship
between type-checked packages. The work to break any dependence on
global relationships was done over a long time leading up to this CL.

In order to approach the previous type-checking performance, the
following new optimizations are made:
 - the global FileSet is completely removed: repeatedly importing from
   export data resulted in a tremendous amount of unnecessary token.File
   information, and so FileSets had to be scoped to packages
 - files are parsed as a batch and stored in the LRU cache implemented
   in the preceding CL
 - type-checking is also turned into a batch process, so that
   overlapping nodes in the package graph may be shared during large
   type-checking operations such as the initial workspace load

This new execution model enables several simplifications:
 - We no longer need to trim the AST before type-checking:
   TypeCheckMode and ParseExported are gone.
 - We no longer need to do careful bookkeeping around parsed files: all
   parsing uses the LRU parse cache.
 - It is no longer necessary to estimate cache heap usage in debug
   information.

There is still much more to do. This new model for gopls's execution
requires significant testing and experimentation. There may be new bugs
in the complicated new algorithms that enable this change, or bugs
related to the new reliance on export data (this may be the first time
export data for packages with type errors is significantly exercised).
There may be new environments where the new execution model does not
have the same beneficial effect. (On the other hand, there may be
some where it has an even more beneficial effect, such as resource
limited environments like dev containers.) There are also a lot of new
opportunities for optimization now that we are no longer tied to a rigid
structure of in-memory data.

*Furthermore, the following planned work is simply not done yet:
 - Implement precise pruning based on "deep" hash of imports.
 - Rewrite unimported completion, now that we no longer have cached
   import paths.

For golang/go#57987

Change-Id: Iedfc16656f79e314be448b892b710b9e63f72551
Reviewed-on: https://go-review.googlesource.com/c/tools/+/466975
Run-TryBot: Robert Findley <rfindley@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopls-CI: kokoro <noreply+kokoro@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
@golang golang locked and limited conversation to collaborators Apr 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance
Projects
None yet
Development

No branches or pull requests

7 participants