-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/tools/go/types/objectpath: frequent use of scope.Names burns significant cycles #51017
Comments
Change https://golang.org/cl/383075 mentions this issue: |
Thanks for filing. Need to think about this. I'll just note that in the profile linked on that CL, there is also significant amount of time spent in |
Change https://go.dev/cl/404514 mentions this issue: |
Change https://go.dev/cl/405354 mentions this issue: |
Don't search for a matching method on all types in the package scope when we can trivially determine the type's name. This avoids both a linear search over the scope, as well as the currently rather costly call to scope.Names itself. Updates golang/go#51017 Change-Id: I614f4b1b149d6b42728d3f22f5e47e8ff87a5392 Reviewed-on: https://go-review.googlesource.com/c/tools/+/404514 Run-TryBot: Alan Donovan <adonovan@google.com> gopls-CI: kokoro <noreply+kokoro@google.com> Run-TryBot: Dominik Honnef <dominik@honnef.co> Reviewed-by: Alan Donovan <adonovan@google.com> Reviewed-by: Robert Findley <rfindley@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>
@griesemer @alandonovan what do you think about this change? https://go.dev/cl/383075 appears to be stuck in limbo, and it would be good to make a decision for 1.20. Being conservative, I am inclined to say that changing scope.Names to return a memoized slice is a breaking change, may increase steady-state memory, and is inconsistent with the rest of the go/types API which goes to lengths to avoid exposing internal data (consider But I don't feel strongly. If it is most expedient to save sorted names on the scope, in reality it is probably harmless. |
The actual code review hasn't addressed all the comments (e.g. re: atomicity) required for +2, but perhaps that's just because the author doesn't know if the design is worth pursuing. I agree that if the problem is really just in objectpath, then that's a safer place for the fix: perhaps objectpath could compute the paths for the entire package more cheaply in a single pass, then serve a large number of queries on that---like the AST inspector. Even if we don't implement all that at once, the same API could take the first step of memoizing scope.Names() across multiple calls, but it seems likely there are even bigger potential wins if we take that approach. |
I assumed as much. We did not resolve the viability of the approach in that CL.
Yes, that is approximately the API I had in mind, and the AST inspector is a good example. Agree that there are probably other potential optimizations in that approach. |
https://go-review.git.corp.google.com/c/tools/+/405354 (WiP for memoizing) was stuck due to not having a clear performance justification yet. |
@griesemer and I don't think we want to make this change in go/types, and would prefer improving the objectpath API. Even if we pre-sorted, we'd still want to make a copy and so would lose some of the benefit of memoizing. Moving out of the Go1.20 milestone. |
I had an overly simplistic form of I tried to switch to objectpath, as it's clearly more complete and I think it might fix two bugs I've been struggling with. However, re-running In my use case, the tool will be calling I don't think I can use objectpath right now, but I'd be more than happy to help review or test any improvement or new API in objectpath itself. Memoization or a batch API are likely to make a big difference. |
I'd approve a CL that added new API to the objectpath package that allowed it to memoize lookups, e.g. NewEncoder(pkg).For(obj). Internally, the Encoder could assert that obj.pkg = enc.pkg. The first time a For call falls through to the need for Scope.Names ("step 4"), the encoder could call Names once (using sync.Once if concurrency is required) then sort the slice into two parts, a slice of TypeName and one of non-TypeName, and retain these two. Then subsequent calls could skip the Names call, its allocation and sort, all the name-based Lookups and type assertions. Would that significantly improve performance in your case? Care to measure it? |
I just ran into this while working on the new incremental scalability work. I observed that building our xrefs Index takes ~50% as long as type checking (6s vs 12s in my benchmark), largely due to calls to scope.Names. So we're going to need a new API for gopls to use, anyway. We can experiment with it internally first, but should be sure to file a proposal once we're satisfied. |
Another place that burned significant cycles was |
Change https://go.dev/cl/470678 mentions this issue: |
Change https://go.dev/cl/470679 mentions this issue: |
Change https://go.dev/cl/466975 mentions this issue: |
This change adds a new encoder type with For method, that is functionally equivalent to the old For function but avoids the surprising cost of repeated calls to Scope.Names and canonicalize (now called namedMethods), both of which allocate and sorts a slice. The former canonicalize function was previously applied to interface types too, but this was costly and unnecessary as they are already sorted. The public API change will have to wait for proposal 58668 to be accepted and thus fix issue 51017; this change uses the linkname hack to expose the function to x/tools to fix a pressing bug in gopls. See golang/go#58668 Updates golang/go#51017 Change-Id: Id3e8726517d31371b9376b0e47e320f8b6c5e085 Reviewed-on: https://go-review.googlesource.com/c/tools/+/470679 TryBot-Result: Gopher Robot <gobot@golang.org> gopls-CI: kokoro <noreply+kokoro@google.com> Run-TryBot: Alan Donovan <adonovan@google.com> Reviewed-by: Robert Findley <rfindley@google.com>
In this CL type-checked packages are made entirely independent of each other, and package export data and indexes are stored in a file cache. As a result, gopls uses significantly less memory, and (with a warm cache) starts significantly faster. Other benchmarks have regressed slightly due to the additional I/O and export data loading, but not significantly so, and we have some ideas for how to further narrow or even close the performance gap. In the benchmarks below, based on the x/tools repository, we can see that in-use memory was reduced by 88%, and startup time with a warm cache by 65% (this is the best case where nothing has changed). Other benchmarks regressed by 10-50%, much of which can be addressed by improvements to the objectpath package (golang/go#51017), and by making package data serialization asynchronous to type-checking. Notably, we observe larger regressions in implementations, references, and rename because the index implementations (by Alan Donovan) preceded this change to type-checking, and so these benchmark statistics compare in-memory index performance to on-disk index performance. Again, we can optimize these if necessary by keeping certain index information in memory, or by decoding more selectively. name old in_use_bytes new in_use_bytes delta InitialWorkspaceLoad/tools-12 432M ± 2% 50M ± 2% -88.54% (p=0.000 n=10+10) name old time/op new time/op delta StructCompletion/tools-12 27.2ms ± 5% 31.8ms ± 9% +16.99% (p=0.000 n=9+9) ImportCompletion/tools-12 2.07ms ± 8% 2.21ms ± 6% +6.64% (p=0.004 n=9+9) SliceCompletion/tools-12 29.0ms ± 5% 32.7ms ± 5% +12.78% (p=0.000 n=10+9) FuncDeepCompletion/tools-12 39.6ms ± 6% 39.3ms ± 3% ~ (p=0.853 n=10+10) CompletionFollowingEdit/tools-12 72.7ms ± 7% 108.1ms ± 7% +48.59% (p=0.000 n=9+9) Definition/tools-12 525µs ± 6% 601µs ± 2% +14.33% (p=0.000 n=9+10) DidChange/tools-12 6.17ms ± 7% 6.77ms ± 2% +9.64% (p=0.000 n=10+10) Hover/tools-12 2.11ms ± 5% 2.61ms ± 3% +23.87% (p=0.000 n=10+10) Implementations/tools-12 4.04ms ± 3% 60.19ms ± 3% +1389.77% (p=0.000 n=9+10) InitialWorkspaceLoad/tools-12 3.84s ± 4% 1.33s ± 2% -65.47% (p=0.000 n=10+9) References/tools-12 9.72ms ± 6% 24.28ms ± 6% +149.83% (p=0.000 n=10+10) Rename/tools-12 121ms ± 8% 168ms ±12% +38.92% (p=0.000 n=10+10) WorkspaceSymbols/tools-12 14.4ms ± 6% 15.6ms ± 3% +8.76% (p=0.000 n=9+10) This CL is one step closer to the end* of a long journey to reduce memory usage and statefulness in gopls, so that it can be more performant and reliable. Specifically, this CL implements a new type-checking pass that loads and stores export data, cross references, serialized diagnostics, and method set indexes in the file system. Concurrent type-checking passes may share in-progress work, but after type-checking only active packages are kept in memory. Consequently, there can be no global relationship between type-checked packages. The work to break any dependence on global relationships was done over a long time leading up to this CL. In order to approach the previous type-checking performance, the following new optimizations are made: - the global FileSet is completely removed: repeatedly importing from export data resulted in a tremendous amount of unnecessary token.File information, and so FileSets had to be scoped to packages - files are parsed as a batch and stored in the LRU cache implemented in the preceding CL - type-checking is also turned into a batch process, so that overlapping nodes in the package graph may be shared during large type-checking operations such as the initial workspace load This new execution model enables several simplifications: - We no longer need to trim the AST before type-checking: TypeCheckMode and ParseExported are gone. - We no longer need to do careful bookkeeping around parsed files: all parsing uses the LRU parse cache. - It is no longer necessary to estimate cache heap usage in debug information. There is still much more to do. This new model for gopls's execution requires significant testing and experimentation. There may be new bugs in the complicated new algorithms that enable this change, or bugs related to the new reliance on export data (this may be the first time export data for packages with type errors is significantly exercised). There may be new environments where the new execution model does not have the same beneficial effect. (On the other hand, there may be some where it has an even more beneficial effect, such as resource limited environments like dev containers.) There are also a lot of new opportunities for optimization now that we are no longer tied to a rigid structure of in-memory data. *Furthermore, the following planned work is simply not done yet: - Implement precise pruning based on "deep" hash of imports. - Rewrite unimported completion, now that we no longer have cached import paths. For golang/go#57987 Change-Id: Iedfc16656f79e314be448b892b710b9e63f72551 Reviewed-on: https://go-review.googlesource.com/c/tools/+/466975 Run-TryBot: Robert Findley <rfindley@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> gopls-CI: kokoro <noreply+kokoro@google.com> Reviewed-by: Alan Donovan <adonovan@google.com>
In this CL type-checked packages are made entirely independent of each other, and package export data and indexes are stored in a file cache. As a result, gopls uses significantly less memory, and (with a warm cache) starts significantly faster. Other benchmarks have regressed slightly due to the additional I/O and export data loading, but not significantly so, and we have some ideas for how to further narrow or even close the performance gap. In the benchmarks below, based on the x/tools repository, we can see that in-use memory was reduced by 88%, and startup time with a warm cache by 65% (this is the best case where nothing has changed). Other benchmarks regressed by 10-50%, much of which can be addressed by improvements to the objectpath package (golang/go#51017), and by making package data serialization asynchronous to type-checking. Notably, we observe larger regressions in implementations, references, and rename because the index implementations (by Alan Donovan) preceded this change to type-checking, and so these benchmark statistics compare in-memory index performance to on-disk index performance. Again, we can optimize these if necessary by keeping certain index information in memory, or by decoding more selectively. name old in_use_bytes new in_use_bytes delta InitialWorkspaceLoad/tools-12 432M ± 2% 50M ± 2% -88.54% (p=0.000 n=10+10) name old time/op new time/op delta StructCompletion/tools-12 27.2ms ± 5% 31.8ms ± 9% +16.99% (p=0.000 n=9+9) ImportCompletion/tools-12 2.07ms ± 8% 2.21ms ± 6% +6.64% (p=0.004 n=9+9) SliceCompletion/tools-12 29.0ms ± 5% 32.7ms ± 5% +12.78% (p=0.000 n=10+9) FuncDeepCompletion/tools-12 39.6ms ± 6% 39.3ms ± 3% ~ (p=0.853 n=10+10) CompletionFollowingEdit/tools-12 72.7ms ± 7% 108.1ms ± 7% +48.59% (p=0.000 n=9+9) Definition/tools-12 525µs ± 6% 601µs ± 2% +14.33% (p=0.000 n=9+10) DidChange/tools-12 6.17ms ± 7% 6.77ms ± 2% +9.64% (p=0.000 n=10+10) Hover/tools-12 2.11ms ± 5% 2.61ms ± 3% +23.87% (p=0.000 n=10+10) Implementations/tools-12 4.04ms ± 3% 60.19ms ± 3% +1389.77% (p=0.000 n=9+10) InitialWorkspaceLoad/tools-12 3.84s ± 4% 1.33s ± 2% -65.47% (p=0.000 n=10+9) References/tools-12 9.72ms ± 6% 24.28ms ± 6% +149.83% (p=0.000 n=10+10) Rename/tools-12 121ms ± 8% 168ms ±12% +38.92% (p=0.000 n=10+10) WorkspaceSymbols/tools-12 14.4ms ± 6% 15.6ms ± 3% +8.76% (p=0.000 n=9+10) This CL is one step closer to the end* of a long journey to reduce memory usage and statefulness in gopls, so that it can be more performant and reliable. Specifically, this CL implements a new type-checking pass that loads and stores export data, cross references, serialized diagnostics, and method set indexes in the file system. Concurrent type-checking passes may share in-progress work, but after type-checking only active packages are kept in memory. Consequently, there can be no global relationship between type-checked packages. The work to break any dependence on global relationships was done over a long time leading up to this CL. In order to approach the previous type-checking performance, the following new optimizations are made: - the global FileSet is completely removed: repeatedly importing from export data resulted in a tremendous amount of unnecessary token.File information, and so FileSets had to be scoped to packages - files are parsed as a batch and stored in the LRU cache implemented in the preceding CL - type-checking is also turned into a batch process, so that overlapping nodes in the package graph may be shared during large type-checking operations such as the initial workspace load This new execution model enables several simplifications: - We no longer need to trim the AST before type-checking: TypeCheckMode and ParseExported are gone. - We no longer need to do careful bookkeeping around parsed files: all parsing uses the LRU parse cache. - It is no longer necessary to estimate cache heap usage in debug information. There is still much more to do. This new model for gopls's execution requires significant testing and experimentation. There may be new bugs in the complicated new algorithms that enable this change, or bugs related to the new reliance on export data (this may be the first time export data for packages with type errors is significantly exercised). There may be new environments where the new execution model does not have the same beneficial effect. (On the other hand, there may be some where it has an even more beneficial effect, such as resource limited environments like dev containers.) There are also a lot of new opportunities for optimization now that we are no longer tied to a rigid structure of in-memory data. *Furthermore, the following planned work is simply not done yet: - Implement precise pruning based on "deep" hash of imports. - Rewrite unimported completion, now that we no longer have cached import paths. For golang/go#57987 Change-Id: Iedfc16656f79e314be448b892b710b9e63f72551 Reviewed-on: https://go-review.googlesource.com/c/tools/+/466975 Run-TryBot: Robert Findley <rfindley@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> gopls-CI: kokoro <noreply+kokoro@google.com> Reviewed-by: Alan Donovan <adonovan@google.com>
What version of Go are you using (
go version
)?go1.18-pre10
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?Linux, amd64. Not dependent on OS or architecture, however.
What did you do?
The objectpath.For method relies on frequent calls to scope.Names(), which always performs a sort of all names. For large-scale analysis trees, this can consume a significant portion of cycles (30-50% in my case) and can easily be memoized. This can either be done by objectpath as a consumer of the API, or within the *types.Scope object.
What did you expect to see?
Fewer cycles spent sorting.
What did you see instead?
Lots of cycles spent sorting.
The text was updated successfully, but these errors were encountered: