-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/tools/go/packages: Load is inefficient due to heavy contention #70094
Comments
I'm not sure what's causing this, but I can do some debugging to try to figure out where this is happening. Is this happening on all platforms or only some? I tried running the tests on golang/tools@70e82e013 (the commit before the fix for #70078 was submitted) running a single test case at a time and logging the time spent in the go list driver but it all seemed to be in the 70-100ms range for me. |
I'm using a Mac, whose file system is slower than Linux. Try again with this patch, which adds 8-way parallelism, and logs the time of each go list command: diff --git a/go/ssa/builder_test.go b/go/ssa/builder_test.go
index a38186dfa..8368bcc9b 100644
--- a/go/ssa/builder_test.go
+++ b/go/ssa/builder_test.go
@@ -701,6 +701,10 @@ func TestTypeparamTest(t *testing.T) {
continue // Consider standalone go files.
}
t.Run(entry.Name(), func(t *testing.T) {
+ t.Parallel()
+ iolimit <- struct{}{} // acquire a token
+ defer func() { <-iolimit }() // release a token
+
src, err := fs.ReadFile(fsys, entry.Name())
if err != nil {
t.Fatal(err)
@@ -718,6 +722,8 @@ func TestTypeparamTest(t *testing.T) {
}
}
+var iolimit = make(chan struct{}, 8) // concurrency limiting counting semaphore
+
// TestOrderOfOperations ensures order of operations are as intended.
func TestOrderOfOperations(t *testing.T) {
// Testing for the order of operations within an expression is done
diff --git a/go/ssa/testutil_test.go b/go/ssa/testutil_test.go
index 58680b282..e03935fa7 100644
--- a/go/ssa/testutil_test.go
+++ b/go/ssa/testutil_test.go
@@ -11,9 +11,11 @@ import (
"go/parser"
"go/token"
"io/fs"
+ "log"
"os"
"testing"
"testing/fstest"
+ "time"
"golang.org/x/tools/go/packages"
"golang.org/x/tools/go/ssa"
@@ -89,7 +91,9 @@ func loadPackages(t testing.TB, src fs.FS, patterns ...string) []*packages.Packa
"GOWORK=off",
"GOPROXY=off"),
}
+ t0 := time.Now()
pkgs, err := packages.Load(cfg, patterns...)
+ log.Printf("loadPackages %v = %v", patterns, time.Since(t0))
if err != nil {
t.Fatal(err)
}
Typical results:
The results are less clearly bimodal than the other day, though. |
FYI @adonovan @matloob , I tried alan's approach in my ubuntu 20.04 with kernel 'go list' has a relative low variance
Click to see the test codevar iolimits = make(chan struct{}, 8)
func TestGoListSpeed(t *testing.T) {
for i := 0; i < 1_000; i++ {
t.Run("", func(t *testing.T) {
t.Parallel()
iolimits <- struct{}{}
defer func() {
<-iolimits
}()
cmd := exec.Command("go", "list", "std")
t0 := time.Now()
err := cmd.Run()
t.Log(time.Since(t0))
if err != nil {
t.Fatal(err)
}
})
}
} 'packages.Load' vs 'go list'Add new lines on the top of alan's diff: @@ -89,13 +92,24 @@ func loadPackages(t testing.TB, src fs.FS, patterns ...string) []*packages.Packa
if packages.PrintErrors(pkgs) > 0 {
t.Fatal("there were errors")
}
+ cmd := exec.Command("go", append([]string{"list"}, patterns...)...)
+ cmd.Dir = dir
+ t1 := time.Now()
+ err = cmd.Run()
+ if err != nil {
+ t.Fatal(err)
+ }
+ log.Printf("go list %v = %v", patterns, time.Since(t1))
|
Thanks for the data. I cannot now remember whether in my original observation of the problem in #70078 I was measuring the "go list" execution directly, or the packages.Load call (as in my diff above). I will measure again just the "go list" executions within packages.Load to confirm whether the effect is real or just an artifact of using packages.Load. (BTW, by "killing time" I meant something more specific: logic to intentionally sleep for a second to avoid clock quantization problems. I vaguely recall we added this to the go command.) I'm not surprised that the packages.Load should block for a long time in Lock and Wait as it starts many goroutines that all immediately block waiting for their predecessors. It would reduce contention to build the waits-for graph and only enqueue a task once it predecessors are complete, like we do for gopls' analysis). I should certainly fix that in any case. |
Measuring again with this additional patch to print raw go list times:
I observe these statistics:
It does indeed like like packages.Load is to blame. Thanks again for the suggestion. |
@adonovan understood, thanks for the update.
Would you mind if I add an enqueue functionality to package.Load? I wrote a draft and could pass the existing tests, but some refinements and tidy up are needed and I will do it tomorrow. By the way, the design in gopls is really a good example. |
Change https://go.dev/cl/624815 mentions this issue: |
Change https://go.dev/cl/624816 mentions this issue: |
Thanks for the offer. I had already put together a quick implementation before I saw your note, but unfortunately it doesn't deliver tangible gains yet. I've shared it in CL 624816. If you'd like to take it over, be my guest.
Thanks. It is potentially much more efficient than the naive approach, but the naive approach (one goroutine per edge) is just so easy to write that I use it a lot--too much. At some point I'd like to try to extract the "enqueue" approach into a library, something like: package parallel
// A Graph holds the state for a parallel fold over a directed acyclic graph.
type Graph struct { ...leaves... }
// AddEdge records a dependency between two nodes.
func (*Graph[N Node]) AddEdge(from, to N)
// Do executes each Node's Do function in parallel postorder.
func (*Graph) Do(context.Context) error { ... enqueue, etc ... }
// A Node is an application data type representing a graph node.
// The concrete type of node must embed a NodeState.
type Node interface {
// Do does the computation for a node.
// The node may assume its direct dependencies are already done.
Do(context.Context) error
// State returns the state of the node.
State() *NodeState
}
// A NodeState holds the batch state of a node. Embed it within your node data type.
type NodeState[N Node] struct { ... opaque: succs, preds, counters ... }
// Call Finished when all the AddEdge calls for this node have been made.
// (This enqueues the initial leaf nodes.)
func (s *State[N]) Finished() { ...} |
Your approach that resolves the leaves of DAG and enqueue them is better than mine(xieyuschen/tools@14066ba), which spawns more go routines and resolves the imports inside the spawned go routines.
I tried to cherry-pick your change at golang/tools@70e82e013 and run test with your diff again, the data fluctuation reduces a lot and there is no second-level load operations anymore. I have seen your commit message, but I don't know how to improve it further:( |
Details: - Break out of overlay loop when a match is found. This is a (theoretical) optimization. - Document optimization opportunities: one TODO, and one not worth doing. - Clarify some comments. - Use errgroup instead of WaitGroup (more concise). No intended behavior changes. Updates golang/go#70094 Change-Id: I27e34adb1176b65b4fae60e7d67bb4d2f4b19a14 Reviewed-on: https://go-review.googlesource.com/c/tools/+/624815 Reviewed-by: Robert Findley <rfindley@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Alan Donovan <adonovan@google.com>
This CL changes the concurrent recursion over the import graph so that instead of creating one goroutine per import edge (the naive approach formerly used by loadRecursive), it creates only one per package. We follow the approach used by 'enqueue' in gopls/internal/cache/analysis.go: we construct the inverse import graph, and keep a count of unfinished successors (direct dependencies). Only when this count drops to zero do we add a package to the queue. Subtle: the import graph must now be materalized not just when NeedImports, but when Syntax or Types are needed too. Also, we add a CPU-limiting semaphore around calls to the parser and type-checker. This seems to improve the BenchmarkNetHTTP by around 10% elapsed time. Before: BenchmarkNetHTTP-8 19 550622678 ns/op 17.12 MB/s 484348024 B/op 4951363 allocs/op After: BenchmarkNetHTTP-8 24 499117113 ns/op 18.89 MB/s 484340136 B/op 4949985 allocs/op Updates golang/go#70094 Change-Id: I54bf17b16238844401aba4ec5a7f6485da5e73fb Reviewed-on: https://go-review.googlesource.com/c/tools/+/624816 Commit-Queue: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Alan Donovan <adonovan@google.com> Reviewed-by: Robert Findley <rfindley@google.com>
See #70078 (comment): for the same workload, the go list command sometimes runs very quickly (70ms) and other times waits for a conspicuously round number (1.0s) of elapsed time before it exits. I vaguely recall that this was intentional behavior to prevent file system time quantization problems. Is it always necessary? If so, when and why? Are there ways to avoid it?
@matloob @samthanawalla
The text was updated successfully, but these errors were encountered: