x/tools/go/packages: Load is inefficient due to heavy contention #70094

adonovan · 2024-10-29T19:07:43Z

See #70078 (comment): for the same workload, the go list command sometimes runs very quickly (70ms) and other times waits for a conspicuously round number (1.0s) of elapsed time before it exits. I vaguely recall that this was intentional behavior to prevent file system time quantization problems. Is it always necessary? If so, when and why? Are there ways to avoid it?

@matloob @samthanawalla

matloob · 2024-11-01T20:53:00Z

I'm not sure what's causing this, but I can do some debugging to try to figure out where this is happening.

Is this happening on all platforms or only some? I tried running the tests on golang/tools@70e82e013 (the commit before the fix for #70078 was submitted) running a single test case at a time and logging the time spent in the go list driver but it all seemed to be in the 70-100ms range for me.

adonovan · 2024-11-01T22:05:18Z

I'm using a Mac, whose file system is slower than Linux. Try again with this patch, which adds 8-way parallelism, and logs the time of each go list command:

diff --git a/go/ssa/builder_test.go b/go/ssa/builder_test.go
index a38186dfa..8368bcc9b 100644
--- a/go/ssa/builder_test.go
+++ b/go/ssa/builder_test.go
@@ -701,6 +701,10 @@ func TestTypeparamTest(t *testing.T) {
                        continue // Consider standalone go files.
                }
                t.Run(entry.Name(), func(t *testing.T) {
+                       t.Parallel()
+                       iolimit <- struct{}{}        // acquire a token
+                       defer func() { <-iolimit }() // release a token
+
                        src, err := fs.ReadFile(fsys, entry.Name())
                        if err != nil {
                                t.Fatal(err)
@@ -718,6 +722,8 @@ func TestTypeparamTest(t *testing.T) {
        }
 }
 
+var iolimit = make(chan struct{}, 8) // concurrency limiting counting semaphore
+
 // TestOrderOfOperations ensures order of operations are as intended.
 func TestOrderOfOperations(t *testing.T) {
        // Testing for the order of operations within an expression is done

diff --git a/go/ssa/testutil_test.go b/go/ssa/testutil_test.go
index 58680b282..e03935fa7 100644
--- a/go/ssa/testutil_test.go
+++ b/go/ssa/testutil_test.go
@@ -11,9 +11,11 @@ import (
        "go/parser"
        "go/token"
        "io/fs"
+       "log"
        "os"
        "testing"
        "testing/fstest"
+       "time"
 
        "golang.org/x/tools/go/packages"
        "golang.org/x/tools/go/ssa"
@@ -89,7 +91,9 @@ func loadPackages(t testing.TB, src fs.FS, patterns ...string) []*packages.Packa
                        "GOWORK=off",
                        "GOPROXY=off"),
        }
+       t0 := time.Now()
        pkgs, err := packages.Load(cfg, patterns...)
+       log.Printf("loadPackages %v = %v", patterns, time.Since(t0))
        if err != nil {
                t.Fatal(err)
        }

macOS$ go test -v  ./go/ssa/ -run=TestTypeparamTest

Typical results:

...
=== CONT  TestTypeparamTest/issue50147.go
    builder_test.go:718: Input: issue50147.go
2024/11/01 18:03:36 loadPackages [main] = 2.892153834s
2024/11/01 18:03:36 loadPackages [p] = 94.128958ms
=== CONT  TestTypeparamTest/issue50121b.go
    builder_test.go:718: Input: issue50121b.go
=== CONT  TestTypeparamTest/issue50121.go
    builder_test.go:718: Input: issue50121.go
2024/11/01 18:03:36 loadPackages [ignored] = 126.790666ms
=== CONT  TestTypeparamTest/issue50109b.go
    builder_test.go:718: Input: issue50109b.go
2024/11/01 18:03:36 loadPackages [ignored] = 188.077708ms
=== CONT  TestTypeparamTest/issue50109.go
    builder_test.go:718: Input: issue50109.go
2024/11/01 18:03:37 loadPackages [main] = 168.551041ms
=== CONT  TestTypeparamTest/issue50002.go
    builder_test.go:718: Input: issue50002.go
2024/11/01 18:03:37 loadPackages [main] = 2.9195375s
=== CONT  TestTypeparamTest/issue49893.go
    builder_test.go:718: Input: issue49893.go
2024/11/01 18:03:37 loadPackages [main] = 2.511128708s
=== CONT  TestTypeparamTest/issue51836.go
    builder_test.go:718: Input: issue51836.go
2024/11/01 18:03:37 loadPackages [ignored] = 96.019ms
=== CONT  TestTypeparamTest/issue49875.go
    builder_test.go:718: Input: issue49875.go
2024/11/01 18:03:37 loadPackages [ignored] = 97.62825ms
=== CONT  TestTypeparamTest/issue51832.go
    builder_test.go:718: Input: issue51832.go
2024/11/01 18:03:37 loadPackages [p] = 116.755166ms
=== CONT  TestTypeparamTest/issue49667.go
    builder_test.go:718: Input: issue49667.go
2024/11/01 18:03:37 loadPackages [main] = 2.507203417s
2024/11/01 18:03:37 loadPackages [main] = 141.188791ms
=== CONT  TestTypeparamTest/issue49659b.go
    builder_test.go:718: Input: issue49659b.go
2024/11/01 18:03:37 loadPackages [ignored] = 126.474333ms
=== CONT  TestTypeparamTest/issue51765.go
    builder_test.go:718: Input: issue51765.go
=== CONT  TestTypeparamTest/issue49659.go
    builder_test.go:718: Input: issue49659.go
2024/11/01 18:03:38 loadPackages [p] = 175.450167ms
...

The results are less clearly bimodal than the other day, though.

xieyuschen · 2024-11-03T10:33:25Z

FYI @adonovan @matloob , I tried alan's approach in my ubuntu 20.04 with kernel 6.8.0-48-generic. It's more likely the packages.Load kills time instead of go.

'go list' has a relative low variance

=== CONT  TestGoListSpeed/#285
=== NAME  TestGoListSpeed/#293
    main_test.go:22: 931.67855ms
=== CONT  TestGoListSpeed/#284
=== NAME  TestGoListSpeed/#291
    main_test.go:22: 909.82343ms
=== CONT  TestGoListSpeed/#283
=== NAME  TestGoListSpeed/#290
    main_test.go:22: 959.312539ms
=== CONT  TestGoListSpeed/#282
=== NAME  TestGoListSpeed/#289
    main_test.go:22: 839.518395ms
=== CONT  TestGoListSpeed/#281
=== NAME  TestGoListSpeed/#288
    main_test.go:22: 962.895681ms
=== CONT  TestGoListSpeed/#280
=== NAME  TestGoListSpeed/#287
    main_test.go:22: 911.338394ms
=== CONT  TestGoListSpeed/#279
=== NAME  TestGoListSpeed/#285
    main_test.go:22: 820.664181ms

Click to see the test code

var iolimits = make(chan struct{}, 8)

func TestGoListSpeed(t *testing.T) {
	for i := 0; i < 1_000; i++ {
		t.Run("", func(t *testing.T) {
			t.Parallel()
			iolimits <- struct{}{}
			defer func() {
				<-iolimits
			}()
			cmd := exec.Command("go", "list", "std")
			t0 := time.Now()
			err := cmd.Run()
			t.Log(time.Since(t0))
			if err != nil {
				t.Fatal(err)
			}
		})
	}
}

'packages.Load' vs 'go list'

Add new lines on the top of alan's diff:

@@ -89,13 +92,24 @@ func loadPackages(t testing.TB, src fs.FS, patterns ...string) []*packages.Packa
        if packages.PrintErrors(pkgs) > 0 {
                t.Fatal("there were errors")
        }
+       cmd := exec.Command("go", append([]string{"list"}, patterns...)...)
+       cmd.Dir = dir
+       t1 := time.Now()
+       err = cmd.Run()
+       if err != nil {
+               t.Fatal(err)
+       }
+       log.Printf("go list %v = %v", patterns, time.Since(t1))

=== CONT  TestTypeparamTest/issue51522a.go
    builder_test.go:717: Input: issue51522a.go
2024/11/03 17:57:09 loadPackages [main] = 291.218732ms
2024/11/03 17:57:09 loadPackages [main] = 329.429631ms
2024/11/03 17:57:09 go list [main] = 106.651425ms

=== CONT  TestTypeparamTest/issue51521.go
    builder_test.go:717: Input: issue51521.go
2024/11/03 17:57:09 loadPackages [main] = 360.640229ms
2024/11/03 17:57:09 go list [main] = 37.53786ms
2024/11/03 17:57:10 loadPackages [main] = 314.162804ms
2024/11/03 17:57:10 go list [main] = 82.010451ms

=== CONT  TestTypeparamTest/issue51355.go
    builder_test.go:717: Input: issue51355.go
2024/11/03 17:57:11 loadPackages [main] = 331.424779ms
2024/11/03 17:57:11 go list [main] = 59.669024ms

=== CONT  TestTypeparamTest/ifaceconv.go
    builder_test.go:717: Input: ifaceconv.go
2024/11/03 17:57:11 loadPackages [main] = 3.683294694s
2024/11/03 17:57:11 go list [main] = 101.648646ms

=== CONT  TestTypeparamTest/issue48645a.go
    builder_test.go:717: Input: issue48645a.go
2024/11/03 17:57:11 loadPackages [main] = 3.485840972s
2024/11/03 17:57:11 loadPackages [main] = 3.142638313s
2024/11/03 17:57:11 go list [main] = 83.376119ms

xieyuschen · 2024-11-03T12:38:27Z

I found the synchronization sync.(*WaitGroup).Wait and sync.(*Mutex).Lock inside packages.(*loader).loadRecursive takes a lot of time.

go test -v  ./go/ssa/ -run=TestTypeparamTest -trace trace.out
go tool trace trace.out

adonovan · 2024-11-03T14:16:52Z

Thanks for the data. I cannot now remember whether in my original observation of the problem in #70078 I was measuring the "go list" execution directly, or the packages.Load call (as in my diff above). I will measure again just the "go list" executions within packages.Load to confirm whether the effect is real or just an artifact of using packages.Load. (BTW, by "killing time" I meant something more specific: logic to intentionally sleep for a second to avoid clock quantization problems. I vaguely recall we added this to the go command.)

I'm not surprised that the packages.Load should block for a long time in Lock and Wait as it starts many goroutines that all immediately block waiting for their predecessors. It would reduce contention to build the waits-for graph and only enqueue a task once it predecessors are complete, like we do for gopls' analysis). I should certainly fix that in any case.

adonovan · 2024-11-03T17:16:59Z

Measuring again with this additional patch to print raw go list times:

diff --git a/internal/gocommand/invoke.go b/internal/gocommand/invoke.go
index e333efc87..c536f60eb 100644
--- a/internal/gocommand/invoke.go
+++ b/internal/gocommand/invoke.go
@@ -376,6 +376,7 @@ func runCmdContext(ctx context.Context, cmd *exec.Cmd) (err error) {
        go func() {
                resChan <- cmd.Wait()
+               log.Printf("Command %s took %v", cmd, time.Since(startTime).Microseconds())
        }()

I observe these statistics:

		50%	75%	90%	95%	99%
packages.Load	197ms	1.26s	2.15s	2.37s	2.84s
go list		157ms	180ms	215ms	228ms	252ms

It does indeed like like packages.Load is to blame. Thanks again for the suggestion.

xieyuschen · 2024-11-03T17:43:57Z

@adonovan understood, thanks for the update.

It would reduce contention to build the waits-for graph and only enqueue a task once it predecessors are complete, like we do for gopls' analysis). I should certainly fix that in any case.

Would you mind if I add an enqueue functionality to package.Load? I wrote a draft and could pass the existing tests, but some refinements and tidy up are needed and I will do it tomorrow. By the way, the design in gopls is really a good example.

gopherbot · 2024-11-03T20:59:54Z

Change https://go.dev/cl/624815 mentions this issue: go/packages: minor cleanups to loader.parseFiles

gopherbot · 2024-11-03T20:59:55Z

Change https://go.dev/cl/624816 mentions this issue: go/packages: create fewer goroutines

adonovan · 2024-11-03T21:25:34Z

Would you mind if I add an enqueue functionality to package.Load? I wrote a draft and could pass the existing tests, but some refinements and tidy up are needed and I will do it tomorrow.

Thanks for the offer. I had already put together a quick implementation before I saw your note, but unfortunately it doesn't deliver tangible gains yet. I've shared it in CL 624816. If you'd like to take it over, be my guest.

By the way, the design in gopls is really a good example.

Thanks. It is potentially much more efficient than the naive approach, but the naive approach (one goroutine per edge) is just so easy to write that I use it a lot--too much. At some point I'd like to try to extract the "enqueue" approach into a library, something like:

package parallel

// A Graph holds the state for a parallel fold over a directed acyclic graph.
type Graph struct { ...leaves... }

// AddEdge records a dependency between two nodes.
func (*Graph[N Node]) AddEdge(from, to N)

// Do executes each Node's Do function in parallel postorder.
func (*Graph) Do(context.Context) error { ... enqueue, etc ... }

// A Node is an application data type representing a graph node.
// The concrete type of  node must embed a NodeState.
type Node interface {
    // Do does the computation for a node.
    // The node may assume its direct dependencies are already done.
    Do(context.Context) error

    // State returns the state of the node.
    State() *NodeState
}

// A NodeState holds the batch state of a node. Embed it within your node data type.
type NodeState[N Node] struct { ... opaque:  succs, preds, counters ... }

// Call Finished when all the AddEdge calls for this node have been made.
// (This enqueues the initial leaf nodes.)
func (s *State[N]) Finished() { ...}

xieyuschen · 2024-11-04T09:39:54Z

Your approach that resolves the leaves of DAG and enqueue them is better than mine(xieyuschen/tools@14066ba), which spawns more go routines and resolves the imports inside the spawned go routines.

unfortunately it doesn't deliver tangible gains yet

I tried to cherry-pick your change at golang/tools@70e82e013 and run test with your diff again, the data fluctuation reduces a lot and there is no second-level load operations anymore. I have seen your commit message, but I don't know how to improve it further:(

Details: - Break out of overlay loop when a match is found. This is a (theoretical) optimization. - Document optimization opportunities: one TODO, and one not worth doing. - Clarify some comments. - Use errgroup instead of WaitGroup (more concise). No intended behavior changes. Updates golang/go#70094 Change-Id: I27e34adb1176b65b4fae60e7d67bb4d2f4b19a14 Reviewed-on: https://go-review.googlesource.com/c/tools/+/624815 Reviewed-by: Robert Findley <rfindley@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Alan Donovan <adonovan@google.com>

This CL changes the concurrent recursion over the import graph so that instead of creating one goroutine per import edge (the naive approach formerly used by loadRecursive), it creates only one per package. We follow the approach used by 'enqueue' in gopls/internal/cache/analysis.go: we construct the inverse import graph, and keep a count of unfinished successors (direct dependencies). Only when this count drops to zero do we add a package to the queue. Subtle: the import graph must now be materalized not just when NeedImports, but when Syntax or Types are needed too. Also, we add a CPU-limiting semaphore around calls to the parser and type-checker. This seems to improve the BenchmarkNetHTTP by around 10% elapsed time. Before: BenchmarkNetHTTP-8 19 550622678 ns/op 17.12 MB/s 484348024 B/op 4951363 allocs/op After: BenchmarkNetHTTP-8 24 499117113 ns/op 18.89 MB/s 484340136 B/op 4949985 allocs/op Updates golang/go#70094 Change-Id: I54bf17b16238844401aba4ec5a7f6485da5e73fb Reviewed-on: https://go-review.googlesource.com/c/tools/+/624816 Commit-Queue: Alan Donovan <adonovan@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Alan Donovan <adonovan@google.com> Reviewed-by: Robert Findley <rfindley@google.com>

cagedmantis added GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Oct 31, 2024

adonovan changed the title ~~cmd/go: go list kills time until 1.0s has elapsed~~ x/tools/go/packages: Load is inefficient due to heavy contention Nov 3, 2024

adonovan removed the GoCommand cmd/go label Nov 3, 2024

adonovan self-assigned this Nov 3, 2024

gopherbot added this to the Unreleased milestone Nov 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x/tools/go/packages: Load is inefficient due to heavy contention #70094

x/tools/go/packages: Load is inefficient due to heavy contention #70094

adonovan commented Oct 29, 2024 •

edited

Loading

matloob commented Nov 1, 2024

adonovan commented Nov 1, 2024

xieyuschen commented Nov 3, 2024 •

edited

Loading

xieyuschen commented Nov 3, 2024

adonovan commented Nov 3, 2024

adonovan commented Nov 3, 2024

xieyuschen commented Nov 3, 2024

gopherbot commented Nov 3, 2024

gopherbot commented Nov 3, 2024

adonovan commented Nov 3, 2024 •

edited by gabyhelp

Loading

xieyuschen commented Nov 4, 2024

x/tools/go/packages: Load is inefficient due to heavy contention #70094

x/tools/go/packages: Load is inefficient due to heavy contention #70094

Comments

adonovan commented Oct 29, 2024 • edited Loading

matloob commented Nov 1, 2024

adonovan commented Nov 1, 2024

xieyuschen commented Nov 3, 2024 • edited Loading

'go list' has a relative low variance

'packages.Load' vs 'go list'

xieyuschen commented Nov 3, 2024

adonovan commented Nov 3, 2024

adonovan commented Nov 3, 2024

xieyuschen commented Nov 3, 2024

gopherbot commented Nov 3, 2024

gopherbot commented Nov 3, 2024

adonovan commented Nov 3, 2024 • edited by gabyhelp Loading

xieyuschen commented Nov 4, 2024

adonovan commented Oct 29, 2024 •

edited

Loading

xieyuschen commented Nov 3, 2024 •

edited

Loading

adonovan commented Nov 3, 2024 •

edited by gabyhelp

Loading