Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/go/packages: Load is inefficient due to heavy contention #70094

Open
adonovan opened this issue Oct 29, 2024 · 11 comments
Open

x/tools/go/packages: Load is inefficient due to heavy contention #70094

adonovan opened this issue Oct 29, 2024 · 11 comments
Assignees
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@adonovan
Copy link
Member

adonovan commented Oct 29, 2024

See #70078 (comment): for the same workload, the go list command sometimes runs very quickly (70ms) and other times waits for a conspicuously round number (1.0s) of elapsed time before it exits. I vaguely recall that this was intentional behavior to prevent file system time quantization problems. Is it always necessary? If so, when and why? Are there ways to avoid it?

@matloob @samthanawalla

@cagedmantis cagedmantis added GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Oct 31, 2024
@matloob
Copy link
Contributor

matloob commented Nov 1, 2024

I'm not sure what's causing this, but I can do some debugging to try to figure out where this is happening.

Is this happening on all platforms or only some? I tried running the tests on golang/tools@70e82e013 (the commit before the fix for #70078 was submitted) running a single test case at a time and logging the time spent in the go list driver but it all seemed to be in the 70-100ms range for me.

@adonovan
Copy link
Member Author

adonovan commented Nov 1, 2024

I'm using a Mac, whose file system is slower than Linux. Try again with this patch, which adds 8-way parallelism, and logs the time of each go list command:

diff --git a/go/ssa/builder_test.go b/go/ssa/builder_test.go
index a38186dfa..8368bcc9b 100644
--- a/go/ssa/builder_test.go
+++ b/go/ssa/builder_test.go
@@ -701,6 +701,10 @@ func TestTypeparamTest(t *testing.T) {
                        continue // Consider standalone go files.
                }
                t.Run(entry.Name(), func(t *testing.T) {
+                       t.Parallel()
+                       iolimit <- struct{}{}        // acquire a token
+                       defer func() { <-iolimit }() // release a token
+
                        src, err := fs.ReadFile(fsys, entry.Name())
                        if err != nil {
                                t.Fatal(err)
@@ -718,6 +722,8 @@ func TestTypeparamTest(t *testing.T) {
        }
 }
 
+var iolimit = make(chan struct{}, 8) // concurrency limiting counting semaphore
+
 // TestOrderOfOperations ensures order of operations are as intended.
 func TestOrderOfOperations(t *testing.T) {
        // Testing for the order of operations within an expression is done

diff --git a/go/ssa/testutil_test.go b/go/ssa/testutil_test.go
index 58680b282..e03935fa7 100644
--- a/go/ssa/testutil_test.go
+++ b/go/ssa/testutil_test.go
@@ -11,9 +11,11 @@ import (
        "go/parser"
        "go/token"
        "io/fs"
+       "log"
        "os"
        "testing"
        "testing/fstest"
+       "time"
 
        "golang.org/x/tools/go/packages"
        "golang.org/x/tools/go/ssa"
@@ -89,7 +91,9 @@ func loadPackages(t testing.TB, src fs.FS, patterns ...string) []*packages.Packa
                        "GOWORK=off",
                        "GOPROXY=off"),
        }
+       t0 := time.Now()
        pkgs, err := packages.Load(cfg, patterns...)
+       log.Printf("loadPackages %v = %v", patterns, time.Since(t0))
        if err != nil {
                t.Fatal(err)
        }
macOS$ go test -v  ./go/ssa/ -run=TestTypeparamTest

Typical results:

...
=== CONT  TestTypeparamTest/issue50147.go
    builder_test.go:718: Input: issue50147.go
2024/11/01 18:03:36 loadPackages [main] = 2.892153834s
2024/11/01 18:03:36 loadPackages [p] = 94.128958ms
=== CONT  TestTypeparamTest/issue50121b.go
    builder_test.go:718: Input: issue50121b.go
=== CONT  TestTypeparamTest/issue50121.go
    builder_test.go:718: Input: issue50121.go
2024/11/01 18:03:36 loadPackages [ignored] = 126.790666ms
=== CONT  TestTypeparamTest/issue50109b.go
    builder_test.go:718: Input: issue50109b.go
2024/11/01 18:03:36 loadPackages [ignored] = 188.077708ms
=== CONT  TestTypeparamTest/issue50109.go
    builder_test.go:718: Input: issue50109.go
2024/11/01 18:03:37 loadPackages [main] = 168.551041ms
=== CONT  TestTypeparamTest/issue50002.go
    builder_test.go:718: Input: issue50002.go
2024/11/01 18:03:37 loadPackages [main] = 2.9195375s
=== CONT  TestTypeparamTest/issue49893.go
    builder_test.go:718: Input: issue49893.go
2024/11/01 18:03:37 loadPackages [main] = 2.511128708s
=== CONT  TestTypeparamTest/issue51836.go
    builder_test.go:718: Input: issue51836.go
2024/11/01 18:03:37 loadPackages [ignored] = 96.019ms
=== CONT  TestTypeparamTest/issue49875.go
    builder_test.go:718: Input: issue49875.go
2024/11/01 18:03:37 loadPackages [ignored] = 97.62825ms
=== CONT  TestTypeparamTest/issue51832.go
    builder_test.go:718: Input: issue51832.go
2024/11/01 18:03:37 loadPackages [p] = 116.755166ms
=== CONT  TestTypeparamTest/issue49667.go
    builder_test.go:718: Input: issue49667.go
2024/11/01 18:03:37 loadPackages [main] = 2.507203417s
2024/11/01 18:03:37 loadPackages [main] = 141.188791ms
=== CONT  TestTypeparamTest/issue49659b.go
    builder_test.go:718: Input: issue49659b.go
2024/11/01 18:03:37 loadPackages [ignored] = 126.474333ms
=== CONT  TestTypeparamTest/issue51765.go
    builder_test.go:718: Input: issue51765.go
=== CONT  TestTypeparamTest/issue49659.go
    builder_test.go:718: Input: issue49659.go
2024/11/01 18:03:38 loadPackages [p] = 175.450167ms
...

The results are less clearly bimodal than the other day, though.

@xieyuschen
Copy link
Contributor

xieyuschen commented Nov 3, 2024

FYI @adonovan @matloob , I tried alan's approach in my ubuntu 20.04 with kernel 6.8.0-48-generic. It's more likely the packages.Load kills time instead of go.

'go list' has a relative low variance

=== CONT  TestGoListSpeed/#285
=== NAME  TestGoListSpeed/#293
    main_test.go:22: 931.67855ms
=== CONT  TestGoListSpeed/#284
=== NAME  TestGoListSpeed/#291
    main_test.go:22: 909.82343ms
=== CONT  TestGoListSpeed/#283
=== NAME  TestGoListSpeed/#290
    main_test.go:22: 959.312539ms
=== CONT  TestGoListSpeed/#282
=== NAME  TestGoListSpeed/#289
    main_test.go:22: 839.518395ms
=== CONT  TestGoListSpeed/#281
=== NAME  TestGoListSpeed/#288
    main_test.go:22: 962.895681ms
=== CONT  TestGoListSpeed/#280
=== NAME  TestGoListSpeed/#287
    main_test.go:22: 911.338394ms
=== CONT  TestGoListSpeed/#279
=== NAME  TestGoListSpeed/#285
    main_test.go:22: 820.664181ms
Click to see the test code
var iolimits = make(chan struct{}, 8)

func TestGoListSpeed(t *testing.T) {
	for i := 0; i < 1_000; i++ {
		t.Run("", func(t *testing.T) {
			t.Parallel()
			iolimits <- struct{}{}
			defer func() {
				<-iolimits
			}()
			cmd := exec.Command("go", "list", "std")
			t0 := time.Now()
			err := cmd.Run()
			t.Log(time.Since(t0))
			if err != nil {
				t.Fatal(err)
			}
		})
	}
}

'packages.Load' vs 'go list'

Add new lines on the top of alan's diff:

@@ -89,13 +92,24 @@ func loadPackages(t testing.TB, src fs.FS, patterns ...string) []*packages.Packa
        if packages.PrintErrors(pkgs) > 0 {
                t.Fatal("there were errors")
        }
+       cmd := exec.Command("go", append([]string{"list"}, patterns...)...)
+       cmd.Dir = dir
+       t1 := time.Now()
+       err = cmd.Run()
+       if err != nil {
+               t.Fatal(err)
+       }
+       log.Printf("go list %v = %v", patterns, time.Since(t1))
=== CONT  TestTypeparamTest/issue51522a.go
    builder_test.go:717: Input: issue51522a.go
2024/11/03 17:57:09 loadPackages [main] = 291.218732ms
2024/11/03 17:57:09 loadPackages [main] = 329.429631ms
2024/11/03 17:57:09 go list [main] = 106.651425ms

=== CONT  TestTypeparamTest/issue51521.go
    builder_test.go:717: Input: issue51521.go
2024/11/03 17:57:09 loadPackages [main] = 360.640229ms
2024/11/03 17:57:09 go list [main] = 37.53786ms
2024/11/03 17:57:10 loadPackages [main] = 314.162804ms
2024/11/03 17:57:10 go list [main] = 82.010451ms

=== CONT  TestTypeparamTest/issue51355.go
    builder_test.go:717: Input: issue51355.go
2024/11/03 17:57:11 loadPackages [main] = 331.424779ms
2024/11/03 17:57:11 go list [main] = 59.669024ms

=== CONT  TestTypeparamTest/ifaceconv.go
    builder_test.go:717: Input: ifaceconv.go
2024/11/03 17:57:11 loadPackages [main] = 3.683294694s
2024/11/03 17:57:11 go list [main] = 101.648646ms

=== CONT  TestTypeparamTest/issue48645a.go
    builder_test.go:717: Input: issue48645a.go
2024/11/03 17:57:11 loadPackages [main] = 3.485840972s
2024/11/03 17:57:11 loadPackages [main] = 3.142638313s
2024/11/03 17:57:11 go list [main] = 83.376119ms

@xieyuschen
Copy link
Contributor

I found the synchronization sync.(*WaitGroup).Wait and sync.(*Mutex).Lock inside packages.(*loader).loadRecursive takes a lot of time.

go test -v  ./go/ssa/ -run=TestTypeparamTest -trace trace.out
go tool trace trace.out

block

@adonovan
Copy link
Member Author

adonovan commented Nov 3, 2024

Thanks for the data. I cannot now remember whether in my original observation of the problem in #70078 I was measuring the "go list" execution directly, or the packages.Load call (as in my diff above). I will measure again just the "go list" executions within packages.Load to confirm whether the effect is real or just an artifact of using packages.Load. (BTW, by "killing time" I meant something more specific: logic to intentionally sleep for a second to avoid clock quantization problems. I vaguely recall we added this to the go command.)

I'm not surprised that the packages.Load should block for a long time in Lock and Wait as it starts many goroutines that all immediately block waiting for their predecessors. It would reduce contention to build the waits-for graph and only enqueue a task once it predecessors are complete, like we do for gopls' analysis). I should certainly fix that in any case.

@adonovan
Copy link
Member Author

adonovan commented Nov 3, 2024

Measuring again with this additional patch to print raw go list times:

diff --git a/internal/gocommand/invoke.go b/internal/gocommand/invoke.go
index e333efc87..c536f60eb 100644
--- a/internal/gocommand/invoke.go
+++ b/internal/gocommand/invoke.go
@@ -376,6 +376,7 @@ func runCmdContext(ctx context.Context, cmd *exec.Cmd) (err error) {
        go func() {
                resChan <- cmd.Wait()
+               log.Printf("Command %s took %v", cmd, time.Since(startTime).Microseconds())
        }()

I observe these statistics:

		50%	75%	90%	95%	99%
packages.Load	197ms	1.26s	2.15s	2.37s	2.84s
go list		157ms	180ms	215ms	228ms	252ms

It does indeed like like packages.Load is to blame. Thanks again for the suggestion.

@adonovan adonovan changed the title cmd/go: go list kills time until 1.0s has elapsed x/tools/go/packages: Load is inefficient due to heavy contention Nov 3, 2024
@adonovan adonovan removed the GoCommand cmd/go label Nov 3, 2024
@adonovan adonovan self-assigned this Nov 3, 2024
@gopherbot gopherbot added this to the Unreleased milestone Nov 3, 2024
@xieyuschen
Copy link
Contributor

@adonovan understood, thanks for the update.

It would reduce contention to build the waits-for graph and only enqueue a task once it predecessors are complete, like we do for gopls' analysis). I should certainly fix that in any case.

Would you mind if I add an enqueue functionality to package.Load? I wrote a draft and could pass the existing tests, but some refinements and tidy up are needed and I will do it tomorrow. By the way, the design in gopls is really a good example.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/624815 mentions this issue: go/packages: minor cleanups to loader.parseFiles

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/624816 mentions this issue: go/packages: create fewer goroutines

@adonovan
Copy link
Member Author

adonovan commented Nov 3, 2024

Would you mind if I add an enqueue functionality to package.Load? I wrote a draft and could pass the existing tests, but some refinements and tidy up are needed and I will do it tomorrow.

Thanks for the offer. I had already put together a quick implementation before I saw your note, but unfortunately it doesn't deliver tangible gains yet. I've shared it in CL 624816. If you'd like to take it over, be my guest.

By the way, the design in gopls is really a good example.

Thanks. It is potentially much more efficient than the naive approach, but the naive approach (one goroutine per edge) is just so easy to write that I use it a lot--too much. At some point I'd like to try to extract the "enqueue" approach into a library, something like:

package parallel

// A Graph holds the state for a parallel fold over a directed acyclic graph.
type Graph struct { ...leaves... }

// AddEdge records a dependency between two nodes.
func (*Graph[N Node]) AddEdge(from, to N)

// Do executes each Node's Do function in parallel postorder.
func (*Graph) Do(context.Context) error { ... enqueue, etc ... }

// A Node is an application data type representing a graph node.
// The concrete type of  node must embed a NodeState.
type Node interface {
    // Do does the computation for a node.
    // The node may assume its direct dependencies are already done.
    Do(context.Context) error

    // State returns the state of the node.
    State() *NodeState
}

// A NodeState holds the batch state of a node. Embed it within your node data type.
type NodeState[N Node] struct { ... opaque:  succs, preds, counters ... }

// Call Finished when all the AddEdge calls for this node have been made.
// (This enqueues the initial leaf nodes.)
func (s *State[N]) Finished() { ...}

@xieyuschen
Copy link
Contributor

Your approach that resolves the leaves of DAG and enqueue them is better than mine(xieyuschen/tools@14066ba), which spawns more go routines and resolves the imports inside the spawned go routines.

unfortunately it doesn't deliver tangible gains yet

I tried to cherry-pick your change at golang/tools@70e82e013 and run test with your diff again, the data fluctuation reduces a lot and there is no second-level load operations anymore. I have seen your commit message, but I don't know how to improve it further:(

gopherbot pushed a commit to golang/tools that referenced this issue Nov 4, 2024
Details:
- Break out of overlay loop when a match is found.
  This is a (theoretical) optimization.
- Document optimization opportunities: one TODO,
  and one not worth doing.
- Clarify some comments.
- Use errgroup instead of WaitGroup (more concise).

No intended behavior changes.

Updates golang/go#70094

Change-Id: I27e34adb1176b65b4fae60e7d67bb4d2f4b19a14
Reviewed-on: https://go-review.googlesource.com/c/tools/+/624815
Reviewed-by: Robert Findley <rfindley@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Alan Donovan <adonovan@google.com>
gopherbot pushed a commit to golang/tools that referenced this issue Nov 5, 2024
This CL changes the concurrent recursion over the import
graph so that instead of creating one goroutine per import
edge (the naive approach formerly used by loadRecursive),
it creates only one per package.

We follow the approach used by 'enqueue' in
gopls/internal/cache/analysis.go: we construct the
inverse import graph, and keep a count of unfinished
successors (direct dependencies). Only when this
count drops to zero do we add a package to the
queue.

Subtle: the import graph must now be materalized not
just when NeedImports, but when Syntax or Types are
needed too.

Also, we add a CPU-limiting semaphore around calls to
the parser and type-checker.

This seems to improve the BenchmarkNetHTTP by around 10% elapsed time.

Before:
BenchmarkNetHTTP-8   	      19	 550622678 ns/op	  17.12 MB/s	484348024 B/op	 4951363 allocs/op
After:
BenchmarkNetHTTP-8   	      24	 499117113 ns/op	  18.89 MB/s	484340136 B/op	 4949985 allocs/op

Updates golang/go#70094

Change-Id: I54bf17b16238844401aba4ec5a7f6485da5e73fb
Reviewed-on: https://go-review.googlesource.com/c/tools/+/624816
Commit-Queue: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants