[internal] Don't download Go third-party dependencies multiple times #13352

Eric-Arellano · 2021-10-25T21:46:20Z

Turns out that #13339 didn't actually work - we are redownloading the same modules several times with Go! Downloads happen when:

determining GoModInfo (once per go.mod)
AllDownloadedModules (once per go.mod)
Determining metadata for each third-party package (once per third-party module)
Determining metadata for each first-party package (once per first-party package/directory)

This PR fixes it so that we only download modules a single time, once per go.mod.

To fix this, we stop treating third-party modules like first-party modules, i.e. we stop cd-ing into its downloaded directory and running go list directly in it, using its own go.mod and go.sum. That requires that the chroot has all of the module's transitive dependencies present, and it also resulted in issues like #13138. Instead, the much simpler thing to do is run go list '...' to do all third-party analysis in a single swoop. That gets us all the analysis we need.

We also extract the relevant .go files from all of the downloaded GOPATH, i.e. all the downloaded modules. For compilation, all we need is the .go files + the metadata we had earlier calculated. Compilation doesn't need access to anything else like other package's.

For first-party analysis, we copy the whole GOPATH into the chroot. (This is really slow! We need something like #12716 to fix this.)

Benchmark

Running in https://github.com/toolchainlabs/remote-api-tools.

Before:

❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     36.467 s ±  0.603 s    [User: 41.109 s, System: 38.095 s]
  Range (min … max):   35.518 s … 37.137 s    5 runs

Fixing only third-party analysis:

❯ hyperfine -r 5 --show-output './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     29.880 s ±  0.901 s    [User: 29.564 s, System: 15.281 s]
  Range (min … max):   28.835 s … 31.312 s    5 runs

Fixing everything:

❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     26.633 s ±  2.283 s    [User: 24.115 s, System: 30.453 s]
  Range (min … max):   24.570 s … 30.037 s    5 runs

[ci skip-rust]
[ci skip-build-wheels]

… times # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

tdyas · 2021-10-27T13:25:10Z

src/python/pants/backend/go/util_rules/third_party_pkg.py

+        "-json",
+        # This matches all packages. `all` only matches first-party packages and complains that
+        # there are no `.go` files.
+        "...",


I assume this will need to be changed?

Specifically the use of "...".

Why? It's very intentional to be using ... as we discussed over DM. all isn't doing what we want.

Based on that discussion, I thought we were going to go with running go list -m and the analysis of each package in a single Process?

No, I don't think there's any benefit to doing that. I found in testing that go list path/to/pkg/... still has the exact same failure with Helm that we're seeing with directly running go list '...'

tdyas · 2021-10-27T13:25:46Z

src/python/pants/backend/go/util_rules/third_party_pkg.py

+    all_digest_subset_gets = []
+    all_pkg_info_kwargs = []
+    for pkg_json in ijson.items(list_result.stdout, "", multiple_values=True):
+        if "Standard" in pkg_json:


This also needs to check whether Standard has the value true.

In practice, the key only shows up if it's set to True. But sure, can fix for completeness

Can fix in followups, as this is mostly a style thing and I want to kick off the cherry-pick

stuhood

Thanks!

stuhood · 2021-10-27T19:26:26Z

src/python/pants/backend/go/util_rules/first_party_pkg_test.py

-                go 1.17
+                go 1.16


Is the version change intended?

Unfortunately, yeah :/ Go 1.17 made a change that the go.mod expects you to list all indirect (transitive) dependencies in go.mod with a require clause, whereas 1.16 only listed direct deps.

When we were using go mod download all, it didn't complain if go.mod left off some indirect dependencies. Now, go list ... does complain, but with a very unhelpful message saying "Invalid go.mod!" and no instructions why it's invalid.

Their advice is to run go mod tidy to update your go.mod, but that works by analyzing your first-party code to see what is used, as it removes any third-party deps you don't actually consume. Our integration tests have no first-party code, so go mod tidy removes it all. Relates to #13136 I think.

I gave up trying to get the magical incantation for this test to work with Go 1.17+. I spent a lot of time yesterday trying to figure it out 😅

One of my remaining TODOs for the Go milestone is to try to improve our error messages so that it's easier to figure this out. I'd like to revert this test to 1.17 as part of that.

stuhood · 2021-10-27T19:27:52Z

src/python/pants/backend/go/util_rules/third_party_pkg.py

+async def download_and_analyze_third_party_packages(
+    request: AllThirdPartyPackagesRequest,
+) -> AllThirdPartyPackages:
+    # NB: We download all modules to GOPATH=$(pwd)/gopath. Running `go list ...` from $(pwd) would


👍

Love the comments in here.

…antsbuild#13352) Turns out that pantsbuild#13339 didn't actually work - we are redownloading the same modules several times with Go! Downloads happen when: 1. determining `GoModInfo` (once per `go.mod`) 2. `AllDownloadedModules` (once per `go.mod`) 3. Determining metadata for each third-party package (once per third-party module) 4. Determining metadata for each first-party package (once per first-party package/directory) This PR fixes it so that we only download modules a single time, once per `go.mod`. To fix this, we stop treating third-party modules like first-party modules, i.e. we stop `cd`-ing into its downloaded directory and running `go list` directly in it, using its own `go.mod` and `go.sum`. That requires that the chroot has all of the module's transitive dependencies present, and it also resulted in issues like pantsbuild#13138. Instead, the much simpler thing to do is run `go list '...'` to do all third-party analysis in a single swoop. That gets us all the analysis we need. We also extract the relevant `.go` files from all of the downloaded `GOPATH`, i.e. all the downloaded modules. For compilation, all we need is the `.go` files + the metadata we had earlier calculated. Compilation doesn't need access to anything else like other package's. For first-party analysis, we copy the whole `GOPATH` into the chroot. (This is really slow! We need something like pantsbuild#12716 to fix this.) ## Benchmark Running in https://github.com/toolchainlabs/remote-api-tools. Before: ``` ❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::' Time (mean ± σ): 36.467 s ± 0.603 s [User: 41.109 s, System: 38.095 s] Range (min … max): 35.518 s … 37.137 s 5 runs ``` Fixing only third-party analysis: ``` ❯ hyperfine -r 5 --show-output './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::' Time (mean ± σ): 29.880 s ± 0.901 s [User: 29.564 s, System: 15.281 s] Range (min … max): 28.835 s … 31.312 s 5 runs ``` Fixing everything: ``` ❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::' Time (mean ± σ): 26.633 s ± 2.283 s [User: 24.115 s, System: 30.453 s] Range (min … max): 24.570 s … 30.037 s 5 runs ``` [ci skip-rust] [ci skip-build-wheels]

…(Cherry-pick of #13352) (#13378) Turns out that #13339 didn't actually work - we are redownloading the same modules several times with Go! Downloads happen when: 1. determining `GoModInfo` (once per `go.mod`) 2. `AllDownloadedModules` (once per `go.mod`) 3. Determining metadata for each third-party package (once per third-party module) 4. Determining metadata for each first-party package (once per first-party package/directory) This PR fixes it so that we only download modules a single time, once per `go.mod`. To fix this, we stop treating third-party modules like first-party modules, i.e. we stop `cd`-ing into its downloaded directory and running `go list` directly in it, using its own `go.mod` and `go.sum`. That requires that the chroot has all of the module's transitive dependencies present, and it also resulted in issues like #13138. Instead, the much simpler thing to do is run `go list '...'` to do all third-party analysis in a single swoop. That gets us all the analysis we need. We also extract the relevant `.go` files from all of the downloaded `GOPATH`, i.e. all the downloaded modules. For compilation, all we need is the `.go` files + the metadata we had earlier calculated. Compilation doesn't need access to anything else like other package's. For first-party analysis, we copy the whole `GOPATH` into the chroot. (This is really slow! We need something like #12716 to fix this.) ## Benchmark Running in https://github.com/toolchainlabs/remote-api-tools. Before: ``` ❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::' Time (mean ± σ): 36.467 s ± 0.603 s [User: 41.109 s, System: 38.095 s] Range (min … max): 35.518 s … 37.137 s 5 runs ``` Fixing only third-party analysis: ``` ❯ hyperfine -r 5 --show-output './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::' Time (mean ± σ): 29.880 s ± 0.901 s [User: 29.564 s, System: 15.281 s] Range (min … max): 28.835 s … 31.312 s 5 runs ``` Fixing everything: ``` ❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::' Time (mean ± σ): 26.633 s ± 2.283 s [User: 24.115 s, System: 30.453 s] Range (min … max): 24.570 s … 30.037 s 5 runs ``` [ci skip-rust] [ci skip-build-wheels]

Eric-Arellano added 2 commits October 26, 2021 15:16

[internal] Make explicit that we're redownloading Go modules multiple…

a3f31eb

… times # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

Redesign third-party dependency to not keep downloading the same modules

a2646a2

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

Eric-Arellano force-pushed the actually-disable-proxy branch from 94bea5b to a2646a2 Compare October 26, 2021 23:25

Eric-Arellano added 2 commits October 26, 2021 20:26

Better error message if Dir is not defined

8860a2a

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

Add regression test for pantsbuild#13138

5271ea6

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

Eric-Arellano changed the title ~~[internal] Make explicit that we're redownloading Go modules multiple times~~ [internal] Don't redownload Go third-party dependencies when analyzing each third-party module Oct 27, 2021

Eric-Arellano marked this pull request as ready for review October 27, 2021 04:31

Eric-Arellano requested review from tdyas and stuhood October 27, 2021 04:31

Eric-Arellano mentioned this pull request Oct 27, 2021

[wip] Fix Go not building third-party modules with a replace directive #13349

Closed

Eric-Arellano added 3 commits October 26, 2021 21:52

Clean up some incomplete things

a05f687

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

Store GOPATH digest on AllThirdPartyPackages

e01c8a7

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

Fix first-party package to not download too

a380172

# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]

Eric-Arellano changed the title ~~[internal] Don't redownload Go third-party dependencies when analyzing each third-party module~~ [internal] Don't download Go third-party dependencies multiple times Oct 27, 2021

tdyas reviewed Oct 27, 2021

View reviewed changes

tdyas approved these changes Oct 27, 2021

View reviewed changes

stuhood approved these changes Oct 27, 2021

View reviewed changes

Eric-Arellano merged commit 07a308e into pantsbuild:main Oct 27, 2021

Eric-Arellano deleted the actually-disable-proxy branch October 27, 2021 19:38

This was referenced Oct 27, 2021

go: failure to handle an import path replacement directive #13138

Closed

Go: avoid go.sum changes invalidating all processes using external modules #13093

Closed

Eric-Arellano mentioned this pull request Jan 4, 2022

Go: consider redesigning third-party metadata to only be what's used #14071

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[internal] Don't download Go third-party dependencies multiple times #13352

[internal] Don't download Go third-party dependencies multiple times #13352

Eric-Arellano commented Oct 25, 2021 •

edited

Loading

tdyas Oct 27, 2021

tdyas Oct 27, 2021

Eric-Arellano Oct 27, 2021

tdyas Oct 27, 2021

Eric-Arellano Oct 27, 2021

tdyas Oct 27, 2021

Eric-Arellano Oct 27, 2021

Eric-Arellano Oct 27, 2021

stuhood left a comment

stuhood Oct 27, 2021

Eric-Arellano Oct 27, 2021

stuhood Oct 27, 2021

[internal] Don't download Go third-party dependencies multiple times #13352

[internal] Don't download Go third-party dependencies multiple times #13352

Conversation

Eric-Arellano commented Oct 25, 2021 • edited Loading

Benchmark

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stuhood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Eric-Arellano commented Oct 25, 2021 •

edited

Loading