-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[internal] Don't download Go third-party dependencies multiple times #13352
[internal] Don't download Go third-party dependencies multiple times #13352
Conversation
… times # Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
94bea5b
to
a2646a2
Compare
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended. [ci skip-rust] # Building wheels and fs_util will be skipped. Delete if not intended. [ci skip-build-wheels]
"-json", | ||
# This matches all packages. `all` only matches first-party packages and complains that | ||
# there are no `.go` files. | ||
"...", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this will need to be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically the use of "..."
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? It's very intentional to be using ...
as we discussed over DM. all
isn't doing what we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on that discussion, I thought we were going to go with running go list -m
and the analysis of each package in a single Process
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't think there's any benefit to doing that. I found in testing that go list path/to/pkg/...
still has the exact same failure with Helm that we're seeing with directly running go list '...'
all_digest_subset_gets = [] | ||
all_pkg_info_kwargs = [] | ||
for pkg_json in ijson.items(list_result.stdout, "", multiple_values=True): | ||
if "Standard" in pkg_json: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also needs to check whether Standard
has the value true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, the key only shows up if it's set to True
. But sure, can fix for completeness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can fix in followups, as this is mostly a style thing and I want to kick off the cherry-pick
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
go 1.17 | ||
go 1.16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the version change intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, yeah :/ Go 1.17 made a change that the go.mod
expects you to list all indirect (transitive) dependencies in go.mod
with a require
clause, whereas 1.16 only listed direct deps.
When we were using go mod download all
, it didn't complain if go.mod
left off some indirect dependencies. Now, go list ...
does complain, but with a very unhelpful message saying "Invalid go.mod!" and no instructions why it's invalid.
Their advice is to run go mod tidy
to update your go.mod
, but that works by analyzing your first-party code to see what is used, as it removes any third-party deps you don't actually consume. Our integration tests have no first-party code, so go mod tidy
removes it all. Relates to #13136 I think.
I gave up trying to get the magical incantation for this test to work with Go 1.17+. I spent a lot of time yesterday trying to figure it out 😅
One of my remaining TODOs for the Go milestone is to try to improve our error messages so that it's easier to figure this out. I'd like to revert this test to 1.17 as part of that.
async def download_and_analyze_third_party_packages( | ||
request: AllThirdPartyPackagesRequest, | ||
) -> AllThirdPartyPackages: | ||
# NB: We download all modules to GOPATH=$(pwd)/gopath. Running `go list ...` from $(pwd) would |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Love the comments in here.
…antsbuild#13352) Turns out that pantsbuild#13339 didn't actually work - we are redownloading the same modules several times with Go! Downloads happen when: 1. determining `GoModInfo` (once per `go.mod`) 2. `AllDownloadedModules` (once per `go.mod`) 3. Determining metadata for each third-party package (once per third-party module) 4. Determining metadata for each first-party package (once per first-party package/directory) This PR fixes it so that we only download modules a single time, once per `go.mod`. To fix this, we stop treating third-party modules like first-party modules, i.e. we stop `cd`-ing into its downloaded directory and running `go list` directly in it, using its own `go.mod` and `go.sum`. That requires that the chroot has all of the module's transitive dependencies present, and it also resulted in issues like pantsbuild#13138. Instead, the much simpler thing to do is run `go list '...'` to do all third-party analysis in a single swoop. That gets us all the analysis we need. We also extract the relevant `.go` files from all of the downloaded `GOPATH`, i.e. all the downloaded modules. For compilation, all we need is the `.go` files + the metadata we had earlier calculated. Compilation doesn't need access to anything else like other package's. For first-party analysis, we copy the whole `GOPATH` into the chroot. (This is really slow! We need something like pantsbuild#12716 to fix this.) ## Benchmark Running in https://github.com/toolchainlabs/remote-api-tools. Before: ``` ❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::' Time (mean ± σ): 36.467 s ± 0.603 s [User: 41.109 s, System: 38.095 s] Range (min … max): 35.518 s … 37.137 s 5 runs ``` Fixing only third-party analysis: ``` ❯ hyperfine -r 5 --show-output './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::' Time (mean ± σ): 29.880 s ± 0.901 s [User: 29.564 s, System: 15.281 s] Range (min … max): 28.835 s … 31.312 s 5 runs ``` Fixing everything: ``` ❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::' Time (mean ± σ): 26.633 s ± 2.283 s [User: 24.115 s, System: 30.453 s] Range (min … max): 24.570 s … 30.037 s 5 runs ``` [ci skip-rust] [ci skip-build-wheels]
…(Cherry-pick of #13352) (#13378) Turns out that #13339 didn't actually work - we are redownloading the same modules several times with Go! Downloads happen when: 1. determining `GoModInfo` (once per `go.mod`) 2. `AllDownloadedModules` (once per `go.mod`) 3. Determining metadata for each third-party package (once per third-party module) 4. Determining metadata for each first-party package (once per first-party package/directory) This PR fixes it so that we only download modules a single time, once per `go.mod`. To fix this, we stop treating third-party modules like first-party modules, i.e. we stop `cd`-ing into its downloaded directory and running `go list` directly in it, using its own `go.mod` and `go.sum`. That requires that the chroot has all of the module's transitive dependencies present, and it also resulted in issues like #13138. Instead, the much simpler thing to do is run `go list '...'` to do all third-party analysis in a single swoop. That gets us all the analysis we need. We also extract the relevant `.go` files from all of the downloaded `GOPATH`, i.e. all the downloaded modules. For compilation, all we need is the `.go` files + the metadata we had earlier calculated. Compilation doesn't need access to anything else like other package's. For first-party analysis, we copy the whole `GOPATH` into the chroot. (This is really slow! We need something like #12716 to fix this.) ## Benchmark Running in https://github.com/toolchainlabs/remote-api-tools. Before: ``` ❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::' Time (mean ± σ): 36.467 s ± 0.603 s [User: 41.109 s, System: 38.095 s] Range (min … max): 35.518 s … 37.137 s 5 runs ``` Fixing only third-party analysis: ``` ❯ hyperfine -r 5 --show-output './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::' Time (mean ± σ): 29.880 s ± 0.901 s [User: 29.564 s, System: 15.281 s] Range (min … max): 28.835 s … 31.312 s 5 runs ``` Fixing everything: ``` ❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::' Time (mean ± σ): 26.633 s ± 2.283 s [User: 24.115 s, System: 30.453 s] Range (min … max): 24.570 s … 30.037 s 5 runs ``` [ci skip-rust] [ci skip-build-wheels]
Turns out that #13339 didn't actually work - we are redownloading the same modules several times with Go! Downloads happen when:
GoModInfo
(once pergo.mod
)AllDownloadedModules
(once pergo.mod
)This PR fixes it so that we only download modules a single time, once per
go.mod
.To fix this, we stop treating third-party modules like first-party modules, i.e. we stop
cd
-ing into its downloaded directory and runninggo list
directly in it, using its owngo.mod
andgo.sum
. That requires that the chroot has all of the module's transitive dependencies present, and it also resulted in issues like #13138. Instead, the much simpler thing to do is rungo list '...'
to do all third-party analysis in a single swoop. That gets us all the analysis we need.We also extract the relevant
.go
files from all of the downloadedGOPATH
, i.e. all the downloaded modules. For compilation, all we need is the.go
files + the metadata we had earlier calculated. Compilation doesn't need access to anything else like other package's.For first-party analysis, we copy the whole
GOPATH
into the chroot. (This is really slow! We need something like #12716 to fix this.)Benchmark
Running in https://github.com/toolchainlabs/remote-api-tools.
Before:
Fixing only third-party analysis:
Fixing everything:
[ci skip-rust]
[ci skip-build-wheels]