Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[internal] Don't download Go third-party dependencies multiple times #13352

Merged
merged 7 commits into from
Oct 27, 2021

Conversation

Eric-Arellano
Copy link
Contributor

@Eric-Arellano Eric-Arellano commented Oct 25, 2021

Turns out that #13339 didn't actually work - we are redownloading the same modules several times with Go! Downloads happen when:

  1. determining GoModInfo (once per go.mod)
  2. AllDownloadedModules (once per go.mod)
  3. Determining metadata for each third-party package (once per third-party module)
  4. Determining metadata for each first-party package (once per first-party package/directory)

This PR fixes it so that we only download modules a single time, once per go.mod.

To fix this, we stop treating third-party modules like first-party modules, i.e. we stop cd-ing into its downloaded directory and running go list directly in it, using its own go.mod and go.sum. That requires that the chroot has all of the module's transitive dependencies present, and it also resulted in issues like #13138. Instead, the much simpler thing to do is run go list '...' to do all third-party analysis in a single swoop. That gets us all the analysis we need.

We also extract the relevant .go files from all of the downloaded GOPATH, i.e. all the downloaded modules. For compilation, all we need is the .go files + the metadata we had earlier calculated. Compilation doesn't need access to anything else like other package's.

For first-party analysis, we copy the whole GOPATH into the chroot. (This is really slow! We need something like #12716 to fix this.)

Benchmark

Running in https://github.com/toolchainlabs/remote-api-tools.

Before:

❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     36.467 s ±  0.603 s    [User: 41.109 s, System: 38.095 s]
  Range (min … max):   35.518 s … 37.137 s    5 runs

Fixing only third-party analysis:

❯ hyperfine -r 5 --show-output './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     29.880 s ±  0.901 s    [User: 29.564 s, System: 15.281 s]
  Range (min … max):   28.835 s … 31.312 s    5 runs

Fixing everything:

❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     26.633 s ±  2.283 s    [User: 24.115 s, System: 30.453 s]
  Range (min … max):   24.570 s … 30.037 s    5 runs

[ci skip-rust]
[ci skip-build-wheels]

… times

# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@Eric-Arellano Eric-Arellano changed the title [internal] Make explicit that we're redownloading Go modules multiple times [internal] Don't redownload Go third-party dependencies when analyzing each third-party module Oct 27, 2021
@Eric-Arellano Eric-Arellano marked this pull request as ready for review October 27, 2021 04:31
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
# Rust tests and lints will be skipped. Delete if not intended.
[ci skip-rust]

# Building wheels and fs_util will be skipped. Delete if not intended.
[ci skip-build-wheels]
@Eric-Arellano Eric-Arellano changed the title [internal] Don't redownload Go third-party dependencies when analyzing each third-party module [internal] Don't download Go third-party dependencies multiple times Oct 27, 2021
"-json",
# This matches all packages. `all` only matches first-party packages and complains that
# there are no `.go` files.
"...",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this will need to be changed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically the use of "...".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? It's very intentional to be using ... as we discussed over DM. all isn't doing what we want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on that discussion, I thought we were going to go with running go list -m and the analysis of each package in a single Process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think there's any benefit to doing that. I found in testing that go list path/to/pkg/... still has the exact same failure with Helm that we're seeing with directly running go list '...'

all_digest_subset_gets = []
all_pkg_info_kwargs = []
for pkg_json in ijson.items(list_result.stdout, "", multiple_values=True):
if "Standard" in pkg_json:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to check whether Standard has the value true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, the key only shows up if it's set to True. But sure, can fix for completeness

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can fix in followups, as this is mostly a style thing and I want to kick off the cherry-pick

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

go 1.17
go 1.16
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the version change intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, yeah :/ Go 1.17 made a change that the go.mod expects you to list all indirect (transitive) dependencies in go.mod with a require clause, whereas 1.16 only listed direct deps.

When we were using go mod download all, it didn't complain if go.mod left off some indirect dependencies. Now, go list ... does complain, but with a very unhelpful message saying "Invalid go.mod!" and no instructions why it's invalid.

Their advice is to run go mod tidy to update your go.mod, but that works by analyzing your first-party code to see what is used, as it removes any third-party deps you don't actually consume. Our integration tests have no first-party code, so go mod tidy removes it all. Relates to #13136 I think.

I gave up trying to get the magical incantation for this test to work with Go 1.17+. I spent a lot of time yesterday trying to figure it out 😅

One of my remaining TODOs for the Go milestone is to try to improve our error messages so that it's easier to figure this out. I'd like to revert this test to 1.17 as part of that.

async def download_and_analyze_third_party_packages(
request: AllThirdPartyPackagesRequest,
) -> AllThirdPartyPackages:
# NB: We download all modules to GOPATH=$(pwd)/gopath. Running `go list ...` from $(pwd) would
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Love the comments in here.

@Eric-Arellano Eric-Arellano merged commit 07a308e into pantsbuild:main Oct 27, 2021
@Eric-Arellano Eric-Arellano deleted the actually-disable-proxy branch October 27, 2021 19:38
Eric-Arellano added a commit to Eric-Arellano/pants that referenced this pull request Oct 27, 2021
…antsbuild#13352)

Turns out that pantsbuild#13339 didn't actually work - we are redownloading the same modules several times with Go! Downloads happen when:

1. determining `GoModInfo` (once per `go.mod`)
2. `AllDownloadedModules` (once per `go.mod`)
3.  Determining metadata for each third-party package (once per third-party module)
4. Determining metadata for each first-party package (once per first-party package/directory)

This PR fixes it so that we only download modules a single time, once per `go.mod`.

To fix this, we stop treating third-party modules like first-party modules, i.e. we stop `cd`-ing into its downloaded directory and running `go list` directly in it, using its own `go.mod` and `go.sum`. That requires that the chroot has all of the module's transitive dependencies present, and it also resulted in issues like pantsbuild#13138. Instead, the much simpler thing to do is run `go list '...'` to do all third-party analysis in a single swoop. That gets us all the analysis we need.

We also extract the relevant `.go` files from all of the downloaded `GOPATH`, i.e. all the downloaded modules. For compilation, all we need is the `.go` files + the metadata we had earlier calculated. Compilation doesn't need access to anything else like other package's.

For first-party analysis, we copy the whole `GOPATH` into the chroot. (This is really slow! We need something like pantsbuild#12716 to fix this.)

## Benchmark

Running in https://github.com/toolchainlabs/remote-api-tools.

Before:

```
❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     36.467 s ±  0.603 s    [User: 41.109 s, System: 38.095 s]
  Range (min … max):   35.518 s … 37.137 s    5 runs
```

Fixing only third-party analysis:

```
❯ hyperfine -r 5 --show-output './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     29.880 s ±  0.901 s    [User: 29.564 s, System: 15.281 s]
  Range (min … max):   28.835 s … 31.312 s    5 runs
```

Fixing everything:

```
❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     26.633 s ±  2.283 s    [User: 24.115 s, System: 30.453 s]
  Range (min … max):   24.570 s … 30.037 s    5 runs
```

[ci skip-rust]
[ci skip-build-wheels]
Eric-Arellano added a commit that referenced this pull request Oct 27, 2021
…(Cherry-pick of #13352) (#13378)

Turns out that #13339 didn't actually work - we are redownloading the same modules several times with Go! Downloads happen when:

1. determining `GoModInfo` (once per `go.mod`)
2. `AllDownloadedModules` (once per `go.mod`)
3.  Determining metadata for each third-party package (once per third-party module)
4. Determining metadata for each first-party package (once per first-party package/directory)

This PR fixes it so that we only download modules a single time, once per `go.mod`.

To fix this, we stop treating third-party modules like first-party modules, i.e. we stop `cd`-ing into its downloaded directory and running `go list` directly in it, using its own `go.mod` and `go.sum`. That requires that the chroot has all of the module's transitive dependencies present, and it also resulted in issues like #13138. Instead, the much simpler thing to do is run `go list '...'` to do all third-party analysis in a single swoop. That gets us all the analysis we need.

We also extract the relevant `.go` files from all of the downloaded `GOPATH`, i.e. all the downloaded modules. For compilation, all we need is the `.go` files + the metadata we had earlier calculated. Compilation doesn't need access to anything else like other package's.

For first-party analysis, we copy the whole `GOPATH` into the chroot. (This is really slow! We need something like #12716 to fix this.)

## Benchmark

Running in https://github.com/toolchainlabs/remote-api-tools.

Before:

```
❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     36.467 s ±  0.603 s    [User: 41.109 s, System: 38.095 s]
  Range (min … max):   35.518 s … 37.137 s    5 runs
```

Fixing only third-party analysis:

```
❯ hyperfine -r 5 --show-output './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     29.880 s ±  0.901 s    [User: 29.564 s, System: 15.281 s]
  Range (min … max):   28.835 s … 31.312 s    5 runs
```

Fixing everything:

```
❯ hyperfine -r 5 './pants_from_sources --no-process-execution-local-cache --no-pantsd package ::'
  Time (mean ± σ):     26.633 s ±  2.283 s    [User: 24.115 s, System: 30.453 s]
  Range (min … max):   24.570 s … 30.037 s    5 runs
```

[ci skip-rust]
[ci skip-build-wheels]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants