Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/dist: include system metadata in build logs #50146

Closed
prattmic opened this issue Dec 13, 2021 · 13 comments
Closed

cmd/dist: include system metadata in build logs #50146

prattmic opened this issue Dec 13, 2021 · 13 comments
Labels
Builders x/build issues (builders, bots, dashboards) early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@prattmic
Copy link
Member

We've had two issues very recently (#49209, #46272) that only occurred on specific CPU models, and a third (#49065) that occurred only on upgrade to a new kernel version. All of these took quite a while to narrow down to the specific property.

Neither of these properties are indicated anywhere in the build logs, so manual reproduction (or new builders) were required to see if there was any correlation. I'd like to include the CPU model, OS version, and perhaps additional future metadata in build output so that we have a chance to notice such correlations in the future.

This is also something I think we'd want included in structured build output (#37486).

I have a prototype MVP implementation of this in https://golang.org/cl/371474 and https://golang.org/cl/371475. These add a "test" to cmd/dist which prints the metadata. This avoids bootstrapping issues with cmd/dist, but isn't the cleanest.

This approach may be problematic w.r.t. test sharding. This test will (I think?) run only on a single shard, but other shards may not have the same CPU and/or OS version. Even if we run on each shard, we'd need to know which tests ran on which shard when correlating failures.

Alternatives either @bcmills or I have thought of:

  • Add this as a helper in internal/testenv and manually print from TestMain in particularly important packages. This has less coverage, but we could instrument the most crashy packages (like runtime).
  • Add this to testing itself, always printing the metadata on a GOEXPERIMENT?
  • Integrate fully into cmd/dist with bootstrap toolchain support to print every time. We still need a way to connect the dots between shards. Maybe this is already possible? I'm not very familiar with our test sharding.

cc @golang/release @bcmills @aclements

@prattmic prattmic added Builders x/build issues (builders, bots, dashboards) NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Dec 13, 2021
@prattmic prattmic added this to the Backlog milestone Dec 13, 2021
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/371475 mentions this issue: cmd/dist: log OS version when testing

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/371474 mentions this issue: cmd/dist: log CPU model when testing

@prattmic
Copy link
Member Author

I took a look at the sharding code in the coordinator. Summary:

The last bullet means that it should be safe to have each dist execution print the metadata at startup. Since logs don't interleave, that metadata will be correct for all following tests until the next metadata block.

This will look a bit ugly in the output, breaking the illusion that the log looks like a single unified run, but at least it will be accurate.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/372536 mentions this issue: cmd/coodinator: add tests for parseOutputAndBanner

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/372537 mentions this issue: cmd/coordinator: parse "header" instead of "banner"

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/372538 mentions this issue: cmd/coordinator: collapse duplicate execution headers

gopherbot pushed a commit to golang/build that referenced this issue Dec 16, 2021
For golang/go#50146.

Change-Id: I44804be98159d9b1f3ba24b16c7a7fdc8fae97c5
Reviewed-on: https://go-review.googlesource.com/c/build/+/372536
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Alex Rakoczy <alex@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Jan 5, 2022
In preparation of supporting a longer test header, change
parseOutputAndBanner to parseOutputAndHeader which returns the full
human-readable header.

The only behavior change is that output with only a banner and no
trailing newline will still return the banner as part of the header.

For golang/go#50146

Change-Id: Ie34fe3891e9bc29a3da79fddacd4829a89d2dbfb
Reviewed-on: https://go-review.googlesource.com/c/build/+/372537
Reviewed-by: Alex Rakoczy <alex@golang.org>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopherbot pushed a commit to golang/build that referenced this issue Jan 10, 2022
Following CL 371474, cmd/dist may include a "Test execution environment"
header which describes the test execution environment.

When sharding tests, we want to surface the proper execution environment
for each test, which may vary from shard to shard.

We could simply print the metadata for each shard in its output, as
shard results are printed in an atomic block. However, shards are pretty
small and that adds quite a bit of noise to logs.

Instead, treat the metadata block like the test header banners: as long
as the metadata doesn't change, we don't need to print it again.

On the other hand, if the metadata changes, we do print the test header
banner again. This isn't strictly necessary, it just serves to improve
readability by ensuring that tests are always immediately preceeded by
their banner rather than metadata (in the case that metadata changes in
the middle of a header block).

This CL should be submitted and deployed before CL 371474.

For golang/go#50146.

Change-Id: Ifca30f7f31237fd8cd0fcd801d198d9c341f695e
Reviewed-on: https://go-review.googlesource.com/c/build/+/372538
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Alex Rakoczy <alex@golang.org>
gopherbot pushed a commit that referenced this issue Jan 14, 2022
Knowing whether test failures are correlated with specific CPU models on
has proven useful on several issues. Log it for prior to testing so it
is always available.

internal/sysinfo provides the CPU model, but it is not available in the
bootstrap toolchain, so we can't access this unconditionally in
cmd/dist. Instead use a build-tagged file, as the final version of
cmd/dist will use the final toolchain.

The addition of new data to the beginning of cmd/dist output will break
x/build/cmd/coordinator's banner parsing, leaving extra lines in the log
output, though information will not be lost.
https://golang.org/cl/372538 fixes up the coordinator and should be
submitted and deployed before this CL is submitted.

For #46272.
For #49209.
For #50146.

Change-Id: I515d2ec58e4c0034b76bf624ecaab38f16146074
Reviewed-on: https://go-review.googlesource.com/c/go/+/371474
Trust: Benny Siegert <bsiegert@gmail.com>
Reviewed-by: Benny Siegert <bsiegert@gmail.com>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/378584 mentions this issue: cmd: go mod tidy

@prattmic prattmic reopened this Jan 14, 2022
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/378586 mentions this issue: Revert "cmd/dist: log OS version when testing"

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/378587 mentions this issue: Revert "cmd/dist: log CPU model when testing"

gopherbot pushed a commit that referenced this issue Jan 14, 2022
Fails TestAllDependencies.

This reverts CL 371475.

For #50146.

Change-Id: I1feccd772d92d80e57c6327a4ac7b8af753a8c05
Reviewed-on: https://go-review.googlesource.com/c/go/+/378586
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue Jan 14, 2022
Fails TestRepeatBootstrap.

This reverts CL 371474.

For #50146.

Change-Id: Ie4adda4e0229e153471301ca00fe2c1c694b4b2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/378587
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/378589 mentions this issue: cmd/dist: log CPU model when testing

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/378590 mentions this issue: cmd/dist: log OS version when testing

@dmitshur
Copy link
Contributor

@prattmic Have you thought about what Go release this should target? Since the re-apply CL is ready, I'll tentatively put it into 1.19 milestone with early-in-cycle label, but please feel free to update further as needed.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Feb 16, 2022
@dmitshur dmitshur modified the milestones: Backlog, Go1.19 Feb 16, 2022
@dmitshur dmitshur added the early-in-cycle A change that should be done early in the 3 month dev cycle. label Feb 16, 2022
gopherbot pushed a commit that referenced this issue Mar 8, 2022
Knowing whether test failures are correlated with specific CPU models on
has proven useful on several issues. Log it for prior to testing so it
is always available.

internal/sysinfo provides the CPU model, but it is not available in the
bootstrap toolchain, so we can't access this in cmd/dist. Instead use a
separate binary which cmd/dist will only build once testing begins.

The addition of new data to the beginning of cmd/dist output will break
x/build/cmd/coordinator's banner parsing, leaving extra lines in the log
output, though information will not be lost.
https://golang.org/cl/372538 fixes up the coordinator and should be
submitted and deployed before this CL is submitted.

This is a redo of CL 371474. It switches back to the original approach
of using a separate binary, as the bootstap toolchain won't allow
cmd/dist to import internal packages.

For #46272.
For #49209.
For #50146.

Change-Id: I906bbda987902a2120c5183290a4e89a2440de58
Reviewed-on: https://go-review.googlesource.com/c/go/+/378589
Reviewed-by: Austin Clements <austin@google.com>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/411061 mentions this issue: internal/task: filter out cmd/internal/metadata

gopherbot pushed a commit to golang/build that referenced this issue Jun 9, 2022
We don't want to include the metadata command in the release. Exclude
it.

For golang/go#50146.

Change-Id: I0e7c8b94afe07ffd8396d7bac1c039c972a1b855
Reviewed-on: https://go-review.googlesource.com/c/build/+/411061
Reviewed-by: Alex Rakoczy <alex@golang.org>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
jproberts pushed a commit to jproberts/go that referenced this issue Jun 21, 2022
Knowing whether test failures are correlated with specific CPU models on
has proven useful on several issues. Log it for prior to testing so it
is always available.

internal/sysinfo provides the CPU model, but it is not available in the
bootstrap toolchain, so we can't access this unconditionally in
cmd/dist. Instead use a build-tagged file, as the final version of
cmd/dist will use the final toolchain.

The addition of new data to the beginning of cmd/dist output will break
x/build/cmd/coordinator's banner parsing, leaving extra lines in the log
output, though information will not be lost.
https://golang.org/cl/372538 fixes up the coordinator and should be
submitted and deployed before this CL is submitted.

For golang#46272.
For golang#49209.
For golang#50146.

Change-Id: I515d2ec58e4c0034b76bf624ecaab38f16146074
Reviewed-on: https://go-review.googlesource.com/c/go/+/371474
Trust: Benny Siegert <bsiegert@gmail.com>
Reviewed-by: Benny Siegert <bsiegert@gmail.com>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
jproberts pushed a commit to jproberts/go that referenced this issue Jun 21, 2022
As a follow-up to https://golang.org/cl/371474, add the OS version to
the metadata printed for each test.

Fixes golang#50146.

Change-Id: I3b7e47983d0e85feebce8e424881b931882d53bf
Reviewed-on: https://go-review.googlesource.com/c/go/+/371475
Reviewed-by: Bryan Mills <bcmills@google.com>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
jproberts pushed a commit to jproberts/go that referenced this issue Jun 21, 2022
Fails TestAllDependencies.

This reverts CL 371475.

For golang#50146.

Change-Id: I1feccd772d92d80e57c6327a4ac7b8af753a8c05
Reviewed-on: https://go-review.googlesource.com/c/go/+/378586
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
jproberts pushed a commit to jproberts/go that referenced this issue Jun 21, 2022
Fails TestRepeatBootstrap.

This reverts CL 371474.

For golang#50146.

Change-Id: Ie4adda4e0229e153471301ca00fe2c1c694b4b2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/378587
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Jun 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) early-in-cycle A change that should be done early in the 3 month dev cycle. FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

3 participants