-
Notifications
You must be signed in to change notification settings - Fork 17.8k
TestFailures
If you notice a failure in a test in the Go project, what should you do?
The goal of writing (and running) tests for a Go package is to learn about the behavior of that package and its dependencies.
A test failure for a Go package may give us information about:
- implementation defects in the package or its dependencies,
- mistaken assumptions about the package API,
- bugs in the test itself (such as invalid assumptions about timing),
- unexpectedly high resource requirements (such as RAM or CPU time), or
- bugs in the underlying platform or defects in the test infrastructure (which may need to be escalated or worked around).
In some cases, the cause of a test failure might not be clear: it might be caused by more than one of the above conditions. Much like repeating a scientific experiment, allowing a test to fail multiple times can sometimes provide more information from the specific pattern of failures.
However, if a test fails without telling us anything new, then the test is not fulfilling its purpose.
A test failure is typically noticed from:
- the Go build dashboard, especially during builder triage;
- TryBot or SlowBot failures on a pending change;
- running
go test
on a specific package or packages, either working within a Go project repository or as part of (say)go test all
in a user's own module; - or running
all.bash
orall.bat
when installing from source or testing a contributed change.
Once we have noticed a failure, we need to triage it. The goal of triage is to identify:
- Is the information from the failure new?
- Who is best equipped to analyze the new information from the failure?
Start by searching the open issues for key details from the failure, such as the name of the failing test and/or other distinctive fragments of the error text (such as error codes).
If you find an existing issue, first check the issue discussion to see whether the failure has already been fixed, and whether the information from your failure contributes relevant new information. If so, comment on it with details:
- describe what Go version you were testing (
go version
) - how and where you were running the test, such as:
-
go env
output - your machine and OS configuration
- your network configuration and status
-
- whether or how often you are able to reproduce the failure.
Ideally, attach or link to the full test logs (possibly in a <details>
block).
If you don't find an existing issue, file a new one.
Paste in enough of the test log for future reporters to be able to search for
the issue, including the name of the test and any distinctive parts of its log
output. (If the test log is long — such as if it contains a large goroutine
dump — consider posting a shorter excerpt and/or enclosing the complete failure
message in a <details>
block.)
You can use the
fetchlogs
and
greplogs
tools to
search for similar failures in the build dashboard:
# download recent logs
fetchlogs -n 1024 -repo all
# search logs for some regexp describing the failure message
greplogs -l -e $FAILURE_REGEXP
If the failure appears to be specific to a package, consult https://dev.golang.org/owners to find the maintainers of that package and mention them on the issue. (If no owners are listed for the package, check the recent history of the package at https://cs.opensource.google/go and/or escalate to someone who can help to identify a relevant owner — and consider updating the owners table with what you learn!)
If the failure appears to be specific to a GOOS
or GOARCH
label the issue
with the corresponding GOOS and/or GOARCH labels, and mention the relevant
subteam(s) of
@golang/port-maintainers
on the issue.
If the failure appears to affect at least one
first class port,
add the issue to the current release milestone and label it
release-blocker
.
Otherwise, add the issue to the Backlog
milestone.
If the failure appears to be specific to a builder (such as a network
connectivity issue, or a platform bug requiring a system update), consult
x/build/dashboard/builders.go
to find the maintainer for that builder and mention them on the issue.
(For builders without an explicit maintainer listed, instead mention
the @golang/release team.)
Once an issue has been filed for a test failure, the relevant package, port, and/or builder maintainers should examine the information gleaned from the test failure and decide how to address it, by doing one or more of:
- revert a change to the code or the test infrastructure believed to have introduced the problem,
- fix (or apply a workaround for) the root cause of the failure,
- collect more information, by subscribing to issue updates and/or running more tests,
- report an underlying defect in a dependency, platform, or test infrastructure, and/or
-
deprioritize the failure, by skipping the failure on affected platforms (or
marking those platforms as broken) and then moving the issue to a future or
Backlog
milestone and/or removing therelease-blocker
label.
When a maintainer decides to deprioritize a test failure, they determine that
additional failures of the test will not provide useful new information. At
that point, the test no longer fulfills its purpose, and the maintainer should
suppress the failure — typically by adding a call to
testenv.SkipFlaky
or
t.Skipf
.
When we add a call to testenv.SkipFlaky
, our goal is to eliminate failure
modes that do not provide new information while still preserving as much of the
value of the test as is feasible.
-
If the observed failure is only one of several possible failure modes for the test, skip the test only for that failure mode.
- For example, if the error is always something specific like
syscall.ECONNRESET
, useerrors.Is
to check for that specific error.
- For example, if the error is always something specific like
-
If the failure is believed to affect all versions of a particular
GOOS
and/orGOARCH
, or the affected versions cannot be identified, check againstruntime.GOOS
and/orruntime.GOARCH
and skip only the affected platform. -
If the failure is due to a bug on a specific version of a platform, skip the test based on
testenv.Builder
or theGO_BUILDER_NAME
environment variable. (If the test fails for external Go users, they have the option to upgrade to an unaffected version of the platform — and they probably ought to see the test failure to find out that the bug exists!)- Also consider adding another environment variable that users and contributors can set to acknowledge the bug and suppress the failure.
A platform bug or bug in a core package (such as os
, net
, or runtime
) may
impact so many tests that the failures are not feasible to skip, or may manifest
as a failure at compile or link time. If such a bug occurs, the options are more
limited: if we cannot revert a change or fix or work around the root cause, and
don't need to collect more information, we can only deprioritize the failure by
marking the entire builder or platform as broken.
To mark a builder as broken, edit its configuration in
x/build/dashboard/builders.go
to add an issue in the KnownIssue
field; note that builders with known issues
will generally be skipped during dashboard triage.
A broken builder for a
first class port
should have its known issue(s) labeled release-blocker
, pending a decision
to either fix the builder or drop support for the affected version of the
platform.
If all of the builders for a secondary port are broken, the port itself may be considered broken. Discussion #53060 aims to resolve the question of how broken secondary ports should be handled.