testing/synctest: new package for testing concurrent code

Current proposal status: https://github.com/golang/go/issues/67434#issuecomment-2565780150

----

This is a proposal for a new package to aid in testing concurrent code.

```go
// Package synctest provides support for testing concurrent code.
package synctest

// Run executes f in a new goroutine.
//
// The new goroutine and any goroutines transitively started by it form a group.
// Run waits for all goroutines in the group to exit before returning.
//
// Goroutines in the group use a synthetic time implementation.
// The initial time is midnight UTC 2000-01-01.
// Time advances when every goroutine is idle.
// If every goroutine is idle and there are no timers scheduled,
// Run panics.
func Run(f func())

// Wait blocks until every goroutine within the current group is idle.
//
// A goroutine is idle if it is blocked on a channel operation,
// mutex operation,
// time.Sleep,
// a select with no cases,
// or is the goroutine calling Wait.
//
// A goroutine blocked on an I/O operation, such as a read from a network connection,
// is not idle. Tests which operate on a net.Conn or similar type should use an
// in-memory implementation rather than a real network connection.
//
// The caller of Wait must be in a goroutine created by Run,
// or a goroutine transitively started by Run.
// If it is not, Wait panics.
func Wait()
```

This package has two main features:

1. It permits using a fake clock to test code which uses timers. The test can control the passage of time as observed by the code under test.
2. It permits a test to wait until an asynchronous operation has completed.

As an example, let us say we are testing an expiring concurrent cache:

```go
type Cache[K comparable, V any] struct{}

// NewCache creates a new cache with the given expiry.
// f is called to create new items as necessary.
func NewCache[K comparable, V any](expiry time.Duration, f func(K) V) *Cache {}

// Get returns the cache entry for K, creating it if necessary.
func (c *Cache[K,V]) Get(key K) V {}
```

A naive test for this cache might look something like this:

```go
func TestCacheEntryExpires(t *testing.T) {
	count := 0
	c := NewCache(2 * time.Second, func(key string) int {
		count++
		return fmt.Sprintf("%v:%v", key, count)
	})

	// Get an entry from the cache.
	if got, want := c.Get("k"), "k:1"; got != want {
		t.Errorf("c.Get(k) = %q, want %q", got, want)
	}

	// Verify that we get the same entry when accessing it before the expiry.
	time.Sleep(1 * time.Second)
	if got, want := c.Get("k"), "k:1"; got != want {
		t.Errorf("c.Get(k) = %q, want %q", got, want)
	}

	// Wait for the entry to expire and verify that we now get a new one.
	time.Sleep(3 * time.Second)
	if got, want := c.Get("k"), "k:2"; got != want {
		t.Errorf("c.Get(k) = %q, want %q", got, want)
	}
}
```

This test has a couple problems. It's slow, taking four seconds to execute. And it's flaky, because it assumes the cache entry will not have expired one second before its deadline and will have expired one second after. While computers are fast, it is not uncommon for an overloaded CI system to pause execution of a program for longer than a second.

We can make the test less flaky by making it slower, or we can make the test faster at the expense of making it flakier, but we can't make it fast and reliable using this approach.

We can design our Cache type to be more testable. We can inject a fake clock to give us control over time in tests. When advancing the fake clock, we will need some mechanism to ensure that any timers that fire have executed before progressing the test. These changes come at the expense of additional code complexity: We can no longer use time.Timer, but must use a testable wrapper. Background goroutines need additional synchronization points.

The synctest package simplifies all of this. Using synctest, we can write:

```go
func TestCacheEntryExpires(t *testing.T) {
        synctest.Run(func() {
                count := 0
                        c := NewCache(2 * time.Second, func(key string) int {
                        count++
                        return fmt.Sprintf("%v:%v", key, count)
                })

                // Get an entry from the cache.
                if got, want := c.Get("k"), "k:1"; got != want {
                        t.Errorf("c.Get(k) = %q, want %q", got, want)
                }

                // Verify that we get the same entry when accessing it before the expiry.
                time.Sleep(1 * time.Second)
                synctest.Wait()
                if got, want := c.Get("k"), "k:1"; got != want {
                        t.Errorf("c.Get(k) = %q, want %q", got, want)
                }

                // Wait for the entry to expire and verify that we now get a new one.
                time.Sleep(3 * time.Second)
                synctest.Wait()
                if got, want := c.Get("k"), "k:2"; got != want {
                        t.Errorf("c.Get(k) = %q, want %q", got, want)
                }
        })
}
```

This is identical to the naive test above, wrapped in synctest.Run and with the addition of two calls to synctest.Wait. However:

1. This test is not slow. The time.Sleep calls use a fake clock, and execute immediately.
2. This test is not flaky. The synctest.Wait ensures that all background goroutines have idled or exited before the test proceeds.
3. This test requires no additional instrumentation of the code under test. It can use standard time package timers, and it does not need to provide any mechanism for tests to synchronize with it.

A limitation of the synctest.Wait function is that it does not recognize goroutines blocked on network or other I/O operations as idle. While the scheduler can identify a goroutine blocked on I/O, it cannot distinguish between a goroutine that is genuinely blocked and one which is about to receive data from a kernel network buffer. For example, if a test creates a loopback TCP connection, starts a goroutine reading from one side of the connection, and then writes to the other, the read goroutine may remain in I/O wait for a brief time before the kernel indicates that the connection has become readable. If synctest.Wait considered a goroutine in I/O wait to be idle, this would cause nondeterminism in cases such as this,

Tests which use synctest with network connections or other external data sources should use a fake implementation with deterministic behavior. For net.Conn, net.Pipe can create a suitable in-memory connection.

This proposal is based in part on experience with tests in the golang.org/x/net/http2 package. Tests of an HTTP client or server often involve multiple interacting goroutines and timers. For example, a client request may involve goroutines writing to the server, reading from the server, and reading from the request body; as well as timers covering various stages of the request process. The combination of fake clocks and an operation which waits for all goroutines in the test to stabilize has proven effective.


@gabyhelp's overview of this issue: https://github.com/golang/go/issues/67434#issuecomment-2593973640

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

testing/synctest: new package for testing concurrent code #67434

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

testing/synctest: new package for testing concurrent code #67434

Description

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions