Skip to content

Commit

Permalink
increase time allowance on StopAndCancel test
Browse files Browse the repository at this point in the history
When `StopAndCancel` is called, the Client should ~immediately cancel
the context used by all active jobs. While this is indeed how the code
works, the unfortunate reality of GitHub Actions free workers is that
performance is highly variable.

The test is structured to measure the time from right before
`StopAndCancel` is called until _after_ the client has finished shutting
down. This includes the time for the Client to cancel the context, then
have the workers receive that cancellation and return; importantly, it
also includes the time it takes for the Completer to mark any jobs as
errored so that they can be retried. As we have seen in the past, the
duration of this simple single-row update query can be vary from sub-ms
to tens of milliseconds, even on a fast Apple Silicon machine. On
GitHub Actions free workers, this whole sequence can sometimes take just
over 100ms and cause the test to fail.

I considered some options for restructuring this test. In particular, we
could separately measure the duration between calling `StopAndCancel`
and the worker returning. This too could be subject to variable runtime
performance, but it would allow us to set a smaller limit for the
context cancellation while allowing a longer duration for the overall
process including the UPDATE. Another option is to wait for the
Completer to emit its Failed event once the query finishes, which would
allow us to break up the intervals into more discrete chunks.

For now I felt this would add too much complexity and additional
concurrency to the test, but I think if it fails again we should pursue
another option like this.

Fixes #65.
  • Loading branch information
bgentry committed Nov 26, 2023
1 parent 724cefc commit 3d77ca1
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions client_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -538,7 +538,7 @@ func Test_Client_StopAndCancel(t *testing.T) {
stopCtx, stopCancel := context.WithTimeout(ctx, time.Second)
t.Cleanup(stopCancel)

stopStart := time.Now()
stopStartedAt := time.Now()

err = client.StopAndCancel(stopCtx)
require.NoError(t, err)
Expand All @@ -556,7 +556,7 @@ func Test_Client_StopAndCancel(t *testing.T) {
// TODO: client stop seems to take a widely variable amount of time,
// between 1ms and >50ms, due to the JobComplete query taking that long.
// Investigate and solve that if we can, or consider reworking this test.
require.WithinDuration(t, time.Now(), stopStart, 100*time.Millisecond)
require.WithinDuration(t, time.Now(), stopStartedAt, 200*time.Millisecond)
})
}

Expand Down

0 comments on commit 3d77ca1

Please sign in to comment.