increase time allowance on StopAndCancel test

When `StopAndCancel` is called, the Client should ~immediately cancel the context used by all active jobs. While this is indeed how the code works, the unfortunate reality of GitHub Actions free workers is that performance is highly variable. The test is structured to measure the time from right before `StopAndCancel` is called until _after_ the client has finished shutting down. This includes the time for the Client to cancel the context, then have the workers receive that cancellation and return; importantly, it also includes the time it takes for the Completer to mark any jobs as errored so that they can be retried. As we have seen in the past, the duration of this simple single-row update query can be vary from sub-ms to tens of milliseconds, even on a fast Apple Silicon machine. On GitHub Actions free workers, this whole sequence can sometimes take just over 100ms and cause the test to fail. I considered some options for restructuring this test. In particular, we could separately measure the duration between calling `StopAndCancel` and the worker returning. This too could be subject to variable runtime performance, but it would allow us to set a smaller limit for the context cancellation while allowing a longer duration for the overall process including the UPDATE. Another option is to wait for the Completer to emit its Failed event once the query finishes, which would allow us to break up the intervals into more discrete chunks. For now I felt this would add too much complexity and additional concurrency to the test, but I think if it fails again we should pursue another option like this. Fixes #65.
riverqueue · Nov 26, 2023 · 3d77ca1 · 3d77ca1
1 parent 724cefc
commit 3d77ca1
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/client_test.go b/client_test.go
@@ -538,7 +538,7 @@ func Test_Client_StopAndCancel(t *testing.T) {
 		stopCtx, stopCancel := context.WithTimeout(ctx, time.Second)
 		t.Cleanup(stopCancel)
 
-		stopStart := time.Now()
+		stopStartedAt := time.Now()
 
 		err = client.StopAndCancel(stopCtx)
 		require.NoError(t, err)
@@ -556,7 +556,7 @@ func Test_Client_StopAndCancel(t *testing.T) {
 		// TODO: client stop seems to take a widely variable amount of time,
 		// between 1ms and >50ms, due to the JobComplete query taking that long.
 		// Investigate and solve that if we can, or consider reworking this test.
-		require.WithinDuration(t, time.Now(), stopStart, 100*time.Millisecond)
+		require.WithinDuration(t, time.Now(), stopStartedAt, 200*time.Millisecond)
 	})
 }