Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flaky StopAndCancel test in CI #65

Closed
bgentry opened this issue Nov 23, 2023 · 1 comment · Fixed by #72
Closed

flaky StopAndCancel test in CI #65

bgentry opened this issue Nov 23, 2023 · 1 comment · Fixed by #72
Assignees
Labels
bug Something isn't working

Comments

@bgentry
Copy link
Contributor

bgentry commented Nov 23, 2023

GitHub Actions continues to surprise us with timing issues @brandur:

client_test.go:546: Waiting on job to be done
        client_test.go:559: 
            	Error Trace:	/home/runner/work/river/river/client_test.go:559
            	Error:      	Max difference between 2023-11-23 18:02:54.632264664 +0000 UTC m=+0.706377072 and 2023-11-23 18:02:54.531149022 +0000 UTC m=+0.6052614[30](https://github.com/riverqueue/river/actions/runs/6973035679/job/18976333126#step:8:31) allowed is 100ms, but difference was 101.115642ms
            	Test:       	Test_Client_StopAndCancel/jobs_in_progress,_only_completing_when_context_is_canceled
@bgentry bgentry added the bug Something isn't working label Nov 23, 2023
@brandur
Copy link
Contributor

brandur commented Nov 23, 2023

1 ms over budget. Seems like the tolerance on that might be a little too tight.

@bgentry bgentry self-assigned this Nov 26, 2023
bgentry added a commit that referenced this issue Nov 26, 2023
When `StopAndCancel` is called, the Client should ~immediately cancel
the context used by all active jobs. While this is indeed how the code
works, the unfortunate reality of GitHub Actions free workers is that
things can be highly variable.

The test is structured to measure the time from right before
`StopAndCancel` is called until _after_ the client has finished shutting
down. This includes the time for the Client to cancel the context, then
have the workers receive that cancellation and return; importantly, it
also includes the time it takes for the Completer to mark any jobs as
errored so that they can be retried. As we have seen in the past, the
duration of this simple single-row update query can be highly variable.
On GitHub Actions free workers, this whole sequence can sometimes take
just over 100ms and cause the test to fail.

I considered some options for restructuring this test. In particular, we
could separately measure the duration between calling `StopAndCancel`
and the worker returning. This too could be subject to variable runtime
performance, but it would allow us to set a smaller limit for the
context cancellation while allowing a longer duration for the overall
process including the UPDATE. Another option is to wait for the
Completer to emit its Failed event once the query finishes, which would
allow us to break up the intervals into more discrete chunks.

For now I felt this would add too much complexity and additional
concurrency to the test, but I think if it fails again we should pursue
another option like this.

Fixes #65.
bgentry added a commit that referenced this issue Nov 26, 2023
When `StopAndCancel` is called, the Client should ~immediately cancel
the context used by all active jobs. While this is indeed how the code
works, the unfortunate reality of GitHub Actions free workers is that
performance is highly variable.

The test is structured to measure the time from right before
`StopAndCancel` is called until _after_ the client has finished shutting
down. This includes the time for the Client to cancel the context, then
have the workers receive that cancellation and return; importantly, it
also includes the time it takes for the Completer to mark any jobs as
errored so that they can be retried. As we have seen in the past, the
duration of this simple single-row update query can be vary from sub-ms
to tens of milliseconds, even on a fast Apple Silicon machine. On
GitHub Actions free workers, this whole sequence can sometimes take just
over 100ms and cause the test to fail.

I considered some options for restructuring this test. In particular, we
could separately measure the duration between calling `StopAndCancel`
and the worker returning. This too could be subject to variable runtime
performance, but it would allow us to set a smaller limit for the
context cancellation while allowing a longer duration for the overall
process including the UPDATE. Another option is to wait for the
Completer to emit its Failed event once the query finishes, which would
allow us to break up the intervals into more discrete chunks.

For now I felt this would add too much complexity and additional
concurrency to the test, but I think if it fails again we should pursue
another option like this.

Fixes #65.
bgentry added a commit that referenced this issue Nov 26, 2023
When `StopAndCancel` is called, the Client should ~immediately cancel
the context used by all active jobs. While this is indeed how the code
works, the unfortunate reality of GitHub Actions free workers is that
performance is highly variable.

The test is structured to measure the time from right before
`StopAndCancel` is called until _after_ the client has finished shutting
down. This includes the time for the Client to cancel the context, then
have the workers receive that cancellation and return; importantly, it
also includes the time it takes for the Completer to mark any jobs as
errored so that they can be retried. As we have seen in the past, the
duration of this simple single-row update query can be vary from sub-ms
to tens of milliseconds, even on a fast Apple Silicon machine. On
GitHub Actions free workers, this whole sequence can sometimes take just
over 100ms and cause the test to fail.

I considered some options for restructuring this test. In particular, we
could separately measure the duration between calling `StopAndCancel`
and the worker returning. This too could be subject to variable runtime
performance, but it would allow us to set a smaller limit for the
context cancellation while allowing a longer duration for the overall
process including the UPDATE. Another option is to wait for the
Completer to emit its Failed event once the query finishes, which would
allow us to break up the intervals into more discrete chunks.

For now I felt this would add too much complexity and additional
concurrency to the test, but I think if it fails again we should pursue
another option like this.

Fixes #65.
brandur added a commit that referenced this issue Dec 9, 2023
Background: While trying to figure out how to safely release [1]
(which includes a new `riverdriver` submodule), I threw myself through a
loop and realized that it'd be practically impossible to do without a
whole bunch of manual finnagling on the release tags/lint settings in
the repo, or breaking the master build, or most likely, both.

This got me thinking: maybe we should, maybe we _should_ be using
`replace` directives to make this less painful/brittle, which were
originally removed in #35 because using `replace` makes it impossible to
use `go install ...@latest`.

A change that we made recently was the addition of a new Go migration
API in #65 (`rivermigrate`), and a side effect of that change is that
the API being used by the River CLI became entirely public (previously,
it'd depended on packages in River's `internal`). By extension, that
means it's now possible to make the River CLI its own Go module,
something that was infeasible before because of the use of `internal`.

So my thinking currently: maybe we should go back to trying to see if we
can make `replace` in use by most of River's core packages, and keep the
River CLI as its own module without `replace` so that it can still be
installed with `go install ...@latest`. I'm not entirely sure we won't
run into another problem, but it might be the easiest thing in the
meantime. As the River CLI expands, we'll need to make sure to only use
public APIs, but that might not be the worst thing anyway -- we could
adopt the philosophy that any function the CLI accesses should also be
accessible by the Go API.

Here, notably we still use a `replace`, which I'm finding that I need to
have a passing build for now, and which I think will have to temporarily
stay until we cut a new release. Trying to build the new submodule
without results in this error that I was unable to otherwise find a way
around:

    $ go test .
    ambiguous import: found package github.com/riverqueue/river/cmd/river in multiple modules:
            github.com/riverqueue/river v0.0.12 (/Users/brandur/Documents/go/pkg/mod/github.com/riverqueue/river@v0.0.12/cmd/river)
            github.com/riverqueue/river/cmd/river (/Users/brandur/Documents/projects/river/cmd/river)

[1] #98
brandur added a commit that referenced this issue Dec 9, 2023
Background: While trying to figure out how to safely release [1]
(which includes a new `riverdriver` submodule), I threw myself through a
loop and realized that it'd be practically impossible to do without a
whole bunch of manual finnagling on the release tags/lint settings in
the repo, or breaking the master build, or most likely, both.

This got me thinking: maybe we should, maybe we _should_ be using
`replace` directives to make this less painful/brittle, which were
originally removed in #35 because using `replace` makes it impossible to
use `go install ...@latest`.

A change that we made recently was the addition of a new Go migration
API in #65 (`rivermigrate`), and a side effect of that change is that
the API being used by the River CLI became entirely public (previously,
it'd depended on packages in River's `internal`). By extension, that
means it's now possible to make the River CLI its own Go module,
something that was infeasible before because of the use of `internal`.

So my thinking currently: maybe we should go back to trying to see if we
can make `replace` in use by most of River's core packages, and keep the
River CLI as its own module without `replace` so that it can still be
installed with `go install ...@latest`. I'm not entirely sure we won't
run into another problem, but it might be the easiest thing in the
meantime. As the River CLI expands, we'll need to make sure to only use
public APIs, but that might not be the worst thing anyway -- we could
adopt the philosophy that any function the CLI accesses should also be
accessible by the Go API.

Here, notably we still use a `replace`, which I'm finding that I need to
have a passing build for now, and which I think will have to temporarily
stay until we cut a new release. Trying to build the new submodule
without results in this error that I was unable to otherwise find a way
around:

    $ go test .
    ambiguous import: found package github.com/riverqueue/river/cmd/river in multiple modules:
            github.com/riverqueue/river v0.0.12 (/Users/brandur/Documents/go/pkg/mod/github.com/riverqueue/river@v0.0.12/cmd/river)
            github.com/riverqueue/river/cmd/river (/Users/brandur/Documents/projects/river/cmd/river)

[1] #98
brandur added a commit that referenced this issue Dec 9, 2023
Background: While trying to figure out how to safely release [1]
(which includes a new `riverdriver` submodule), I threw myself through a
loop and realized that it'd be practically impossible to do without a
whole bunch of manual finnagling on the release tags/lint settings in
the repo, or breaking the master build, or most likely, both.

This got me thinking: maybe we should, maybe we _should_ be using
`replace` directives to make this less painful/brittle, which were
originally removed in #35 because using `replace` makes it impossible to
use `go install ...@latest`.

A change that we made recently was the addition of a new Go migration
API in #65 (`rivermigrate`), and a side effect of that change is that
the API being used by the River CLI became entirely public (previously,
it'd depended on packages in River's `internal`). By extension, that
means it's now possible to make the River CLI its own Go module,
something that was infeasible before because of the use of `internal`.

So my thinking currently: maybe we should go back to trying to see if we
can make `replace` in use by most of River's core packages, and keep the
River CLI as its own module without `replace` so that it can still be
installed with `go install ...@latest`. I'm not entirely sure we won't
run into another problem, but it might be the easiest thing in the
meantime. As the River CLI expands, we'll need to make sure to only use
public APIs, but that might not be the worst thing anyway -- we could
adopt the philosophy that any function the CLI accesses should also be
accessible by the Go API.

Here, notably we still use a `replace`, which I'm finding that I need to
have a passing build for now, and which I think will have to temporarily
stay until we cut a new release. Trying to build the new submodule
without results in this error that I was unable to otherwise find a way
around:

    $ go test .
    ambiguous import: found package github.com/riverqueue/river/cmd/river in multiple modules:
            github.com/riverqueue/river v0.0.12 (/Users/brandur/Documents/go/pkg/mod/github.com/riverqueue/river@v0.0.12/cmd/river)
            github.com/riverqueue/river/cmd/river (/Users/brandur/Documents/projects/river/cmd/river)

[1] #98
brandur added a commit that referenced this issue Dec 9, 2023
Background: While trying to figure out how to safely release [1]
(which includes a new `riverdriver` submodule), I threw myself through a
loop and realized that it'd be practically impossible to do without a
whole bunch of manual finnagling on the release tags/lint settings in
the repo, or breaking the master build, or most likely, both.

This got me thinking: maybe we should, maybe we _should_ be using
`replace` directives to make this less painful/brittle, which were
originally removed in #35 because using `replace` makes it impossible to
use `go install ...@latest`.

A change that we made recently was the addition of a new Go migration
API in #65 (`rivermigrate`), and a side effect of that change is that
the API being used by the River CLI became entirely public (previously,
it'd depended on packages in River's `internal`). By extension, that
means it's now possible to make the River CLI its own Go module,
something that was infeasible before because of the use of `internal`.

So my thinking currently: maybe we should go back to trying to see if we
can make `replace` in use by most of River's core packages, and keep the
River CLI as its own module without `replace` so that it can still be
installed with `go install ...@latest`. I'm not entirely sure we won't
run into another problem, but it might be the easiest thing in the
meantime. As the River CLI expands, we'll need to make sure to only use
public APIs, but that might not be the worst thing anyway -- we could
adopt the philosophy that any function the CLI accesses should also be
accessible by the Go API.

Here, notably we still use a `replace`, which I'm finding that I need to
have a passing build for now, and which I think will have to temporarily
stay until we cut a new release. Trying to build the new submodule
without results in this error that I was unable to otherwise find a way
around:

    $ go test .
    ambiguous import: found package github.com/riverqueue/river/cmd/river in multiple modules:
            github.com/riverqueue/river v0.0.12 (/Users/brandur/Documents/go/pkg/mod/github.com/riverqueue/river@v0.0.12/cmd/river)
            github.com/riverqueue/river/cmd/river (/Users/brandur/Documents/projects/river/cmd/river)

[1] #98
brandur added a commit that referenced this issue Dec 13, 2023
Background: While trying to figure out how to safely release [1]
(which includes a new `riverdriver` submodule), I threw myself through a
loop and realized that it'd be practically impossible to do without a
whole bunch of manual finnagling on the release tags/lint settings in
the repo, or breaking the master build, or most likely, both.

This got me thinking: maybe we should, maybe we _should_ be using
`replace` directives to make this less painful/brittle, which were
originally removed in #35 because using `replace` makes it impossible to
use `go install ...@latest`.

A change that we made recently was the addition of a new Go migration
API in #65 (`rivermigrate`), and a side effect of that change is that
the API being used by the River CLI became entirely public (previously,
it'd depended on packages in River's `internal`). By extension, that
means it's now possible to make the River CLI its own Go module,
something that was infeasible before because of the use of `internal`.

So my thinking currently: maybe we should go back to trying to see if we
can make `replace` in use by most of River's core packages, and keep the
River CLI as its own module without `replace` so that it can still be
installed with `go install ...@latest`. I'm not entirely sure we won't
run into another problem, but it might be the easiest thing in the
meantime. As the River CLI expands, we'll need to make sure to only use
public APIs, but that might not be the worst thing anyway -- we could
adopt the philosophy that any function the CLI accesses should also be
accessible by the Go API.

Here, notably we still use a `replace`, which I'm finding that I need to
have a passing build for now, and which I think will have to temporarily
stay until we cut a new release. Trying to build the new submodule
without results in this error that I was unable to otherwise find a way
around:

    $ go test .
    ambiguous import: found package github.com/riverqueue/river/cmd/river in multiple modules:
            github.com/riverqueue/river v0.0.12 (/Users/brandur/Documents/go/pkg/mod/github.com/riverqueue/river@v0.0.12/cmd/river)
            github.com/riverqueue/river/cmd/river (/Users/brandur/Documents/projects/river/cmd/river)

[1] #98
brandur added a commit that referenced this issue Dec 13, 2023
Background: While trying to figure out how to safely release [1]
(which includes a new `riverdriver` submodule), I threw myself through a
loop and realized that it'd be practically impossible to do without a
whole bunch of manual finnagling on the release tags/lint settings in
the repo, or breaking the master build, or most likely, both.

This got me thinking: maybe we should, maybe we _should_ be using
`replace` directives to make this less painful/brittle, which were
originally removed in #35 because using `replace` makes it impossible to
use `go install ...@latest`.

A change that we made recently was the addition of a new Go migration
API in #65 (`rivermigrate`), and a side effect of that change is that
the API being used by the River CLI became entirely public (previously,
it'd depended on packages in River's `internal`). By extension, that
means it's now possible to make the River CLI its own Go module,
something that was infeasible before because of the use of `internal`.

So my thinking currently: maybe we should go back to trying to see if we
can make `replace` in use by most of River's core packages, and keep the
River CLI as its own module without `replace` so that it can still be
installed with `go install ...@latest`. I'm not entirely sure we won't
run into another problem, but it might be the easiest thing in the
meantime. As the River CLI expands, we'll need to make sure to only use
public APIs, but that might not be the worst thing anyway -- we could
adopt the philosophy that any function the CLI accesses should also be
accessible by the Go API.

Here, notably we still use a `replace`, which I'm finding that I need to
have a passing build for now, and which I think will have to temporarily
stay until we cut a new release. Trying to build the new submodule
without results in this error that I was unable to otherwise find a way
around:

    $ go test .
    ambiguous import: found package github.com/riverqueue/river/cmd/river in multiple modules:
            github.com/riverqueue/river v0.0.12 (/Users/brandur/Documents/go/pkg/mod/github.com/riverqueue/river@v0.0.12/cmd/river)
            github.com/riverqueue/river/cmd/river (/Users/brandur/Documents/projects/river/cmd/river)

[1] #98
brandur added a commit that referenced this issue Dec 13, 2023
Background: While trying to figure out how to safely release [1]
(which includes a new `riverdriver` submodule), I threw myself through a
loop and realized that it'd be practically impossible to do without a
whole bunch of manual finnagling on the release tags/lint settings in
the repo, or breaking the master build, or most likely, both.

This got me thinking: maybe we should, maybe we _should_ be using
`replace` directives to make this less painful/brittle, which were
originally removed in #35 because using `replace` makes it impossible to
use `go install ...@latest`.

A change that we made recently was the addition of a new Go migration
API in #65 (`rivermigrate`), and a side effect of that change is that
the API being used by the River CLI became entirely public (previously,
it'd depended on packages in River's `internal`). By extension, that
means it's now possible to make the River CLI its own Go module,
something that was infeasible before because of the use of `internal`.

So my thinking currently: maybe we should go back to trying to see if we
can make `replace` in use by most of River's core packages, and keep the
River CLI as its own module without `replace` so that it can still be
installed with `go install ...@latest`. I'm not entirely sure we won't
run into another problem, but it might be the easiest thing in the
meantime. As the River CLI expands, we'll need to make sure to only use
public APIs, but that might not be the worst thing anyway -- we could
adopt the philosophy that any function the CLI accesses should also be
accessible by the Go API.

Here, notably we still use a `replace`, which I'm finding that I need to
have a passing build for now, and which I think will have to temporarily
stay until we cut a new release. Trying to build the new submodule
without results in this error that I was unable to otherwise find a way
around:

    $ go test .
    ambiguous import: found package github.com/riverqueue/river/cmd/river in multiple modules:
            github.com/riverqueue/river v0.0.12 (/Users/brandur/Documents/go/pkg/mod/github.com/riverqueue/river@v0.0.12/cmd/river)
            github.com/riverqueue/river/cmd/river (/Users/brandur/Documents/projects/river/cmd/river)

[1] #98
brandur added a commit that referenced this issue Dec 13, 2023
Background: While trying to figure out how to safely release [1]
(which includes a new `riverdriver` submodule), I threw myself through a
loop and realized that it'd be practically impossible to do without a
whole bunch of manual finnagling on the release tags/lint settings in
the repo, or breaking the master build, or most likely, both.

This got me thinking: maybe we should, maybe we _should_ be using
`replace` directives to make this less painful/brittle, which were
originally removed in #35 because using `replace` makes it impossible to
use `go install ...@latest`.

A change that we made recently was the addition of a new Go migration
API in #65 (`rivermigrate`), and a side effect of that change is that
the API being used by the River CLI became entirely public (previously,
it'd depended on packages in River's `internal`). By extension, that
means it's now possible to make the River CLI its own Go module,
something that was infeasible before because of the use of `internal`.

So my thinking currently: maybe we should go back to trying to see if we
can make `replace` in use by most of River's core packages, and keep the
River CLI as its own module without `replace` so that it can still be
installed with `go install ...@latest`. I'm not entirely sure we won't
run into another problem, but it might be the easiest thing in the
meantime. As the River CLI expands, we'll need to make sure to only use
public APIs, but that might not be the worst thing anyway -- we could
adopt the philosophy that any function the CLI accesses should also be
accessible by the Go API.

Here, notably we still use a `replace`, which I'm finding that I need to
have a passing build for now, and which I think will have to temporarily
stay until we cut a new release. Trying to build the new submodule
without results in this error that I was unable to otherwise find a way
around:

    $ go test .
    ambiguous import: found package github.com/riverqueue/river/cmd/river in multiple modules:
            github.com/riverqueue/river v0.0.12 (/Users/brandur/Documents/go/pkg/mod/github.com/riverqueue/river@v0.0.12/cmd/river)
            github.com/riverqueue/river/cmd/river (/Users/brandur/Documents/projects/river/cmd/river)

[1] #98
brandur added a commit that referenced this issue Dec 13, 2023
Background: While trying to figure out how to safely release [1]
(which includes a new `riverdriver` submodule), I threw myself through a
loop and realized that it'd be practically impossible to do without a
whole bunch of manual finnagling on the release tags/lint settings in
the repo, or breaking the master build, or most likely, both.

This got me thinking: maybe we should, maybe we _should_ be using
`replace` directives to make this less painful/brittle, which were
originally removed in #35 because using `replace` makes it impossible to
use `go install ...@latest`.

A change that we made recently was the addition of a new Go migration
API in #65 (`rivermigrate`), and a side effect of that change is that
the API being used by the River CLI became entirely public (previously,
it'd depended on packages in River's `internal`). By extension, that
means it's now possible to make the River CLI its own Go module,
something that was infeasible before because of the use of `internal`.

So my thinking currently: maybe we should go back to trying to see if we
can make `replace` in use by most of River's core packages, and keep the
River CLI as its own module without `replace` so that it can still be
installed with `go install ...@latest`. I'm not entirely sure we won't
run into another problem, but it might be the easiest thing in the
meantime. As the River CLI expands, we'll need to make sure to only use
public APIs, but that might not be the worst thing anyway -- we could
adopt the philosophy that any function the CLI accesses should also be
accessible by the Go API.

Here, notably we still use a `replace`, which I'm finding that I need to
have a passing build for now, and which I think will have to temporarily
stay until we cut a new release. Trying to build the new submodule
without results in this error that I was unable to otherwise find a way
around:

    $ go test .
    ambiguous import: found package github.com/riverqueue/river/cmd/river in multiple modules:
            github.com/riverqueue/river v0.0.12 (/Users/brandur/Documents/go/pkg/mod/github.com/riverqueue/river@v0.0.12/cmd/river)
            github.com/riverqueue/river/cmd/river (/Users/brandur/Documents/projects/river/cmd/river)

[1] #98
brandur added a commit that referenced this issue Dec 13, 2023
Background: While trying to figure out how to safely release [1]
(which includes a new `riverdriver` submodule), I threw myself through a
loop and realized that it'd be practically impossible to do without a
whole bunch of manual finnagling on the release tags/lint settings in
the repo, or breaking the master build, or most likely, both.

This got me thinking: maybe we should, maybe we _should_ be using
`replace` directives to make this less painful/brittle, which were
originally removed in #35 because using `replace` makes it impossible to
use `go install ...@latest`.

A change that we made recently was the addition of a new Go migration
API in #65 (`rivermigrate`), and a side effect of that change is that
the API being used by the River CLI became entirely public (previously,
it'd depended on packages in River's `internal`). By extension, that
means it's now possible to make the River CLI its own Go module,
something that was infeasible before because of the use of `internal`.

So my thinking currently: maybe we should go back to trying to see if we
can make `replace` in use by most of River's core packages, and keep the
River CLI as its own module without `replace` so that it can still be
installed with `go install ...@latest`. I'm not entirely sure we won't
run into another problem, but it might be the easiest thing in the
meantime. As the River CLI expands, we'll need to make sure to only use
public APIs, but that might not be the worst thing anyway -- we could
adopt the philosophy that any function the CLI accesses should also be
accessible by the Go API.

Here, notably we still use a `replace`, which I'm finding that I need to
have a passing build for now, and which I think will have to temporarily
stay until we cut a new release. Trying to build the new submodule
without results in this error that I was unable to otherwise find a way
around:

    $ go test .
    ambiguous import: found package github.com/riverqueue/river/cmd/river in multiple modules:
            github.com/riverqueue/river v0.0.12 (/Users/brandur/Documents/go/pkg/mod/github.com/riverqueue/river@v0.0.12/cmd/river)
            github.com/riverqueue/river/cmd/river (/Users/brandur/Documents/projects/river/cmd/river)

[1] #98
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants