-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new errext.AbortReason
type, move and use old RunStatus
type only for k6 cloud code
#2810
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 left minor comment
e0e7e45
to
0039acb
Compare
62eb4f9
to
b456ed7
Compare
Codecov Report
@@ Coverage Diff @@
## master #2810 +/- ##
==========================================
+ Coverage 76.87% 76.88% +0.01%
==========================================
Files 217 218 +1
Lines 16619 16650 +31
==========================================
+ Hits 12776 12802 +26
- Misses 3039 3041 +2
- Partials 804 807 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving RunStatus
to cloudapi
makes sense, but I have some suggestions and questions about the other changes.
output/cloud/output.go
Outdated
case errext.AbortedByThresholdsAfterTestEnd: | ||
// The test finished normally, it wasn't prematurely aborted. Such | ||
// failures are tracked by the restult_status, not the run_status | ||
// (so called "tainted" in some places of the API here). | ||
return cloudapi.RunStatusFinished |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's strange to have AbortedByThresholdsAfterTestEnd
be an AbortReason
. Like the comment says, the test wasn't prematurely aborted.
So I would remove it from AbortReason
, and simply pass nil
to this method, which would return RunStatusFinished
. I haven't traced the calls back to see how feasible this is, but it would be more intuitive, without having to explain anything here.
This comment also raises the questions: "what's the difference between result_status
and run_status
?", and "what does 'tainted' mean?". Also, s/restult/result
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I improved the comments in be3aa5f, which hopefully clears up some things.
The TLDR versions is that we have a mismatch between the k6 OSS and cloud. k6 OSS needs to have an error here, so it can exit with a non-0 exit code, but the k6 cloud tracks that error differently and needs the normal RunStatusFinished
here 😞 Though I am also not super happy with the AbortedByThresholdsAfterTestEnd
name, so if anyone has any better suggestions, please comment here:
Line 13 in be3aa5f
AbortedByThresholdsAfterTestEnd // TODO: rename? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but the comment was the least of my concerns here 😅 While a long explanation of this mismatch helps, I'd prefer if we could avoid it.
My main objection is that if thresholds fail after the test ends, the test wasn't aborted. So it's not so much about the naming of AbortedByThresholdsAfterTestEnd
, but that it's wrong and confusing, and that it shouldn't be an AbortReason
at all. The only reason we use it, is so we can easily catch it here, but we can avoid this if we detect this scenario in a different way.
Maybe use a different error altogether? I'm not sure what the best approach would be, but the current situation is confusing even with the long explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we can avoid this if we detect this scenario in a different way.
The problem is that this will require a very, very substantial refactoring, just for this one single minor mismatch... 😞 And we'd essentially end up needing two different mechanisms for reporting the error to the output, or something like that 🤔
But overall it is a very fair point 🤔 AbortReason
is not that great of a name for other reasons, now that I think about it some more, especially if we want to use it for #2810 (comment) 🤔 What do you think about TestRunError
as the type name and these as the values:
// TestRunError is used to attach extra information for the cause of a test run failure or interruption.
type TestRunError uint8
const (
TestRunAbortedByUser TestRunError = iota + 1
TestRunAbortedByBreachedThresholds
TestRunThresholdsWereBreached // at the end of the test
TestRunAbortedByScriptError
TestRunAbortedByScriptAbort
TestRunAbortedByTimeout
)
// IsTestRunError is a wrapper around an error with an attached TestRunError.
type IsTestRunError interface {
error
TestRunError() TestRunError
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Yeah, TestRunError
is a better name than AbortReason
, but it seems like we're just prefixing TestRun
to make it more generic so we can fit TestRunThresholdsWereBreached
😄
The cloud
output already knows if thresholds were breached with this "tainted" check. It could arguably even avoid this check, since the Engine
(or whatever we end up moving this to in this series of PRs) already knows this from engine.IsTainted()
, and the output could just query it. In the meantime, could it infer the RunStatus
from this?
I feel like we're abusing errors as a way to pass information to outputs, and making global changes that are only useful for the cloud
output. This could be avoided if we had an API for outputs to query this and other information instead, which is related to the discussion below about StopWithTestError()
.
If you think this would be a much larger refactor, then let's use TestRunError
for now, but we should discuss and improve this soon-ish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it will be quite a big refactor, and it will be difficult to make an API that doesn't have timing issues. This querying API will only work once the test has finished and the thresholds have been crunched. But the output will only know that this has happened once its Stop
method has been called. So my thinking was - why not pass that information to the Stop method directly anyway? And, thus, WithStopWithTestError
was born 😅
Even my original idea from #2430, to have outputs know that they need to stop when their Samples
channel is closed, is not very workable... 😞 After we finish the test and run out of samples, we first need to crunch the thresholds and only then stop the rest of the outputs... So yeah, this is complicated and I'd very much like to leave solving it for after we have gotten rid of the Engine
, since that will simplify things tremendously
err = errext.WithAbortReasonIfNone( | ||
errext.WithExitCodeIfNone(errors.New("test run aborted by signal"), exitcodes.ExternalAbort), | ||
errext.AbortedByUser, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wrapping is a bit convoluted. It seems we're only wrapping exit code errors in aborted errors because it's convenient to return a single error here. But conceptually, these errors aren't related in this way. If anything, maybe there should be a mapping from AbortReason
to ExitCode
, which can happen outside of core
. After all, why should the Engine
be concerned with process exit codes?
I'm not sure I have a concrete improvement suggestion here, but this seems fishy in several ways. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point! Now that this PR adds the new AbortReason
type that can accurately represent the internal reason for why a k6 test run failed, we can use it for external mappings. The PR does the mapping from AbortReason
to cloudapi.RunStatus
, but I don't see a reason why we couldn't also have a mapping to ExitCode
. We'd probably need to add more AbortReason
values, but that should be fine 🤔 In any case, I'd prefer to handle this in a separate PR, so I added this #2804 (comment)
// WithStopWithTestError allows output to receive the error value that the test | ||
// finished with. It could be nil, if the test finished nominally. | ||
// | ||
// If this interface is implemented by the output, StopWithError() will be | ||
// called instead of Stop(). | ||
// | ||
// TODO: refactor the main interface to use this method instead of Stop()? Or | ||
// something else along the lines of https://github.com/grafana/k6/issues/2430 ? | ||
type WithStopWithTestError interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proposal in #2430 that outputs should stop when the samples
channel is closed would be cleaner, but we would still need a way for them to query the test error, as suggested here.
So WDYT if we add a method to output.Params
for this purpose? This would avoid this deprecation of Stop()
and eventual breaking change, and only outputs that need it can call it. Eventually, we can deprecate and remove Stop()
in favor of the proposal in #2430, but that would be a single breaking change, instead of the two in this case.
An alternative to changing output.Params
could be for outputs to subscribe to error events, using the event system suggested for JS extensions in #2432, but I think that's still a long ways off.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have a breaking change here, this is why I added an extra optional interface. All of the existing outputs and output extensions that use the old Stop()
method will continue to work. They will just continue not knowing if the test finished with an error or not, which most outputs don't care about anyway.
I am softly against adding a control surface to output.Params
, it is counter-intuitive. But I am open to discussion and willing to be convinced otherwise, when we decide to tackle #2430. For now, this solution allows us to delay any breaking changes until we do so, exactly so we can do them only once and not constantly break extensions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're avoiding a breaking change here by doing this optional expansion of the Output
interface, which is IMO a code smell. If we decide to remove Stop()
in #2430, then we'll also need to remove StopWithTestError()
, hence two eventual breaking changes. I'd rather we avoided adding future work, and reconsider how outputs can query test errors. Essentially, doing this in a way that brings us closer to #2430, not farther.
Adding a method to output.Params
was a suggestion, but I also agree not a very intuitive one. Maybe via some other public method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which is IMO a code smell
Why? I'd consider it a fairly idiomatic Go thing, the stdlib does it all the time 😕
If we decide to remove Stop() in #2430, then we'll also need to remove StopWithTestError(), hence two eventual breaking changes.
It is likely that when we tackle #2430, we will refactor all of the existing interfaces, and that won't be 6 different breaking changes, it will be one big change 😅 And even so, it's likely it will be somewhat graceful, since we will likely have a Go shim to the old Output
APIs that will work (with a warning) for a few k6 versions.
In any case, we still don't have a clear idea of how exactly we will do #2430, and when exactly we will do it, or even if we will actually do it... So I don't want to block this PR and its follow-ups until we have that figured out. And, FWIW, figuring out how the new Outputs will look like should come after we have removed the Engine
in #2815, since it changes things in this area a lot, but it also depends on this PR first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, let's leave this for later then.
Why? I'd consider it a fairly idiomatic Go thing, the stdlib does it all the time
I don't mean in the general sense. Composing interfaces is sometimes clean and convenient. But in this case it means that outputs don't just implement output.Output
, but some outputs also implement WithStopWithTestError
, which besides being awkwardly named, also requires runtime type checks wherever this additional functionality is required. And we're doing all this just to avoid a breaking change, which could be prevented by rethinking how this is done, and getting us closer to #2430.
0039acb
to
73f67fa
Compare
8c360a1
to
4077da3
Compare
4077da3
to
bcceeb3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test-tip
is not happy with the latest push
I guess the go-tip is unhappy with the |
Output | ||
SetRunStatus(latestStatus lib.RunStatus) | ||
StopWithTestError(testRunErr error) error // nil testRunErr means error-free test run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StopWithTestError(testRunErr error) error // nil testRunErr means error-free test run | |
StopWithTestError(ctx context.Context, testRunErr error) error // nil testRunErr means error-free test run |
I would use this opportunity to introduce a graceful stop. I'm ok with just changing the API here and addressing the graceful stop from the output manager in a dedicated PR after the removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, the convention so far has been to always wait for outputs to finish stopping 🤔 that is, unless a user hits Ctrl+C twice to trigger the hard stop with os.Exit()
, k6 will always wait for outputs to finish... which makes some sense to me and I am not sure we should change it, since we then will need some way to configure this timeout
so, I can add a context here, but we probably won't ever use it... similar to what I said in #2810 (comment), my vote would be to add a note in #2430 and we can discuss it when we completely refactor the outputs 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, the convention so far has been to always wait for outputs to finish stopping
It doesn't sound to be aligned with the comment here: #2430 (comment)
This will allow us to set a soft limit for how long outputs are allowed to stop.
I'm ok with moving this as a discussion for 2430
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, @na-- I wait for the rebase
The run_status constants are something very specific to the k6 cloud, they shouldn't be a part of the main packages.
RunStatus is very specific to the k6 cloud and doesn't map perfectly to how k6 OSS sees the causes of prematurely stopped tests. So, this commit restricts the usage of RunStatus only to the cloudapi/ package and to the `k6 cloud` command handling in `cmd/cloud.go`. It does that by introducing a new type, `errext.AbortReason`, and a way to attach this type to test run errors. This allows us a cleaner error propagation to the outputs, and every output can map these generic values to its own internal ones however it wants. It is not necessary for that mapping to be exactly 1:1 and, indeed, the `errext.AbortReason`:`cloudapi.RunStatus` mapping in cloudapi/ is not 1:1.
be3aa5f
to
2955dcc
Compare
Rebased |
This is built on top of #2807 and is a part of #1889 and part of #2430
RunStatus
is very specific to the k6 cloud and doesn't map perfectly to how k6 OSS sees the causes of prematurely stopped tests. So, this PR restricts the usage ofRunStatus
only to thecloudapi/
package and to thek6 cloud
command handling incmd/cloud.go
.It does that by introducing a new type,
errext.AbortReason
, and a way to attach this type to test run errors. This allows us a cleaner error propagation to the outputs, and every output can map these generic values to its own internal ones however it wants. It is not necessary for that mapping to be exactly 1:1 and, indeed, as you can see, theerrext.AbortReason
:cloudapi.RunStatus
mapping incloudapi/
is not 1:1.