Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new errext.AbortReason type, move and use old RunStatus type only for k6 cloud code #2810

Merged
merged 2 commits into from
Jan 24, 2023

Conversation

na--
Copy link
Member

@na-- na-- commented Dec 7, 2022

This is built on top of #2807 and is a part of #1889 and part of #2430

RunStatus is very specific to the k6 cloud and doesn't map perfectly to how k6 OSS sees the causes of prematurely stopped tests. So, this PR restricts the usage of RunStatus only to the cloudapi/ package and to the k6 cloud command handling in cmd/cloud.go.

It does that by introducing a new type, errext.AbortReason, and a way to attach this type to test run errors. This allows us a cleaner error propagation to the outputs, and every output can map these generic values to its own internal ones however it wants. It is not necessary for that mapping to be exactly 1:1 and, indeed, as you can see, the errext.AbortReason:cloudapi.RunStatus mapping in cloudapi/ is not 1:1.

Copy link
Contributor

@olegbespalov olegbespalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 left minor comment

output/cloud/output.go Outdated Show resolved Hide resolved
@na-- na-- force-pushed the fix-aborted-by-user-exit-codes branch from e0e7e45 to 0039acb Compare December 14, 2022 11:52
@na-- na-- force-pushed the refactor-run-status branch from 62eb4f9 to b456ed7 Compare December 14, 2022 12:00
@codecov-commenter
Copy link

codecov-commenter commented Dec 14, 2022

Codecov Report

Merging #2810 (6d515f5) into master (41472d1) will increase coverage by 0.01%.
The diff coverage is 88.05%.

❗ Current head 6d515f5 differs from pull request most recent head 2955dcc. Consider uploading reports for the commit 2955dcc to get more accurate results

@@            Coverage Diff             @@
##           master    #2810      +/-   ##
==========================================
+ Coverage   76.87%   76.88%   +0.01%     
==========================================
  Files         217      218       +1     
  Lines       16619    16650      +31     
==========================================
+ Hits        12776    12802      +26     
- Misses       3039     3041       +2     
- Partials      804      807       +3     
Flag Coverage Δ
ubuntu 76.76% <88.05%> (-0.04%) ⬇️
windows 76.63% <88.05%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
js/timeout_error.go 52.94% <0.00%> (-3.31%) ⬇️
lib/testutils/mockoutput/mockoutput.go 41.66% <ø> (-4.49%) ⬇️
output/manager.go 79.16% <70.00%> (+4.16%) ⬆️
errext/abort_reason.go 75.00% <75.00%> (ø)
output/cloud/output.go 85.59% <91.30%> (+0.18%) ⬆️
cloudapi/api.go 52.08% <100.00%> (ø)
cmd/cloud.go 71.42% <100.00%> (ø)
cmd/run.go 72.22% <100.00%> (+0.52%) ⬆️
core/engine.go 94.33% <100.00%> (-0.18%) ⬇️
errext/interrupt_error.go 71.42% <100.00%> (+4.76%) ⬆️
... and 6 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@na-- na-- requested a review from olegbespalov December 14, 2022 12:01
olegbespalov
olegbespalov previously approved these changes Dec 14, 2022
Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving RunStatus to cloudapi makes sense, but I have some suggestions and questions about the other changes.

Comment on lines 314 to 332
case errext.AbortedByThresholdsAfterTestEnd:
// The test finished normally, it wasn't prematurely aborted. Such
// failures are tracked by the restult_status, not the run_status
// (so called "tainted" in some places of the API here).
return cloudapi.RunStatusFinished
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's strange to have AbortedByThresholdsAfterTestEnd be an AbortReason. Like the comment says, the test wasn't prematurely aborted.

So I would remove it from AbortReason, and simply pass nil to this method, which would return RunStatusFinished. I haven't traced the calls back to see how feasible this is, but it would be more intuitive, without having to explain anything here.

This comment also raises the questions: "what's the difference between result_status and run_status?", and "what does 'tainted' mean?". Also, s/restult/result.

Copy link
Member Author

@na-- na-- Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I improved the comments in be3aa5f, which hopefully clears up some things.

The TLDR versions is that we have a mismatch between the k6 OSS and cloud. k6 OSS needs to have an error here, so it can exit with a non-0 exit code, but the k6 cloud tracks that error differently and needs the normal RunStatusFinished here 😞 Though I am also not super happy with the AbortedByThresholdsAfterTestEnd name, so if anyone has any better suggestions, please comment here:

AbortedByThresholdsAfterTestEnd // TODO: rename?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but the comment was the least of my concerns here 😅 While a long explanation of this mismatch helps, I'd prefer if we could avoid it.

My main objection is that if thresholds fail after the test ends, the test wasn't aborted. So it's not so much about the naming of AbortedByThresholdsAfterTestEnd, but that it's wrong and confusing, and that it shouldn't be an AbortReason at all. The only reason we use it, is so we can easily catch it here, but we can avoid this if we detect this scenario in a different way.

Maybe use a different error altogether? I'm not sure what the best approach would be, but the current situation is confusing even with the long explanation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we can avoid this if we detect this scenario in a different way.

The problem is that this will require a very, very substantial refactoring, just for this one single minor mismatch... 😞 And we'd essentially end up needing two different mechanisms for reporting the error to the output, or something like that 🤔

But overall it is a very fair point 🤔 AbortReason is not that great of a name for other reasons, now that I think about it some more, especially if we want to use it for #2810 (comment) 🤔 What do you think about TestRunError as the type name and these as the values:

// TestRunError is used to attach extra information for the cause of a test run failure or interruption.
type TestRunError uint8

const (
	TestRunAbortedByUser TestRunError = iota + 1
	TestRunAbortedByBreachedThresholds
	TestRunThresholdsWereBreached // at the end of the test
	TestRunAbortedByScriptError
	TestRunAbortedByScriptAbort
	TestRunAbortedByTimeout
)

// IsTestRunError is a wrapper around an error with an attached TestRunError.
type IsTestRunError interface {
	error
	TestRunError() TestRunError
}

Copy link
Contributor

@imiric imiric Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Yeah, TestRunError is a better name than AbortReason, but it seems like we're just prefixing TestRun to make it more generic so we can fit TestRunThresholdsWereBreached 😄

The cloud output already knows if thresholds were breached with this "tainted" check. It could arguably even avoid this check, since the Engine (or whatever we end up moving this to in this series of PRs) already knows this from engine.IsTainted(), and the output could just query it. In the meantime, could it infer the RunStatus from this?

I feel like we're abusing errors as a way to pass information to outputs, and making global changes that are only useful for the cloud output. This could be avoided if we had an API for outputs to query this and other information instead, which is related to the discussion below about StopWithTestError().

If you think this would be a much larger refactor, then let's use TestRunError for now, but we should discuss and improve this soon-ish.

Copy link
Member Author

@na-- na-- Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it will be quite a big refactor, and it will be difficult to make an API that doesn't have timing issues. This querying API will only work once the test has finished and the thresholds have been crunched. But the output will only know that this has happened once its Stop method has been called. So my thinking was - why not pass that information to the Stop method directly anyway? And, thus, WithStopWithTestError was born 😅

Even my original idea from #2430, to have outputs know that they need to stop when their Samples channel is closed, is not very workable... 😞 After we finish the test and run out of samples, we first need to crunch the thresholds and only then stop the rest of the outputs... So yeah, this is complicated and I'd very much like to leave solving it for after we have gotten rid of the Engine, since that will simplify things tremendously

Comment on lines +192 to +195
err = errext.WithAbortReasonIfNone(
errext.WithExitCodeIfNone(errors.New("test run aborted by signal"), exitcodes.ExternalAbort),
errext.AbortedByUser,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wrapping is a bit convoluted. It seems we're only wrapping exit code errors in aborted errors because it's convenient to return a single error here. But conceptually, these errors aren't related in this way. If anything, maybe there should be a mapping from AbortReason to ExitCode, which can happen outside of core. After all, why should the Engine be concerned with process exit codes?

I'm not sure I have a concrete improvement suggestion here, but this seems fishy in several ways. 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point! Now that this PR adds the new AbortReason type that can accurately represent the internal reason for why a k6 test run failed, we can use it for external mappings. The PR does the mapping from AbortReason to cloudapi.RunStatus, but I don't see a reason why we couldn't also have a mapping to ExitCode. We'd probably need to add more AbortReason values, but that should be fine 🤔 In any case, I'd prefer to handle this in a separate PR, so I added this #2804 (comment)

Comment on lines +78 to +86
// WithStopWithTestError allows output to receive the error value that the test
// finished with. It could be nil, if the test finished nominally.
//
// If this interface is implemented by the output, StopWithError() will be
// called instead of Stop().
//
// TODO: refactor the main interface to use this method instead of Stop()? Or
// something else along the lines of https://github.com/grafana/k6/issues/2430 ?
type WithStopWithTestError interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the proposal in #2430 that outputs should stop when the samples channel is closed would be cleaner, but we would still need a way for them to query the test error, as suggested here.

So WDYT if we add a method to output.Params for this purpose? This would avoid this deprecation of Stop() and eventual breaking change, and only outputs that need it can call it. Eventually, we can deprecate and remove Stop() in favor of the proposal in #2430, but that would be a single breaking change, instead of the two in this case.

An alternative to changing output.Params could be for outputs to subscribe to error events, using the event system suggested for JS extensions in #2432, but I think that's still a long ways off.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a breaking change here, this is why I added an extra optional interface. All of the existing outputs and output extensions that use the old Stop() method will continue to work. They will just continue not knowing if the test finished with an error or not, which most outputs don't care about anyway.

I am softly against adding a control surface to output.Params, it is counter-intuitive. But I am open to discussion and willing to be convinced otherwise, when we decide to tackle #2430. For now, this solution allows us to delay any breaking changes until we do so, exactly so we can do them only once and not constantly break extensions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're avoiding a breaking change here by doing this optional expansion of the Output interface, which is IMO a code smell. If we decide to remove Stop() in #2430, then we'll also need to remove StopWithTestError(), hence two eventual breaking changes. I'd rather we avoided adding future work, and reconsider how outputs can query test errors. Essentially, doing this in a way that brings us closer to #2430, not farther.

Adding a method to output.Params was a suggestion, but I also agree not a very intuitive one. Maybe via some other public method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which is IMO a code smell

Why? I'd consider it a fairly idiomatic Go thing, the stdlib does it all the time 😕

If we decide to remove Stop() in #2430, then we'll also need to remove StopWithTestError(), hence two eventual breaking changes.

It is likely that when we tackle #2430, we will refactor all of the existing interfaces, and that won't be 6 different breaking changes, it will be one big change 😅 And even so, it's likely it will be somewhat graceful, since we will likely have a Go shim to the old Output APIs that will work (with a warning) for a few k6 versions.

In any case, we still don't have a clear idea of how exactly we will do #2430, and when exactly we will do it, or even if we will actually do it... So I don't want to block this PR and its follow-ups until we have that figured out. And, FWIW, figuring out how the new Outputs will look like should come after we have removed the Engine in #2815, since it changes things in this area a lot, but it also depends on this PR first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's leave this for later then.

Why? I'd consider it a fairly idiomatic Go thing, the stdlib does it all the time

I don't mean in the general sense. Composing interfaces is sometimes clean and convenient. But in this case it means that outputs don't just implement output.Output, but some outputs also implement WithStopWithTestError, which besides being awkwardly named, also requires runtime type checks wherever this additional functionality is required. And we're doing all this just to avoid a breaking change, which could be prevented by rethinking how this is done, and getting us closer to #2430.

@na-- na-- force-pushed the fix-aborted-by-user-exit-codes branch from 0039acb to 73f67fa Compare January 11, 2023 14:01
Base automatically changed from fix-aborted-by-user-exit-codes to master January 11, 2023 14:22
@na-- na-- force-pushed the refactor-run-status branch 2 times, most recently from 8c360a1 to 4077da3 Compare January 16, 2023 13:24
@na-- na-- force-pushed the refactor-run-status branch from 4077da3 to bcceeb3 Compare January 20, 2023 18:15
Copy link
Contributor

@codebien codebien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test-tip is not happy with the latest push

@olegbespalov
Copy link
Contributor

I guess the go-tip is unhappy with the master branch golang/go@62a9948 😢

Output
SetRunStatus(latestStatus lib.RunStatus)
StopWithTestError(testRunErr error) error // nil testRunErr means error-free test run
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
StopWithTestError(testRunErr error) error // nil testRunErr means error-free test run
StopWithTestError(ctx context.Context, testRunErr error) error // nil testRunErr means error-free test run

I would use this opportunity to introduce a graceful stop. I'm ok with just changing the API here and addressing the graceful stop from the output manager in a dedicated PR after the removal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, the convention so far has been to always wait for outputs to finish stopping 🤔 that is, unless a user hits Ctrl+C twice to trigger the hard stop with os.Exit(), k6 will always wait for outputs to finish... which makes some sense to me and I am not sure we should change it, since we then will need some way to configure this timeout

so, I can add a context here, but we probably won't ever use it... similar to what I said in #2810 (comment), my vote would be to add a note in #2430 and we can discuss it when we completely refactor the outputs 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, the convention so far has been to always wait for outputs to finish stopping

It doesn't sound to be aligned with the comment here: #2430 (comment)

This will allow us to set a soft limit for how long outputs are allowed to stop.

I'm ok with moving this as a discussion for 2430

@na-- na-- requested a review from codebien January 23, 2023 12:03
imiric
imiric previously approved these changes Jan 23, 2023
Copy link
Contributor

@codebien codebien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @na-- I wait for the rebase

na-- added 2 commits January 23, 2023 17:08
The run_status constants are something very specific to the k6 cloud, they shouldn't be a part of the main packages.
RunStatus is very specific to the k6 cloud and doesn't map perfectly to how k6 OSS sees the causes of prematurely stopped tests. So, this commit restricts the usage of RunStatus only to the cloudapi/ package and to the `k6 cloud` command handling in `cmd/cloud.go`.

It does that by introducing a new type, `errext.AbortReason`, and a way to attach this type to test run errors. This allows us a cleaner error propagation to the outputs, and every output can map these generic values to its own internal ones however it wants. It is not necessary for that mapping to be exactly 1:1 and, indeed, the `errext.AbortReason`:`cloudapi.RunStatus` mapping in cloudapi/ is not 1:1.
@na--
Copy link
Member Author

na-- commented Jan 23, 2023

Rebased

@na-- na-- requested review from codebien and imiric January 23, 2023 15:12
@na-- na-- merged commit a2a5b39 into master Jan 24, 2023
@na-- na-- deleted the refactor-run-status branch January 24, 2023 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants