-
Notifications
You must be signed in to change notification settings - Fork 589
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
admin: remove throw
on unknown self test type in admin_server
#21370
admin: remove throw
on unknown self test type in admin_server
#21370
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the issue is that in a mixed-version cluster, the rpk command targets the older version admin server which doesn't support the requested command, so an error is returned?
It kinda seems like we should return an error in this case (at the very least--we could run the other tests and and still return an error).
Another option might be to create a better behavior at the rpk layer.
Is this problem common? For example, if mixed version state is rare, we could request rpk to direct the request to a specific broker?
Regardless of which admin server in a cluster the request to run a So, if any nodes in a mixed cluster are of a version before
Do you think that running the tests that are valid (known) to a node, while ignoring and logging a statement about the unknown test types seems like valid behavior?
What behavior would you want to see? Could
Likely not common, but something that we discovered while preparing docs for |
I'm not sure what behavior makes the most sense. But silently dropping at the api level what was requested doesn't strike me as the most desirable. Is this issue being discussed some where like a ticket capturing the issue? |
No, it was brought to my attention in a Slack thread. |
skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51442#0190a8a9-1c5e-4108-859b-d2f781f4ee79: skipped ducktape retry in https://buildkite.com/redpanda/redpanda/builds/51442#0190a8aa-202c-4001-a234-988e48345821: |
Summary here from offline meeting:
|
f01f540
to
e93a874
Compare
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51543#0190b846-d481-43bf-92cc-4a0849b07d5c ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/51543#0190b848-a629-41cf-96c9-f52e9e4ece43 |
CI failures are unrelated (and closed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there compatibility concerns with say older rpk talking to newer redpanda if we are now returning a different style of error messaging that reports tests that weren't able to run for some reason?
Nope, we haven't changed anything about the |
new failures in https://buildkite.com/redpanda/redpanda/builds/51543#01910076-b628-4903-a98a-b30894ffab0a:
new failures in https://buildkite.com/redpanda/redpanda/builds/51543#01910108-3264-4ff3-83b0-8af934322a6b:
|
e93a874
to
cc4c589
Compare
Force push to bump @andrwng, I have a solution for case 2 that I will include in a follow up PR (I won't want to backport it to previous versions beyond |
For logging purposes, when an unrecognized test type is requested through the `self_test_start_handler`, it will be abstracted by the `unknown_check` struct.
This commit removes the `throw` statement in `admin_server::self_test_start_handler()` and replaces it instead with a `push_back()` to `self_test_request::unknown_checks`. Eventually, these unknown checks will have a result displayed when `rpk cluster self-test status` is invoked. For the self-test, any unrecognized tests will be appended to `start_test_request::unknown_checks`, so a future result from `rpk cluster self-test status` will return a message indicating an unknown test was skipped.
For testing of the previous change in which unknown test types are added to `start_test_request::unknown_checks`, instead of resulting in a `throw`.
`test_type` was not being set in early exit cases for `cloudcheck`. This would result in longer than expected output (with `IOPS`, `THROUGHPUT`, `LATENCY` in `rpk cluster self-test status`).
cc4c589
to
147d5e9
Compare
Last force push to include some work that will require backporting. Future PR will build on this one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
#Attempt to run with an unknown test type "pandatest" | ||
#and possibly unknown "cloud" test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would be good to be consistent about spacing, even in comments (#comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix in follow up.
@@ -64,6 +65,7 @@ cloudcheck::run(cloudcheck_opts opts) { | |||
"Cloud storage is not enabled, exiting cloud storage self-test."); | |||
auto result = self_test_result{ | |||
.name = _opts.name, | |||
.test_type = "cloud", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't feel strongly that it needs to be in this PR, but it'd have been nice to see a test that ran the cloud check via RPK and parsed the output to assert that we don't print "IOPS" etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add in follow up PR.
Also just noting that we chatted offline, and Willem has some follow-up changes to have the self test controller pass through unknown tests to workers, in case of mixed versions |
/backport v24.2.x |
/backport v24.1.x |
Failed to create a backport PR to v24.1.x branch. I tried:
|
To help with compatibility for clusters with mixed versions of
redpanda
, thethrow
statement inadmin_server::self_test_start_handler
has been removed.Previously, in a cluster with mixed versions including
24.1.x
and24.2.x
, thisthrow
would prevent theself-test
from running, due to the addition ofcloudcheck
in24.2.x
, resulting in aBad Request, 400
code.We now
push_back()
unknown test types toself_test_request::unknown_checks
, and include the unknown test inrpk cluster self-test status
with a generic error message:If only an unknown test is specified, the server will (expectedly)
throw
due to no tests being run.Backports Required
Release Notes
Improvements
rpk cluster self-test start
to run, even in a cluster with mixed versions ofredpanda
(before and aftercloudcheck
addition in24.2.x
).