-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move testing pageserver libpq cmds to HTTP api #2429
Conversation
edee462
to
6fece44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question about the control_plane API was discussed, one more to consider: we already have a failpoints
feature:
Lines 6 to 12 in 40c845e
[features] | |
# It is simpler infra-wise to have failpoints enabled by default | |
# It shouldn't affect performance in any way because failpoints | |
# are not placed in hot code paths | |
default = ["failpoints"] | |
profiling = ["pprof"] | |
failpoints = ["fail/failpoints"] |
I think we should unite the new, testing HTTP API with that feature, so we could disable both failpoints and these endpoints in the final, relesae build later.
Since failpoints
is the default feature now, it should be simple to add a few cfg
on top of the new HTTP code?
pageserver/src/http/openapi_spec.yml
Outdated
@@ -23,6 +23,21 @@ paths: | |||
id: | |||
type: integer | |||
|
|||
/v1/failpoints: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, if we should document this in a public spec: ideally, I would compile this code for test build only, since allowing somebody to query the http method and break the server is odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's fair. I wasn't sure, but figured that it was better to add it for now (at this stage of the PR) and remove later if we don't need it.
I wasn't sure whether the OpenAPI spec was used for validation of some kind (I've since checked, and I can't find anything using outside of the routes that return it). I'll remove the failpoints endpoint from the spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure whether the OpenAPI spec was used for validation of some kind
Both, actually.
For ad-hoc validation, it is used via
neon/pageserver/src/http/routes.rs
Line 678 in b8eb908
let mut router = attach_openapi_ui(endpoint::make_router(), spec, "/swagger.yml", "/v1/doc"); |
yet no tests check that it's ok.
And then (via currently implicit agreement that we keep it up-to-date), we have it https://github.com/neondatabase/cloud/blob/main/swagger/pageserver.yml and some client code generation happens around it in the project.
be8a14b
to
81d7d16
Compare
81d7d16
to
f790e88
Compare
Current status: Running the regression tests locally, AFAICT, there's exceptions expected in these two tests that aren't being triggered: |
f790e88
to
74dcee9
Compare
I like the idea of removing the functions if it's not compiled in (mostly; see footnote)1. However: I'm worried that just removing the handlers if not So perhaps replacing the handlers with implementations that just return an appropriate error? The other thing that I touched on briefly (albeit, not very clearly) in #2446 (comment) is that: (a) I think that if we're using a feature flag to turn on/off testing APIs, it should be something more general than Footnotes
|
I like that and I think we should add it for all new HTTP endpoints we've added.
That's the idea, maybe a bit implicit.
So, I think we're doing that, can add some But sure, features in general could be confusing, so not a fan of having many of them.
Absolutely, something like With #2446 merged, I have no objections on merging this as is, since the main HTTP backdoor is compiled out in the release-release builds now and the rest of the HTTP methods are protected by the token and are not very harmful. We should proceed with the feature work more, but it can be done later in a separate PR. |
Good idea; I'll do that.
Ah ok! I hadn't realized that piece of it. Makes sense.
per discussion on #2464, I'lll go with I think it's easiest to wait for #2464 to merge and rebase on top of it with the features stuff, so I'll do that. |
30decad
to
7a65bda
Compare
@SomeoneToIgnore Added the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I've imagined it since the first review, thanks for making it happen!
bb18ab5
to
c166607
Compare
ref #2422 there's four commands of note here, one of which (`failpoint`) is to do with the pageserver as a whole, and the other three are per-timeline. Everything except `failpoint` is added to the pageserver's OpenAPI spec
Summary of changes: * Remove `failpoints` feature; use `testing` everywhere. * Change pageserver's `cfg_disabled!` to `testing_api!` * Remove other testing APIs from OpenAPI spec
c166607
to
c30c9b7
Compare
Going ahead and merging with flaky e2e tests still failing, per brief discussion w/ @hlinnaka. Tests are failing on teardown with a 404 from not finding the project - different tests each time. |
Fixes #2422.
Of the four commands moved (
do_gc
,compact
,checkpoint
andfailpoint
), the first three affect individual timelines and the last affects the entire pageserver. Because of this, I've added afailpoint
command toneon_local
in addition to the management API.One of the tests --
test_runner/regress/test_ancestor_branch.py::test_ancestor_branch
-- previously provided an unused LSN to itscompact
call. I've removed that, but maybe there's some behavior that should have been happening in the pageserver that wasn't.Also the new API methods have been added to the appropriate OpenAPI spec, but given that they exist solely for testing, I'm not certain that they should stay there.
Also also: this PR currently doesn't add any checks to return 4XX if we're not in test mode, which would be good to have.