Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move testing pageserver libpq cmds to HTTP api #2429

Merged
merged 6 commits into from
Sep 20, 2022

Conversation

sharnoff
Copy link
Member

Fixes #2422.

Of the four commands moved (do_gc, compact, checkpoint and failpoint), the first three affect individual timelines and the last affects the entire pageserver. Because of this, I've added a failpoint command to neon_local in addition to the management API.

One of the tests -- test_runner/regress/test_ancestor_branch.py::test_ancestor_branch -- previously provided an unused LSN to its compact call. I've removed that, but maybe there's some behavior that should have been happening in the pageserver that wasn't.

Also the new API methods have been added to the appropriate OpenAPI spec, but given that they exist solely for testing, I'm not certain that they should stay there.

Also also: this PR currently doesn't add any checks to return 4XX if we're not in test mode, which would be good to have.

@sharnoff sharnoff force-pushed the pageserver-move-commands-libpq-to-http branch from edee462 to 6fece44 Compare September 12, 2022 23:32
pageserver/src/http/models.rs Outdated Show resolved Hide resolved
control_plane/src/bin/neon_local.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@SomeoneToIgnore SomeoneToIgnore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question about the control_plane API was discussed, one more to consider: we already have a failpoints feature:

[features]
# It is simpler infra-wise to have failpoints enabled by default
# It shouldn't affect performance in any way because failpoints
# are not placed in hot code paths
default = ["failpoints"]
profiling = ["pprof"]
failpoints = ["fail/failpoints"]

I think we should unite the new, testing HTTP API with that feature, so we could disable both failpoints and these endpoints in the final, relesae build later.
Since failpoints is the default feature now, it should be simple to add a few cfg on top of the new HTTP code?

@@ -23,6 +23,21 @@ paths:
id:
type: integer

/v1/failpoints:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, if we should document this in a public spec: ideally, I would compile this code for test build only, since allowing somebody to query the http method and break the server is odd.

Copy link
Member Author

@sharnoff sharnoff Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fair. I wasn't sure, but figured that it was better to add it for now (at this stage of the PR) and remove later if we don't need it.

I wasn't sure whether the OpenAPI spec was used for validation of some kind (I've since checked, and I can't find anything using outside of the routes that return it). I'll remove the failpoints endpoint from the spec

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure whether the OpenAPI spec was used for validation of some kind

Both, actually.
For ad-hoc validation, it is used via

let mut router = attach_openapi_ui(endpoint::make_router(), spec, "/swagger.yml", "/v1/doc");

yet no tests check that it's ok.

And then (via currently implicit agreement that we keep it up-to-date), we have it https://github.com/neondatabase/cloud/blob/main/swagger/pageserver.yml and some client code generation happens around it in the project.

@sharnoff sharnoff force-pushed the pageserver-move-commands-libpq-to-http branch 2 times, most recently from be8a14b to 81d7d16 Compare September 14, 2022 00:28
@sharnoff sharnoff force-pushed the pageserver-move-commands-libpq-to-http branch from 81d7d16 to f790e88 Compare September 14, 2022 23:38
@sharnoff
Copy link
Member Author

Current status: Running the regression tests locally, test_branch_creation_before_gc and test_branch_behind are failing and I'm not super sure why. There might be something going wrong with GC not being run?

AFAICT, there's exceptions expected in these two tests that aren't being triggered:

https://github.com/neondatabase/neon/blob/f790e88b01aff7558d3ee551eba5193ff6fe20c7/test_runner/regress/test_branch_and_gc.py#L166-L168

https://github.com/neondatabase/neon/blob/f790e88b01aff7558d3ee551eba5193ff6fe20c7/test_runner/regress/test_branch_behind.py#L119-L123

@sharnoff sharnoff force-pushed the pageserver-move-commands-libpq-to-http branch from f790e88 to 74dcee9 Compare September 15, 2022 21:10
@sharnoff
Copy link
Member Author

sharnoff commented Sep 16, 2022

@SomeoneToIgnore

I think we should unite the new, testing HTTP API with that feature, so we could disable both failpoints and these endpoints in the final, relesae build later.
Since failpoints is the default feature now, it should be simple to add a few cfg on top of the new HTTP code?

I like the idea of removing the functions if it's not compiled in (mostly; see footnote)1. However:

I'm worried that just removing the handlers if not #[cfg(feature = "failpoints")] might result in confusing 404s if someone's missed a feature flag. I like the current message in failpoints_handler (it's just copy+pasted from the libpq failpoints handler):

https://github.com/neondatabase/neon/blob/74dcee9421412f1ac14566c0843c2191a383ffe0/pageserver/src/http/routes.rs#L666-L671

So perhaps replacing the handlers with implementations that just return an appropriate error?

The other thing that I touched on briefly (albeit, not very clearly) in #2446 (comment) is that: (a) I think that if we're using a feature flag to turn on/off testing APIs, it should be something more general than failpoints, and (b) I'm not sure what the name for that should be, or if there's a name that's typically used.

Footnotes

  1. If we're adding nontrivial feature flags (i.e., not just enabling a dependency, like failpoints), we'll need to make sure that everything continues to compile with different sets of flags -- in practice, checking all permutations won't be necessary, but I think checking both with and without the testing APIs is probably worthwhile. Not saying that feature flags are necessarily to be avoided, just something to think about.

@SomeoneToIgnore
Copy link
Contributor

So perhaps replacing the handlers with implementations that just return an appropriate error?

I like that and I think we should add it for all new HTTP endpoints we've added.
Seems that a declarative macro could be useful for the task here.

hecking both with and without the testing APIs is probably worthwhile

That's the idea, maybe a bit implicit.
When building and testing a commit, we follow two separate compilation paths:

So, I think we're doing that, can add some curl tests into the Dockerfile, if really paranoid 🙂

But sure, features in general could be confusing, so not a fan of having many of them.

it should be something more general than failpoints

Absolutely, something like testing-api could work, but have no good idea about the name either.


With #2446 merged, I have no objections on merging this as is, since the main HTTP backdoor is compiled out in the release-release builds now and the rest of the HTTP methods are protected by the token and are not very harmful.

We should proceed with the feature work more, but it can be done later in a separate PR.

@sharnoff
Copy link
Member Author

So perhaps replacing the handlers with implementations that just return an appropriate error?

I like that and I think we should add it for all new HTTP endpoints we've added.
Seems that a declarative macro could be useful for the task here.

Good idea; I'll do that.

That's the idea, maybe a bit implicit.
When building and testing a commit, we follow two separate compilation paths

Ah ok! I hadn't realized that piece of it. Makes sense.

something like testing-api could work, but have no good idea about the name either

per discussion on #2464, I'lll go with testing :)

I think it's easiest to wait for #2464 to merge and rebase on top of it with the features stuff, so I'll do that.

@sharnoff sharnoff force-pushed the pageserver-move-commands-libpq-to-http branch 2 times, most recently from 30decad to 7a65bda Compare September 16, 2022 19:49
@sharnoff
Copy link
Member Author

@SomeoneToIgnore Added the testing feature as a separate commit, would appreciate a once-over to make sure it makes sense :)

Copy link
Contributor

@SomeoneToIgnore SomeoneToIgnore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I've imagined it since the first review, thanks for making it happen!

pageserver/src/http/openapi_spec.yml Outdated Show resolved Hide resolved
pageserver/src/http/routes.rs Show resolved Hide resolved
@sharnoff sharnoff force-pushed the pageserver-move-commands-libpq-to-http branch 2 times, most recently from bb18ab5 to c166607 Compare September 20, 2022 15:16
ref #2422

there's four commands of note here, one of which (`failpoint`) is to do
with the pageserver as a whole, and the other three are per-timeline.

Everything except `failpoint` is added to the pageserver's OpenAPI spec
Summary of changes:

 * Remove `failpoints` feature; use `testing` everywhere.
   * Change pageserver's `cfg_disabled!` to `testing_api!`
 * Remove other testing APIs from OpenAPI spec
@sharnoff sharnoff force-pushed the pageserver-move-commands-libpq-to-http branch from c166607 to c30c9b7 Compare September 20, 2022 17:34
@sharnoff
Copy link
Member Author

Going ahead and merging with flaky e2e tests still failing, per brief discussion w/ @hlinnaka. Tests are failing on teardown with a 404 from not finding the project - different tests each time.

@sharnoff sharnoff merged commit 4a3b3ff into main Sep 20, 2022
@sharnoff sharnoff deleted the pageserver-move-commands-libpq-to-http branch September 20, 2022 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move pageserver test commands from libpq to mgmt API
3 participants