Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/server/api: Add /healthz for load-balancer health checks #239

Closed
wants to merge 2 commits into from

Conversation

wking
Copy link
Member

@wking wking commented Dec 16, 2018

The server currently 404s the root path. From a master:

[core@ip-10-0-26-134 ~]$ curl -ik https://wking-api.devcluster.openshift.com:49500/
HTTP/1.1 404 Not Found
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Sun, 16 Dec 2018 06:30:14 GMT
Content-Length: 19

404 page not found

but we need a reliable response on the range 200-399 to satisfy our network load balancer health checks, which do not support configurable response status codes (see here and here, which are Terraform links, but they discuss an AWS restriction that is not Terraform-specific). This pull-request adds a /healthz endpoint which always 204s (when the server is alive to handle it).

Required for openshift/installer#924.

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Dec 16, 2018
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 16, 2018
@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 16, 2018
@wking
Copy link
Member Author

wking commented Dec 16, 2018

/hold

Using a muxer for this server seems overly complicated. Lets table this change until I have time to drop the muxer, unless other folks can think of a reason to keep it.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 16, 2018
@abhinavdahiya
Copy link
Contributor

/hold

Using a muxer for this server seems overly complicated. Lets table this change until I have time to drop the muxer, unless other folks can think of a reason to keep it.

Do you mind expanding on why to think it's overcomplicated.

@wking
Copy link
Member Author

wking commented Dec 16, 2018

Do you mind expanding on why to think it's overcomplicated.

Each sub-handler will be stwitching on Path anyway, and we don't have so many paths for the muxer's extra level of path switching to help much. And it's nice to only have to put things like #240's method guard in one spot (vs. repeating in each of the muxer's sub-handlers).

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 17, 2018
@wking
Copy link
Member Author

wking commented Dec 17, 2018

Rebased onto #242 with 47439f8 -> 9fc7b2b. This will collide with #240, but rebasing either will be easy, so I don't think it matters which lands first.

@cgwalters
Copy link
Member

LGTM
/assign abhinavdahiya

Copy link
Member

@ashcrow ashcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well.

@wking
Copy link
Member Author

wking commented Dec 17, 2018

With #242 closed, I've gone back to a separate sub-handler for /healthz and rebased onto #240 to get unit tests with 9fc7b2b -> 628e654.

return
}

if r.URL.Path == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this check capturing.. ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the path is going to be set to /healthz based on setup here 628e654#diff-4bbf1ce644e1b41f1ed61ab858def528R46

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm trying to wrap my head around how muxers work :p. It looks like they pass the full URL through, and aren't stripping off the portion handled by the muxer's switch. Anyhow, don't bother looking at this PR until #240 has landed ;).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did some local testing and this sheds some light on Path reaginrding how the muxer works.
#242 (comment)

@openshift-ci-robot openshift-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 17, 2018
@wking
Copy link
Member Author

wking commented Dec 17, 2018

Ok, I've rebased this onto the new #246 with 628e654 -> 6964b05. #246 gets api_test.go running over the muxer instead of just the API/config handler.

@wking wking force-pushed the server-healthz branch 2 times, most recently from 4dc0a08 to 144b903 Compare December 17, 2018 23:21
@wking wking force-pushed the server-healthz branch 2 times, most recently from 0f57e46 to d3fa0ad Compare December 18, 2018 05:37
The previous API required consumers to construct a handler and pass
that handler back into a separate function to create a server based on
the handler.  With this commit, I've rerolled the API to return the
handler directly, and the new public API is just NewBootstrapServer
and NewClusterServer.  To serve both HTTP and HTTPS from the returned
server, I'm making shallow copies and altering Addr to set the
per-protocol ports in those copies.

I've also made some "server" -> "config" renames in the types that
have getConfig methods.  And APIHandler is now configHandler.  That
helps consolidate the /config/* handling under a consistent name, so
we have:

* An http.Server that is the server (e.g. as returned by
  NewClusterServer).
* That server's handler is a muxer, and one of the muxer handlers is
  the configHandler for /config/*.
* The configHandler structure holds a getConfig function, and we have
  bootstrapConfig and clusterConfig which can supply getConfig.  Using
  a function type for this avoids the need for an interface and mock
  getConfig type.

The defaultHandler gives us our usual empty 404s for requests that
don't match our other handlers.  Without it, we'd have returned Go's
default:

  404 page not found

body, and had unit-test failures like:

  --- FAIL: TestHandler (0.00s)
      --- FAIL: TestHandler/GET_/does-not-exist (0.00s)
          api_test.go:93: response body length 19 does not match Content-Length -1
The server currently 404s the root path.  From a master:

  [core@ip-10-0-26-134 ~]$ curl -ik https://wking-api.devcluster.openshift.com:49500/
  HTTP/1.1 404 Not Found
  Content-Type: text/plain; charset=utf-8
  X-Content-Type-Options: nosniff
  Date: Sun, 16 Dec 2018 06:30:14 GMT
  Content-Length: 19

  404 page not found

but we need a reliable response on the range 200-399 to satisfy our
network load balancer health checks, which do not support configurable
response status codes [1,2] (these are Terraform links, but they
discuss an AWS restriction that is not Terraform-specific).  This
commit adds a /healthz endpoint which always 204s (when the server is
alive to handle it).

[1]: hashicorp/terraform-provider-aws#2708 (comment)
[2]: https://github.com/terraform-providers/terraform-provider-aws/pull/2906/files#diff-375aea487c27a6ada86edfd817ba2401R612
@cgwalters
Copy link
Member

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 21, 2018
@abhinavdahiya
Copy link
Contributor

The PR includes commit 6cfa51c

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 21, 2018
@crawford
Copy link
Contributor

crawford commented Jan 4, 2019

@abhinavdahiya @wking What still needs to happen with this PR?

@wking
Copy link
Member Author

wking commented Jan 4, 2019

What still needs to happen with this PR?

We need to sort out the underlying #246, or figure out another way to get unit tests on the full muxed handler.

@abhinavdahiya
Copy link
Contributor

abhinavdahiya commented Jan 5, 2019

@abhinavdahiya @wking What still needs to happen with this PR?

this is what i would have expected it to look like without #246

@wking
Copy link
Member Author

wking commented Jan 5, 2019

this is what i would have expected it to look like without #246

That looks more complicated to me:

$ git show --oneline --stat 50f1439777
50f1439 pkg/server/api: Add /healthz for load-balancer health checks
 pkg/server/api.go      | 21 +++++++++++++
 pkg/server/api_test.go | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+)
$ git show --oneline --stat d0a7ae21b0
d0a7ae2 add healthz
 pkg/server/api.go      |  47 ++++++++--
 pkg/server/api_test.go | 243 ++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 269 insertions(+), 21 deletions(-)

But, shrug, you've put more work into this repo than me. Want to file a PR?

@abhinavdahiya
Copy link
Contributor

this is what i would have expected it to look like without #246

That looks more complicated to me:

Hmm, based on what the lines changed?

$ git show --oneline --stat 50f1439777
50f1439 pkg/server/api: Add /healthz for load-balancer health checks
 pkg/server/api.go      | 21 +++++++++++++
 pkg/server/api_test.go | 84 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+)
$ git show --oneline --stat d0a7ae21b0
d0a7ae2 add healthz
 pkg/server/api.go      |  47 ++++++++--

this is because 50f1439 carries over defaultHandler from 6cfa51c

pkg/server/api_test.go | 243 ++++++++++++++++++++++++++++++++++++++++++++++---

this is because this has 2 types of tests

2 files changed, 269 insertions(+), 21 deletions(-)


But, _shrug_, you've put more work into this repo than me. Want to file a PR?

#267

@wking
Copy link
Member Author

wking commented Jan 5, 2019

Hmm, based on what the lines changed?

Yeah, and also the package API.

this is because 50f1439 carries over defaultHandler from 6cfa51c

But 6cfa51c as a whole is a net reduction of 40 lines.

this is because this has 2 types of tests

With comprehensive muxer tests, I don't think we need sub-handler tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants