Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/health: separate health and metrics services #2407

Merged
merged 1 commit into from
Apr 13, 2020
Merged

internal/health: separate health and metrics services #2407

merged 1 commit into from
Apr 13, 2020

Conversation

poidag-zz
Copy link
Contributor

Fixes #2380

Signed-off-by: Peter Grant pegrant@vmware.com

Allows for separating health/metric services.

@codecov
Copy link

codecov bot commented Apr 1, 2020

Codecov Report

Merging #2407 into master will decrease coverage by 0.93%.
The diff coverage is 2.94%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2407      +/-   ##
==========================================
- Coverage   78.09%   77.16%   -0.94%     
==========================================
  Files          62       65       +3     
  Lines        5292     5443     +151     
==========================================
+ Hits         4133     4200      +67     
- Misses       1070     1152      +82     
- Partials       89       91       +2
Impacted Files Coverage Δ
cmd/contour/serve.go 0% <0%> (ø) ⬆️
internal/health/health.go 0% <0%> (ø)
internal/metrics/metrics.go 89.42% <0%> (+4.1%) ⬆️
cmd/contour/servecontext.go 93.1% <100%> (+0.18%) ⬆️
internal/dag/annotations.go 97.72% <0%> (-2.28%) ⬇️
internal/envoy/listener.go 95.6% <0%> (-0.55%) ⬇️
internal/dag/cache.go 95.42% <0%> (-0.02%) ⬇️
cmd/contour/ingressstatus.go 0% <0%> (ø)
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d4fb68d...0d23115. Read the comment docs.

Copy link
Contributor

@jpeach jpeach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, http-address/http-port specifies where to tell Envoy
to start its HTTP listener, so we don't need to change that. The
stats-address/stats-port flags specify where the metrics and health
servers listen.

If we add new flags for heath-address/health-port, then the health service
can run on that, distinct from the metrics service. As @stevesloka noted
on the issue, if the new flag is not specified, Contour should implement
the previous, backwards compatible behavior. This means that we need
to have some special handling (or special defaults) on the new flags,
since the default case should be to put both services on the same address.

If metrics and health services are allowed to be started in different
configurations, that creates a structural problem for the internal
packages, which are designed to create a http.Server for every
internal service. One way to decouple this is to add a member function to
httpsvc.Service that accepts an interface representing a HTTP endpoint,
possibly something like this:

type Handler interface {
    Attach(*http.ServeMux)
}

Unfortunately, there's not really a good place to document this, but I
definitely think that it should be documented.

Finally, we should discuss whether we want to update the deployment YAML
to configure the services separately. I'm not sure what the trade-off
here is, but maybe we should enable this by default?

Finally finally, take a look at the contributing doc and the link I
send you on Slack about git commit messages. The former has the standard
outline that you should try to follow for Contour.

@jpeach jpeach added this to the 1.4.0 milestone Apr 1, 2020
@jpeach jpeach added area/metrics Issues or PRs related to exposing time series metrics. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Apr 1, 2020
@poidag-zz
Copy link
Contributor Author

Hi @jpeach

There seems to be some confusion as things currently are not named or mapped very well.

Currently, http-address/http-port specifies where to tell Envoy
to start its HTTP listener, so we don't need to change that. The
stats-address/stats-port flags specify where the metrics and health
servers listen.

http-address/http-port seem to only be used for Contours metrics. Envoy listeners are configured using envoy-service-http-address/envoy-service-http-port

My proposal is to retain the http-address/http-port as flags to maintain backwards compatibility. Should an end-user wish to separate health-checking from metrics. They are then able to pass health-port which will then create a separate listener using <http-address>:<health-port> to avoid port conflict.

An example of how this currently looks using defaults

> $ contour serve
> $ curl -s localhost:8000/metrics | grep contour_build_info                                                                                                                                                                          [±master ●]
# HELP contour_build_info Build information for Contour. Labels include the branch and git SHA that Contour was built from, and the Contour version.
# TYPE contour_build_info gauge
contour_build_info{branch="separate-ports",revision="8bacb1007a84050e6f8a15dd4a32440940bf5742",version="v1.2.0-93-g8bacb100"} 1
> $ curl localhost:8000/health                                                                                                                                                                                                        [±master ●]
OK
> $ curl localhost:8000/healthz                                                                                                                                                                                                       [±master ●]
OK

and then specifying a --health-port flag to the serve command

> $ contour serve --health-port 9999
> $
> $ curl -s localhost:8000/health
404 page not found
> $ curl -s localhost:8000/healthz
404 page not found
> $ curl -s localhost:8000/metrics | grep contour_build_info
# HELP contour_build_info Build information for Contour. Labels include the branch and git SHA that Contour was built from, and the Contour version.
# TYPE contour_build_info gauge
contour_build_info{branch="separate-ports",revision="8bacb1007a84050e6f8a15dd4a32440940bf5742",version="v1.2.0-93-g8bacb100"} 1
> $ curl -s localhost:9999/health
OK
> $ curl -s localhost:9999/healthz
OK
> $ curl -s localhost:9999/metrics | grep contour_build_info
> $ curl -I localhost:9999/metrics
HTTP/1.1 404 Not Found
Content-Type: text/plain; charset=utf-8
X-Content-Type-Options: nosniff
Date: Thu, 02 Apr 2020 00:28:31 GMT
Content-Length: 19

> $

Unfortunately, there's not really a good place to document this, but I
definitely think that it should be documented.
Agreed. I have added site/_guides/health-checking.md and updated site/_guides/prometheus.md to reflect this. Let me know if this is autogenerated and I'm not doing it right. Or if there is a better place for this.

Finally, we should discuss whether we want to update the deployment YAML
to configure the services separately. I'm not sure what the trade-off
here is, but maybe we should enable this by default?
I think not updating the deployment YAML at this stage is ok (as to not disrupt what people expect to see) and just providing a little extra doc in the guides that this is possible will work.

Finally finally, take a look at the contributing doc and the link I
send you on Slack about git commit messages. The former has the standard
outline that you should try to follow for Contour.
Thanks, I will update everything on this PR and my other one after I have reviewed

@jpeach
Copy link
Contributor

jpeach commented Apr 2, 2020

Hi @jpeach

There seems to be some confusion as things currently are not named or mapped very well.

Currently, http-address/http-port specifies where to tell Envoy
to start its HTTP listener, so we don't need to change that. The
stats-address/stats-port flags specify where the metrics and health
servers listen.

http-address/http-port seem to only be used for Contours metrics. Envoy listeners are configured using envoy-service-http-address/envoy-service-http-port

Oh I see. Yes, I got it wrong there :-/

My proposal is to retain the http-address/http-port as flags to maintain backwards compatibility. Should an end-user wish to separate health-checking from metrics. They are then able to pass health-port which will then create a separate listener using <http-address>:<health-port> to avoid port conflict.

An example of how this currently looks using defaults

> $ contour serve
> $ curl -s localhost:8000/metrics | grep contour_build_info                                                                                                                                                                          [±master ●]
# HELP contour_build_info Build information for Contour. Labels include the branch and git SHA that Contour was built from, and the Contour version.
# TYPE contour_build_info gauge
contour_build_info{branch="separate-ports",revision="8bacb1007a84050e6f8a15dd4a32440940bf5742",version="v1.2.0-93-g8bacb100"} 1
> $ curl localhost:8000/health                                                                                                                                                                                                        [±master ●]

In this case, I was expecting that calling ListenAndServe would cause an error when the second server tried to bind the same address. If Go makes that work, then you are right, and we don't have to do any refactoring.

@jpeach
Copy link
Contributor

jpeach commented Apr 6, 2020

@pickledrick I took a better look at this, and I think that the approach we should use is to decouple metrics and health from httpsvc.Service. We can do this by introducing Handler exports for both packages. This function returns a http.Handler that can be directly attached to the service.

The result would look something like this:

	metricsvc := httpsvc.Service{
		Addr:        ctx.metricsAddr,
		Port:        ctx.metricsPort,
		FieldLogger: log.WithField("context", "metricsvc"),
		ServeMux:    http.ServeMux{},
	}

	metricsvc.ServeMux.Handle("/metrics", metrics.Handler(registry))

	if ctx.healthAddr == ctx.metricsAddr && ctx.healthPort == ctx.metricsPort {
		h := health.Handler(clients.ClientSet())
		metricsvc.ServeMux.Handle("/health", h)
		metricsvc.ServeMux.Handle("/healthz", h)
	}

	g.Add(metricsvc.Start)

	if ctx.healthAddr != ctx.metricsAddr || ctx.healthPort != ctx.metricsPort {
		healthsvc := httpsvc.Service{
			Addr:        ctx.healthAddr,
			Port:        ctx.healthPort,
			FieldLogger: log.WithField("context", "healthsvc"),
		}

		h := health.Handler(clients.ClientSet())
		healthsvc.ServeMux.Handle("/health", h)
		healthsvc.ServeMux.Handle("/healthz", h)

		g.Add(healthsvc.Start)
	}

This approach keeps the health and metrics packages separate, and also decouples them from how we set up HTTP serving. One issue that needs a bit of consideration is what log fields the HTTP service should use (since context could not be ambiguous).

Note that we should add --health-address for consistency with other address specifications.

@poidag-zz
Copy link
Contributor Author

Thanks @jpeach

wrt

One issue that needs a bit of consideration is what log fields the HTTP service should use (since context could not be ambiguous).

Just clarifying if you are referring to the health service or the metrics service?

@poidag-zz poidag-zz changed the title [WIP] internal/health: separate health and metrics services internal/health: separate health and metrics services Apr 8, 2020
Copy link
Contributor

@jpeach jpeach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small set of minor issues.

internal/health/health.go Outdated Show resolved Hide resolved
internal/metrics/metrics.go Outdated Show resolved Hide resolved
internal/metrics/metrics.go Outdated Show resolved Hide resolved
site/_guides/health-checking.md Outdated Show resolved Hide resolved
site/_guides/prometheus.md Outdated Show resolved Hide resolved
cmd/contour/serve.go Outdated Show resolved Hide resolved
design/listener-design.md Outdated Show resolved Hide resolved
@jpeach
Copy link
Contributor

jpeach commented Apr 8, 2020

Thanks @jpeach

wrt

One issue that needs a bit of consideration is what log fields the HTTP service should use (since context could not be ambiguous).

Just clarifying if you are referring to the health service or the metrics service?

Looking at the latest changes, I think that what you have is fine.

@poidag-zz
Copy link
Contributor Author

Thanks @jpeach for your thoroughness.

@poidag-zz poidag-zz requested a review from jpeach April 9, 2020 00:56
Copy link
Contributor

@jpeach jpeach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ready to go. Just need a small update to the health check doc.

site/_guides/health-checking.md Outdated Show resolved Hide resolved
Provides the ability to separate HTTP listeners for the health and metrics services.
Both host/port are configurable. Defaults are maintained to support backward compatability.

This Fixes #2380.

Signed-off-by: Peter Grant <pegrant@vmware.com>
@jpeach jpeach merged commit 88509b1 into projectcontour:master Apr 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Issues or PRs related to exposing time series metrics. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow separate listeners for metrics and healthchecks
2 participants