Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1950430: pkg/cvo/metrics: Drop HTTP, require HTTPS for metrics access #481

Merged
merged 2 commits into from
Apr 20, 2021

Conversation

wking
Copy link
Member

@wking wking commented Nov 18, 2020

We began serving metrics over HTTPS with 6132bc3 (#358), which also requested monitoring to scrape us over HTTPS. Now that that is all in place in 4.6, we no longer need to serve over HTTP in 4.7 and later. This commit pivots us to always serving over HTTPS.

Because we are no longer serving HTTP, move to requiring --serving-cert-file and --serving-key-file when --listen is non-empty. I'd like to drop the --listen default, to make it an explicit opt-in, but I don't want to lose metrics when folks update from 4.6 -> 4.7. With this commit we start setting --listen explicitly when we launch child CVOs, and in 4.8 we can drop:

ListenAddr: "0.0.0.0:9099",

from pkg/start. It's possible that the manifest for the incoming CVO is constructed from the incoming release image, in which case we may be able to drop the --listen default now.

I'm setting --listen empty in the bootstrap manifest, because we don't need to serve metrics then (it's long before we have Prometheus around to scrape us).

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 18, 2020
@openshift-merge-robot
Copy link
Contributor

@wking: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/gofmt 5a1e4ae link /test gofmt
ci/prow/unit 5a1e4ae link /test unit
ci/prow/images 5a1e4ae link /test images
ci/prow/e2e-agnostic 5a1e4ae link /test e2e-agnostic
ci/prow/e2e-agnostic-upgrade 5a1e4ae link /test e2e-agnostic-upgrade
ci/prow/integration 5a1e4ae link /test integration
ci/prow/e2e-agnostic-operator 5a1e4ae link /test e2e-agnostic-operator

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 9, 2021
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 8, 2021
@wking wking changed the title pkg/cvo/metrics: Drop HTTP, require HTTPS for metrics access Bug 1950430: pkg/cvo/metrics: Drop HTTP, require HTTPS for metrics access Apr 16, 2021
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Apr 16, 2021
@openshift-ci-robot
Copy link
Contributor

@wking: This pull request references Bugzilla bug 1950430, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.8.0) matches configured target release for branch (4.8.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @jianlinliu

In response to this:

Bug 1950430: pkg/cvo/metrics: Drop HTTP, require HTTPS for metrics access

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Apr 16, 2021
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 16, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 16, 2021
@wking
Copy link
Member Author

wking commented Apr 16, 2021

Rebased onto master with 5a1e4ae -> 8fd5ce4.

We began serving metrics over HTTPS with 6132bc3 (Bug 1809195: Send
CVO metrics over https, 2020-05-07, openshift#358), which also requested
monitoring to scrape us over HTTPS.  Now that that is all in place in
4.6, we no longer need to serve over HTTP in 4.7 and later.  This
commit pivots us to always serving over HTTPS.

Because we are no longer serving HTTP, move to requiring
--serving-cert-file and --serving-key-file when --listen is non-empty.
I'd like to drop the --listen default, to make it an explicit opt-in,
but I don't want to lose metrics when folks update from 4.6 -> 4.7.
With this commit we start setting --listen explicitly when we launch
child CVOs, and in 4.8 we can drop:

  ListenAddr: "0.0.0.0:9099",

from pkg/start.  It's possible that the manifest for the incoming CVO
is constructed from the incoming release image, in which case we may
be able to drop the --listen default now.

I'm not setting --listen in the bootstrap manifest, because we don't
need to serve metrics then (it's long before we have Prometheus around
to scrape us).
We removed our only consumer in the previous commit.  Generated with:

  $ go mod tidy
  $ go mod vendor
  $ git add -A pkg/cvo/metrics.go go.* vendor

using:

  $ go version
  go version go1.14.4 linux/arm64
@jottofar
Copy link
Contributor

/lgtm

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jottofar, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 19, 2021
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@wking
Copy link
Member Author

wking commented Apr 20, 2021

Update/rollback was too slow:

INFO[2021-04-19T19:13:18Z] Running step e2e-agnostic-upgrade-openshift-e2e-test. 
{"component":"entrypoint","file":"prow/entrypoint/run.go:165","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 4h0m0s timeout","severity":"error","time":"2021-04-19T21:25:55Z"}
INFO[2021-04-19T21:25:55Z] Received signal.                              signal=interrupt

But the update/rollback did complete by the time we'd gathered, so:

/override ci/prow/e2e-agnostic-upgrade

@openshift-ci-robot
Copy link
Contributor

@wking: Overrode contexts on behalf of wking: ci/prow/e2e-agnostic-upgrade

In response to this:

Update/rollback was too slow:

INFO[2021-04-19T19:13:18Z] Running step e2e-agnostic-upgrade-openshift-e2e-test. 
{"component":"entrypoint","file":"prow/entrypoint/run.go:165","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 4h0m0s timeout","severity":"error","time":"2021-04-19T21:25:55Z"}
INFO[2021-04-19T21:25:55Z] Received signal.                              signal=interrupt

But the update/rollback did complete by the time we'd gathered, so:

/override ci/prow/e2e-agnostic-upgrade

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit 6fdd1e0 into openshift:master Apr 20, 2021
@openshift-ci-robot
Copy link
Contributor

@wking: All pull requests linked via external trackers have merged:

Bugzilla bug 1950430 has been moved to the MODIFIED state.

In response to this:

Bug 1950430: pkg/cvo/metrics: Drop HTTP, require HTTPS for metrics access

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants