Bug 1852047: controller: Emit events #1962

cgwalters · 2020-07-30T13:07:57Z

A while ago I'd invested some time in tweaking the
node controller to have useful logs around what it's
doing; my first "point of contact" when looking at
upgrades was its pod logs. But...we lose most
those on upgrade since the pod gets killed.

Add events to the node controller too.
Currently the MCD emits useful events which
can be queried afterwards (in our CI runs we
dump events.json).

With this we can create a "journal/history"
for upgrade/update events just by querying the
event stream.

cgwalters · 2020-07-30T13:10:32Z

Demo jq invocation over here - I plan to stick that some place better but need to decide where that is (this repo? some shared openshift repo?)

cgwalters · 2020-08-01T18:04:24Z

This one is passing tests and should help us a lot trace through upgrades in the future, can I get a lgtm?

sinnykumari

Looks sane but will let someone with more familiarity to lgtm

cgwalters · 2020-08-03T20:36:15Z

This one also relates to https://bugzilla.redhat.com/show_bug.cgi?id=1852047

openshift-ci-robot · 2020-08-03T20:37:06Z

@cgwalters: This pull request references Bugzilla bug 1852047, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.6.0) matches configured target release for branch (4.6.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1852047: controller: Emit events

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

manifests/machineconfigcontroller/events-clusterrole.yaml

A while ago I'd invested some time in tweaking the node controller to have useful logs around what it's doing; my first "point of contact" when looking at upgrades was its pod logs. But...we lose most those on upgrade since the pod gets killed. Add events to the node controller too. Currently the MCD emits useful events which can be queried afterwards (in our CI runs we dump `events.json`). With this we can create a "journal/history" for upgrade/update events just by querying the event stream.

cgwalters · 2020-08-04T18:44:19Z

I think that patch permission may have been necessary, it looks like the run without it just ended up printing them in the logs which isn't useful.

cgwalters · 2020-08-05T12:39:15Z

It'd be really useful to me to have these changes in master, so I can start gathering more "baseline" data across all the clusters being launched for 4.6.

kikisdeliveryservice

Is there something we should look at in the artifacts specifically to verify this?

cgwalters · 2020-08-05T18:13:36Z

Yep, same answer as #1977 (comment)

We had an event when we were starting an OS update, but nothing when it was completed - one could implicitly get that by looking at the next event, but that's a bit fragile. And since then we started doing a lot more stuff with the OS, so let's add an event emitted before and after all OS changes so we can consistently get e.g. timing information about it. Relates to openshift#1962 around getting better data about timing during upgrades.

kikisdeliveryservice · 2020-08-14T00:44:08Z

/approve

/assign @runcom

cgwalters · 2020-08-17T13:49:49Z

We're nearing 3 weeks to get this pretty simple patch in...

cgwalters · 2020-08-18T12:37:40Z

I want to emphasize the value of getting this patch in soon - #1962 (comment)

sinnykumari · 2020-08-18T13:57:14Z

lgtm.
@runcom any final comment before we get it merged?

We had an event when we were starting an OS update, but nothing when it was completed - one could implicitly get that by looking at the next event, but that's a bit fragile. And since then we started doing a lot more stuff with the OS, so let's add an event emitted before and after all OS changes so we can consistently get e.g. timing information about it. Relates to openshift#1962 around getting better data about timing during upgrades.

yuqi-zhang · 2020-08-26T20:33:49Z

/lgtm

openshift-ci-robot · 2020-08-26T20:34:06Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, kikisdeliveryservice, sinnykumari, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters,kikisdeliveryservice,sinnykumari,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-bot · 2020-08-26T21:11:25Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-08-26T21:37:34Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-08-26T23:21:30Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot · 2020-08-27T00:21:52Z

@cgwalters: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/okd-e2e-aws	`4e133e7`	link	`/test okd-e2e-aws`
ci/prow/e2e-aws-workers-rhel7	`4e133e7`	link	`/test e2e-aws-workers-rhel7`
ci/prow/e2e-ovn-step-registry	`4e133e7`	link	`/test e2e-ovn-step-registry`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci-robot · 2020-08-27T00:23:20Z

@cgwalters: All pull requests linked via external trackers have merged:

Bugzilla bug 1852047 has been moved to the MODIFIED state.

In response to this:

Bug 1852047: controller: Emit events

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot requested review from ericavonb and yuqi-zhang July 30, 2020 13:08

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 30, 2020

cgwalters mentioned this pull request Jul 30, 2020

[WIP] Bug 1850057: update etcd followers first, use bfq on control plane #1946

Closed

cgwalters force-pushed the nodecontroller-events branch 2 times, most recently from d203e2d to 4ada695 Compare July 31, 2020 17:58

sinnykumari approved these changes Aug 3, 2020

View reviewed changes

cgwalters mentioned this pull request Aug 3, 2020

Bug 1852047: daemon: Add events before/after all OS changes #1977

Merged

kikisdeliveryservice changed the title ~~controller: Emit events~~ Bug 1852047: controller: Emit events Aug 3, 2020

openshift-ci-robot added the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label Aug 3, 2020

openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Aug 3, 2020

runcom reviewed Aug 3, 2020

View reviewed changes

manifests/machineconfigcontroller/events-clusterrole.yaml Show resolved Hide resolved

cgwalters force-pushed the nodecontroller-events branch from 4ada695 to 7162aab Compare August 4, 2020 13:55

cgwalters force-pushed the nodecontroller-events branch from 7162aab to 7961d7d Compare August 4, 2020 14:03

cgwalters mentioned this pull request Aug 4, 2020

Bug 1850057: Use bfq scheduler on control plane, idle I/O for rpm-ostreed #1957

Merged

cgwalters force-pushed the nodecontroller-events branch from 7961d7d to 4e133e7 Compare August 4, 2020 18:31

kikisdeliveryservice reviewed Aug 5, 2020

View reviewed changes

openshift-ci-robot assigned runcom Aug 14, 2020

openshift-ci-robot assigned yuqi-zhang Aug 26, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 26, 2020

openshift-merge-robot merged commit ab32432 into openshift:master Aug 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1852047: controller: Emit events #1962

Bug 1852047: controller: Emit events #1962

cgwalters commented Jul 30, 2020

cgwalters commented Jul 30, 2020

cgwalters commented Aug 1, 2020

sinnykumari left a comment

cgwalters commented Aug 3, 2020

openshift-ci-robot commented Aug 3, 2020

cgwalters commented Aug 4, 2020

cgwalters commented Aug 5, 2020

kikisdeliveryservice left a comment

cgwalters commented Aug 5, 2020

kikisdeliveryservice commented Aug 14, 2020

cgwalters commented Aug 17, 2020

cgwalters commented Aug 18, 2020

sinnykumari commented Aug 18, 2020

yuqi-zhang commented Aug 26, 2020

openshift-ci-robot commented Aug 26, 2020

openshift-bot commented Aug 26, 2020

openshift-bot commented Aug 26, 2020

openshift-bot commented Aug 26, 2020

openshift-ci-robot commented Aug 27, 2020

openshift-ci-robot commented Aug 27, 2020

Bug 1852047: controller: Emit events #1962

Bug 1852047: controller: Emit events #1962

Conversation

cgwalters commented Jul 30, 2020

cgwalters commented Jul 30, 2020

cgwalters commented Aug 1, 2020

sinnykumari left a comment

Choose a reason for hiding this comment

cgwalters commented Aug 3, 2020

openshift-ci-robot commented Aug 3, 2020

cgwalters commented Aug 4, 2020

cgwalters commented Aug 5, 2020

kikisdeliveryservice left a comment

Choose a reason for hiding this comment

cgwalters commented Aug 5, 2020

kikisdeliveryservice commented Aug 14, 2020

cgwalters commented Aug 17, 2020

cgwalters commented Aug 18, 2020

sinnykumari commented Aug 18, 2020

yuqi-zhang commented Aug 26, 2020

openshift-ci-robot commented Aug 26, 2020

openshift-bot commented Aug 26, 2020

openshift-bot commented Aug 26, 2020

openshift-bot commented Aug 26, 2020

openshift-ci-robot commented Aug 27, 2020

openshift-ci-robot commented Aug 27, 2020