Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1852047: controller: Emit events #1962

Merged

Conversation

cgwalters
Copy link
Member

A while ago I'd invested some time in tweaking the
node controller to have useful logs around what it's
doing; my first "point of contact" when looking at
upgrades was its pod logs. But...we lose most
those on upgrade since the pod gets killed.

Add events to the node controller too.
Currently the MCD emits useful events which
can be queried afterwards (in our CI runs we
dump events.json).

With this we can create a "journal/history"
for upgrade/update events just by querying the
event stream.

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 30, 2020
@cgwalters
Copy link
Member Author

Demo jq invocation over here - I plan to stick that some place better but need to decide where that is (this repo? some shared openshift repo?)

@cgwalters
Copy link
Member Author

This one is passing tests and should help us a lot trace through upgrades in the future, can I get a lgtm?

Copy link
Contributor

@sinnykumari sinnykumari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sane but will let someone with more familiarity to lgtm

@cgwalters
Copy link
Member Author

This one also relates to https://bugzilla.redhat.com/show_bug.cgi?id=1852047

@kikisdeliveryservice kikisdeliveryservice changed the title controller: Emit events Bug 1852047: controller: Emit events Aug 3, 2020
@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. label Aug 3, 2020
@openshift-ci-robot
Copy link
Contributor

@cgwalters: This pull request references Bugzilla bug 1852047, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

Bug 1852047: controller: Emit events

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Aug 3, 2020
@cgwalters cgwalters force-pushed the nodecontroller-events branch from 4ada695 to 7162aab Compare August 4, 2020 13:55
@cgwalters cgwalters force-pushed the nodecontroller-events branch from 7162aab to 7961d7d Compare August 4, 2020 14:03
A while ago I'd invested some time in tweaking the
node controller to have useful logs around what it's
doing; my first "point of contact" when looking at
upgrades was its pod logs. But...we lose most
those on upgrade since the pod gets killed.

Add events to the node controller too.
Currently the MCD emits useful events which
can be queried afterwards (in our CI runs we
dump `events.json`).

With this we can create a "journal/history"
for upgrade/update events just by querying the
event stream.
@cgwalters
Copy link
Member Author

I think that patch permission may have been necessary, it looks like the run without it just ended up printing them in the logs which isn't useful.

@cgwalters
Copy link
Member Author

It'd be really useful to me to have these changes in master, so I can start gathering more "baseline" data across all the clusters being launched for 4.6.

Copy link
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something we should look at in the artifacts specifically to verify this?

@cgwalters
Copy link
Member Author

Yep, same answer as #1977 (comment)

cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Aug 6, 2020
We had an event when we were starting an OS update, but nothing
when it was completed - one could implicitly get that by looking
at the next event, but that's a bit fragile.

And since then we started doing a lot more stuff with the OS,
so let's add an event emitted before and after all OS changes
so we can consistently get e.g. timing information about it.

Relates to openshift#1962
around getting better data about timing during upgrades.
@kikisdeliveryservice
Copy link
Contributor

/approve

/assign @runcom

@cgwalters
Copy link
Member Author

We're nearing 3 weeks to get this pretty simple patch in...

@cgwalters
Copy link
Member Author

I want to emphasize the value of getting this patch in soon - #1962 (comment)

@sinnykumari
Copy link
Contributor

lgtm.
@runcom any final comment before we get it merged?

cgwalters added a commit to cgwalters/machine-config-operator that referenced this pull request Aug 18, 2020
We had an event when we were starting an OS update, but nothing
when it was completed - one could implicitly get that by looking
at the next event, but that's a bit fragile.

And since then we started doing a lot more stuff with the OS,
so let's add an event emitted before and after all OS changes
so we can consistently get e.g. timing information about it.

Relates to openshift#1962
around getting better data about timing during upgrades.
@yuqi-zhang
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 26, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, kikisdeliveryservice, sinnykumari, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [cgwalters,kikisdeliveryservice,sinnykumari,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

2 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

@cgwalters: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/okd-e2e-aws 4e133e7 link /test okd-e2e-aws
ci/prow/e2e-aws-workers-rhel7 4e133e7 link /test e2e-aws-workers-rhel7
ci/prow/e2e-ovn-step-registry 4e133e7 link /test e2e-ovn-step-registry

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit ab32432 into openshift:master Aug 27, 2020
@openshift-ci-robot
Copy link
Contributor

@cgwalters: All pull requests linked via external trackers have merged:

Bugzilla bug 1852047 has been moved to the MODIFIED state.

In response to this:

Bug 1852047: controller: Emit events

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants