Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Detect OOMkilled status for the operator itself #939

Merged
merged 12 commits into from
Nov 20, 2023

Conversation

czunker
Copy link
Contributor

@czunker czunker commented Nov 16, 2023

This adds a Status Condition for the controller itself:

 - affectedPods:
    - mondoo-operator-controller-manager-7594c45888-j5xf6
  lastTransitionTime: "2023-11-16T11:22:10Z"
  lastUpdateTime: "2023-11-16T11:22:10Z"
  memoryLimit: 15Mi
  message: Mondoo Operator controller is unavailable due to OOM
  reason: MondooOperatorUnvailable
  status: "True"
  type: MondooOperatorDegraded

@czunker
Copy link
Contributor Author

czunker commented Nov 16, 2023

  • The revert of the status in test does not work. Needs a fix.

Hit kubernetes-sigs/controller-runtime#2362 in the tests, but the issue provides a workaround.

@czunker czunker changed the title Christian/detect operator oom ✨ Detect OOMkilled status for the operator itself Nov 16, 2023
Signed-off-by: Christian Zunker <christian@mondoo.com>
Signed-off-by: Christian Zunker <christian@mondoo.com>
Signed-off-by: Christian Zunker <christian@mondoo.com>
Signed-off-by: Christian Zunker <christian@mondoo.com>
Signed-off-by: Christian Zunker <christian@mondoo.com>
Signed-off-by: Christian Zunker <christian@mondoo.com>
Try to detect OOMkilled status for the operator before it gets killed again.
Report the status upstream and add it the MondooAuditConfig status.

Signed-off-by: Christian Zunker <christian@mondoo.com>
Signed-off-by: Christian Zunker <christian@mondoo.com>
Signed-off-by: Christian Zunker <christian@mondoo.com>
@czunker czunker force-pushed the christian/detect_operator_oom branch from d0f6429 to 49890ca Compare November 16, 2023 16:49
@@ -70,7 +70,6 @@ func (mr *MetricsReconciler) Start(ctx context.Context) error {
}

func (mr *MetricsReconciler) metricsLoop() {
mr.log.Info("Updating metrics")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was filling up the logs. We don't have a debug level, so I removed it.

Signed-off-by: Christian Zunker <christian@mondoo.com>
@czunker czunker marked this pull request as ready for review November 16, 2023 17:01
@czunker czunker force-pushed the christian/detect_operator_oom branch from 4c78466 to 07ac777 Compare November 16, 2023 17:41
Copy link
Member

@imilchev imilchev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an integration test for this? should be an easily reproducible scenario.

Signed-off-by: Christian Zunker <christian@mondoo.com>
@czunker czunker force-pushed the christian/detect_operator_oom branch from 07ac777 to c19a205 Compare November 20, 2023 05:00
@czunker
Copy link
Contributor Author

czunker commented Nov 20, 2023

can you add an integration test for this? should be an easily reproducible scenario.

Good call. The tests revealed some cases where the code wouldn't have updated the status correctly.

@czunker czunker force-pushed the christian/detect_operator_oom branch 2 times, most recently from 4a26977 to d5268ec Compare November 20, 2023 05:15
Signed-off-by: Christian Zunker <christian@mondoo.com>
@czunker czunker force-pushed the christian/detect_operator_oom branch from d5268ec to 7ad6a23 Compare November 20, 2023 05:20
Copy link
Member

@chris-rock chris-rock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @czunker

@chris-rock chris-rock merged commit 4d7bf13 into main Nov 20, 2023
20 checks passed
@chris-rock chris-rock deleted the christian/detect_operator_oom branch November 20, 2023 12:15
@github-actions github-actions bot locked and limited conversation to collaborators Nov 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants