Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-14057: Removes HAProxyDown critical alert exception. #28575

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

miheer
Copy link

@miheer miheer commented Feb 6, 2024

Removes HAProxyDown critical alert exception.
Ticket: https://issues.redhat.com/browse/OCPBUGS-14057

@miheer miheer changed the title Removes HAProxyDown critical alert exception. WIP: Removes HAProxyDown critical alert exception. Feb 6, 2024
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2024
@miheer
Copy link
Author

miheer commented Feb 6, 2024

/jira-refresh

Copy link
Contributor

openshift-ci bot commented Feb 6, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: miheer
Once this PR has been reviewed and has the lgtm label, please assign slashpai for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@miheer miheer changed the title WIP: Removes HAProxyDown critical alert exception. OCPBUGS-14057: WIP: Removes HAProxyDown critical alert exception. Feb 6, 2024
@openshift-ci-robot openshift-ci-robot added the jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. label Feb 6, 2024
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 6, 2024
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Feb 6, 2024
@openshift-ci-robot
Copy link

@miheer: This pull request references Jira Issue OCPBUGS-14057, which is invalid:

  • expected the bug to target the "4.16.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Removes HAProxyDown critical alert exception.
Ticket: https://issues.redhat.com/browse/OCPBUGS-14057

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@miheer
Copy link
Author

miheer commented Feb 6, 2024

/jira refresh

@openshift-ci-robot
Copy link

@miheer: This pull request references Jira Issue OCPBUGS-14057, which is invalid:

  • expected the bug to target the "4.16.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jan--f
Copy link
Contributor

jan--f commented Feb 6, 2024

/hold
until openshift/runbooks#166 is merged.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 6, 2024
@Miciah
Copy link
Contributor

Miciah commented Mar 18, 2024

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Mar 18, 2024
@openshift-ci-robot
Copy link

@Miciah: This pull request references Jira Issue OCPBUGS-14057, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from ShudiLi March 18, 2024 15:21
@miheer miheer changed the title OCPBUGS-14057: WIP: Removes HAProxyDown critical alert exception. OCPBUGS-14057: Removes HAProxyDown critical alert exception. Mar 19, 2024
@candita
Copy link
Contributor

candita commented Mar 27, 2024

/assign @Miciah
/assign

@ShudiLi
Copy link
Member

ShudiLi commented May 15, 2024

tested it with 4.16.0-0.ci.test-2024-05-15-084545-ci-ln-x0xpjtt-latest, when the haproxy was down, the log could be shown in the web console
1.
% oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.16.0-0.ci.test-2024-05-15-084545-ci-ln-x0xpjtt-latest True False 111m Cluster version is 4.16.0-0.ci.test-2024-05-15-084545-ci-ln-x0xpjtt-latest

%oc scale --replicas 0 -n openshift-cluster-version deployments/cluster-version-operator
%oc scale --replicas 0 -n openshift-ingress-operator deployments ingress-operator

  1. edit the router-default deployment to removing the livenessProbe check and startupProbe check

  2. rsh to a router pod and kill the haproxy progress id
    % oc -n openshift-ingress get pods
    NAME READY STATUS RESTARTS AGE
    router-default-595f85875f-2j5jr 1/1 Running 0 73m
    router-default-595f85875f-8glrr 1/1 Running 0 73m

  3. login the web console, Observer >> Alerting, can see the HAProxyDown log
    Name
    HAProxyDown

Description
This alert fires when metrics report that HAProxy is down.

Summary
HAProxy is down

Runbook
https://github.com/openshift/runbooks/blob/master/alerts/HAProxyDown.md

Labels
prometheus=openshift-monitoring/k8s severity=critical alertname=HAProxyDown
pod=router-default-595f85875f-8glrr

@ShudiLi
Copy link
Member

ShudiLi commented May 15, 2024

/label qe-approved
thanks

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label May 15, 2024
@openshift-ci-robot
Copy link

@miheer: This pull request references Jira Issue OCPBUGS-14057, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.16.0) matches configured target version for branch (4.16.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @ShudiLi

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Removes HAProxyDown critical alert exception.
Ticket: https://issues.redhat.com/browse/OCPBUGS-14057

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 14, 2024
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 13, 2024
Copy link
Contributor

openshift-ci bot commented Oct 2, 2024

@miheer: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn 67f5bd8 link true /test e2e-gcp-ovn
ci/prow/e2e-aws-ovn-fips 67f5bd8 link true /test e2e-aws-ovn-fips
ci/prow/e2e-openstack-ovn 67f5bd8 link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn-single-node 67f5bd8 link false /test e2e-aws-ovn-single-node
ci/prow/e2e-aws-ovn-serial 67f5bd8 link true /test e2e-aws-ovn-serial
ci/prow/e2e-aws-ovn-single-node-serial 67f5bd8 link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-metal-ipi-sdn 67f5bd8 link false /test e2e-metal-ipi-sdn
ci/prow/e2e-aws-ovn-single-node-upgrade 67f5bd8 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-aws-ovn-cgroupsv2 67f5bd8 link false /test e2e-aws-ovn-cgroupsv2
ci/prow/e2e-agnostic-ovn-cmd 67f5bd8 link false /test e2e-agnostic-ovn-cmd
ci/prow/e2e-gcp-ovn-builds 67f5bd8 link true /test e2e-gcp-ovn-builds

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/severity-low Referenced Jira bug's severity is low for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. qe-approved Signifies that QE has signed off on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants