Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASMTRIAGE-7316 add additional check to the bgp test to check for the #599

Merged
merged 1 commit into from
Sep 26, 2024

Conversation

jacobsalmela
Copy link
Contributor

state of metallb-speaker

After checking with SME, @spillerc-hpe, I learned it is possible for BGP to not be up when running the neighbour test from these conditions:

This PR aims to reduce the need to re-run this test, which can often work if the pods are running after the initial failure, which has been observed in the JIRA.

New output from this change as seen below:

ncn-w004:~ # /opt/cray/tests/install/ncn/scripts/check_bgp_neighbors_established.sh
Running in interactive mode
NAME      DATA   AGE
metallb   1      30d
Able to get metallb configmap from Kubernetes
metallb-speaker pods: false <------------------------
ERROR: metallb-speaker pods are not ready, which can cause this test to fail
FAIL
ncn-w004:~ # echo $?
16

It also reports if the pods are ok:

ncn-w004:~ # /opt/cray/tests/install/ncn/scripts/check_bgp_neighbors_established.sh
Running in interactive mode
NAME      DATA   AGE
metallb   1      30d
Able to get metallb configmap from Kubernetes
metallb-speaker pods: true <----------------------

state of metallb-speaker

After checking with SME, @spillerc-hpe, I learned it is possible for BGP to not be up when running the neighbour test from these conditions:

- Bad or no credentials from vault
- The node is not yet up:
    - metallb-speaker isn't running on the node
    - BGP session has gone idle: [https://www.arubanetworks.com/techdocs/AOS-CX/10.07/HTML/5200-7858/Content/Chp_BGP/bgp-nei-sta27.htm](https://www.arubanetworks.com/techdocs/AOS-CX/10.07/HTML/5200-7858/Content/Chp_BGP/bgp-nei-sta27.htm)
- A misconfigured switch
    - The switch configuration can be wrong, but customers can also use BGP to connect to their own network that can lead to false positives.

This PR aims to reduce the need to re-run this test, which can often
work if the pods are running after the initial failure, which has been
observed in the JIRA.

New output from this change as seen below:

```
ncn-w004:~ # /opt/cray/tests/install/ncn/scripts/check_bgp_neighbors_established.sh
Running in interactive mode
NAME      DATA   AGE
metallb   1      30d
Able to get metallb configmap from Kubernetes
metallb-speaker pods: false <------------------------
ERROR: metallb-speaker pods are not ready, which can cause this test to fail
FAIL
ncn-w004:~ # echo $?
16
```

It also reports if the pods are ok:

```
ncn-w004:~ # /opt/cray/tests/install/ncn/scripts/check_bgp_neighbors_established.sh
Running in interactive mode
NAME      DATA   AGE
metallb   1      30d
Able to get metallb configmap from Kubernetes
metallb-speaker pods: true <----------------------
```

Signed-off-by: Jacob Salmela <jacob.salmela@hpe.com>
@jacobsalmela jacobsalmela self-assigned this Sep 25, 2024
@jacobsalmela jacobsalmela marked this pull request as ready for review September 25, 2024 17:57
@jacobsalmela jacobsalmela requested a review from a team as a code owner September 25, 2024 17:57
@jacobsalmela jacobsalmela merged commit 7337c2f into release/1.6 Sep 26, 2024
3 checks passed
@jacobsalmela jacobsalmela deleted the CASMTRIAGE-7316-1.6 branch September 26, 2024 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants