Autofailover Stuck in Unable to Find Pod Status After graphd scale down #529
Labels
affects/v1.8
PR/Issue: this bug affects v1.8.x version.
process/fixed
Process of bug
severity/minor
Severity of bug
type/bug
Type: something is unexpected
Please check the FAQ documentation before raising an issue
Describe the bug (required)
Snap reported that graphd failed to scale up and start new pods after nebula autoscaler increased the number of graphd replicas from 2 to 4. The number of desired replicas in
kubectl describe
for both the autoscaler and the nebula cluster are correct, but no new pods were started. Further investigation reviewed the errorE1007 18:17:25.249973 1 nebula_cluster_controller.go:196] NebulaCluster [cb/cb] reconcile failed: rebuilt graphd pod [cb/cb-graphd-2] not found, skip
in the operator log which was thrown during auto failover when checking the status of new pods. Alsokubectl get pods
revels only 2 graphd pods. This happened due to the following sequence:Solution: Remove the pod from the auto failover map when it's terminated.
Related logs are attached below.
Snap-na-describe-output.txt
cb_nc.txt
controller-manager-logs.txt
Snap-nc-pods-output.txt
Your Environments (required)
How To Reproduce(required)
Steps to reproduce the behavior:
Expected behavior
Graphd should scale up and start new pods successfully
Additional context
All related logs and cluster config is attached
The text was updated successfully, but these errors were encountered: