-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Ignore discovery errors for metrics resources #2009
Conversation
The Metrics API tends to be misconfigured, causing discovery errors which ultimately makes syncs fail. This change makes Flux ignore those errors.
@@ -206,6 +206,11 @@ func (c *Cluster) getAllowedResourcesBySelector(selector string) (map[string]*ku | |||
return nil, err | |||
} | |||
for gv, e := range discErr.Groups { | |||
if strings.HasSuffix(gv.Group, "metrics.k8s.io") { | |||
// The Metrics API tends to be misconfigured, causing errors. | |||
// We just ignore them, since it doesn't make sense to sync metrics anyways. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stefanprodan can you confirm whether this is true? I am not that familiar with the metrics API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GKE comes with the metrics server and it registers the metrics.k8s.io
at bootstrap. For EKS and AKS where you would install the metrics server with Helm, I don't think it will be an issue and for GKE ignoring it will not cause any problems as far as I can tell.
Since this is the third time we have problems with the discovery API (before #1991 we had #1951 and before that #1855) I think this may happen again with other resources in the future. I am tempted to, instead of the proposed solution, ignore the partial list of resources affected by |
@2opremio your suggestion makes more sense to me as I have the feeling we will otherwise forever be patching specific resources. |
After further thinking, all the errors we've had were in one way or another related to metrics. So I think it's worth giving it a try as-is (I am not too happy about ignoring errors, unless we are sure about the impact). If it fails again with a different resource, I will be happy to ignore `discovery.ErrGroupDiscoveryFailed altogether. |
Temporary fix for this? I've tried removing helmrelease for metrics-server...no luck |
@sercanacar I don't know what causes the error, sorry. I would assume it's a problem with the metrics server so I would suggest to double-check it's uninstalled. @stefanprodan any suggestions? |
@sercanacar you can upgrade to 1.12.2, which contains the fix |
The Metrics API tends to be misconfigured, causing discovery errors which ultimately makes syncs fail.
This change makes Flux ignore those errors.
Fixes #1991