-
Notifications
You must be signed in to change notification settings - Fork 221
some k8s components report unhealthy status after cluster bootsrap #64
Comments
Looks like it's hard-coded to expect scheduler + controller-manager are on the same host as api-server: https://github.com/kubernetes/kubernetes/blob/04ce042ff9cfb32b2c776f755cc7abc886b8a441/pkg/master/master.go#L620-L623 We do not adhere to this assumption because schedule + controller manager are deployments which could be on different hosts (and do not use host-networking). @sym3tri would you be able to inspect this information from another api-endpoint? Maybe inspecting pods in kube-system, or a specific set of pods via label query? It seems like this componentstatus endpoint is somewhat contentious as it stands: |
I have no love for the current componentstatuses endpoint. I don't remember whether it was all captured in the proposal, but I think we iterated towards a consensus on Karl's component registration proposal, which you cited: Someone would need to work on it. |
I skimmed the proposal & I more or less agree that it's not exactly a pressing issue to have a single /componentstatuses api-endpoint. I like the idea of fronting healthcheck endpoints with a service (e.g. "scheduler-health.kube-system.cluster.local"). Then if we wanted to drill down into how many of those pods are healthy, it's just a matter of querying the service itself. @sym3tri is this still blocking you for any reason? Would the health-check service endpoint be a reasonable end-goal? Or is directly querying the pods sufficient? |
@aaronlevy Directly querying the pods puts a lot of burden on the caller. If we can have a fronting service that would be ideal. Directly querying the pods is an ok workaround for the time-being but not a good long-term solution. We'd just be shifting the hardcoded services to our code, and there is no other way to query etcd health via the API. |
Opened #85 to track that feature specifically. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
curl 127.0.0.1:8080/api/v1/componentstatuses
The text was updated successfully, but these errors were encountered: