Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeCSRApprover doesn't approve requests on GCP #781

Closed
xmudrii opened this issue Jun 23, 2020 · 3 comments · Fixed by #782 or #799
Closed

NodeCSRApprover doesn't approve requests on GCP #781

xmudrii opened this issue Jun 23, 2020 · 3 comments · Fixed by #782 or #799
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/high sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management.

Comments

@xmudrii
Copy link
Member

xmudrii commented Jun 23, 2020

The NodeCSRApprover controller is not approving CSRs for GCP worker nodes due to lack of private IP address and DNS name in the .status.Addresses field in the Machine object.

The CSR includes both private IP address and DNS name and as those are not in the Machine object, the validation fails, therefore the CSR is not signed.

As a workaround, until the issue is not fixed, the operator can manually approve the CSR, by following the instructions from the CSR docs.

As a fix, the following function should be extended to include the other type of addresses.

@xmudrii xmudrii added the sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management. label Jun 23, 2020
@kron4eg kron4eg added the kind/bug Categorizes issue or PR as related to a bug. label Jun 23, 2020
@xmudrii xmudrii self-assigned this Jun 23, 2020
@toschneck toschneck reopened this Jun 28, 2020
@toschneck
Copy link
Member

Unfortunately the bug is still present, used the latest machine controller version v1.14.3 and it seams the machine controller is still not approving the CSRs:

k get csr
NAME        AGE     REQUESTOR                                   CONDITION
csr-2grjt   3m20s   system:node:k1-pool-az-a-75cf6c85cf-h6bpn   Pending
csr-flqd4   6m32s   system:bootstrap:oszbfi                     Approved,Issued
csr-tz7tx   7m7s    system:bootstrap:oszbfi                     Approved,Issued
csr-wbq8n   9m45s   system:node:k1-control-plane-1              Approved,Issued
csr-z5sqm   3m35s   system:bootstrap:fqdtlp                     Approved,Issued

Attached you will find logs and current object states to determine the error. My guess is the following error:

I0628 17:31:35.991471       1 node_csr_approver.go:109] Skipping reconciling CSR 'csr-z5sqm' because CSR object is not valid: username must have the 'system:node:' prefix

machinecontroller.pod.log
machinecontroller.describe.log

csr.output.yaml.log

@xmudrii
Copy link
Member Author

xmudrii commented Jun 30, 2020

The root of the issue is that the hostname (InternalDNSName) on the Machine object is empty for some reason. I had no time to investigate why this happens, but I'll try to find out once I have some more time.

@toschneck
Copy link
Member

can confirm seams to work now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/high sig/cluster-management Denotes a PR or issue as being assigned to SIG Cluster Management.
Projects
None yet
4 participants