Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"kops update cluster" panics while creating JWKS for OIDC #14174

Closed
seh opened this issue Aug 24, 2022 · 8 comments · Fixed by #14370
Closed

"kops update cluster" panics while creating JWKS for OIDC #14174

seh opened this issue Aug 24, 2022 · 8 comments · Fixed by #14370
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@seh
Copy link
Contributor

seh commented Aug 24, 2022

1. What kops version are you running?

Client version: 1.24.1 (git-v1.24.1)

2. What Kubernetes version are you running?

Starting with version 1.19.9, upgrading to version 1.21.14.

3. What cloud provider are you using?

AWS

4. What commands did you run? What is the simplest way to reproduce this issue?

kops replace --filename=cluster.yaml
kops update cluster --yes

5. What happened after the commands executed?

It appears that kops update cluster fails when it panics preparing for publishing OIDC Discovery documents to an S3 bucket:

W0824 08:36:09.215431   12986 external_access.go:39] KubernetesAPIAccess is empty
I0824 08:36:10.848893   12986 executor.go:111] Tasks: 0 done / 393 total; 110 can run
I0824 08:36:11.805410   12986 executor.go:111] Tasks: 110 done / 393 total; 83 can run
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x108 pc=0x3ec3a7c]

goroutine 1531 [running]:
k8s.io/kops/pkg/model.(*OIDCKeys).Open(0xc0017a32a0?)
        k8s.io/kops/pkg/model/issuerdiscovery.go:134 +0x21c
k8s.io/kops/upup/pkg/fi.CopyResource({0x5c5c140, 0xc00104de60}, {0x5c61d40?, 0xc00108c7c8?})
        k8s.io/kops/upup/pkg/fi/resources.go:85 +0x72
k8s.io/kops/upup/pkg/fi.ResourceAsBytes({0x5c61d40, 0xc00108c7c8})
        k8s.io/kops/upup/pkg/fi/resources.go:112 +0x4c
k8s.io/kops/upup/pkg/fi/fitasks.(*ManagedFile).Render(0x5?, 0x0?, 0xc00168b740?, 0xc000ee9240, 0x2?)
        k8s.io/kops/upup/pkg/fi/fitasks/managedfile.go:154 +0x70
reflect.Value.call({0x4f99d60?, 0xc000ee9240?, 0x4?}, {0x537191a, 0x4}, {0xc000c28c60, 0x4, 0x5c91ab0?})
        reflect/value.go:556 +0x845
reflect.Value.Call({0x4f99d60?, 0xc000ee9240?, 0x53a8fcc?}, {0xc000c28c60, 0x4, 0x4})
        reflect/value.go:339 +0xbf
k8s.io/kops/upup/pkg/fi.(*Context).Render(0xc0011ec0a0, {0x5c633a0?, 0x0}, {0x5c633a0?, 0xc000ee9240}, {0x5c633a0?, 0xc00171f440})
        k8s.io/kops/upup/pkg/fi/context.go:225 +0xf2e
k8s.io/kops/upup/pkg/fi.DefaultDeltaRunMethod({0x5c633a0?, 0xc000ee9240}, 0xc0011ec0a0)
        k8s.io/kops/upup/pkg/fi/default_methods.go:82 +0x46c
k8s.io/kops/upup/pkg/fi/fitasks.(*ManagedFile).Run(0xc0008a2c18?, 0x0?)
        k8s.io/kops/upup/pkg/fi/fitasks/managedfile.go:109 +0x26
k8s.io/kops/upup/pkg/fi.(*executor).forkJoin.func1(0xc00146d8f0, 0x4)
        k8s.io/kops/upup/pkg/fi/executor.go:187 +0x1ea
created by k8s.io/kops/upup/pkg/fi.(*executor).forkJoin
        k8s.io/kops/upup/pkg/fi/executor.go:183 +0x86

It appears to be failing on this line in file pkg/model/issuerdiscovery.go, trying to access a public key in memory.

6. What did you expect to happen?

kops update cluster would publish all the OIDC Discovery documents to S3, and continue on with the rest of its tasks.

7. Please provide your cluster manifest.

cluster.yaml file
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
  name: my-cluster.example.com
spec:
  additionalSans:
  - api.internal.my-cluster.example.com
  api:
    loadBalancer:
      additionalSecurityGroups:
      - sg-005e2b9c6ffed8582
      class: Network
      crossZoneLoadBalancing: true
      type: Public
  authorization:
    rbac: {}
  certManager:
    enabled: true
    managed: false
  cloudConfig:
    disableSecurityGroupIngress: true
  cloudProvider: aws
  clusterAutoscaler:
    balanceSimilarNodeGroups: true
    enabled: true
  configBase: s3://my-kops-state/my-cluster.example.com
  etcdClusters:
  - etcdMembers:
    - instanceGroup: master-us-east-2a
      name: a
    - instanceGroup: master-us-east-2b
      name: b
    - instanceGroup: master-us-east-2c
      name: c
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8081
      - name: ETCD_METRICS
        value: extensive
    name: main
  - etcdMembers:
    - instanceGroup: master-us-east-2a
      name: a
    - instanceGroup: master-us-east-2b
      name: b
    - instanceGroup: master-us-east-2c
      name: c
    manager:
      env:
      - name: ETCD_LISTEN_METRICS_URLS
        value: http://0.0.0.0:8082
      - name: ETCD_METRICS
        value: basic
    name: events
  iam:
    allowContainerRegistry: true
    legacy: false
  kubeAPIServer:
    featureGates:
      EphemeralContainers: "true"
  kubeDNS:
    provider: KubeDNS
  kubeProxy:
    enabled: false
  kubelet:
    anonymousAuth: false
    featureGates:
      EphemeralContainers: "true"
    kubeReserved:
      cpu: 750m
      memory: .75Gi
  kubernetesVersion: 1.21.14
  metricsServer:
    enabled: true
  networkCIDR: 10.3.0.0/16
  networkID: vpc-087cd3eb3bf613986
  networking:
    calico:
      bpfEnabled: true
      crossSubnet: true
      encapsulationMode: vxlan
      typhaReplicas: 3
  nonMasqueradeCIDR: 100.64.0.0/10
  serviceAccountIssuerDiscovery:
    discoveryStore: s3://my-kops-oidc-discovery/my-cluster
    enableAWSOIDCProvider: true
  sshAccess:
  - 184.74.210.37/32
  - 184.74.210.38/32
  - 207.141.66.101/32
  - 207.141.66.99/32
  - 212.187.232.28/32
  - 212.187.232.29/32
  - 4.53.131.109/32
  - 4.53.131.110/32
  - 4.71.99.125/32
  - 4.71.99.126/32
  subnets:
  - cidr: 10.3.100.0/22
    id: subnet-0cd20dfb64345dede
    name: utility-us-east-2a
    type: Utility
    zone: us-east-2a
  - cidr: 10.3.104.0/22
    id: subnet-0657e2c2163960a79
    name: utility-us-east-2b
    type: Utility
    zone: us-east-2b
  - cidr: 10.3.108.0/22
    id: subnet-013e44ade2633a1b1
    name: utility-us-east-2c
    type: Utility
    zone: us-east-2c
  - cidr: 10.3.0.0/22
    egress: nat-06a85bf97c4a5b65d
    id: subnet-0ca2f5a3ab50e538e
    name: us-east-2a
    type: Private
    zone: us-east-2a
  - cidr: 10.3.4.0/22
    egress: nat-054d637847b63ea36
    id: subnet-047a72902591ebe60
    name: us-east-2b
    type: Private
    zone: us-east-2b
  - cidr: 10.3.8.0/22
    egress: nat-0df765ca07bb44f0f
    id: subnet-051d2325bcab67fa6
    name: us-east-2c
    type: Private
    zone: us-east-2c
  topology:
    dns:
      type: Public
    masters: private
    nodes: private
  updatePolicy: external

8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.

Here is the kops update cluster output at verbosity level ten, just before the failure:

I0824 08:54:41.457895   13698 executor.go:186] Executing task "MirrorSecrets/mirror-secrets": *fitasks.MirrorSecrets {"Name":"mirror-secrets","Lifecycle":"Sync","MirrorPath":{}}
I0824 08:54:41.461003   13698 request_logger.go:45] AWS request: ec2/DescribeSecurityGroups
I0824 08:54:41.461652   13698 request_logger.go:45] AWS request: iam/GetInstanceProfile
I0824 08:54:41.462143   13698 request_logger.go:45] AWS request: ec2/DescribeSubnets
I0824 08:54:41.462148   13698 request_logger.go:45] AWS request: iam/GetInstanceProfile
I0824 08:54:41.463820   13698 request_logger.go:45] AWS request: iam/ListAttachedRolePolicies
I0824 08:54:41.472058   13698 request_logger.go:45] AWS request: iam/GetRolePolicy
I0824 08:54:41.472136   13698 request_logger.go:45] AWS request: ec2/DescribeSubnets
I0824 08:54:41.472541   13698 s3fs.go:329] Reading file "s3://my-kops-oidc-discovery/my-cluster/openid/v1/jwks"
I0824 08:54:41.472820   13698 request_logger.go:45] AWS request: iam/GetRolePolicy
I0824 08:54:41.473050   13698 request_logger.go:45] AWS request: ec2/DescribeInternetGateways
I0824 08:54:41.473944   13698 request_logger.go:45] AWS request: ec2/DescribeSubnets
I0824 08:54:41.474061   13698 request_logger.go:45] AWS request: ec2/DescribeSecurityGroups
I0824 08:54:41.473806   13698 request_logger.go:45] AWS request: iam/ListAttachedRolePolicies
I0824 08:54:41.473914   13698 request_logger.go:45] AWS request: iam/GetRolePolicy
I0824 08:54:41.510755   13698 request_logger.go:45] AWS request: elasticloadbalancing/DescribeTargetGroups
panic: runtime error: invalid memory address or nil pointer dereference

Earlier, I see this pertinent log message:

I0824 08:54:41.459055   13698 executor.go:186] Executing task "ManagedFile/keys.json": *fitasks.ManagedFile {"Name":"keys.json","Lifecycle":"Sync","Base":"s3://my-kops-oidc-discovery/my-cluster","Location":"openid/v1/jwks","Contents":{"SigningKey":{"Name":"service-account","alternateNames":null,"Lifecycle":"Sync","Signer":null,"subject":"cn=service-account","issuer":"","type":"ca","oldFormat":false}},"Public":true}

Note that at present, the aforementioned S3 bucket exists, but there is no existing object with the path my-cluster/openid/v1/jwks.

9. Anything else do we need to know?

I have been able to upgrade clusters and activate the "spec.serviceAccountIssuerDiscovery.enableAWSOIDCProvider" field's behavior successfully with earlier versions of kOps, which wrote the S3 object as necessary. This version of kOps appears to be failing before it can create this S3 object. kOps was able to create the my-cluster/.well-known/openid-configuration object in the same bucket.

See #13353 for what looks to be an earlier report of a similar defect.

See the prior discussion in the "kops-users" channel of the "Kubernetes" Slack workspace.

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 24, 2022
@seh
Copy link
Contributor Author

seh commented Aug 24, 2022

It turns out that the KeysetItem.Certificate field is nil in all but the last two items in my key set. I added some output to (*OIDCKeys).Open. It reports the following:

Number of keys in key set:  7
Key set item "6702426753028327577194087677": &{6702426753028327577194087677 <nil> <nil> 0xc001200b50}
  (ID: "6702426753028327577194087677", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc000fd92c0})
Key set item "6717351783746805535929340772": &{6717351783746805535929340772 <nil> <nil> 0xc001200b90}
  (ID: "6717351783746805535929340772", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc000fd9440})
Key set item "6724755564554290971271764485": &{6724755564554290971271764485 <nil> <nil> 0xc001200bd0}
  (ID: "6724755564554290971271764485", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc000fd9500})
Key set item "6725145319226802661715703465": &{6725145319226802661715703465 <nil> <nil> 0xc001200c10}
  (ID: "6725145319226802661715703465", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc000fd95c0})
Key set item "6727272810098431180443208693": &{6727272810098431180443208693 <nil> <nil> 0xc001200c50}
  (ID: "6727272810098431180443208693", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc001508180})
Key set item "6727329898571771312485446625": &{6727329898571771312485446625 <nil> 0xc000afc000 0xc001200d00}
  (ID: "6727329898571771312485446625", distrust timestamp <nil>, certificate: &{CN=kubernetes-master false 0xc00206c580 0xc001200cb0}, private key: &{0xc0015084e0})
Key set item "6906097667750333366645304518": &{6906097667750333366645304518 <nil> 0xc000afc120 0xc001200e90}
  (ID: "6906097667750333366645304518", distrust timestamp <nil>, certificate: &{CN=service-account true 0xc00206cb00 0xc001200e20}, private key: &{0xc001508660})

@seh
Copy link
Contributor Author

seh commented Aug 24, 2022

If I add the following guard condition to (*OIDCKeys).Open, it looks like it will filter the key set items down to just those that contain a certificate for the common name "service-account":

		if item.Certificate == nil || item.Certificate.Subject.CommonName != "service-account" {
			continue
		}

Does that preserve all the items that this method was expecting to consume?

@seh seh changed the title "kops update cluster" panics while creating OIDC Discovery documents in S3 "kops update cluster" panics while creating JWKS for OIDC Aug 24, 2022
@seh
Copy link
Contributor Author

seh commented Aug 24, 2022

Note that the kops get keypairs subcommand fails similarly, due to assuming that every key set item contains an X.509 certificate.

% kops get keypairs
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x100 pc=0x42a9788]

goroutine 1 [running]:
main.listKeypairs({0x5c80b38?, 0xc00066d620?}, {0x7dfba58, 0x0, 0x25?}, 0x0)
        k8s.io/kops/cmd/kops/get_keypairs.go:127 +0x2e8
main.RunGetKeypairs({0x5c7d260, 0xc0000520e8}, {0x5c61a80?, 0xc000c09080?}, {0x5c639c0?, 0xc00000e018?}, 0xc0008b6270)
        k8s.io/kops/cmd/kops/get_keypairs.go:174 +0xf8
main.NewCmdGetKeypairs.func3(0xc000e65680?, {0x7dfba58?, 0x0?, 0x0?})
        k8s.io/kops/cmd/kops/get_keypairs.go:78 +0x3e
github.com/spf13/cobra.(*Command).execute(0xc000e65680, {0x7dfba58, 0x0, 0x0})
        github.com/spf13/cobra@v1.5.0/command.go:872 +0x694
github.com/spf13/cobra.(*Command).ExecuteC(0x7da7c00)
        github.com/spf13/cobra@v1.5.0/command.go:990 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/cobra@v1.5.0/command.go:918
main.Execute()
        k8s.io/kops/cmd/kops/root.go:95 +0x5c
main.main()
        k8s.io/kops/cmd/kops/main.go:20 +0x17

@hakman
Copy link
Member

hakman commented Aug 26, 2022

@seh Would you like to continue to iterate on the fix?

@seh
Copy link
Contributor Author

seh commented Aug 26, 2022

Would you like to continue to iterate on the fix?

Yes, though it would help to hear whether or not these entries that lack certificates are valid. Can kOps use them for anything? Should I ignore them as if they were distrusted?

@olemarkus
Copy link
Member

Ignore them as distrusted, but list them and make them deletable, I would say.

@johngmyers
Copy link
Member

I guess I didn't research far back enough in the history of the keystore code.

kOps can't use a private key without a certificate for anything unless/until it generates a corresponding certificate. (Though for service-account keypairs the only part of the certificate it uses is the public key.)

These days all code paths that create a key also create a corresponding certificate. I would agree that keys without certificates should be ignored as if distrusted.

@olemarkus
Copy link
Member

As this is not a regression or something that breaks things for a lot of users, I removed the blocks-next label.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants