Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secrets Reencrypt command throws fatal error client timing out when there are 1000 basic secrets to reencrypt #11384

Closed
1 of 2 tasks
aganesh-suse opened this issue Nov 27, 2024 · 5 comments
Assignees

Comments

@aganesh-suse
Copy link

Build details:

Branch: release-1.31
Commit ID: e99a668

k3s -v
k3s version v1.31.3-rc1+k3s1 (e99a6684)
go version go1.22.8

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 24.04 LTS"

$ uname -m
x86_64

Cluster Configuration:

HA: 3 etcd, 2 cp and 1 agent node.

Describe the bug:

Etcd Config.yaml:

token: xxxx
disable-apiserver: true
disable-controller-manager: true
disable-scheduler: true
node-taint:
- node-role.kubernetes.io/etcd:NoExecute
cluster-init: true
write-kubeconfig-mode: "0644"
secrets-encryption: true
node-external-ip: 1.1.1.1
node-label:
- k3s-upgrade=server
debug: true

Control Plane Config.yaml:

token: secret
server: https://1.1.1.1:6443
disable-etcd: true
node-taint:
- node-role.kubernetes.io/control-plane:NoSchedule
write-kubeconfig-mode: "0644"
secrets-encryption: true
node-external-ip: 4.4.4.4
node-label:
- k3s-upgrade=server
debug: true

Testing Steps to Reproduce:

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  1. Install k3s
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_COMMIT='01e6b5b8d3274b11e982345f5fc1630dcc6e3add' sh -s - server
  1. Verify Cluster Status:
kubectl get nodes -o wide
kubectl get pods -A
  1. Create 1000 basic secrets:
echo 'this is a file' > file.txt && for i in {1..1000}; do echo test$i >> file.txt; kubectl create secret generic test$i --from-file=file.txt; done
  1. Perform the secrets encryption operations: prepare/rotate/reencrypt (Reboot ALL nodes after every command - all etcd first then all cp then agent nodes)
$ sudo k3s secrets-encrypt prepare
$ sudo k3s secrets-encrypt rotate
$ sudo k3s secrets-encrypt reencrypt

Expected behavior:

Secrets encryption operations should be successful

Actual behavior:

Reencrypt operation times out with fatal error.

$ sudo /usr/local/bin/k3s secrets-encrypt reencrypt
time="2024-11-27T18:53:33Z" level=debug msg="Asset dir /var/lib/rancher/k3s/data/3345fdb78d4ac6f55d7d70b8ec401ed32d58d5af6b2e11412cd5a2d3c50ff3d1"
time="2024-11-27T18:53:33Z" level=debug msg="Running /var/lib/rancher/k3s/data/3345fdb78d4ac6f55d7d70b8ec401ed32d58d5af6b2e11412cd5a2d3c50ff3d1/bin/k3s-secrets-encrypt [/usr/local/bin/k3s secrets-encrypt reencrypt]"
time="2024-11-27T18:53:43Z" level=fatal msg="see server log for details: Put \"https://127.0.0.1:6443/v1-k3s/encrypt/config\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

The status shows reencrypt finished though. (Output shown - Before reboot of all nodes. So hashes dont match yet)

sudo /usr/local/bin/k3s secrets-encrypt status
DEBU[0000] Asset dir /var/lib/rancher/k3s/data/3345fdb78d4ac6f55d7d70b8ec401ed32d58d5af6b2e11412cd5a2d3c50ff3d1
DEBU[0000] Running /var/lib/rancher/k3s/data/3345fdb78d4ac6f55d7d70b8ec401ed32d58d5af6b2e11412cd5a2d3c50ff3d1/bin/k3s-secrets-encrypt [/usr/local/bin/k3s secrets-encrypt status]
Encryption Status: Enabled
Current Rotation Stage: reencrypt_finished
Server Encryption Hashes: hash does not match between ip-172-31-19-50 and ip-172-31-28-125

Active  Key Type  Name
------  --------  ----
 *      AES-CBC   aescbckey-2024-11-27T18:38:34Z

From the journal logs - it looks like the reencrypt action is actually successful - and this could be a client monitor that times out.

Nov 27 18:53:34 ip-172-31-28-125 k3s[8964]: time="2024-11-27T18:53:34Z" level=info msg="Creating secrets-reencrypt event broadcaster"
Nov 27 18:53:34 ip-172-31-28-125 k3s[8964]: I1127 18:53:34.667894    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 50 secrets"
Nov 27 18:53:35 ip-172-31-28-125 k3s[8964]: I1127 18:53:35.367470    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 100 secrets"
Nov 27 18:53:36 ip-172-31-28-125 k3s[8964]: I1127 18:53:36.146289    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 150 secrets"
Nov 27 18:53:36 ip-172-31-28-125 k3s[8964]: I1127 18:53:36.859301    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 200 secrets"
Nov 27 18:53:37 ip-172-31-28-125 k3s[8964]: I1127 18:53:37.525374    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 250 secrets"
Nov 27 18:53:38 ip-172-31-28-125 k3s[8964]: I1127 18:53:38.237599    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 300 secrets"
Nov 27 18:53:38 ip-172-31-28-125 k3s[8964]: I1127 18:53:38.925433    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 350 secrets"
Nov 27 18:53:39 ip-172-31-28-125 k3s[8964]: I1127 18:53:39.702567    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 400 secrets"
Nov 27 18:53:40 ip-172-31-28-125 k3s[8964]: I1127 18:53:40.535542    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 450 secrets"
Nov 27 18:53:41 ip-172-31-28-125 k3s[8964]: I1127 18:53:41.304616    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 500 secrets"
Nov 27 18:53:42 ip-172-31-28-125 k3s[8964]: I1127 18:53:42.022678    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 550 secrets"
Nov 27 18:53:42 ip-172-31-28-125 k3s[8964]: I1127 18:53:42.711861    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 600 secrets"
Nov 27 18:53:43 ip-172-31-28-125 k3s[8964]: I1127 18:53:43.430343    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 650 secrets"
Nov 27 18:53:44 ip-172-31-28-125 k3s[8964]: I1127 18:53:44.136434    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 700 secrets"
Nov 27 18:53:45 ip-172-31-28-125 k3s[8964]: I1127 18:53:45.038735    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 750 secrets"
Nov 27 18:53:46 ip-172-31-28-125 k3s[8964]: I1127 18:53:46.000470    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 800 secrets"
Nov 27 18:53:46 ip-172-31-28-125 k3s[8964]: I1127 18:53:46.659005    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 850 secrets"
Nov 27 18:53:47 ip-172-31-28-125 k3s[8964]: I1127 18:53:47.361779    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 900 secrets"
Nov 27 18:53:48 ip-172-31-28-125 k3s[8964]: I1127 18:53:48.228166    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 950 secrets"
Nov 27 18:53:48 ip-172-31-28-125 k3s[8964]: I1127 18:53:48.982351    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsProgress" message="reencrypted 1000 secrets"
Nov 27 18:53:49 ip-172-31-28-125 k3s[8964]: I1127 18:53:49.207722    8964 event.go:389] "Event occurred" object="ip-172-31-28-125" fieldPath="" kind="Node" apiVersion="" type="Normal" reason="SecretsUpdateComplete" message="reencrypted 1012 secrets"
@brandond
Copy link
Member

Yeah this is just a cosmetic thing - as you noted the re-encrypt does in fact finish.

Rather than having the client make a single request that doesn't return until the re-encrypt is done, it should probably make 1 request to initiate the re-encrypt operation. That request should return immediately, and then the client can poll with additional status requests until the operation is complete.

cc @dereknola

@dereknola
Copy link
Member

dereknola commented Nov 27, 2024

That's not how reencrypt works right now. We don't wait for it to finish. That's why we print reencryption started. It just makes a PUT request to start the process.

The request trigger a annotation to be written on the node, which the k3s secrets-encryption controller watches. The response header is sent immediately after the annotation is written on the server node.

@dereknola
Copy link
Member

Okay, when #10612 got merged, the logic around http replies got muddled. We should not be waiting for that function to finish, we should just move on now.

@dereknola
Copy link
Member

Not sure how we didn't hit this till now, but should be a simple fix. Since its a cosmetic, lets just target December.

@aganesh-suse
Copy link
Author

Closing based on release-1.31 results: #11437

@github-project-automation github-project-automation bot moved this from To Test to Done Issue in K3s Development Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done Issue
Development

No branches or pull requests

3 participants