kubelet does not reload recreated immutable configmap, unclear documentation #42359

juliantaylor · 2023-08-02T11:32:25Z

What happened?

With immutable configmaps and secrets are not watched by the kubelet. This also means it does not refresh the loaded content when they are deleted and recreated.

The documentation of immutable configmaps though is misleading on how this needs to be handled by pods:
https://kubernetes.io/docs/concepts/configuration/configmap/#configmap-immutable

Once a ConfigMap is marked as immutable, it is not possible to revert this change nor to mutate the contents of the data or the binaryData field. You can only delete and recreate the ConfigMap. Because existing Pods maintain a mount point to the deleted ConfigMap, it is recommended to recreate these pods.

This to me implies deleting and recreating the configmap and restarting the pods using it is sufficient, but it is not as the kublet will still have the old version loaded and won't update it until it is restarted.

This behavior is more or less expected but the documentation should be more explicit about this pitfall.

What did you expect to happen?

The documentation should state that immutable objects mounted into pods need to be deleted and recreated with a different name in order for the (recreated) pods to see the new version reliably.

How can we reproduce it (as minimally and precisely as possible)?

Start a deployment with an immutable configmap mounted. Change it, delete and recreate it with the same name and restart the deployment.
Pods on the kubelet will still have the old version until the kubelet is restarted and then the pods restarted.
see #42359 (comment) for an example deployment

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3", GitCommit:"25b4e43193bcda6c7328a6d147b1fb73a33f1598", GitTreeState:"clean", BuildDate:"2023-06-15T00:36:28Z", GoVersion:"go1.20.5", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

bare metall

OS version

No response

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

vaibhav2107 · 2023-08-02T12:06:23Z

/sig node

tzneal · 2023-08-02T17:39:39Z

/kind documentation
/remove-kind bug
/cc @sftim
/triage accepted

sftim · 2023-08-02T22:05:00Z

/uncc

/sig docs

sftim · 2023-08-02T22:05:52Z

/transfer website

sftim · 2023-08-02T22:15:57Z

I believe from the evidence here that we may be able build a test case where, starting from a namespace that contains:

no ConfigMaps
no Pods

can create a ConfigMap and a Pod that references the ConfigMap, and (because of what was previously in that namespace) see the old ConfigMap loaded into that Pod. Even if the new ConfigMap is not marked as immutable.

The period where that's possible might be quite short though.

This sounds more plausible; starting from a namespace that contains:

no ConfigMaps
some running Pods that reference a previous immutable ConfigMaps

create a new ConfigMap (same name) and a new Pod. You may observe that the new Pod appears to have loaded the wrong data, from the old ConfigMap.

We might need to define that you should delete every object that could have had a reference to that old ConfigMap - potentially including cluster-scoped objects such as IngressClasses and ValidatingAdmissionPolicies.

Advising people to create the replacement ConfigMap with a different name is not enough; we cannot assume that a different user is aware of the potential problems around use of the old name.

sftim · 2023-08-02T22:15:59Z

We might be able to detect in-use immutable ConfigMaps (within the control plane), and do something around finalizing them. That mechanism could eventually enable sending a warning when an immutable ConfigMap is deleted whilst still in use.

sftim · 2023-08-02T22:16:14Z

I recommend that SIG Architecture chip in here.

/sig architecture

pegasas · 2023-08-02T22:52:29Z

/assign @pegasas

pegasas · 2023-08-02T22:56:44Z

Hi, @sftim ,
I do the test case that you mentioned in Azure Kubernetes 1.25.11, refer to this doc: https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/

apiVersion: v1
kind: ConfigMap
metadata:
  name: special-config
  namespace: default
data:
  special.how: very
immutable: true

and

apiVersion: v1
kind: Pod
metadata:
  name: ubuntu
  labels:
    app: ubuntu
spec:
  containers:
  - image: ubuntu
    command:
      - "sleep"
      - "3650d"
    env:
      - name: SPECIAL_LEVEL_KEY
        valueFrom:
          configMapKeyRef:
            name: special-config
            key: special.how
    imagePullPolicy: IfNotPresent
    name: ubuntu
  restartPolicy: Always

and I create these 2 in k8s cluster, and echo the key:

kubectl exec -it ubuntu -- /bin/sh
# echo $SPECIAL_LEVEL_KEY
very

and after I delete the cm,

apiVersion: v1
kind: ConfigMap
metadata:
  name: special-config
  namespace: default
data:
  special.how: test
immutable: true

it keeps the same value:

kubectl exec -it ubuntu -- /bin/sh
# echo $SPECIAL_LEVEL_KEY
very

tengqm · 2023-08-03T00:57:11Z

The motivation for Immutable ConfigMap/Secret is to avoid watching them. Since they are immutable, you get what is in it, and treat it as constant. You don't care about the future changes to them, they are immutable after all. If you want to watch them, especially for updates, you won't use immutable ConfigMap/Secret.

Protecting ConfigMap/Secret in use is another topic. See KEP 2840.

sftim · 2023-08-03T08:33:59Z

The test case explained in #42359 (comment) covers already documented behavior; if that's the test case, I don't think we need to make a docs change.

I thought you were concerned about ConfigMap removal and recreation with a new uid @juliantaylor, where a new Pod might see the old ConfigMap.

juliantaylor · 2023-08-03T08:50:18Z

Protecting a configmap from being deleted while it is in use is not sufficient, a configmap or secret name cannot be reused after it has been deleted until all kubelets have been restarted or you risk a pod seeing old content.

This likely also happens long after all pods using the old version are long gone (create deployment and immutable cm, delete it, wait some time redeploy with different configmap content), though I have not explicitly verified full delete and longer timescales, only rolling updates with the same cm name.

The practical solution to this is to have immutable configmaps and secrets name contain the checksum of its content. That way it is almost impossible for wrong data to end in a pod even when the kubelet runs for a long time.
This practice should be mentioned in the documentation.

If you can fix the underlying issue all the better, but as stated the point of immutable is to reduce api requests and watches, so watching for deletes is not an sensible option.

Possibly you could have the kubelet refetch the object when it is required but only if it already exists as immutable in its caches.

sftim · 2023-08-03T08:58:48Z

The practical solution to this is to have immutable configmaps and secrets name contain the checksum of its content.

Let's make sure we know what (replicable) problem we're helping people to avoid, before we triage this as accepted.

juliantaylor · 2023-08-03T09:05:32Z

ok here the reproducer I stated in the beginning written out:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
  name: cm-test
spec:
  replicas: 7
  selector:
    matchLabels:
      app: cm-test
  template:
    metadata:
      labels:
        app: cm-test
    spec:
      terminationGracePeriodSeconds: 1
      volumes:
        - name: cm
          configMap:
            name: testcm
      containers:
      - image: debian:bookworm
        command:
          - bash
          - "-c"
          - 'while true; do echo -n "$NODENAME "; md5sum /cm/script.py || true; sleep 20; done'
        imagePullPolicy: Always
        name: test
        securityContext:
          runAsUser: 1000
        volumeMounts:
          - name: cm
            mountPath: /cm
        env:
          - name: NODENAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: spec.nodeName
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: testcm
data:
  script.py: "content7"
immutable: true

kubectl apply -f deployment.yaml
logs:
cm-test-69864cf46f-smjf5 kworker-be-sandbox-iz2-004 f4a40d590cbb53290e7ba914e24a346c  /cm/script.py
kubectl delete cm testcm
sed -i -e "s/content7/content8/" deployment.yaml
kubectl apply -f deployment.yaml
kubectl rollout restart deployment cm-test
logs (same checksum)
cm-test-8b5456fc-gc2l4 kworker-be-sandbox-iz2-004 f4a40d590cbb53290e7ba914e24a346c  /cm/script.py

restart the kubelet or launch on a node teh pod was not on yet and you get the new content

juliantaylor · 2023-08-03T09:09:09Z

full delete of the deployment does seem to refresh the configmap, so kubelet may have something to cleanup unused configmaps, in that case protecting deletion in use would work

juliantaylor · 2023-08-30T08:37:04Z

Any updates? Unless it can be trivially fixed and released fast, I just want the documentation to be updated to make the existing semantics of immutable objects clearer.

sftim · 2023-08-30T08:47:28Z

@juliantaylor the Kubernetes project is open source and the documentation is largely maintained by volunteers.

Depending on how much you want a fix, you could:

volunteer to fix it yourself - see https://kubernetes.io/docs/contribute/
pay someone else to fix it
keep waiting

~~However, we first want to make sure we understand what needs fixing, or made more clear. Can you help with that part?~~
I saw that you'd added more detail. Thanks.

pegasas · 2023-08-30T09:00:30Z

Hi, @juliantaylor ,

The practical solution to this is to have immutable configmaps and secrets name contain the checksum of its content. That way it is almost impossible for wrong data to end in a pod even when the kubelet runs for a long time.
This practice should be mentioned in the documentation.

You aim to add this into docs as a supplement, right?
I think the community thinks the immutable secret feature is designed as immutable, you treat it as constant and don't care about the future changes to them.

I think your scenario may obey our design rule.

What do you think?

juliantaylor · 2023-08-30T10:00:23Z

I would suggest to update the following from:

Once a ConfigMap is marked as immutable, it is not possible to revert this change nor to mutate the contents of the data or the binaryData field. You can only delete and recreate the ConfigMap. Because existing Pods maintain a mount point to the deleted ConfigMap, it is recommended to recreate these pods.

to

Once a ConfigMap is marked as immutable, it is not possible to revert this change nor to mutate the contents of the data or the binaryData field. Also after deleting and recreating the object with new content Kubernetes components may retain the original content in their caches and continue to provide the original content to new pods.
It is recommended to ensure immutable object names are not reused by for example adding the contents checksum into the object name to ensure pods receive the intended content.

I can open a doc PR if you agree this represents intended behavior or won't be fixed soon and thus worth documenting for the time being,

pegasas · 2023-08-30T10:51:04Z

Yeah, indeed it fits this behavior you mentioned.
However, I think it may mislead users to delete and recreate immutable secrets.
If you want to modify the secret, you should use the secret directly rather than immutable secrets.
We do not encourage this behavior because it is not by design.

The motivation for Immutable ConfigMap/Secret is to avoid watching them. Since they are immutable, you get what is in it, and treat it as constant. You don't care about the future changes to them, they are immutable after all. If you want to watch them, especially for updates, you won't use immutable ConfigMap/Secret.

Protecting ConfigMap/Secret in use is another topic. See KEP 2840.

The test case explained in #42359 (comment) covers already documented behavior; if that's the test case, I don't think we need to make a docs change.

I thought you were concerned about ConfigMap removal and recreation with a new uid @juliantaylor, where a new Pod might see the old ConfigMap.

I believe that this is what @tengqm and @sftim trying to clarify in the comments above.

juliantaylor · 2023-08-30T11:13:41Z

I am only concerned about the existing documentation implying deleting and recreating a immutable configmap/secret and restarting pods is safe to do which it is not due to the potential serving of stale content.

I'm also fine with updating it to say do not ever delete and recreate an immutable secret or configmap and remove the whole section about doing it and restarting pods. But in this case it should also be mentioned why as it is may not immediately obvious to the reader that this is unsafe and yet deletion is allowed by the API.

As for why one would use immutable configmaps, in large clusters with many larges nodes and pods the many many configmap watches do put some load on the API. The majority of configmaps never to barely ever changes e.g. root CA certificates and many deployment use versioned configurationmaps anyway (each change creates a new configmap).
So we mark as many configmaps as we can immutable even if they do occasionally change as its just free load reduction.
While updating some stuff which did not use versioned configmaps we discovered this behaviour. Why it behaves like this is clear after the fact, but needing to discover it in the first place should not be needed after reading the immutable documentation which I would like to see fixed.

juliantaylor · 2023-08-30T11:42:27Z

May this is better at discouraging changing immutable objects:

Once a ConfigMap is marked as immutable, it is not possible to revert this change nor to mutate the contents of the data or the binaryData field. Deletion of an immutable ConfigMap should be avoided as when one recreates the immutable ConfigMap with the same name at a latter time the old deleted content may still be served from caches.

Possibly one could also clarify how long the deleted configmap/secret can be cached which I assume is as long as something in the cluster references it (e.g. a still running pod with the deleted cm mounted).

pegasas · 2023-08-30T14:38:38Z

Possibly one could also clarify how long the deleted configmap/secret can be cached which I assume is as long as something in the cluster references it (e.g. a still running pod with the deleted cm mounted).

I think this is a good point for "clarify how long the deleted configmap/secret can be cached".
Then it will become not binding to this character.
we could finds a suitable place to add this into document if there is not any description right now.

sftim · 2023-08-30T19:10:49Z

Let's clarify that you should prefer to use a new name for your update to an immutable ConfigMap or Secret. If you want to use the same name, you have to first make sure that every existing reference to the old object (not just Pods, any reference) has been cleaned up.

pisto · 2024-05-27T10:37:52Z

As of today, the official docs are not clear on this subject. In both the ConfigMap and Secrets concepts pages, it states:

Once a Secret or ConfigMap is marked as immutable, it is not possible to revert this change nor to mutate the contents of the data field. You can only delete and recreate the Secret. Existing Pods maintain a mount point to the deleted Secret - it is recommended to recreate these pods.

This does not hint the user on the fact that recreated pods may still get old stale value, at least if used as environment variables. The message as it is actually suggests (IMO clearly states) that when an immutable secret actually changes, it is sufficient to recreate it and the consuming pods, which is not correct.

It took us quite some time to figure this out, as we have been seeing inconsistent behavior when scheduling a pod across nodes (some nodes would see the new value, and others the old value). For an end user, this does not make any sense. From an administrator perspective, it took some time to isolate the issue and the particular level in the stack (kubelet? container runtime? race issues?) where the problem was originating. This also breaks the assumption that the state of the cluster is seen consistently across command line users, kubelets and other observers such as operator (we are using the ExternalSecrets operator, so that's one extra layer that we had to debug and exclude from the loop to understand the problem).

IMO it should be worded about this way:

Once a Secret or ConfigMap is marked as immutable, it is not possible to revert this change nor to mutate the contents of the data field. You can only delete and recreate the Secret. Even so, the kubelet does not watch for changes on immutable Secrets and ConfigMaps, so newly created pods may still get stale data. In case one wants to ensure that the new data values are used, the suggested course of action in this case is to recreate the Secret or ConfigMap under a new name.

However, I would strongly prefer a solution that fixes the root issue, and have the kubelet check the uid or something of the requested secret/configmap, and purge its internal cache based on that. Maybe this API can help in requesting only the metadata, if receiving the full object is a performance concern.

sftim · 2024-05-28T22:15:40Z

Repeating #42359 (comment) for clarity
Let's clarify that you should prefer to use a new name for your update to an immutable ConfigMap or Secret. If you want to use the same name, you have to first make sure that every existing reference to the old object (not just Pods, any reference) has been cleaned up.

It may be worse than this and that you also need to reboot nodes that have previously run Pods using the old ConfigMap, or go fix Kubernetes bugs, or whatever. However, we should nonetheless be recommending a straightforward approach: stop using the name of a ConfigMap or Secret that was once marked as immutable; make a new one instead.

juliantaylor added the kind/bug label Aug 2, 2023

k8s-ci-robot added needs-sig needs-triage labels Aug 2, 2023

k8s-ci-robot added sig/node and removed needs-sig labels Aug 2, 2023

k8s-ci-robot added kind/documentation triage/accepted and removed kind/bug needs-triage labels Aug 2, 2023

k8s-ci-robot added the sig/docs label Aug 2, 2023

k8s-ci-robot transferred this issue from kubernetes/kubernetes Aug 2, 2023

k8s-ci-robot added the sig/architecture label Aug 2, 2023

k8s-ci-robot assigned pegasas Aug 2, 2023

adityasamant25 mentioned this issue Mar 5, 2024

Document updating configuration via a ConfigMap #44540

Merged

pisto mentioned this issue May 27, 2024

kubelet should alway get lastest secret/configmap resource in pod add event kubernetes/kubernetes#124701

Open

labrenbe mentioned this issue Jul 16, 2024

fix: Re-create shared internal secret with a new name stackabletech/druid-operator#590

Merged

kannon92 added this to SIG Node Bugs Jul 22, 2024

kannon92 moved this to Triaged in SIG Node Bugs Jul 22, 2024

programmer04 mentioned this issue Aug 6, 2024

feat: introduce basic reconciliation for KongPluginInstallation CRD Kong/gateway-operator#424

Merged

1 task

SergeyKanzhelev removed this from SIG Node Bugs Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet does not reload recreated immutable configmap, unclear documentation #42359

kubelet does not reload recreated immutable configmap, unclear documentation #42359

juliantaylor commented Aug 2, 2023 •

edited

Loading

vaibhav2107 commented Aug 2, 2023

tzneal commented Aug 2, 2023

sftim commented Aug 2, 2023

sftim commented Aug 2, 2023

sftim commented Aug 2, 2023

sftim commented Aug 2, 2023

sftim commented Aug 2, 2023

pegasas commented Aug 2, 2023

pegasas commented Aug 2, 2023 •

edited

Loading

tengqm commented Aug 3, 2023

sftim commented Aug 3, 2023

juliantaylor commented Aug 3, 2023 •

edited

Loading

sftim commented Aug 3, 2023

juliantaylor commented Aug 3, 2023 •

edited

Loading

juliantaylor commented Aug 3, 2023

juliantaylor commented Aug 30, 2023

sftim commented Aug 30, 2023 •

edited

Loading

pegasas commented Aug 30, 2023

juliantaylor commented Aug 30, 2023 •

edited

Loading

pegasas commented Aug 30, 2023

juliantaylor commented Aug 30, 2023 •

edited

Loading

juliantaylor commented Aug 30, 2023

pegasas commented Aug 30, 2023

sftim commented Aug 30, 2023

pisto commented May 27, 2024 •

edited

Loading

sftim commented May 28, 2024

kubelet does not reload recreated immutable configmap, unclear documentation #42359

kubelet does not reload recreated immutable configmap, unclear documentation #42359

Comments

juliantaylor commented Aug 2, 2023 • edited Loading

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Kubernetes version

Cloud provider

OS version

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

vaibhav2107 commented Aug 2, 2023

tzneal commented Aug 2, 2023

sftim commented Aug 2, 2023

sftim commented Aug 2, 2023

sftim commented Aug 2, 2023

sftim commented Aug 2, 2023

sftim commented Aug 2, 2023

pegasas commented Aug 2, 2023

pegasas commented Aug 2, 2023 • edited Loading

tengqm commented Aug 3, 2023

sftim commented Aug 3, 2023

juliantaylor commented Aug 3, 2023 • edited Loading

sftim commented Aug 3, 2023

juliantaylor commented Aug 3, 2023 • edited Loading

juliantaylor commented Aug 3, 2023

juliantaylor commented Aug 30, 2023

sftim commented Aug 30, 2023 • edited Loading

pegasas commented Aug 30, 2023

juliantaylor commented Aug 30, 2023 • edited Loading

pegasas commented Aug 30, 2023

juliantaylor commented Aug 30, 2023 • edited Loading

juliantaylor commented Aug 30, 2023

pegasas commented Aug 30, 2023

sftim commented Aug 30, 2023

pisto commented May 27, 2024 • edited Loading

sftim commented May 28, 2024

juliantaylor commented Aug 2, 2023 •

edited

Loading

pegasas commented Aug 2, 2023 •

edited

Loading

juliantaylor commented Aug 3, 2023 •

edited

Loading

juliantaylor commented Aug 3, 2023 •

edited

Loading

sftim commented Aug 30, 2023 •

edited

Loading

juliantaylor commented Aug 30, 2023 •

edited

Loading

juliantaylor commented Aug 30, 2023 •

edited

Loading

pisto commented May 27, 2024 •

edited

Loading