-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet does not reload recreated immutable configmap, unclear documentation #42359
Comments
/sig node |
/kind documentation |
/uncc /sig docs |
/transfer website |
I believe from the evidence here that we may be able build a test case where, starting from a namespace that contains:
can create a ConfigMap and a Pod that references the ConfigMap, and (because of what was previously in that namespace) see the old ConfigMap loaded into that Pod. Even if the new ConfigMap is not marked as immutable. The period where that's possible might be quite short though. This sounds more plausible; starting from a namespace that contains:
create a new ConfigMap (same name) and a new Pod. You may observe that the new Pod appears to have loaded the wrong data, from the old ConfigMap. We might need to define that you should delete every object that could have had a reference to that old ConfigMap - potentially including cluster-scoped objects such as IngressClasses and ValidatingAdmissionPolicies. Advising people to create the replacement ConfigMap with a different name is not enough; we cannot assume that a different user is aware of the potential problems around use of the old name. |
We might be able to detect in-use immutable ConfigMaps (within the control plane), and do something around finalizing them. That mechanism could eventually enable sending a warning when an immutable ConfigMap is deleted whilst still in use. |
I recommend that SIG Architecture chip in here. /sig architecture |
/assign @pegasas |
Hi, @sftim ,
and
and I create these 2 in k8s cluster, and echo the key:
and after I delete the cm,
it keeps the same value:
|
The motivation for Immutable ConfigMap/Secret is to avoid watching them. Since they are immutable, you get what is in it, and treat it as constant. You don't care about the future changes to them, they are immutable after all. If you want to watch them, especially for updates, you won't use immutable ConfigMap/Secret. Protecting ConfigMap/Secret in use is another topic. See KEP 2840. |
The test case explained in #42359 (comment) covers already documented behavior; if that's the test case, I don't think we need to make a docs change. I thought you were concerned about ConfigMap removal and recreation with a new |
Protecting a configmap from being deleted while it is in use is not sufficient, a configmap or secret name cannot be reused after it has been deleted until all kubelets have been restarted or you risk a pod seeing old content. This likely also happens long after all pods using the old version are long gone (create deployment and immutable cm, delete it, wait some time redeploy with different configmap content), though I have not explicitly verified full delete and longer timescales, only rolling updates with the same cm name. The practical solution to this is to have immutable configmaps and secrets name contain the checksum of its content. That way it is almost impossible for wrong data to end in a pod even when the kubelet runs for a long time. If you can fix the underlying issue all the better, but as stated the point of immutable is to reduce api requests and watches, so watching for deletes is not an sensible option. Possibly you could have the kubelet refetch the object when it is required but only if it already exists as immutable in its caches. |
Let's make sure we know what (replicable) problem we're helping people to avoid, before we triage this as accepted. |
ok here the reproducer I stated in the beginning written out:
restart the kubelet or launch on a node teh pod was not on yet and you get the new content |
full delete of the deployment does seem to refresh the configmap, so kubelet may have something to cleanup unused configmaps, in that case protecting deletion in use would work |
Any updates? Unless it can be trivially fixed and released fast, I just want the documentation to be updated to make the existing semantics of immutable objects clearer. |
@juliantaylor the Kubernetes project is open source and the documentation is largely maintained by volunteers. Depending on how much you want a fix, you could:
|
Hi, @juliantaylor ,
You aim to add this into docs as a supplement, right? I think your scenario may obey our design rule. What do you think? |
I would suggest to update the following from:
to
I can open a doc PR if you agree this represents intended behavior or won't be fixed soon and thus worth documenting for the time being, |
Yeah, indeed it fits this behavior you mentioned.
I believe that this is what @tengqm and @sftim trying to clarify in the comments above. |
I am only concerned about the existing documentation implying deleting and recreating a immutable configmap/secret and restarting pods is safe to do which it is not due to the potential serving of stale content. I'm also fine with updating it to say do not ever delete and recreate an immutable secret or configmap and remove the whole section about doing it and restarting pods. But in this case it should also be mentioned why as it is may not immediately obvious to the reader that this is unsafe and yet deletion is allowed by the API. As for why one would use immutable configmaps, in large clusters with many larges nodes and pods the many many configmap watches do put some load on the API. The majority of configmaps never to barely ever changes e.g. root CA certificates and many deployment use versioned configurationmaps anyway (each change creates a new configmap). |
May this is better at discouraging changing immutable objects:
Possibly one could also clarify how long the deleted configmap/secret can be cached which I assume is as long as something in the cluster references it (e.g. a still running pod with the deleted cm mounted). |
I think this is a good point for "clarify how long the deleted configmap/secret can be cached". |
Let's clarify that you should prefer to use a new name for your update to an immutable ConfigMap or Secret. If you want to use the same name, you have to first make sure that every existing reference to the old object (not just Pods, any reference) has been cleaned up. |
As of today, the official docs are not clear on this subject. In both the ConfigMap and Secrets concepts pages, it states:
This does not hint the user on the fact that recreated pods may still get old stale value, at least if used as environment variables. The message as it is actually suggests (IMO clearly states) that when an immutable secret actually changes, it is sufficient to recreate it and the consuming pods, which is not correct. It took us quite some time to figure this out, as we have been seeing inconsistent behavior when scheduling a pod across nodes (some nodes would see the new value, and others the old value). For an end user, this does not make any sense. From an administrator perspective, it took some time to isolate the issue and the particular level in the stack (kubelet? container runtime? race issues?) where the problem was originating. This also breaks the assumption that the state of the cluster is seen consistently across command line users, kubelets and other observers such as operator (we are using the ExternalSecrets operator, so that's one extra layer that we had to debug and exclude from the loop to understand the problem). IMO it should be worded about this way:
However, I would strongly prefer a solution that fixes the root issue, and have the kubelet check the |
Repeating #42359 (comment) for clarity It may be worse than this and that you also need to reboot nodes that have previously run Pods using the old ConfigMap, or go fix Kubernetes bugs, or whatever. However, we should nonetheless be recommending a straightforward approach: stop using the name of a ConfigMap or Secret that was once marked as immutable; make a new one instead. |
What happened?
With immutable configmaps and secrets are not watched by the kubelet. This also means it does not refresh the loaded content when they are deleted and recreated.
The documentation of immutable configmaps though is misleading on how this needs to be handled by pods:
https://kubernetes.io/docs/concepts/configuration/configmap/#configmap-immutable
This to me implies deleting and recreating the configmap and restarting the pods using it is sufficient, but it is not as the kublet will still have the old version loaded and won't update it until it is restarted.
This behavior is more or less expected but the documentation should be more explicit about this pitfall.
What did you expect to happen?
The documentation should state that immutable objects mounted into pods need to be deleted and recreated with a different name in order for the (recreated) pods to see the new version reliably.
How can we reproduce it (as minimally and precisely as possible)?
Start a deployment with an immutable configmap mounted. Change it, delete and recreate it with the same name and restart the deployment.
Pods on the kubelet will still have the old version until the kubelet is restarted and then the pods restarted.
see #42359 (comment) for an example deployment
Anything else we need to know?
No response
Kubernetes version
Cloud provider
bare metall
OS version
No response
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)
The text was updated successfully, but these errors were encountered: