-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPClaims missing for Machines #1354
Comments
This issue is currently awaiting triage. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What steps did you take and what happened:
It sometimes happens that a Metal3Machine appears without IPClaim after a rolling upgrade of a MachineDeployment.
Update
I think we have pinpointed the issue!
Code path:
If the new Metal3Data is created before the old IPClaim, it could take the old IPClaim and render the secret before the old IPClaim is deleted.
Reproduction criteria:
<bmh-name>-<ippool-name>
.<m3d-name>-<ippool-name>
.End of update
Second update
After trying to find out why/if the IPClaim was not deleted before/together with the Metal3Data, I have concluded that the IPClaim should always be deleted before the finalizers are removed from the Metal3Data. So how come we see this issue still?
I think the answer is cache. Everything except BMHs, ConfigMaps and Secrets is cached, so we could simply get a cached IPClaim 🤦
End of second update
Unfortunately we have not been able to reproduce the issue yet.
Here is what we know so far:
templateReference
. The reference is unique per MachineDeployment and kept the same when rolling to a new Metal3DataTemplate.What did you expect to happen:
All Metal3Machines that are configured to use IPClaims should have the proper IPClaims created.
Otherwise there is risk of IP clash and all kinds of strange things.
Anything else you would like to add:
We are still investigating this issue and we are not 100 % sure that there is an issue in CAPM3, but it is our best lead so far.
I'm creating this issue now to keep track of findings and link to potentially relevant known bugs.
Environment:
kubectl version
): v1.28.3Discovered or related issues:
templateReference
is needed at all./kind bug
The text was updated successfully, but these errors were encountered: