-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart device plugin pod after policy apply #173
Restart device plugin pod after policy apply #173
Conversation
Until k8snetworkplumbingwg/sriov-network-device-plugin#276 wiill be fixed we need to restart device plugin pod each time after SR-IOV Network Operator plugin applied. It's needed because plugin could change a number of VF resources even if config is not changed.
What are those changes a plugin may do without sriov policy being updated? |
I am not sure this change is needed. we have encountered some issues in our CI. but it may be env issue. for now this should be on hold |
If my understanding is correct, if policy is not updated this code won't be executed because the generation number will be the same |
Correct, only new generation of policy will trigger the device plugin restart. |
Do you see a situation that devices are updated but the conditional checks for reqDrain and DpConfigVersion are not met? |
@zshi-redhat yes, you can see logs of network-operator CI where this issue is reproduced: http://13.74.249.42/nic_operator_helm-ci/405/ |
@zshi-redhat we've to a pretty specific use-case when we've got custom SriovOperatorConfig with configDaemonNodeSelector and labels could be dynamically added or removed |
@zshi-redhat I've got such steps to reproduce the issue:
so my patch just adds reboot each time we do changes to SR-IOV configuration |
@@ -519,14 +519,11 @@ func (dn *Daemon) nodeStateSyncHandler(generation int64) error { | |||
} | |||
|
|||
// restart device plugin pod | |||
if reqDrain || latestState.Spec.DpConfigVersion != dn.nodeState.Spec.DpConfigVersion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@e0ne With your fix, the device plugin pod will be restarted in every reconcile loop. It will cause many unnecessary pod restarts.
The issue you mentioned only happens when the numVFs
is changed from 0 to a non-zero value, as we don't need to drain the node when VFs are fresh created
if ifaceStatus.NumVfs != 0 { |
So a better solution should be to add this condition to trigger the device plugin pod restart, instead of always triggering a restart.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this code will be executed only if node state is changed: https://github.com/k8snetworkplumbingwg/sriov-network-operator/blob/master/pkg/daemon/daemon.go#L438
@zshi-redhat This PR looks good to me. Do you have any other comments? |
no, looks good to me. |
We need to restart device plugin pod after node policy applied
and all SR-IOV Network Config Daemon plugins finished. E.g.:
Generic plugin applies SR-IOV Network Node Policy and creates
VFs. That's mean we need to restart device plugin pod to found
newly created devices.
Related k8snetworkplumbingwg/sriov-network-device-plugin#276