-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run Unbind/Bind VF in a serial manner. #30
Run Unbind/Bind VF in a serial manner. #30
Conversation
Missing testing on a real system |
dad1d62
to
4bf4694
Compare
Unbind/Bind VF as well as moving RDMA device to namespace causes rdma resources to be re-created. CNI may be invoked in parallel and kernel may provide a VF RDMA resources under a different name. As the mapping of RDMA resources is done in Device plugin prior to CNI invocation, it must not change during CNI invocation. We serialize the CNI's Add,Del operations by Creating/Locking/Unlocking a file, causing kernel to allocate RDMA resources the same name for a given VF. In the future, Systems should use udev PCI based RDMA device names, ensuring consistent RDMA resources names.
4bf4694
to
7de7717
Compare
@@ -82,6 +108,14 @@ func cmdAdd(args *skel.CmdArgs) error { | |||
} | |||
|
|||
sm := sriov.NewSriovManager() | |||
|
|||
// Lock CNI operation to serialize the operation | |||
lock, err := lockCNIExecution() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be noted, that while further optimizations where possible both here and in CmdDel (unlock eralier than on first error or end of execution)
it would have introduced (imo) un-needed code complexity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about catching signals to remove the lock?
Did you check what will happen if you put sleep of 5 min in the CNI? will kubernetes kill it?
@@ -230,6 +264,13 @@ func cmdDel(args *skel.CmdArgs) error { | |||
} | |||
defer netns.Close() | |||
|
|||
// Lock CNI operation to serialize the operation | |||
lock, err := lockCNIExecution() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is already locked from line 113
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, ill remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at this again, i originally thought that i mistakenly locked twice. but this is not the case.
this lock is for cmdDel and on line 113 its for cmdAdd
please fix lint errors |
This is required to release the lock in case of a signal sent to CNI. Note: While operating systems are supposed to free a locked file if the PID of the locking process does not exist, they may not do so promptly.
205901f
to
f842f5f
Compare
forgot to push :\ it should pass now |
Unbind/Bind VF as well as moving RDMA device to namespace
causes rdma resources to be re-created.
CNI may be invoked in parallel and kernel may provide a VF
RDMA resources under a different name.
As the mapping of RDMA resources is done in Device plugin
prior to CNI invocation, it must not change during CNI invocation.
We serialize the CNI's Add,Del operations by Creating/Locking/Unlocking
a file, causing kernel to allocate RDMA resources the same name
for a given VF.
In the future, Systems should use udev PCI based RDMA device
names, ensuring consistent RDMA resources names.