-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PV controller: don't delete PVs when PVC is not known yet #95909
PV controller: don't delete PVs when PVC is not known yet #95909
Conversation
Normally, the PV controller knows about the PVC that triggers the creation of a PV before it sees the PV, because the PV controller must set the volume.beta.kubernetes.io/storage-provisioner annotation that tells an external provisioner to create the PV. When restarting, the PV controller first syncs its caches, so that case is also covered. However, the creator of a PVC might decided to set that annotation itself to speed up volume creation. While unusual, it's not forbidden and thus part of the external Kubernetes API. Whether it makes sense depends on the intentions of the user. When that is done and there is heavy load, an external provisioner might see the PVC and create a PV before the PV controller sees the PVC. If the PV controller then encounters the PV before the PVC, it incorrectly concludes that the PV needs to be deleted instead of being bound. The same issue occurred earlier for external binding and the existing code for looking up a PVC in the cache or in the apiserver solves the issue also for volume provisioning, it just needs to be enabled also for PVs without the pv.kubernetes.io/bound-by-controller annotation.
@pohly: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsafrane does allowing users to set storage-provisioner annotation directly potentially break anything in external provisioner lib? I seem to recall some assumptions made there in the past.
@@ -577,9 +577,10 @@ func (ctrl *PersistentVolumeController) syncVolume(volume *v1.PersistentVolume) | |||
if err != nil { | |||
return err | |||
} | |||
if !found && metav1.HasAnnotation(volume.ObjectMeta, pvutil.AnnBoundByController) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a unit test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try.
What exactly was the situation with external binder? That variant also seems to lack a unit test, or at least https://github.com/kubernetes/kubernetes/pull/67062/files didn't add one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added more tests and modernized the testing a bit. See individual commits for details.
I'm not sure whether the situation with the external binder is covered, but the code no longer checks for that, so I'd say it's not needed.
I've not found a way to store the PVC in the informer cache, so the code now falls back all the way to a Get operation that then gets satisfied by the reactor.
I've verified manually that this happens and added a corresponding test case where the PVC really doesn't exist. I briefly checked whether it would be possible to validate that the Get happens resp. doesn't happen, but found no hook in the existing framework for such an extra check and would prefer to not make changes that affect other tests.
I also need to get back to some other work 😅
This makes it possible to run tests with -v=5 and thus actually get some output.
This makes it possible to run individual tests.
The main goal was to cover retrieval of a PVC from the apiserver when it isn't known yet. This is achieved by adding PVCs and (for the sake of completeness) PVs to the reactor, but not the controller, when a special annotation is set. The approach with a special annotation was chosen because it doesn't affect other tests. The other test cases were added while checking the existing tests because (at least at first glance) the situations seemed to be not covered.
I would like to backport this fix to older releases. Should that be done with or without the additional testing? |
for _, test := range tests { | ||
klog.V(4).Infof("starting test %q", test.name) | ||
|
||
doit := func(test controllerTest) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not this?
for _, test := range tests {
test := test
t.Run(test.name, func(t *testing.T) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I didn't want to re-indent the entire test code. As it stands now, "git blame" is still useful, whereas without the doit
function everything would get shifted once to the right.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jsafrane, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
Normally, the PV controller knows about the PVC that triggers the
creation of a PV before it sees the PV, because the PV controller must
set the volume.beta.kubernetes.io/storage-provisioner annotation that
tells an external provisioner to create the PV.
When restarting, the PV controller first syncs its caches, so that
case is also covered.
However, the creator of a PVC might decided to set that annotation
itself to speed up volume creation. While unusual, it's not forbidden
and thus part of the external Kubernetes API. Whether it makes sense
depends on the intentions of the user.
When that is done and there is heavy load, an external provisioner
might see the PVC and create a PV before the PV controller sees the
PVC. If the PV controller then encounters the PV before the PVC, it
incorrectly concludes that the PV needs to be deleted instead of being
bound.
The same issue occurred earlier for external binding and the existing
code for looking up a PVC in the cache or in the apiserver solves the
issue also for volume provisioning, it just needs to be enabled also
for PVs without the pv.kubernetes.io/bound-by-controller annotation.
Which issue(s) this PR fixes:
Fixes kubernetes-csi/external-provisioner#507
Does this PR introduce a user-facing change?: