You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a cluster operator attempts to resize and modify a volume at the same time (By Patching the PVC), either the first ExapndVolume or ModifyVolume will be delayed by retry-interval-start
This is due to the external-resizer's modifyController and resizeController reconciliation loops which both attempt to patch the same PVC to mark the operation as in-progress. The loser will need to restart their reconciliation loop after waiting their retry-interval-start with the error can't patch status of PVC ebs-5935/pvc-d5jhc with Operation cannot be fulfilled on persistentvolumeclaims \"pvc-d5jhc\": the object has been modified; please apply your changes to the latest version and try again
These are the full logs with a retry interval start of 4 seconds on the external-resizer.
I1212 16:57:30.697020 1 event.go:389] "Event occurred" object="ebs-5935/pvc-d5jhc" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="VolumeModify" message="external resizer is modifying volume pvc-d5jhc with vac ebs-volume-tester-45bxg"
E1212 16:57:30.704249 1 controller.go:314] "Error syncing PVC" err="marking pvc \"ebs-5935/pvc-d5jhc\" as resizing failed: Mark PVC \"ebs-5935/pvc-d5jhc\" as resize as in progress failed: can't patch status of PVC ebs-5935/pvc-d5jhc with Operation cannot be fulfilled on persistentvolumeclaims \"pvc-d5jhc\": the object has been modified; please apply your changes to the latest version and try again"
I1212 16:57:34.712984 1 event.go:389] "Event occurred" object="ebs-5935/pvc-d5jhc" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Resizing" message="External resizer is resizing volume pvc-69a28244-127b-4c61-81e1-edaaa8ae2e51"
I1212 16:57:36.172525 1 event.go:389] "Event occurred" object="ebs-5935/pvc-d5jhc" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="VolumeModifySuccessful" message="external resizer modified volume pvc-d5jhc with vac ebs-volume-tester-45bxg successfully "
E1212 16:57:37.325585 1 controller.go:314] "Error syncing PVC" err="resize volume \"pvc-69a28244-127b-4c61-81e1-edaaa8ae2e51\" by resizer \"ebs.csi.aws.com\" failed: rpc error: code = Internal desc = Could not resize volume \"vol-078c2da88041a73f0\": rpc error: code = Internal desc = Could not modify volume \"vol-078c2da88041a73f0\": volume \"vol-078c2da88041a73f0\" in OPTIMIZING state, cannot currently modify"
I1212 16:57:37.325781 1 event.go:389] "Event occurred" object="ebs-5935/pvc-d5jhc" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Warning" reason="VolumeResizeFailed" message="resize volume \"pvc-69a28244-127b-4c61-81e1-edaaa8ae2e51\" by resizer \"ebs.csi.aws.com\" failed: rpc error: code = Internal desc = Could not resize volume \"vol-078c2da88041a73f0\": rpc error: code = Internal desc = Could not modify volume \"vol-078c2da88041a73f0\": volume \"vol-078c2da88041a73f0\" in OPTIMIZING state, cannot currently modify"
I1212 16:57:37.333980 1 event.go:389] "Event occurred" object="ebs-5935/pvc-d5jhc" fieldPath="" kind="PersistentVolumeClaim" apiVersion="v1" type="Normal" reason="Resizing" message="External resizer is resizing volume pvc-69a28244-127b-4c61-81e1-edaaa8ae2e51"
E1212 16:57:39.965349 1 controller.go:314] "Error syncing PVC" err="resize volume \"pvc-69a28244-127b-4c61-81e1-edaaa8ae2e51\" by resizer \"ebs.csi.aws.com\" failed: rpc error: code = Internal desc = Could not resize volume \"vol-078c2da88041a73f0\": rpc error: code = Internal desc = Could not modify volume \"vol-078c2da88041a73f0\": volume \"vol-078c2da88041a73f0\" in OPTIMIZING state, cannot currently modify"
.....
This affects the EBS CSI Driver if retry-interval-start is large because it attempts to coalesce both the ExpandVolume and ModifyVolume RPCs into one EC2 ModifyVolume call (due to AWS' 6 hour volume modification cooldown) which is not possible if one of these rpcs has to wait retry-interval-start due to this issue
Happy to work on this issue if we agree it is worth solving.
The text was updated successfully, but these errors were encountered:
As discussed in Kubernetes CSI Implementation Team Standup we do not want to pull a fresh pvc before each of these patches as the failure we would see otherwise is an indicator that the controllers view of the world is wrong and as such it would be much safer to restart the reconciliation loop .Additionally we do not want to set addResourceVersionCheck to false for these calls as this is an indicator that we in fact do need to retry the reconciliation loop and is intended behavior. Best steps for aws-ebs-csi-driver in the case as discussed is to keep the retry-interval-start to 1 second to avoid this issue. Combining resize controller and modify controller work queues was considered for a long term solution, though this would complicate EBS CSI Driver request coalescing.
If a cluster operator attempts to resize and modify a volume at the same time (By Patching the PVC), either the first
ExapndVolume
orModifyVolume
will be delayed byretry-interval-start
This is due to the external-resizer's modifyController and resizeController reconciliation loops which both attempt to patch the same PVC to mark the operation as in-progress. The loser will need to restart their reconciliation loop after waiting their
retry-interval-start
with the errorcan't patch status of PVC ebs-5935/pvc-d5jhc with Operation cannot be fulfilled on persistentvolumeclaims \"pvc-d5jhc\": the object has been modified; please apply your changes to the latest version and try again
The patch attempt happens in these three places markControllerModifyVolumeStatus, updateConditionBasedOnError, and finally in markControllerModifyVolumeCompleted for modifying the volume.
And in these three places for resizing the volume markPVCAsFSResizeRequired, markPVCResizeInProgress, and markPVCResizeFinished
These are the full logs with a retry interval start of 4 seconds on the external-resizer.
This affects the EBS CSI Driver if retry-interval-start is large because it attempts to coalesce both the ExpandVolume and ModifyVolume RPCs into one EC2 ModifyVolume call (due to AWS' 6 hour volume modification cooldown) which is not possible if one of these rpcs has to wait
retry-interval-start
due to this issueHappy to work on this issue if we agree it is worth solving.
The text was updated successfully, but these errors were encountered: