-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce RequestEviction feature for evicting pods in background (KEP-1397) #1466
base: master
Are you sure you want to change the base?
Introduce RequestEviction feature for evicting pods in background (KEP-1397) #1466
Conversation
1c05dac
to
d022bf4
Compare
Compared to virt-handler produced by my local kind these are the logs produced by the upstream CI:
Could be kubevirt does not like combination of running kind in pod with dind enabled. |
3210921
to
90a6b55
Compare
/retest-required |
@ingvagabund can you put the vendor changes into their own commit for review? |
15856d3
to
8203821
Compare
@damemi done |
/retest-required |
8203821
to
edb0100
Compare
@@ -210,13 +569,36 @@ func evictPod(ctx context.Context, client clientset.Interface, pod *v1.Pod, poli | |||
}, | |||
DeleteOptions: deleteOptions, | |||
} | |||
err := client.PolicyV1().Evictions(eviction.Namespace).Evict(ctx, eviction) | |||
err := pe.client.PolicyV1().Evictions(eviction.Namespace).Evict(ctx, eviction) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered adding a similar retry mechanism?
retriable := func(err error) bool {
return apierrors.IsTooManyRequests(err) || apierrors.IsInternalError(err) || apierrors.IsServerTimeout(err) || apierrors.IsUnexpectedServerError(err)
}
err = retry.OnError(retry.DefaultRetry, retriable, func() error {
return evictPod(ctx, pe.client, pod, pe.directEvict)
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we discussed this in the past. Though, I have not seen anyone reporting this as annoying. We can always update the code. Nevertheless, if the returned errors are returned again for some period of time, retrying will not help. If only few evictions return the error codes the descheduler will try to evict another pod. All the pods are still treated as cattle. So the impact is usually negligible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the impact is not significant. For example, the rescheduling policy uses the method podutil.SortPodsBasedOnPriorityLowToHigh(removablePods)
to sort the pods. Lower-priority pods at the front may fail to be evicted due to network issues, resulting in the eviction of higher-priority pods later, even though all these pods are evictable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. This could be a suboptional case. Too many higher priority pods getting evicted instead of lower priority ones. Nevertheless, this deserves a separate PR/change. Retries are outside the scope of this.
No obvious bugs were found; everything except the commented code looks good to me. |
edb0100
to
83d8ff9
Compare
79eebf1
to
9b5eae3
Compare
/retest-required |
When the feature is enabled each pod with descheduler.alpha.kubernetes.io/request-evict-only annotation will have the eviction API error examined for a specific error code/reason and message. If matched eviction of such a pod will be interpreted as initiation of an eviction in background.
9b5eae3
to
a2b5211
Compare
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@@ -45,6 +50,10 @@ type DeschedulerServer struct { | |||
SecureServing *apiserveroptions.SecureServingOptionsWithLoopback | |||
DisableMetrics bool | |||
EnableHTTP2 bool | |||
// FeatureGates enabled by the user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, the approach used by upstream is via ComponentGlobalsRegistry. Might be good to consider if it makes sense to introduce that.
@@ -115,7 +116,13 @@ func NewDeschedulerCommand(out io.Writer) *cobra.Command { | |||
} | |||
|
|||
func Run(ctx context.Context, rs *options.DeschedulerServer) error { | |||
err := tracing.NewTracerProvider(ctx, rs.Tracing.CollectorEndpoint, rs.Tracing.TransportCert, rs.Tracing.ServiceName, rs.Tracing.ServiceNamespace, rs.Tracing.SampleRate, rs.Tracing.FallbackToNoOpProviderOnError) | |||
err := features.DefaultMutableFeatureGate.SetFromMap(rs.FeatureGates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the initializations would be better placed in the apply logic
eventRecorder, | ||
podInformer, | ||
rs.DefaultFeatureGates, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to pass the feature gates as an argument everywhere? Compared to the global variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was keeping in mind the case where the descheduling framework is run as a simulator. E.g. in case two or more framework instances are created in the same run/binary.
@@ -410,6 +417,7 @@ func RunDeschedulerStrategies(ctx context.Context, rs *options.DeschedulerServer | |||
|
|||
sharedInformerFactory.Start(ctx.Done()) | |||
sharedInformerFactory.WaitForCacheSync(ctx.Done()) | |||
descheduler.podEvictor.WaitForHandlersSync(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
descheduler.podEvictor.WaitForHandlersSync(ctx) | |
descheduler.podEvictor.WaitForEventHandlersSync(ctx) |
Handler is a bit too generic IMO
return | ||
} | ||
// Ignore any pod that does not have eviction in progress | ||
if _, exists := newPod.Annotations[EvictionInProgressAnnotationKey]; !exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how can we get to this line and delete pods that lost their annotation, when we do
if newPod.Annotations == nil {
return
}
// Ignore pod's that are not subject to an eviction in background
if _, exists := newPod.Annotations[EvictionRequestAnnotationKey]; !exists {
return
}
just above?
|
||
type evictionRequestItem struct { | ||
pod *v1.Pod | ||
assumed bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we please make the name of the field more verbose?
@@ -210,13 +592,36 @@ func evictPod(ctx context.Context, client clientset.Interface, pod *v1.Pod, poli | |||
}, | |||
DeleteOptions: deleteOptions, | |||
} | |||
err := client.PolicyV1().Evictions(eviction.Namespace).Evict(ctx, eviction) | |||
err := pe.client.PolicyV1().Evictions(eviction.Namespace).Evict(ctx, eviction) | |||
if pe.featureGates.Enabled(features.EvictionsInBackground) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returning on nil err here first could make the rest of the evaluation simpler
@@ -314,7 +315,7 @@ func (d profileImpl) RunDeschedulePlugins(ctx context.Context, nodes []*v1.Node) | |||
span.AddEvent("Plugin Execution Failed", trace.WithAttributes(attribute.String("err", status.Err.Error()))) | |||
errs = append(errs, fmt.Errorf("plugin %q finished with error: %v", pl.Name(), status.Err)) | |||
} | |||
klog.V(1).InfoS("Total number of pods evicted", "extension point", "Deschedule", "evictedPods", d.podEvictor.TotalEvicted()-evicted) | |||
klog.V(1).InfoS("Total number of evictions/requests", "extension point", "Deschedule", "evictedPods", d.podEvictor.TotalEvicted()-evicted, "evictionRequests", d.podEvictor.TotalEvictionRequests()-evictionRequests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it might be good to unify the logging and use keysAndValues
everywhere
@@ -108,6 +423,46 @@ func (pe *PodEvictor) SetClient(client clientset.Interface) { | |||
pe.client = client | |||
} | |||
|
|||
func (pe *PodEvictor) evictionRequestsTotal() uint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot what was the consensus last time, but do we want to introduce new metrics for this feature? https://github.com/kubernetes-sigs/descheduler/blob/master/keps/1397-evictions-in-background/README.md#metrics
ObjectMeta: metav1.ObjectMeta{ | ||
Name: fmt.Sprintf("kubevirtvmi-%v", idx), | ||
Annotations: map[string]string{ | ||
"descheduler.alpha.kubernetes.io/request-evict-only": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm, the feature depends only on the key, and not on the value? Would be good to add a little a bit of the documentation above EvictionRequestAnnotationKey
and EvictionInProgressAnnotationKey
Implementation of #1354.
When the feature is enabled each pod with
descheduler.alpha.kubernetes.io/request-evict-only
annotation will have the eviction API error examined for a specific error code/reason and message. If matched eviction of such a pod will be interpreted as initiation of an eviction in background.The first version of the feature does not provide any way to configure a timeout for pending evictions in background. Neither a timeout for how often a pending evictions in background are checked for timeouts. Both set to 10 minutes by default. The default values can be adjusted based on provided feedback.
TODO:
EvictionsInBackground
as a new feature gate)Refactorings before this PR can be merged: