You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Set open-telemetry collector replicas = 2 (or more)
Use PodAntiAffinity or request for unavailable resource (cpu / memory) there by the pods are not all schedule-able
Though the pods are yet in unschedule-able state, target allocator allocates endpoints to it
Shouldn't we wait for the pod to be scheduled before we allocate target endpoints to it?
The text was updated successfully, but these errors were encountered:
sfc-gh-akrishnan
changed the title
Bug: Target Allocator: Promotes a otel-collector pod in "pending state" to allocated
Bug: Target Allocator: Promotes a otel-collector pod in "pending state" to have targets allocated
Oct 6, 2023
This is clear if the Pod in question is new, but much less clear if it's an existing Pod being rescheduled. Reassigning targets is a fairly expensive operation for the collectors themselves, as it flushes scrape caches, so we should avoid doing so carelessly. Maybe we should have a configurable timeout for existing Pods, so it's possible to control how much the allocator waits for a Pod before reassigning its targets?
#2528 didn't fix this, it just made the fix easier to implement. The main problem here is that it isn't clear to me what the behaviour should be like. I think the following Pods getting assigned targets works:
Pods which are Ready
Pods which were ready less than X seconds ago, and are now not ready, but also not Terminating
But I haven't completely thought this through. If anyone can think of any nasty edge cases for this problem, please speak up and let me know.
How to reproduce:
Shouldn't we wait for the pod to be scheduled before we allocate target endpoints to it?
The text was updated successfully, but these errors were encountered: