Bug: Target Allocator: Promotes a otel-collector pod in "pending state" to have targets allocated #2201

sfc-gh-akrishnan · 2023-10-06T20:51:31Z

How to reproduce:

Set open-telemetry collector replicas = 2 (or more)
Use PodAntiAffinity or request for unavailable resource (cpu / memory) there by the pods are not all schedule-able
Though the pods are yet in unschedule-able state, target allocator allocates endpoints to it

Shouldn't we wait for the pod to be scheduled before we allocate target endpoints to it?

swiatekm · 2023-10-15T14:09:17Z

This is clear if the Pod in question is new, but much less clear if it's an existing Pod being rescheduled. Reassigning targets is a fairly expensive operation for the collectors themselves, as it flushes scrape caches, so we should avoid doing so carelessly. Maybe we should have a configurable timeout for existing Pods, so it's possible to control how much the allocator waits for a Pod before reassigning its targets?

swiatekm · 2024-05-01T14:51:38Z

#2528 didn't fix this, it just made the fix easier to implement. The main problem here is that it isn't clear to me what the behaviour should be like. I think the following Pods getting assigned targets works:

Pods which are Ready
Pods which were ready less than X seconds ago, and are now not ready, but also not Terminating

But I haven't completely thought this through. If anyone can think of any nasty edge cases for this problem, please speak up and let me know.

jaronoff97 · 2024-05-01T14:55:21Z

sorry that was the auto-closer 😓

sfc-gh-akrishnan changed the title ~~Bug: Target Allocator: Promotes a otel-collector pod in "pending state" to allocated~~ Bug: Target Allocator: Promotes a otel-collector pod in "pending state" to have targets allocated Oct 6, 2023

pavolloffay added area:target-allocator Issues for target-allocator bug Something isn't working labels Oct 9, 2023

pavolloffay mentioned this issue Oct 9, 2023

Bug: Target Allocator: Promotes a otel-collector pod in "pending state to #2200

Closed

swiatekm mentioned this issue Apr 30, 2024

[chore] Use informer to track collector Pods in target allocator #2528

Merged

swiatekm self-assigned this Apr 30, 2024

jaronoff97 closed this as completed in #2528 May 1, 2024

swiatekm reopened this May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Target Allocator: Promotes a otel-collector pod in "pending state" to have targets allocated #2201

Bug: Target Allocator: Promotes a otel-collector pod in "pending state" to have targets allocated #2201

sfc-gh-akrishnan commented Oct 6, 2023 •

edited

Loading

swiatekm commented Oct 15, 2023

swiatekm commented May 1, 2024

jaronoff97 commented May 1, 2024

Bug: Target Allocator: Promotes a otel-collector pod in "pending state" to have targets allocated #2201

Bug: Target Allocator: Promotes a otel-collector pod in "pending state" to have targets allocated #2201

Comments

sfc-gh-akrishnan commented Oct 6, 2023 • edited Loading

swiatekm commented Oct 15, 2023

swiatekm commented May 1, 2024

jaronoff97 commented May 1, 2024

sfc-gh-akrishnan commented Oct 6, 2023 •

edited

Loading