Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Target Allocator: Promotes a otel-collector pod in "pending state" to have targets allocated #2201

Open
sfc-gh-akrishnan opened this issue Oct 6, 2023 · 3 comments · Fixed by #2528
Assignees
Labels
area:target-allocator Issues for target-allocator bug Something isn't working

Comments

@sfc-gh-akrishnan
Copy link

sfc-gh-akrishnan commented Oct 6, 2023

How to reproduce:

  • Set open-telemetry collector replicas = 2 (or more)
  • Use PodAntiAffinity or request for unavailable resource (cpu / memory) there by the pods are not all schedule-able
  • Though the pods are yet in unschedule-able state, target allocator allocates endpoints to it

Shouldn't we wait for the pod to be scheduled before we allocate target endpoints to it?

@sfc-gh-akrishnan sfc-gh-akrishnan changed the title Bug: Target Allocator: Promotes a otel-collector pod in "pending state" to allocated Bug: Target Allocator: Promotes a otel-collector pod in "pending state" to have targets allocated Oct 6, 2023
@pavolloffay pavolloffay added area:target-allocator Issues for target-allocator bug Something isn't working labels Oct 9, 2023
@swiatekm
Copy link
Contributor

This is clear if the Pod in question is new, but much less clear if it's an existing Pod being rescheduled. Reassigning targets is a fairly expensive operation for the collectors themselves, as it flushes scrape caches, so we should avoid doing so carelessly. Maybe we should have a configurable timeout for existing Pods, so it's possible to control how much the allocator waits for a Pod before reassigning its targets?

@swiatekm
Copy link
Contributor

swiatekm commented May 1, 2024

#2528 didn't fix this, it just made the fix easier to implement. The main problem here is that it isn't clear to me what the behaviour should be like. I think the following Pods getting assigned targets works:

  • Pods which are Ready
  • Pods which were ready less than X seconds ago, and are now not ready, but also not Terminating

But I haven't completely thought this through. If anyone can think of any nasty edge cases for this problem, please speak up and let me know.

@swiatekm swiatekm reopened this May 1, 2024
@jaronoff97
Copy link
Contributor

sorry that was the auto-closer 😓

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:target-allocator Issues for target-allocator bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants