-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Track number of tasks in executor as metric in scheduler job #29579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This ensures that the stats gauge for scheduler.tasks.running isn't always 0.
1a6a7ff to
15532a7
Compare
|
PR which added HA Scheduler and removed calculation of this metric merged 2 years ago #10956, so I guess it is very small chance that @ashb remember why it removed. I just assume that instead of validate is this TI queued or running could we just use |
|
Memory of this is hazy - I guess the main issue is how you will behave if you have more than one scheduler |
|
@Taragolis - it is also entirely possible that the right thing to do is just delete this metric and the docs for it, given that it has been wrong for so long. |
|
That could be an option if we really do not know is is possible to have more or less truthfully value for this metric |
I think there should be a way to distinguish schedulers and have different metrics for them. But this is tricky with our approach where we can just literally start running yet-another-scheduler (and the bad thing in this context is that we do not even know how many schedulers we are running). I'd be for dropping this metrics altogether |
|
I agree. This looks tricky to get right, meanwhile it's been broken for long. I would be for dropping it. |
|
I will be up for removing it. separately, most (or all) of the scheduler metrics might suffer from the problem when run in HA of reporting only their number without identifying what comes from where. Adding a suffix of |
Let's remove it then. |
This ensures that the stats gauge for scheduler.tasks.running isn't always 0.
closes: #29578
I was unclear of how to set things up in a test to test this behavior, so any advice there would be welcome. This is based on the old behavior of how this worked before the incrementing of
num_tasks_in_executorwas deleted, but if there is a better way to do this I'm happy to update.