Skip to content

Conversation

@pdebelak
Copy link
Contributor

This ensures that the stats gauge for scheduler.tasks.running isn't always 0.

closes: #29578

I was unclear of how to set things up in a test to test this behavior, so any advice there would be welcome. This is based on the old behavior of how this worked before the incrementing of num_tasks_in_executor was deleted, but if there is a better way to do this I'm happy to update.

@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Feb 16, 2023
This ensures that the stats gauge for scheduler.tasks.running isn't
always 0.
@pdebelak pdebelak force-pushed the tasks-running-gauge-increment branch from 1a6a7ff to 15532a7 Compare February 16, 2023 18:58
@Taragolis
Copy link
Contributor

PR which added HA Scheduler and removed calculation of this metric merged 2 years ago #10956, so I guess it is very small chance that @ashb remember why it removed.

I just assume that instead of validate is this TI queued or running could we just use len(self.executor.queued_tasks) + len(self.executor.running) as value for this metric. But I'm not familiar with SchedulerJob so better hear opinion from someone who more confident in this field rather than me.

@ashb
Copy link
Member

ashb commented Feb 18, 2023

Memory of this is hazy - I guess the main issue is how you will behave if you have more than one scheduler

@pdebelak
Copy link
Contributor Author

@Taragolis - it is also entirely possible that the right thing to do is just delete this metric and the docs for it, given that it has been wrong for so long.

@Taragolis
Copy link
Contributor

That could be an option if we really do not know is is possible to have more or less truthfully value for this metric

@potiuk
Copy link
Member

potiuk commented Feb 20, 2023

Memory of this is hazy - I guess the main issue is how you will behave if you have more than one scheduler

I think there should be a way to distinguish schedulers and have different metrics for them. But this is tricky with our approach where we can just literally start running yet-another-scheduler (and the bad thing in this context is that we do not even know how many schedulers we are running).

I'd be for dropping this metrics altogether

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Feb 24, 2023

I agree. This looks tricky to get right, meanwhile it's been broken for long. I would be for dropping it.

@kaxil
Copy link
Member

kaxil commented Mar 8, 2023

I will be up for removing it.

separately, most (or all) of the scheduler metrics might suffer from the problem when run in HA of reporting only their number without identifying what comes from where. Adding a suffix of scheduler_job_id to those metrics is a potential solution

@potiuk
Copy link
Member

potiuk commented Mar 10, 2023

I will be up for removing it.

separately, most (or all) of the scheduler metrics might suffer from the problem when run in HA of reporting only their number without identifying what comes from where. Adding a suffix of scheduler_job_id to those metrics is a potential solution

Let's remove it then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

scheduler.tasks.running metric is always 0

6 participants