Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsystem metrics for task manager #12235

Merged
merged 1 commit into from
Jun 14, 2022

Conversation

fosterseth
Copy link
Member

@fosterseth fosterseth commented May 13, 2022

SUMMARY

Adds a handful of subsystem metrics for the task manager. It tracks data from the last-ran task manager cycle only The metrics are saved only at the very end of the task manager schedule() call. . A consequence of that useful task manager data may be quickly overridden before prometheus can scrape the endpoint.

imagine this timeline of events

prometheus scrapes /api/v2/metrics
queue a bunch of jobs
task manager runs and processes all these jobs (maybe takes a long time to run)
another task manager is scheduled and processes no jobs (takes ~0 seconds to run)
prometheus scrapes /api/v2/metrics

prometheus would have missed the first task manager cycle that did a lot of processing, and only "see" the data from the task manager that ran 0 jobs.

To combat this, the task manager will track each time it records metrics. If the last time the metrics were written is less than 15s ( settings.SUBSYSTEM_METRICS_TASK_MANAGER_RECORD_INTERVAL), the task manager will not record metrics to redis. This will give prometheus enough time to scrape the metrics endpoint and capture a snapshot of the task manager run.

ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME
  • API
AWX VERSION
awx: 21.0.1.dev103+gff74e538cd

@fosterseth fosterseth force-pushed the subsystem_metrics_task_manager branch from ff74e53 to ce0b28e Compare May 30, 2022 19:24
@fosterseth fosterseth changed the title [wip] Subsystem metrics for task manager Subsystem metrics for task manager May 31, 2022
@fosterseth fosterseth force-pushed the subsystem_metrics_task_manager branch from 3e649e6 to 55d0c54 Compare May 31, 2022 15:10
@fosterseth fosterseth force-pushed the subsystem_metrics_task_manager branch 2 times, most recently from e262f6a to 26d4e36 Compare June 9, 2022 18:23
@kdelee
Copy link
Member

kdelee commented Jun 13, 2022

@fosterseth @AlanCoding whats between this and merging?


def record_aggregate_metrics_and_exit(self, *args):
self.record_aggregate_metrics()
sys.exit(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this sys.exit(1) do to the transaction? I think it rolls it back right? I assume that would be the same as the current behavior. Right? Probably.

@fosterseth fosterseth force-pushed the subsystem_metrics_task_manager branch from 0046c4d to 2f82b75 Compare June 14, 2022 15:00
@fosterseth fosterseth merged commit 30c060c into ansible:devel Jun 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants