Skip to content

Slow running rules from one tenant can cause PrometheusRules API to timeout for all tenants #5745

Closed
@emanlodovice

Description

@emanlodovice

Describe the bug
Currently the manager's SyncRuleGroups and GetRules methods share the same lock. This means that if SyncRuleGroups becomes slow then GetRules will have to wait a long time to acquire the lock.

SyncRuleGroups can become slow when we are updating a Rule group with slow running rules because the RuleGroup will wait for the Rule to finish before it stops.

https://github.com/prometheus/prometheus/blob/main/rules/group.go#L249
https://github.com/prometheus/prometheus/blob/main/rules/group.go#L426-L430

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    component/rulesBits & bobs todo with rules and alerts: the ruler, config service etc.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions