Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Camel K operator monitoring #1762

Merged
merged 25 commits into from
Nov 10, 2020
Merged

Camel K operator monitoring #1762

merged 25 commits into from
Nov 10, 2020

Conversation

astefanutti
Copy link
Member

@astefanutti astefanutti commented Oct 14, 2020

This PR:

  • Exposes the metrics container port in the operator deployment
  • Registers metrics for the relevant operator SLIs to the /metrics endpoint exposed by the operator:
    • Resources reconciliation duration histogram
    • Build duration histogram
    • Build queue duration histogram
    • Build recovery attempts histogram
    • Time to first integration readiness histogram
  • Adds a monitoring option to the kamel install command, so that kamel install --monitoring=true:
    • Creates a default PodMonitor resource targeting the operator metrics endpoint
    • Creates a default PrometheusRule resource with alerting rules for:
      • Aggregated reconciliation request duration SLO
      • Aggregated reconciliation request failure SLO
      • Build duration SLOs (warning 1% > 2m, critical for 1% > 5m)
      • Build failure and error SLOs (warning for failures, critical for errors)
      • Build queue duration SLOs (warning for 1% > 1m, critical for 1% > 5m)
  • Adds an option to configure the metrics endpoint port
  • Exposes a health endpoint with a liveness probe
    • Note that no readiness probe is added, as it conflicts with leader election in case of a rolling update of the operator. Besides readiness isn't very relevant for the Camel K operator anyway.

Fixes #1267.

TODO:

It should be enabling the discrimination of user vs. platform errors (#1633).

Release Note

feat: Camel K operator monitoring

@astefanutti astefanutti added this to the 1.3.0 milestone Oct 14, 2020
@astefanutti astefanutti marked this pull request as ready for review November 3, 2020 10:11
@astefanutti
Copy link
Member Author

Let's merge this and create a follow-up PR for the documentation.

@astefanutti astefanutti merged commit 6afbdc2 into apache:master Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose operator related metrics
2 participants