Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics about Matrix API calls #68

Merged
merged 8 commits into from
Nov 28, 2021

Conversation

jaller94
Copy link
Contributor

@jaller94 jaller94 commented Nov 15, 2021

This is an alternative to #67.

This adds the following Prometheus metrics:

  • bridge_matrix_api_calls Counter: Number of API calls made
  • bridge_matrix_api_calls_failed Counter: Number of API calls which failed

The names and types have been chosen to match those of https://github.com/matrix-org/matrix-appservice-bridge.

Incomplete list of things which could be improved

  • Add a meaningful method labels to other parts of the API, not just events.py.

Differences to PR #67

  • This covers all Matrix API calls, not just those in events.js, however, all others have an empty string as the method label.
  • This counts retries as separate API calls. Add metrics about Matrix API events #67 considers retries to be part of the first call.
  • Removed bridge_matrix_request_seconds s it was implemented differently from the NodeJS bridges.

@Half-Shot
Copy link
Contributor

N.B. I think matrix_request_seconds is bridge_matrix_request_seconds because the bridge lib prefixes "bridge".

@tulir tulir self-requested a review November 17, 2021 16:01
mautrix/api.py Outdated
"The number of Matrix client API calls made", ("method",))
API_CALLS_FAILED = Counter("bridge_matrix_api_calls_failed",
"The number of Matrix client API calls which failed", ("method",))
MATRIX_REQUEST_SECONDS = Histogram("bridge_matrix_request_seconds",
Copy link
Contributor

@Half-Shot Half-Shot Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practise I think we got this wrong. The IRC/Slack bridges use this name to mean requests coming from matrix and the outcome (e.g. did we fail to send a message to IRC, how long did it take).

We found this out when the change got deployed and alerts kept firing because we wern't doing this enough.

I'd propose that we move this Histogram to the event handling portion of the library and report the ourtcome of a processed AS event from a transaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only incorrect metric is bridge_matrix_request_seconds, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the others look fine!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed bridge_matrix_request_seconds from this PR. I may come in another PR.

@jaller94 jaller94 requested a review from tulir November 22, 2021 16:01
@tulir tulir merged commit c1fd50b into mautrix:master Nov 28, 2021
@jaller94 jaller94 deleted the j94/call-metrics-alternative branch November 29, 2021 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants