Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Prometheus metrics implementation #1218

Closed
3 of 5 tasks
tsmethurst opened this issue Dec 6, 2022 · 13 comments
Closed
3 of 5 tasks

[feature] Prometheus metrics implementation #1218

tsmethurst opened this issue Dec 6, 2022 · 13 comments
Labels
future tech Powerful technology to be added in the future. performance

Comments

@tsmethurst
Copy link
Contributor

tsmethurst commented Dec 6, 2022

We should look into instrumenting GoToSocial with prometheus metrics gathering, so that admins can glean more insight into the performance and load of their instance.

We should start with just one or two useful metrics and play around with it, rather than worrying about implementing everything at once.

We should be wary of performance impact, and make this toggleable so that if it's off it doesn't cause any memory/cpu overhead.

Likely we should have this 'off' by default, since most people likely won't need it or want it.

Here's a little checklist to get us going:

@tsmethurst tsmethurst added future tech Powerful technology to be added in the future. performance labels Dec 6, 2022
@NyaaaWhatsUpDoc
Copy link
Member

Another checkmark worth adding (so it's not forgotten): minimize metrics overhead, allow disabling

@tsmethurst
Copy link
Contributor Author

Another checkmark worth adding (so it's not forgotten): minimize metrics overhead, allow disabling

Added :)

@daenney
Copy link
Member

daenney commented Dec 9, 2022

One interesting thing in Prometheus that's currently under preview in 2.40.x is the new native/sparse histograms. The current histograms have a slight usability issue that the buckets need to be defined upfront and you have to hope to get them right. In the case of GTS that might also be hard because instances of different sizes or running on smol devices might result in fairly different measurements for things like HTTP response latency. The native type doesn't suffer from this making them much more useful in practice.

@tsmethurst
Copy link
Contributor Author

If we implement opentelemetry tracing (#1230), it seems we can also use a prometheus exporter for that, to allow prometheus to pull the opentelemetry metrics: https://github.com/open-telemetry/opentelemetry-go/tree/main/exporters/prometheus

@LittleFox94
Copy link
Contributor

Useful metrics I planned to add but just didn't get around to yet:

  • the usual runtime metrics (goroutines, memory, ... standard Go collectors)
  • number of users
  • number of posts
  • number of instances federated with
  • the usual HTTP request metrics (count by status code, latency, request/response body size, ...)
    • split between server-to-server / server-to-client
    • s2s split between if requesting or replying
  • database metrics, especially latency (as bun already warns about that)

@LittleFox94
Copy link
Contributor

Re the requested "Minimize metrics overhead, allow disabling" @NyaaaWhatsUpDoc: I think it's better to filter that on Prometheus level? Most applications I come across don't have something like that and for them you'd have to filter that in Prometheus anyway, making that a better source of truth - does this make sense, what do you think?

@NyaaaWhatsUpDoc
Copy link
Member

My point is to not have Prometheus metrics being gathered in the binary, and ideally not compiled into the binary, if not necessary. So no overhead if not enabled.

@LittleFox94
Copy link
Contributor

Ah it's about not even gathering them, reducing load in GtS itself - got it, thanks!

@raspbeguy
Copy link

I don't know if Mastodon exports openmetrics but if it does it would be wise to follow their metrics naming.

@Tsuribori
Copy link
Contributor

#1623 opens up the following method:

If we implement opentelemetry tracing (#1230), it seems we can also use a prometheus exporter for that, to allow prometheus to pull the opentelemetry metrics: https://github.com/open-telemetry/opentelemetry-go/tree/main/exporters/prometheus

"tracing" and "metrics" should probably be unified under "observability" regarding build flags etc.?

@LittleFox94
Copy link
Contributor

#1623 opens up the following method:

If we implement opentelemetry tracing (#1230), it seems we can also use a prometheus exporter for that, to allow prometheus to pull the opentelemetry metrics: https://github.com/open-telemetry/opentelemetry-go/tree/main/exporters/prometheus

"tracing" and "metrics" should probably be unified under "observability" regarding build flags etc.?

real proper tracing is something I'd not use, metrics on the hand I'd use heavily - so turning them on and off separately makes sense to me
but I'd be fine with all that being in the binary all the time and it being a runtime config (config file/CLI flags) instead of build flags

@Tsuribori
Copy link
Contributor

I could work on this, getting the runtime metrics and Gin HTTP metrics was rather straightforward

@tsmethurst
Copy link
Contributor Author

Closed by #2334

We might want to add additional metrics later, but we can make smaller issues for that. For now, we record Go, Gin, and Bundb metrics with this thing, which is plenty to get started with!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
future tech Powerful technology to be added in the future. performance
Projects
None yet
Development

No branches or pull requests

6 participants