Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a Pages Operator, I want a prometheus #178

Open
soutenniza opened this issue May 16, 2022 · 8 comments
Open

As a Pages Operator, I want a prometheus #178

soutenniza opened this issue May 16, 2022 · 8 comments

Comments

@soutenniza
Copy link
Contributor

soutenniza commented May 16, 2022

In order to have better alerting, Corwin and friends want Pages Prometheus

Acceptance Criteria

  • Prometheus-Pages Staging
  • Prometheus-Pages Production

Security considerations

This will improve security because alerts!

Implementation sketch

@bengerman13
Copy link
Contributor

@davemcorwin can you spell out better what parts you want Pages to own and what parts you want the platform to own?
e.g. are we standing this up and you're configuring the metrics shipping and alerts or...?
Also - is prometheus a requirement, or would alerting from the logging stack work?

@davemcorwin
Copy link

  1. I don't know that much about Prometheus, but I would guess that we just want to be able to send metrics and configure alerts, I doubt that we would need to manage/administer the application and would prefer y'all to do it. I understand that Prometheus' primary means of collecting metrics is "pull" rather than "push" but I don't think we have a use case for that right now.
  2. I don't know how to "alert from the logging stack".

@bengerman13
Copy link
Contributor

alerting from logging stack is a potential benefit of our switch to Open Search. There's still a lot to be defined on that portion, but it's feasible that users may be able to define alerts based on cf logs (which include container-level metrics and router logs)

@davemcorwin
Copy link

I don't know enough about the differences between these two options to say for sure. I imagine that for anything custom we could either send to Prometheus or spit out something in the logs that could be used for alerts. The upside of Prometheus is that some systems have libraries for exporting metrics to Prometheus that could drop in, like for our background job library.

@davemcorwin davemcorwin changed the title As a Pages Opeartor, I want a prometheus As a Pages Operator, I want a prometheus May 16, 2022
@bengerman13
Copy link
Contributor

yeah, this definitely sounds more like what prometheus is for.
the catch is the way we currently deploy prometheus does not have a good role-based permissions model, so getting a prometheus set up for pages folks will take some research and doing, and probably an SCR for one or both of us.
If the open search solution works, it's work we might already be doing, and would be valuable for all our users

@davemcorwin
Copy link

hmmmm, ok, my thought was this wouldn't require an SCR, since after our SCR we are actually part of cloud.gov.

@bengerman13
Copy link
Contributor

It might not, but I think it will - we'll need to define a new authentication mechanism for this (since we don't have proper RBAC on the prometheus stack) and a new alert path, both of which seem likely to me to trigger an SCR

@davemcorwin
Copy link

OK, maybe we can touch base about it, I don't understand why those changes would necessary or SCR-worthy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants