Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose prometheus with basic auth #1091

Merged
merged 18 commits into from
Mar 18, 2022
Merged

Conversation

yuvipanda
Copy link
Member

This is the beginning of implementing idea 1 from the
list @georiganaelena made in
#328 (comment).

We have one prometheus running per cluster, but manage many clusters.
A single grafana that can connect to all such prometheus clusters
will help with monitoring as well as reporting. So we need to expose
it as securely as possible to the external world, as it can contain
private information.

In this case, we're using https + basic auth provided by
nginx-ingress
(https://kubernetes.github.io/ingress-nginx/examples/auth/basic/)
to safely expose prometheus to the outside world. We can then
use a grafana that knows these username / passwords to access this
prometheus instance. Each cluster needs its own username / password
(generated with pwgen 64 1), so users in one cluster can not access
prometheus for another cluster.

Ref #328

This is the beginning of implementing idea 1 from the
list @georiganaelena made in
2i2c-org#328 (comment).

We have one prometheus running per cluster, but manage many clusters.
A single grafana that can connect to all such prometheus clusters
will help with monitoring as well as reporting. So we need to expose
it as securely as possible to the external world, as it can contain
private information.

In this case, we're using https + basic auth provided by
nginx-ingress
(https://kubernetes.github.io/ingress-nginx/examples/auth/basic/)
to safely expose prometheus to the outside world. We can then
use a grafana that knows these username / passwords to access this
prometheus instance. Each cluster needs its own username / password
(generated with pwgen 64 1), so users in one cluster can not access
prometheus for another cluster.

Ref 2i2c-org#328
@yuvipanda
Copy link
Member Author

yuvipanda commented Mar 12, 2022

I'm manually generating and check-ing in the username / password for each prometheus instead of autogenerating it in the deployer, as I sense that's the preferred approach now based on conversations with @consideRatio and @sgibson91. Keeps config explicit.

TODO:

  • Do this for all clusters
  • Document this as part of setting up the cluster

/cc @GeorgianaElena as well.

Except meom-ige and farallon, which don't yet have support
clusters setup
@yuvipanda
Copy link
Member Author

I used a tiny helper script to generate these encrypted secret files:

#!/bin/bash
set -euo pipefail

F="${1}/enc-support.secret.yaml"
echo "prometheusAuth:" > $F
echo "  username: $(pwgen 64 1)" >> $F
echo "  password: $(pwgen 64 1)" >> $F
sops -i -e $F
echo "Done $F"%   

Then ran find . -type d | xargs -L1 ./gen.bash to run it for all dirs inside config/clusters.

However, meom-ige and farallon still don't have a support cluster, so this can't be used there yet.

@yuvipanda
Copy link
Member Author

Note - I've already run deploy-support for all these :)

run pre-commit run -a
@consideRatio
Copy link
Contributor

consideRatio commented Mar 12, 2022

Hmmm, the support chart could also do the trick jupyterhub does and generate secrets in a Secret file - if prometheus could accept accessing the content from there.

It may not be what makes sense, just floating the idea to ensure it is considered as an option we rule out. Someplace, the grafana server would need to have access to those secrets in other clusters so it would probably be relevant to have access to those centrally still without probing multiple clusters whenever they are needed.

@yuvipanda
Copy link
Member Author

@consideRatio yeah, that's the other option - but that would require giving kubernetes API access to every single cluster to the centralized grafana, which I would very much prefer to not do.

Copy link
Contributor

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice beginning!

My understanding summarized

  1. We assume we have prometheus subdomains for each cluster point to an nginx-ingress controllers in each cluster.
  2. Letting the support chart create a k8s Secret to hold username/password for prometheus access via support chart values.
  3. Letting the support chart's dependent prometheus Helm chart create an k8s Ingress resource that references the created k8s Secret with username/password. This reference is a annotation that an nginx-ingress controller understands: we wish to require basic authentication using the provided username/password in the k8s Secret
  4. Later we manually (?) configure a central Grafana instance through its web UI to have many data sources, one for each cluster. Following this, we can allow dashboards there to choose to inspect any given cluster or have some dashboards that summarize the state of all clusters.

A few action points

  • Add a k8s Secret template in the support chat
  • Do work to ensure we get the relevant new sub-domain names get CNAME entries in the DNS servers we control, I figure they should point to the primary domain name we use which also points to the ingress controllers IP.
  • Point out a specific Grafana instance to be the central instance we use (the one in the 2i2c cluster right?)

deployer/cluster.py Show resolved Hide resolved
deployer/cluster.py Outdated Show resolved Hide resolved
deployer/cluster.py Show resolved Hide resolved
helm-charts/support/values.schema.yaml Outdated Show resolved Hide resolved
helm-charts/support/values.yaml Outdated Show resolved Hide resolved
helm-charts/support/values.yaml Outdated Show resolved Hide resolved
helm-charts/support/values.yaml Outdated Show resolved Hide resolved
yuvipanda and others added 5 commits March 14, 2022 13:42
Also actually add the secrets file
Co-authored-by: Erik Sundell <erik.i.sundell@gmail.com>
Co-authored-by: Erik Sundell <erik.i.sundell@gmail.com>
@consideRatio's reasoning in 2i2c-org#1091 (comment)
is pretty flawless.

Co-authored-by: Erik Sundell <erik.i.sundell@gmail.com>
Copy link
Contributor

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, if needed, lets iterate from here to not slow down too much!

I think the key points now with relation to this PR is that this goes with the creation of additional domain names etc and those aspects may need documentation updates.

@yuvipanda
Copy link
Member Author

I've added docs on what needs to happen with DNS, and I think it's already happened for all the clusters mentioned here.

Copy link
Contributor

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these comments resolved, this LGTM! Nice work on this @yuvipanda!

docs/howto/operate/grafana.md Outdated Show resolved Hide resolved
docs/howto/operate/grafana.md Outdated Show resolved Hide resolved
docs/howto/operate/grafana.md Outdated Show resolved Hide resolved
helm-charts/support/values.yaml Outdated Show resolved Hide resolved
deployer/cluster.py Outdated Show resolved Hide resolved
deployer/cluster.py Outdated Show resolved Hide resolved
@yuvipanda
Copy link
Member Author

I think what's left here is to automate adding all these as a data source to one centralized grafana.

@yuvipanda
Copy link
Member Author

I think this is ready to go!

@yuvipanda
Copy link
Member Author

Still need to write code that'll populate central grafana with datasources, but let's get this merged until then.

@damianavila
Copy link
Contributor

Still need to write code that'll populate central grafana with datasources

I think the upcoming work is actually described here: #328 (comment), let me know if you disagree.

Btw, thanks for all this work, @yuvipanda!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants