Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring of cluster stats via grafana and prometheus #39

Closed
costrouc opened this issue May 1, 2020 · 15 comments · Fixed by #733
Closed

Monitoring of cluster stats via grafana and prometheus #39

costrouc opened this issue May 1, 2020 · 15 comments · Fixed by #733
Assignees
Labels
type: enhancement 💅🏼 New feature or request

Comments

@costrouc
Copy link
Member

costrouc commented May 1, 2020

No description provided.

@costrouc costrouc transferred this issue from Quansight/qhub-ops Aug 18, 2020
@laisbsc
Copy link
Contributor

laisbsc commented Feb 9, 2021

According to @costrouc : this is a fun issue!

It is useful for demos, a cool to have. Not sure when the team will be able to implement it.

@costrouc
Copy link
Member Author

Acceptance Criteria:

  • grafana and prometheus deployment in Qhub
  • pre made dashboards in grafana for viewing:
    • jupyterhub health
    • kuberenetes pods
    • traefik health and traffic

@costrouc costrouc added this to the Release v0.4.0 milestone Jun 17, 2021
@costrouc costrouc changed the title Add monitoring to cluster via grafana and prometheus Monitoring of cluster stats via grafana and prometheus Jun 17, 2021
@Adam-D-Lewis Adam-D-Lewis self-assigned this Jun 18, 2021
@Adam-D-Lewis
Copy link
Member

Adam-D-Lewis commented Jun 18, 2021

I'll try to work on this if no one else is planning on working on it at the moment. I would add documentation to the acceptance criteria as well.

Acceptance Criteria:

  • grafana and prometheus deployment in Qhub
  • pre made dashboards in grafana for viewing:
    • jupyterhub health
    • kuberenetes pods
    • traefik health and traffic
  • documentation detailing how to access prometheus and grafana.

@costrouc
Copy link
Member Author

@Adam-D-Lewis this would be really useful for the 0.4 release and would be awesome if you worked on it. We've needed this feature for awhile!

@Adam-D-Lewis
Copy link
Member

To update my progress, I was able to deploy the grafana helm chart manually via helm install ..., but I'm still working on setting up some traefik routes to access grafana.

@Adam-D-Lewis
Copy link
Member

Adam-D-Lewis commented Jun 24, 2021

To provide further updates, with Chris's help I was able to access grafana at grafana.myqhub.qhub.dev, but was not able to access grafana at myqhub.qhub.dev/monitoring/. Also, in consultation with Chris, he advised me we could either strip the /monitoring prefix with a middleware or we can let grafana know its baseUrl similar to how it's done at https://github.com/Quansight/qhub-hpc/blob/main/templates/grafana.ini.j2#L54. The preference is on doing the latter which needs more research on my end to see if the helm chart exposes the grafana.ini somehow.

@brl0
Copy link
Contributor

brl0 commented Jun 24, 2021

Hey @Adam-D-Lewis, hope all is well. :)

I am definitely +1 on this idea and am looking forward to trying it out.

I don't know if this is particularly helpful, but I thought it was an interesting project and at least tangentially related: https://github.com/yuvipanda/jupyterhub-grafana

This project provides some standard Grafana Dashboards as Code

@brl0
Copy link
Contributor

brl0 commented Jun 25, 2021

Incidentally, it looks like the project I mentioned above just got moved to the jupyterhub organization:
https://github.com/jupyterhub/grafana-dashboards

@costrouc
Copy link
Member Author

@brl0 yeup we saw that and certainly want to use that! We'll also be tying in additional stats from other services e.g. traefic

@Adam-D-Lewis
Copy link
Member

Adam-D-Lewis commented Jun 28, 2021

Thanks for the references @brl0, hoping we can tie those in eventually. I'm working through some Traefik/grafana routing issues at the moment.

@Adam-D-Lewis
Copy link
Member

Adam-D-Lewis commented Jun 28, 2021

To provide further updates, with Chris's help I was able to access grafana at grafana.myqhub.qhub.dev, but was not able to access grafana at myqhub.qhub.dev/monitoring/. Also, in consultation with Chris, he advised me we could either strip the /monitoring prefix with a middleware or we can let grafana know its baseUrl similar to how it's done at https://github.com/Quansight/qhub-hpc/blob/main/templates/grafana.ini.j2#L54. The preference is on doing the latter which needs more research on my end to see if the helm chart exposes the grafana.ini somehow.

I've specified the baseUrl in the grafana config, but I'm getting an infinite loop of redirects when navigating to https://github-actions.qhub.dev/monitoring/login (local deployment) rather than the grafana login page being loaded. I'm continuing to work on this.

@brl0
Copy link
Contributor

brl0 commented Jun 29, 2021

Another resource that may or may not be helpful... I stumbled across NeuroHackademy's JupyterHub which deploys prometheus and grafana via helm dependencies.

@Adam-D-Lewis
Copy link
Member

Adam-D-Lewis commented Jul 13, 2021

I still am hitting the infinite loop of redirects in grafana mentioned in my last comment. I want to try out using traefik ingress instead of traefik ingressroute and see if that helps. If not, I may need to just open an issue on grafana's repo and then use the subdomain approach (monitoring.myqhub.qhub.dev) to access grafana instead of prefix paths (myqhub.qhub.dev/monitoring).

@Adam-D-Lewis
Copy link
Member

Finally got past the infinite loop of redirects. I had to add a middleware to strip the path prefix, and then it worked.

@Adam-D-Lewis Adam-D-Lewis linked a pull request Jul 14, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement 💅🏼 New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants