-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Grafana plot to monitor disk usage on the home directory #1119
Comments
This may be a useful issue to move upstream? https://github.com/jupyterhub/grafana-dashboards |
@sgibson91 yep definitely agree that an upstream improvement would be the best place for this, if we know that it is generalizable to many deployments. Have updated top comment with a ref to https://github.com/jupyterhub/grafana-dashboards |
This is a bit complicated on Azure or AWS since we're using managed storage service (AzureFile or EFS), and we'll need an exporter specifically for those that prometheus can scrape. |
@yuvipanda aren't we also planning to move to Google's managed filesystem service as well? |
Update: I opened a PR upstream to add three dashboards from https://grafana.com/grafana/dashboards/11454 that track some PVC stats. I've deployed them to the 2i2c grafana. I know it's not exactly disk usage, but it gives an intuition about what is going on and doesn't care about storage type. |
I think it provides useful information. The second one is a per-user usage rate, correct? |
These all seem useful to me as well - I am curious what "daily usage" means though, does it mean "data written to disk"? |
As I mentioned, the graphs are ported from https://grafana.com/grafana/dashboards/11454 and not my creation, but the query that generates that graph is Didn't find any official docs for So, from what I understand, the daily usage graph shows the daily bytes usage rate in a particular volume. And yes, I believe each of the prod (home-nfs) there corresponds to a user. And since the home-nfs PVs have a retain policy, they don't ever get destroyed. |
This is done! |
I think this is the proper ref (for a future reader): #1992 |
Background and proposal
In #1081 we had a hub outage because the cluster had run out of disk space, causing user launches to fail.
Running out of disk space is a common concern for our hubs, and we should set up a Grafana plot so that we can monitor and potentially send alerts when disk space is low.
Implementation guide and constraints
There are two related issues here, and we may want to solve them independently if need be:
Updates and ongoing work
No response
The text was updated successfully, but these errors were encountered: