Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Grafana plot to monitor disk usage on the home directory #1119

Closed
choldgraf opened this issue Mar 15, 2022 · 10 comments
Closed

Add a Grafana plot to monitor disk usage on the home directory #1119

choldgraf opened this issue Mar 15, 2022 · 10 comments

Comments

@choldgraf
Copy link
Member

choldgraf commented Mar 15, 2022

Background and proposal

In #1081 we had a hub outage because the cluster had run out of disk space, causing user launches to fail.

Running out of disk space is a common concern for our hubs, and we should set up a Grafana plot so that we can monitor and potentially send alerts when disk space is low.

Implementation guide and constraints

There are two related issues here, and we may want to solve them independently if need be:

Updates and ongoing work

No response

@sgibson91
Copy link
Member

This may be a useful issue to move upstream? https://github.com/jupyterhub/grafana-dashboards

@choldgraf
Copy link
Member Author

choldgraf commented Mar 16, 2022

@sgibson91 yep definitely agree that an upstream improvement would be the best place for this, if we know that it is generalizable to many deployments. Have updated top comment with a ref to https://github.com/jupyterhub/grafana-dashboards

@yuvipanda
Copy link
Member

This is a bit complicated on Azure or AWS since we're using managed storage service (AzureFile or EFS), and we'll need an exporter specifically for those that prometheus can scrape.

@choldgraf
Copy link
Member Author

choldgraf commented Mar 23, 2022

@yuvipanda aren't we also planning to move to Google's managed filesystem service as well?

@GeorgianaElena
Copy link
Member

GeorgianaElena commented May 13, 2022

Update: I opened a PR upstream to add three dashboards from https://grafana.com/grafana/dashboards/11454 that track some PVC stats. I've deployed them to the 2i2c grafana.

Screenshot 2022-05-13 at 16 32 34

I know it's not exactly disk usage, but it gives an intuition about what is going on and doesn't care about storage type.
Curious if you find it useful.

@damianavila
Copy link
Contributor

Curious if you find it useful.

I think it provides useful information. The second one is a per-user usage rate, correct?

@choldgraf
Copy link
Member Author

These all seem useful to me as well - I am curious what "daily usage" means though, does it mean "data written to disk"?

@GeorgianaElena
Copy link
Member

As I mentioned, the graphs are ported from https://grafana.com/grafana/dashboards/11454 and not my creation, but the query that generates that graph is rate(kubelet_volume_stats_used_bytes[1d]).

Didn't find any official docs for kubelet_volume_stats_used_bytes other than these.

So, from what I understand, the daily usage graph shows the daily bytes usage rate in a particular volume. And yes, I believe each of the prod (home-nfs) there corresponds to a user. And since the home-nfs PVs have a retain policy, they don't ever get destroyed.

@consideRatio
Copy link
Contributor

This is done!

@damianavila
Copy link
Contributor

damianavila commented Feb 15, 2023

This is done!

I think this is the proper ref (for a future reader): #1992

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

6 participants