-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] setting up grafana and prometheus #31129
Conversation
Signed-off-by: Alan Guo <aguo@anyscale.com>
doc/source/cluster/running-applications/monitoring-and-observability.rst
Outdated
Show resolved
Hide resolved
on the head node of the cluster. However, in order to view the :ref:`Dashboard <ray-dashboard>` metrics on your local | ||
machine, you must configure the Dashboard UI to embed the metrics graphs via a public address for the Grafana instance. | ||
|
||
The `RAY_GRAFANA_HOST` env var can be set when launching Ray to configure how the Dashboard UI embeds the metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this example to ray-metrics
page and just have a link here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Nit: I think we should spell out "env vars" as environment variables and be consistent about capitalizing Ray, Prometheus, Grafana, IP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
The overall structure is a bit confusing to me. I could think of 4 JTBDs related to this part of documentation. Here are some difficulties for me to complete each one of them
- learn about metrics.
- In the metrics page, we jump to the prometheus first. I may not know what "metrics" refer to in Ray's context
- learn how to collect metrics via prometheus
- It's easy to understand how to set it up locally following the documentation. However, it's still not very straightford in terms of how to set it up on a cluster. First, I want to know where to run it. This is in metrics page ("Alternate Prometheus host location"). Then I want to know how to scrape the metrics which is in the "cluster monitoring" page.
- learn how to set up grafana
- same issues as prometheus. I want to know where to run Grafana first. Then I need to know how to configure it to visualize the prometheus metrics in Grafana.
- learn how to view the embedded metrics in Ray dashboard
- The ray dashboard/metrics page says "It requires that prometheus and grafana is running for your cluster" and sends me to the metrics page. However, it's not clear what setup of prometheus and grafana is required in the metrics page for the metrics to show up.
Here are the suggested changes to the structure:
Cluster monitoring
- Ray dashboard
- Ray CLI
- Prometheus metrics
- Just a short intro paragraph with a link to metrics page for more details
Metrics
- A short intro (keep the current one).
- System metrics
- Application-level metrics
- Prometheus
- A short intro
- Run Prometheus (locally, on a head node or outisde of ray cluster)
- Auto-discovering metrics endpoints
- Manually discovering metrics endpoints
- Customize prometheus export port
- Grafana
- A short intro
- Run Grafana and view graphs for Ray (locally, on a head node or outisde of ray cluster)
- Embed Grafana graphs in Ray Dashboard
- (Prometheus running)
- (Grafana is able to access Promtheus)
- (Ray is able to access Grafana)
Ray Dashboard
- Metrics view
- "It requires that prometheus and grafana is running for your cluster." + link to "Embed Grafana graphs in Ray Dashboard"
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
unstale. And cc @alanwguo |
Signed-off-by: Alan Guo <aguo@anyscale.com>
I think this makes sense but I don't think I have time to make these changes by Friday. I also think we need to redo all the dashboard docs all at once to really re-structure it well and that would require pairing with @rkooo567 at least who is doing parallel doc changes. I made some updates to include more info on setting up Grafana on a cluster, but I think we should consider a re-write for 2.4. |
by setting the `RAY_PROMETHEUS_HOST` env var when launching ray. The env var takes in the address to access Prometheus. | ||
|
||
|
||
Alternate Grafana host location |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is duplicated?
SGTM! |
doc/source/cluster/running-applications/monitoring-and-observability.rst
Outdated
Show resolved
Hide resolved
Looks awesome. Last few comments. |
* [docs] setting up grafana and prometheus (#31129) * Apply suggestions from code review Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com> Signed-off-by: Alan Guo <aguo@aguo.software> --------- Signed-off-by: Alan Guo <aguo@aguo.software> Co-authored-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: elliottower <elliot@elliottower.com>
Signed-off-by: Alan Guo aguo@anyscale.com
Why are these changes needed?
Many users have struggled setting up prometheus and grafana. At least, we should do a better job pointing people to how to set this up for remote ray clusters and to let users know about configuration options they can set.
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.