Document changes to Monitor dashboards for 3.16 (#53126)

Co-authored-by: Isaac Brown <101839405+isaacmbrown@users.noreply.github.com>
github · Nov 22, 2024 · 1e24ec2 · 1e24ec2
1 parent f748cba
commit 1e24ec2
Show file tree

Hide file tree

Showing 14 changed files with 97 additions and 58 deletions.
diff --git a/assets/images/enterprise/management-console/monitor-dash-link-old.png b/assets/images/enterprise/management-console/monitor-dash-link-old.png
diff --git a/assets/images/enterprise/management-console/monitor-dash-link.png b/assets/images/enterprise/management-console/monitor-dash-link.png
diff --git a/assets/images/enterprise/management-console/monitor-dash-navigation.png b/assets/images/enterprise/management-console/monitor-dash-navigation.png
diff --git a/...migrating-your-enterprise-to-the-container-registry-from-the-docker-registry.md b/...migrating-your-enterprise-to-the-container-registry-from-the-docker-registry.md
@@ -50,7 +50,7 @@ During the migration, the CPU and memory usage for your instance will increase.
 
 After the migration, storage pressure on your instance will increase due to the duplication of image files in the Docker registry and the {% data variables.product.prodname_container_registry %}. A future release of {% data variables.product.product_name %} will remove the duplicated files when all migrations are complete.
 
-For more information about monitoring the performance and storage of {% data variables.location.product_location %}, see "[AUTOTITLE](/admin/enterprise-management/monitoring-your-appliance/accessing-the-monitor-dashboard)."
+For more information about monitoring the performance and storage of {% data variables.location.product_location %}, see "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/about-the-monitor-dashboards)."
 
 ### Starting a migration
 

diff --git a/content/admin/guides.md b/content/admin/guides.md
@@ -64,7 +64,7 @@ includeGuides:
   - /admin/configuring-settings/hardening-security-for-your-enterprise/troubleshooting-tls-errors
   - /admin/configuring-settings/configuring-network-settings/using-github-enterprise-server-with-a-load-balancer
   - /admin/monitoring-and-managing-your-instance/configuring-high-availability/about-high-availability-configuration
-  - /admin/monitoring-and-managing-your-instance/monitoring-your-instance/accessing-the-monitor-dashboard
+  - /admin/monitoring-and-managing-your-instance/monitoring-your-instance/about-the-monitor-dashboards
   - /admin/monitoring-and-managing-your-instance/configuring-high-availability/creating-a-high-availability-replica
   - /admin/monitoring-and-managing-your-instance/configuring-clustering/differences-between-clustering-and-high-availability-ha
   - /admin/upgrading-your-instance/preparing-to-upgrade/enabling-automatic-update-checks

diff --git a/...ation-and-troubleshooting/troubleshooting-github-actions-for-your-enterprise.md b/...ation-and-troubleshooting/troubleshooting-github-actions-for-your-enterprise.md
@@ -72,7 +72,7 @@ You may be hitting the CPU or memory limits if you notice that jobs are not star
 
 ### 1. Check the overall CPU and memory usage in the management console
 
-Access the management console and use the monitor dashboard to inspect the overall CPU and memory graphs under "System Health". For more information, see "[AUTOTITLE](/admin/enterprise-management/monitoring-your-appliance/accessing-the-monitor-dashboard)."
+Access the management console and use the monitor dashboard to inspect the overall CPU and memory graphs under "System Health". For more information, see "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/about-the-monitor-dashboards)."
 
 If the overall "System Health" CPU usage is close to 100%, or there is no free memory left, then {% data variables.location.product_location %} is running at capacity and needs to be scaled up. For more information, see "[AUTOTITLE](/admin/enterprise-management/updating-the-virtual-machine-and-physical-resources/increasing-cpu-or-memory-resources)."
 

diff --git a/...stance/accessing-the-monitor-dashboard.md → ...-instance/about-the-monitor-dashboards.md b/...stance/accessing-the-monitor-dashboard.md → ...-instance/about-the-monitor-dashboards.md
@@ -1,13 +1,15 @@
 ---
-title: Accessing the monitor dashboard
-intro: '{% data variables.product.prodname_ghe_server %} includes a web-based monitoring dashboard that displays historical data about your {% data variables.product.prodname_ghe_server %} appliance, such as CPU and storage usage, application and authentication response times, and general system health.'
+title: 'About the monitor {% ifversion ghes > 3.15 %}dashboards{% else %}dashboard{% endif %}'
+allowTitleToDifferFromFilename: true
+intro: 'View historical data for details like CPU and storage usage, application and authentication response times, and general system health.'
 redirect_from:
   - /enterprise/admin/installation/accessing-the-monitor-dashboard
   - /enterprise/admin/enterprise-management/accessing-the-monitor-dashboard
   - /admin/enterprise-management/accessing-the-monitor-dashboard
   - /admin/enterprise-management/monitoring-your-appliance/accessing-the-monitor-dashboard
   - /admin/monitoring-managing-and-updating-your-instance/monitoring-your-appliance/accessing-the-monitor-dashboard
   - /admin/monitoring-managing-and-updating-your-instance/monitoring-your-instance/accessing-the-monitor-dashboard
+  - /admin/monitoring-and-managing-your-instance/monitoring-your-instance/accessing-the-monitor-dashboard
 versions:
   ghes: '*'
 type: how_to
@@ -17,17 +19,40 @@ topics:
   - Infrastructure
   - Monitoring
   - Performance
-shortTitle: Access the monitor dashboard
+shortTitle: About the monitor {% ifversion ghes > 3.15 %}dashboards{% else %}dashboard{% endif %}
 ---
-## Accessing the monitor dashboard
+## Accessing the monitor {% ifversion ghes > 3.15 %}dashboards{% else %}dashboard{% endif %}
 
 {% data reusables.enterprise_site_admin_settings.access-settings %}
 {% data reusables.enterprise_site_admin_settings.management-console %}
 1. In the top navigation bar, click **Monitor**.
 
-   ![Screenshot of the header of the {% data variables.enterprise.management_console %}. A tab, labeled "Monitor", is highlighted with an orange outline.](/assets/images/enterprise/management-console/monitor-dash-link.png)
+   ![Screenshot of the header of the {% data variables.enterprise.management_console %}. A tab, labeled "Monitor", is highlighted with an orange outline.](/assets/images/enterprise/management-console/{% ifversion ghes > 3.15 %}monitor-dash-link.png{% else %}monitor-dash-link-old.png{% endif %})
 
 1. In HA and cluster environments you can switch between nodes using the dropdown and clicking on a different hostname.
+{% ifversion ghes > 3.15 %}
+
+## Using the monitor dashboards
+
+The dashboards visualize metrics which can be useful for troubleshooting performance issues and better understanding how your {% data variables.product.prodname_ghe_server %} appliance is being used. The data behind the graphs is gathered by the `collectd` service and sampled every 10 seconds.
+
+Within the pre-built dashboards you can find various sections grouping graphs of different types of system resources. Use the links on the page to navigate between the dashboards.
+
+![Screenshot of the {% data variables.enterprise.management_console %} header. The dashboard navigation links provided at the top right are highlighted in orange.](/assets/images/enterprise/management-console/monitor-dash-navigation.png)
+
+### "Operational Health" dashboard
+
+This is the default dashboard displayed on the "Monitor" page. It visualizes key metrics that help you to get a quick overview of the health of your {% data variables.product.prodname_ghe_server %} appliance.
+
+### "System & Application Insights" dashboard
+
+On this more detailed dashboard you can get further insights into all aspects of the services that are running on your appliance.
+
+## Creating new dashboards
+
+Building your own dashboard and alerts requires the data to be forwarded to an external instance, by enabling `collectd` forwarding. For more information, see "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/configuring-collectd-for-your-instance)."
+
+{% else %}
 
 ## Using the monitor dashboard
 
@@ -36,12 +61,26 @@ The page visualizes metrics which can be useful for troubleshooting performance
 Within the pre-built dashboard you can find various sections grouping graphs of different types of system resources.
 
 Building your own dashboard and alerts requires the data to be forwarded to an external instance, by enabling `collectd` forwarding. For more information, see "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/configuring-collectd-for-your-instance)."
+{% endif %}
 
-## About the metrics on the monitor dashboard
+## About the metrics on the monitor dashboards
 
-### System health
+### System Health
 
 The system health graphs provide a general overview of services and system resource utilization. The CPU, memory, and load average graphs are useful for identifying trends or times where provisioned resource saturation has occurred. For more information, see "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/recommended-alert-thresholds)."
+{% ifversion ghes > 3.15 %}
+
+### Application Health
+
+These graphs include key metrics for the resource utilization of services that power  {% data variables.product.prodname_ghe_server %}. They help visualize ongoing issues while processing requests.
+
+* **Nomad jobs**: The CPU and memory usage of individual services. {% data variables.product.prodname_ghe_server %} utilizes Nomad internally as the workload orchestrator.
+* **Response code**: The number of responses by status code returned across {% data variables.product.prodname_ghe_server %} services.
+* **Response time**: The speed of web requests at the 90th percentile in milliseconds.
+* **Active workers**: The number of web workers busy per {% data variables.product.prodname_ghe_server %} application.
+* **Queued requests**: The number of web requests queued per {% data variables.product.prodname_ghe_server %} application. It is expected for this panel to display 'No data' when no requests are queued up.
+* **ElasticSearch Cluster Health**: The health status of the ElasticSearch cluster, based on the state of its primary and replica shards. This cluster powers {% data variables.product.prodname_ghe_server %} search.
+{% endif %}
 
 ### Processes
 
@@ -65,7 +104,7 @@ The **App request/response** section looks at the rate of requests, how quickly
 
 ### Actions
 
-The graphs break down different metrics about {% data variables.product.prodname_actions %} on {% data variables.location.product_location %} including an overview of {% data variables.product.prodname_actions %} services web requests.
+The graphs break down different metrics about {% data variables.product.prodname_actions %} on {% data variables.location.product_location %} including an overview of {% data variables.product.prodname_actions %} services web requests {% ifversion ghes > 3.15 %} and MSSQL database transaction log size{% endif %}.
 
 ### Background jobs
 

diff --git a/...our-instance/monitoring-your-instance/configuring-collectd-for-your-instance.md b/...our-instance/monitoring-your-instance/configuring-collectd-for-your-instance.md
@@ -26,7 +26,7 @@ topics:
 
 `collectd` is a service that runs on {% data variables.location.product_location %} to gather and provide metrics about the system's performance. Common metrics that `collectd` gathers includes CPU utilization, memory and disk consumption, network interface traffic and errors, and a system's overall load. You can also forward the data to another `collectd` server. For more information see the [collectd wiki](https://github.com/collectd/collectd/wiki).
 
-Your instance uses metrics from `collectd` to display graphs in the {% data variables.enterprise.management_console %}'s monitor dashboard. For more information, see "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/accessing-the-monitor-dashboard)."
+Your instance uses metrics from `collectd` to display graphs in the {% data variables.enterprise.management_console %}'s monitor dashboard. For more information, see "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/about-the-monitor-dashboards)."
 
 You can review a list of the metrics that `collectd` gathers on {% data variables.location.product_location %}. For more information, see "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/collectd-metrics-for-github-enterprise-server)."
 

diff --git a/...t/admin/monitoring-and-managing-your-instance/monitoring-your-instance/index.md b/...t/admin/monitoring-and-managing-your-instance/monitoring-your-instance/index.md
@@ -14,7 +14,7 @@ versions:
 topics:
   - Enterprise
 children:
-  - /accessing-the-monitor-dashboard
+  - /about-the-monitor-dashboards
   - /recommended-alert-thresholds
   - /setting-up-external-monitoring
   - /configuring-collectd-for-your-instance

diff --git a/...managing-your-instance/monitoring-your-instance/recommended-alert-thresholds.md b/...managing-your-instance/monitoring-your-instance/recommended-alert-thresholds.md
@@ -24,7 +24,7 @@ shortTitle: Recommended alert thresholds
 
 ## About recommended alert thresholds
 
-You can configure external monitoring systems to alert you to storage, CPU, and memory usage that may cause problems with {% data variables.location.product_location %}. For more information, see "[AUTOTITLE](/admin/enterprise-management/monitoring-your-appliance/setting-up-external-monitoring)" and "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/accessing-the-monitor-dashboard)."
+You can configure external monitoring systems to alert you to storage, CPU, and memory usage that may cause problems with {% data variables.location.product_location %}. For more information, see "[AUTOTITLE](/admin/enterprise-management/monitoring-your-appliance/setting-up-external-monitoring)" and "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/about-the-monitor-dashboards)."
 
 ## Monitoring storage
 

diff --git a/...stance/monitoring-your-instance/troubleshooting-resource-allocation-problems.md b/...stance/monitoring-your-instance/troubleshooting-resource-allocation-problems.md
@@ -36,7 +36,7 @@ For system-critical issues, and prior to making modifications to your appliance,
 * CPU of your instance is under-provisioned for your workload.
 * Upgrading to a new {% data variables.product.prodname_ghe_server %} releases often increases CPU and memory usage due to new features. Additionally, post-upgrade migration or reconciliation background jobs can temporarily degrade performance until they complete.
 * Elevated requests against Git or API. Increased requests to Git or API can occur due to various factors, such as excessive repository cloning, CI/CD processes, or unintentional usage by API scripts or new workloads.
-* Increased number of [GitHub Actions jobs](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/accessing-the-monitor-dashboard#actions).
+* Increased number of [GitHub Actions jobs](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/about-the-monitor-dashboards#actions).
 * Elevated amount of Git commands executed a large repository.
 
 ### Recommendations
@@ -60,10 +60,10 @@ For system-critical issues, and prior to making modifications to your appliance,
 ### Recommendations
 
 * Memory of your instance is under-provisioned for your workload, data volume, given usage over time may exceed the [minimum recommended requirements](/admin/installing-your-enterprise-server/setting-up-a-github-enterprise-server-instance/installing-github-enterprise-server-on-aws#minimum-recommended-requirements).
-* Within the Nomad graphs, identify services with out of memory trends which are often followed by free memory trends after they get restarted. For more information, see "[AUTOTITLE](/enterprise-server@3.14/admin/monitoring-and-managing-your-instance/monitoring-your-instance/accessing-the-monitor-dashboard#appliance-specific-system-services)."
+* Within the Nomad graphs, identify services with out of memory trends which are often followed by free memory trends after they get restarted. For more information, see "[AUTOTITLE](/enterprise-server@3.14/admin/monitoring-and-managing-your-instance/monitoring-your-instance/about-the-monitor-dashboards#appliance-specific-system-services)."
 * Check logs for processes going out of memory by running `rg -z 'kernel: Out of memory: Killed process' /var/log/syslog*` (for this, first log in to the administrative shell using SSH - see "[AUTOTITLE](/enterprise-server@3.14/admin/administering-your-instance/administering-your-instance-from-the-command-line/accessing-the-administrative-shell-ssh).")
 * Ensure the correct ratio of memory to CPU services is met (at least `6.5:1`).
-* Check the amount of tasks queued for background processing - see "[AUTOTITLE](/enterprise-server@3.14/admin/monitoring-and-managing-your-instance/monitoring-your-instance/accessing-the-monitor-dashboard#background-jobs)."
+* Check the amount of tasks queued for background processing - see "[AUTOTITLE](/enterprise-server@3.14/admin/monitoring-and-managing-your-instance/monitoring-your-instance/about-the-monitor-dashboards#background-jobs)."
 
 ## Low disk space availability
 
@@ -101,7 +101,7 @@ Keep in mind that the root storage volume is split into two equally-sized partit
 * Check the database logs for slow queries in `/var/log/github/exceptions.log` (for this, first log in to the administrative shell using SSH - see "[AUTOTITLE](/enterprise-server@3.14/admin/administering-your-instance/administering-your-instance-from-the-command-line/accessing-the-administrative-shell-ssh)"), for example by checking for Top 10 slow requests by URL: `grep SlowRequest github-logs/exceptions.log | jq '.url' | sort | uniq -c | sort -rn | head`.
 * Check the **Queued requests** graph for certain workers and consider adjusting their active worker count.
 * Increase the storage disks to ones with higher IOPS/throughput.
-* Check the amount of tasks queued for background processing - see "[AUTOTITLE](/enterprise-server@3.14/admin/monitoring-and-managing-your-instance/monitoring-your-instance/accessing-the-monitor-dashboard#background-jobs)."
+* Check the amount of tasks queued for background processing - see "[AUTOTITLE](/enterprise-server@3.14/admin/monitoring-and-managing-your-instance/monitoring-your-instance/about-the-monitor-dashboards#background-jobs)."
 
 ## Elevated error rates
 

diff --git a/...nstance/troubleshooting-upgrades/known-issues-with-upgrades-to-your-instance.md b/...nstance/troubleshooting-upgrades/known-issues-with-upgrades-to-your-instance.md
@@ -42,7 +42,7 @@ Collect the baseline data before upgrading to {% data variables.product.prodname
 
 You may not be able to simulate the load that your instance experiences in a production environment. However, it's useful if you can collect baseline data while simulating patterns of usage from your production environment on the staging instance.
 
-1. Browse to your instance's monitor dashboard. For more information, see "[AUTOTITLE](/admin/enterprise-management/monitoring-your-appliance/accessing-the-monitor-dashboard)."
+1. Browse to your instance's monitor dashboard. For more information, see "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/about-the-monitor-dashboards)."
 1. From the monitor dashboard, monitor relevant graphs.
 
    * Under "Processes", monitor the graphs for "I/O operations (Read IOPS)" and "I/O operations (Write IOPS)", filtering for `mysqld`. These graphs display I/O operations for all of the node's services.
@@ -52,7 +52,7 @@ You may not be able to simulate the load that your instance experiences in a pro
 
 After the upgrade to {% data variables.product.prodname_ghe_server %} 3.9, review the instance's I/O utilization. {% data variables.product.company_short %} recommends that you upgrade a staging instance of {% data variables.product.prodname_ghe_server %} running 3.7 or 3.8 that includes restored data from your production instance, or that you restore data from your production instance to a new staging instance running 3.9. For more information, see "[AUTOTITLE](/admin/installation/setting-up-a-github-enterprise-server-instance/setting-up-a-staging-instance)" and "[AUTOTITLE](/admin/configuration/configuring-your-enterprise/configuring-backups-on-your-appliance)."
 
-1. Browse to your instance's monitor dashboard. For more information, see "[AUTOTITLE](/admin/enterprise-management/monitoring-your-appliance/accessing-the-monitor-dashboard)."
+1. Browse to your instance's monitor dashboard. For more information, see "[AUTOTITLE](/admin/monitoring-and-managing-your-instance/monitoring-your-instance/about-the-monitor-dashboards)."
 1. From the monitor dashboard, monitor relevant graphs.
 
    * Under "Processes", monitor the graphs for "I/O operations (Read IOPS)" and "I/O operations (Write IOPS)", filtering for `mysqld`. These graphs display I/O operations for all of the node's services.