Skip to content

Commit

Permalink
OCPBUGS-6832: feat(recent_metrics) adds openshift_apps_deploymentconf…
Browse files Browse the repository at this point in the history
…igs_strategy_total (openshift#726)

* feat(recent_metrics) adds openshift_apps_deploymentconfigs_strategy_total

* chore(docs): update gathered data

* fix(status/controller): fix status error for UnknownError
  • Loading branch information
Ricardo Lüders committed Feb 20, 2023
1 parent 359130c commit 135dcfb
Show file tree
Hide file tree
Showing 5 changed files with 93 additions and 47 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,6 @@ devspace.yaml

cover.out
_output

# MacOS
.DS_Store
65 changes: 42 additions & 23 deletions docs/gathered-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -548,32 +548,51 @@ Response see:

## MostRecentMetrics

gathers cluster Federated Monitoring metrics.
Collects cluster Federated Monitoring metrics.

The GET REST query to URL /federate
Gathered metrics:
virt_platform
etcd_object_counts
cluster_installer
vsphere_node_hw_version_total
namespace CPU and memory usage
console_helm_installs_total
console_helm_upgrades_total
console_helm_uninstalls_total
followed by at most 1000 lines of ALERTS metric

* Location in archive: config/metrics
* See: docs/insights-archive-sample/config/metrics
* Id in config: clusterconfig/metrics
* Since version:
- "etcd_object_counts": 4.3+
- "cluster_installer": 4.3+
- "ALERTS": 4.3+
- "namespace:container_cpu_usage_seconds_total:sum_rate": 4.5+
- "namespace:container_memory_usage_bytes:sum": 4.5+
- "virt_platform metric": 4.6.34+, 4.7.16+, 4.8+
- "vsphere_node_hw_version_total": 4.7.11+, 4.8+
- "console_helm_installs_total": 4.11+
- `virt_platform`
- `etcd_object_counts`
- `cluster_installer`
- `vsphere_node_hw_version_total`
- namespace CPU and memory usage
- `console_helm_installs_total`
- `console_helm_upgrades_total`
- `console_helm_uninstalls_total`
- openshift_apps_deploymentconfigs_strategy_total
- followed by at most 1000 lines of `ALERTS` metric

### API Reference
None

### Sample data
- docs/insights-archive-sample/config/metrics

### Location in archive
- `config/metrics`

### Config ID
`clusterconfig/metrics`

### Released version
- 4.3.0

### Backported versions
None

### Changes
- `etcd_object_counts` introduced in version 4.3+
- `cluster_installer` introduced in version 4.3+
- `ALERTS` introduced in version 4.3+
- `namespace:container_cpu_usage_seconds_total:sum_rate` introduced in version 4.5+
- `namespace:container_memory_usage_bytes:sum` introduced in version 4.5+
- `virt_platform metric` introduced in version 4.6.34+, 4.7.16+, 4.8+
- `vsphere_node_hw_version_total` introduced in version 4.7.11+, 4.8+
- `console_helm_installs_total` introduced in version 4.11+
- `console_helm_upgrades_total` introduced in version 4.12+
- `console_helm_uninstalls_total` introduced in version 4.12+
- `openshift_apps_deploymentconfigs_strategy_total` introduced in version 4.13+


## MutatingWebhookConfigurations
Expand Down
4 changes: 4 additions & 0 deletions docs/insights-archive-sample/config/metrics
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,10 @@ namespace:container_memory_usage_bytes:sum{namespace="openshift-network-operator
namespace:container_memory_usage_bytes:sum{namespace="openshift-config-operator",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-1"} 5.3141504e+07 1612793316662
namespace:container_memory_usage_bytes:sum{namespace="openshift-controller-manager",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-1"} 1.24739584e+08 1612793316662
namespace:container_memory_usage_bytes:sum{namespace="openshift-etcd-operator",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-1"} 1.19230464e+08 1612793316662
# TYPE openshift_apps_deploymentconfigs_strategy_total untyped
openshift_apps_deploymentconfigs_strategy_total{container="controller-manager",endpoint="https",instance="10.129.0.48:8443",job="controller-manager",namespace="openshift-controller-manager",pod="controller-manager-5b548f5447-dq5wx",service="controller-manager",type="custom",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 0 1675166408404
openshift_apps_deploymentconfigs_strategy_total{container="controller-manager",endpoint="https",instance="10.129.0.48:8443",job="controller-manager",namespace="openshift-controller-manager",pod="controller-manager-5b548f5447-dq5wx",service="controller-manager",type="recreate",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 0 1675166408404
openshift_apps_deploymentconfigs_strategy_total{container="controller-manager",endpoint="https",instance="10.129.0.48:8443",job="controller-manager",namespace="openshift-controller-manager",pod="controller-manager-5b548f5447-dq5wx",service="controller-manager",type="rolling",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 0 1675166408404
# TYPE vsphere_node_hw_version_total untyped
vsphere_node_hw_version_total{container="vsphere-problem-detector-operator",endpoint="vsphere-metrics",hw_version="vmx-13",instance="10.128.0.25:8444",job="vsphere-problem-detector-metrics",namespace="openshift-cluster-storage-operator",pod="vsphere-problem-detector-operator-7f746856d4-78lnn",service="vsphere-problem-detector-metrics",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 6 1619708403480
# TYPE virt_platform untyped
Expand Down
4 changes: 2 additions & 2 deletions pkg/controller/status/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -442,11 +442,11 @@ func handleControllerStatusError(errs []string, errorReason string) (reason, mes
sort.Strings(errs)
message = fmt.Sprintf("There are multiple errors blocking progress:\n* %s", strings.Join(errs, "\n* "))
} else if len(errs) == 1 {
message = errs[0]
reason = errorReason
if len(errorReason) == 0 {
reason = "UnknownError"
}
message = errs[0]
reason = errorReason
}
return reason, message
}
Expand Down
64 changes: 42 additions & 22 deletions pkg/gatherers/clusterconfig/recent_metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,32 +21,51 @@ const (
metricsAlertsLinesLimit = 1000
)

// GatherMostRecentMetrics gathers cluster Federated Monitoring metrics.
// GatherMostRecentMetrics Collects cluster Federated Monitoring metrics.
//
// The GET REST query to URL /federate
// Gathered metrics:
// virt_platform
// etcd_object_counts
// cluster_installer
// vsphere_node_hw_version_total
// namespace CPU and memory usage
// console_helm_installs_total
// console_helm_upgrades_total
// console_helm_uninstalls_total
// followed by at most 1000 lines of ALERTS metric
// - `virt_platform`
// - `etcd_object_counts`
// - `cluster_installer`
// - `vsphere_node_hw_version_total`
// - namespace CPU and memory usage
// - `console_helm_installs_total`
// - `console_helm_upgrades_total`
// - `console_helm_uninstalls_total`
// - openshift_apps_deploymentconfigs_strategy_total
// - followed by at most 1000 lines of `ALERTS` metric
//
// * Location in archive: config/metrics
// * See: docs/insights-archive-sample/config/metrics
// * Id in config: clusterconfig/metrics
// * Since version:
// - "etcd_object_counts": 4.3+
// - "cluster_installer": 4.3+
// - "ALERTS": 4.3+
// - "namespace:container_cpu_usage_seconds_total:sum_rate": 4.5+
// - "namespace:container_memory_usage_bytes:sum": 4.5+
// - "virt_platform metric": 4.6.34+, 4.7.16+, 4.8+
// - "vsphere_node_hw_version_total": 4.7.11+, 4.8+
// - "console_helm_installs_total": 4.11+
// ### API Reference
// None
//
// ### Sample data
// - docs/insights-archive-sample/config/metrics
//
// ### Location in archive
// - `config/metrics`
//
// ### Config ID
// `clusterconfig/metrics`
//
// ### Released version
// - 4.3.0
//
// ### Backported versions
// None
//
// ### Changes
// - `etcd_object_counts` introduced in version 4.3+
// - `cluster_installer` introduced in version 4.3+
// - `ALERTS` introduced in version 4.3+
// - `namespace:container_cpu_usage_seconds_total:sum_rate` introduced in version 4.5+
// - `namespace:container_memory_usage_bytes:sum` introduced in version 4.5+
// - `virt_platform metric` introduced in version 4.6.34+, 4.7.16+, 4.8+
// - `vsphere_node_hw_version_total` introduced in version 4.7.11+, 4.8+
// - `console_helm_installs_total` introduced in version 4.11+
// - `console_helm_upgrades_total` introduced in version 4.12+
// - `console_helm_uninstalls_total` introduced in version 4.12+
// - `openshift_apps_deploymentconfigs_strategy_total` introduced in version 4.13+
func (g *Gatherer) GatherMostRecentMetrics(ctx context.Context) ([]record.Record, []error) {
metricsRESTClient, err := rest.RESTClientFor(g.metricsGatherKubeConfig)
if err != nil {
Expand All @@ -68,6 +87,7 @@ func gatherMostRecentMetrics(ctx context.Context, metricsClient rest.Interface)
Param("match[]", "console_helm_installs_total").
Param("match[]", "console_helm_upgrades_total").
Param("match[]", "console_helm_uninstalls_total").
Param("match[]", "openshift_apps_deploymentconfigs_strategy_total").
DoRaw(ctx)
if err != nil {
klog.Errorf("Unable to retrieve most recent metrics: %v", err)
Expand Down

0 comments on commit 135dcfb

Please sign in to comment.