Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-4.10] OCPBUGS-6835: feat(recent_metrics) adds openshift_apps_deploymentconfigs_strategy_total #740

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,8 @@ main
.dockerignore
devspace.yaml

cover.out
cover.out
_output

# MacOS
.DS_Store
65 changes: 42 additions & 23 deletions docs/gathered-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -510,32 +510,51 @@ Response see:

## MostRecentMetrics

gathers cluster Federated Monitoring metrics.
Collects cluster Federated Monitoring metrics.

The GET REST query to URL /federate
Gathered metrics:
virt_platform
etcd_object_counts
cluster_installer
vsphere_node_hw_version_total
namespace CPU and memory usage
console_helm_installs_total
console_helm_upgrades_total
console_helm_uninstalls_total
followed by at most 1000 lines of ALERTS metric

* Location in archive: config/metrics
* See: docs/insights-archive-sample/config/metrics
* Id in config: metrics
* Since version:
- "etcd_object_counts": 4.3+
- "cluster_installer": 4.3+
- "ALERTS": 4.3+
- "namespace:container_cpu_usage_seconds_total:sum_rate": 4.5+
- "namespace:container_memory_usage_bytes:sum": 4.5+
- "virt_platform metric": 4.6.34+, 4.7.16+, 4.8+
- "vsphere_node_hw_version_total": 4.7.11+, 4.8+
- "console_helm_installs_total": 4.10+
- `virt_platform`
- `etcd_object_counts`
- `cluster_installer`
- `vsphere_node_hw_version_total`
- namespace CPU and memory usage
- `console_helm_installs_total`
- `console_helm_upgrades_total`
- `console_helm_uninstalls_total`
- `openshift_apps_deploymentconfigs_strategy_total`
- followed by at most 1000 lines of `ALERTS` metric

### API Reference
None

### Sample data
- docs/insights-archive-sample/config/metrics

### Location in archive
- `config/metrics`

### Config ID
`clusterconfig/metrics`

### Released version
- 4.3.0

### Backported versions
None

### Changes
- `etcd_object_counts` introduced in version 4.3+
- `cluster_installer` introduced in version 4.3+
- `ALERTS` introduced in version 4.3+
- `namespace:container_cpu_usage_seconds_total:sum_rate` introduced in version 4.5+
- `namespace:container_memory_usage_bytes:sum` introduced in version 4.5+
- `virt_platform metric` introduced in version 4.6.34+, 4.7.16+, 4.8+
- `vsphere_node_hw_version_total` introduced in version 4.7.11+, 4.8+
- `console_helm_installs_total` introduced in version 4.11+
- `console_helm_upgrades_total` introduced in version 4.12+
- `console_helm_uninstalls_total` introduced in version 4.12+
- `openshift_apps_deploymentconfigs_strategy_total` introduced in version 4.13+


## MutatingWebhookConfigurations
Expand Down
4 changes: 4 additions & 0 deletions docs/insights-archive-sample/config/metrics
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,10 @@ namespace:container_memory_usage_bytes:sum{namespace="openshift-network-operator
namespace:container_memory_usage_bytes:sum{namespace="openshift-config-operator",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-1"} 5.3141504e+07 1612793316662
namespace:container_memory_usage_bytes:sum{namespace="openshift-controller-manager",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-1"} 1.24739584e+08 1612793316662
namespace:container_memory_usage_bytes:sum{namespace="openshift-etcd-operator",instance="",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-1"} 1.19230464e+08 1612793316662
# TYPE openshift_apps_deploymentconfigs_strategy_total untyped
openshift_apps_deploymentconfigs_strategy_total{container="controller-manager",endpoint="https",instance="10.129.0.48:8443",job="controller-manager",namespace="openshift-controller-manager",pod="controller-manager-5b548f5447-dq5wx",service="controller-manager",type="custom",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 0 1675166408404
openshift_apps_deploymentconfigs_strategy_total{container="controller-manager",endpoint="https",instance="10.129.0.48:8443",job="controller-manager",namespace="openshift-controller-manager",pod="controller-manager-5b548f5447-dq5wx",service="controller-manager",type="recreate",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 0 1675166408404
openshift_apps_deploymentconfigs_strategy_total{container="controller-manager",endpoint="https",instance="10.129.0.48:8443",job="controller-manager",namespace="openshift-controller-manager",pod="controller-manager-5b548f5447-dq5wx",service="controller-manager",type="rolling",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 0 1675166408404
# TYPE vsphere_node_hw_version_total untyped
vsphere_node_hw_version_total{container="vsphere-problem-detector-operator",endpoint="vsphere-metrics",hw_version="vmx-13",instance="10.128.0.25:8444",job="vsphere-problem-detector-metrics",namespace="openshift-cluster-storage-operator",pod="vsphere-problem-detector-operator-7f746856d4-78lnn",service="vsphere-problem-detector-metrics",prometheus="openshift-monitoring/k8s",prometheus_replica="prometheus-k8s-0"} 6 1619708403480
# TYPE virt_platform untyped
Expand Down
4 changes: 2 additions & 2 deletions pkg/controller/status/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -407,11 +407,11 @@ func handleControllerStatusError(errs []string, errorReason string) (reason, mes
sort.Strings(errs)
message = fmt.Sprintf("There are multiple errors blocking progress:\n* %s", strings.Join(errs, "\n* "))
} else if len(errs) == 1 {
message = errs[0]
reason = errorReason
if len(errorReason) == 0 {
reason = "UnknownError"
}
message = errs[0]
reason = errorReason
}
return reason, message
}
Expand Down
64 changes: 42 additions & 22 deletions pkg/gatherers/clusterconfig/recent_metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,32 +21,51 @@ const (
metricsAlertsLinesLimit = 1000
)

// GatherMostRecentMetrics gathers cluster Federated Monitoring metrics.
// GatherMostRecentMetrics Collects cluster Federated Monitoring metrics.
//
// The GET REST query to URL /federate
// Gathered metrics:
// virt_platform
// etcd_object_counts
// cluster_installer
// vsphere_node_hw_version_total
// namespace CPU and memory usage
// console_helm_installs_total
// console_helm_upgrades_total
// console_helm_uninstalls_total
// followed by at most 1000 lines of ALERTS metric
// - `virt_platform`
// - `etcd_object_counts`
// - `cluster_installer`
// - `vsphere_node_hw_version_total`
// - namespace CPU and memory usage
// - `console_helm_installs_total`
// - `console_helm_upgrades_total`
// - `console_helm_uninstalls_total`
// - `openshift_apps_deploymentconfigs_strategy_total`
// - followed by at most 1000 lines of `ALERTS` metric
//
// * Location in archive: config/metrics
// * See: docs/insights-archive-sample/config/metrics
// * Id in config: metrics
// * Since version:
// - "etcd_object_counts": 4.3+
// - "cluster_installer": 4.3+
// - "ALERTS": 4.3+
// - "namespace:container_cpu_usage_seconds_total:sum_rate": 4.5+
// - "namespace:container_memory_usage_bytes:sum": 4.5+
// - "virt_platform metric": 4.6.34+, 4.7.16+, 4.8+
// - "vsphere_node_hw_version_total": 4.7.11+, 4.8+
// - "console_helm_installs_total": 4.10+
// ### API Reference
// None
//
// ### Sample data
// - docs/insights-archive-sample/config/metrics
//
// ### Location in archive
// - `config/metrics`
//
// ### Config ID
// `clusterconfig/metrics`
//
// ### Released version
// - 4.3.0
//
// ### Backported versions
// None
//
// ### Changes
// - `etcd_object_counts` introduced in version 4.3+
// - `cluster_installer` introduced in version 4.3+
// - `ALERTS` introduced in version 4.3+
// - `namespace:container_cpu_usage_seconds_total:sum_rate` introduced in version 4.5+
// - `namespace:container_memory_usage_bytes:sum` introduced in version 4.5+
// - `virt_platform metric` introduced in version 4.6.34+, 4.7.16+, 4.8+
// - `vsphere_node_hw_version_total` introduced in version 4.7.11+, 4.8+
// - `console_helm_installs_total` introduced in version 4.11+
// - `console_helm_upgrades_total` introduced in version 4.12+
// - `console_helm_uninstalls_total` introduced in version 4.12+
// - `openshift_apps_deploymentconfigs_strategy_total` introduced in version 4.13+
func (g *Gatherer) GatherMostRecentMetrics(ctx context.Context) ([]record.Record, []error) {
metricsRESTClient, err := rest.RESTClientFor(g.metricsGatherKubeConfig)
if err != nil {
Expand All @@ -68,6 +87,7 @@ func gatherMostRecentMetrics(ctx context.Context, metricsClient rest.Interface)
Param("match[]", "console_helm_installs_total").
Param("match[]", "console_helm_upgrades_total").
Param("match[]", "console_helm_uninstalls_total").
Param("match[]", "openshift_apps_deploymentconfigs_strategy_total").
DoRaw(ctx)
if err != nil {
klog.Errorf("Unable to retrieve most recent metrics: %v", err)
Expand Down