Skip to content

Commit

Permalink
enabled metrics-collector for ICL
Browse files Browse the repository at this point in the history
  • Loading branch information
reggeenr authored and qu1queee committed Dec 16, 2024
1 parent a148d1f commit 37a6677
Show file tree
Hide file tree
Showing 13 changed files with 665 additions and 8 deletions.
81 changes: 79 additions & 2 deletions metrics-collector/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

Code Engine job that demonstrates how to collect resource metrics (CPU, memory and disk usage) of running Code Engine apps, jobs, and builds

![Dashboard overview](./images/icl-dashboard-overview.png)

## Installation

### Capture metrics every n seconds
Expand All @@ -17,11 +19,11 @@ $ ibmcloud ce job create \
--wait
```

* Submit a daemon job that collects metrics in an endless loop. The daemon job queries the Metrics API every 10 seconds
* Submit a daemon job that collects metrics in an endless loop. The daemon job queries the Metrics API every 30 seconds
```
$ ibmcloud ce jobrun submit \
--job metrics-collector \
--env INTERVAL=10
--env INTERVAL=30
```


Expand Down Expand Up @@ -57,6 +59,81 @@ One can use the environment variable `COLLECT_DISKUSAGE=true` to also collect th

Once your IBM Cloud Code Engine project has detected a corresponding IBM Cloud Logs instance, which is configured to receive platform logs, you can consume the resource metrics in IBM Cloud Logs. Use the filter `metric:instance-resources` to filter for log lines that print resource metrics for each detected IBM Cloud Code Engine instance that is running in a project.

### Custom dashboard

Follow the steps below to create a custom dashboard in your IBM Cloud Logs instance, to gain insights into resource consumption metrics.

![Dashboard overview](./images/icl-dashboard-overview.png)

**Setup instructions:**

* Navigate to the "Custom dashboards" view, hover of the "New" button, and click "Import dashboard"

![New dashboard](./images/icl-dashboard-new.png)

* In the "Import" modal, select the file [./setup/dashboard-code_engine_resource_consumption_metrics.json](./setup/dashboard-code_engine_resource_consumption_metrics.json) located in this repository, and click "Import"

![Import modal](./images/icl-dashboard-import.png)

* Confirm the import by clicking "Import" again

![Import confirmation](./images/icl-dashboard-import-confirm.png)


### Logs view

Follow the steps below to create a Logs view in your IBM Cloud Logs instance, that allows you to drill into individual instance-resources log lines.

![Logs overview](./images/icl-logs-view-overview.png)

**Setup instructions:**

* Filter only log lines related collected istio-proxy logs, by filtering for the following query
```
app:"codeengine" AND message.metric:"instance-resources"
```

![Query](./images/icl-logs-view-query.png)

* In the left bar, click "Add Filter" and add the following filters
* `Application`
* `App`
* `Label.Project`
* `Message.Component_name`

![Filters](./images/icl-logs-view-filters.png)

* In the top-right corner, click on "Columns" and configure the following columns:
* `Timestamp`
* `label.Project`
* `message.component_type`
* `message.component_name`
* `message.message`
* `Text`

![Columns](./images/icl-logs-view-columns.png)

* Once applied adjust the column widths appropriately

* In the top-right corner, select `1-line` as view mode

![View](./images/icl-logs-view-mode.png)

* In the graph title it says "**Count** all grouped by **Severity**". Click on `Severity` and select `message.component_name` instead. Furthermore, select `Max` as aggregation metric and choose `message.memory.usage` as aggregation field

![Graph](./images/icl-logs-view-graph.png)

* Save the view

![Save](./images/icl-logs-view-save.png)

* Utilize the custom logs view to drill into HTTP requests

![Logs overview](./images/icl-logs-view-overview.png)


## IBM Log Analysis setup (deprecated)

### Log lines

Along with a human readable message, like `Captured metrics of app instance 'load-generator-00001-deployment-677d5b7754-ktcf6': 3m vCPU, 109 MB memory, 50 MB ephemeral storage`, each log line passes specific resource utilization details in a structured way allowing to apply advanced filters on them.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added metrics-collector/images/icl-dashboard-import.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added metrics-collector/images/icl-dashboard-new.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added metrics-collector/images/icl-logs-view-graph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added metrics-collector/images/icl-logs-view-save.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
19 changes: 13 additions & 6 deletions metrics-collector/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,12 @@ func main() {
}

// If the 'INTERVAL' env var is set then sleep for that many seconds
sleepDuration := 10
sleepDuration := 30
if t := os.Getenv("INTERVAL"); t != "" {
sleepDuration, _ = strconv.Atoi(t)
if sleepDuration < 30 {
sleepDuration = 30
}
}

// In daemon mode, collect resource metrics in an endless loop
Expand Down Expand Up @@ -111,10 +114,10 @@ func collectInstanceMetrics() {

// fetches all pods
pods := getAllPods(coreClientset, namespace, config)

// fetch all pod metrics
podMetrics := getAllPodMetrics(namespace, config)

var wg sync.WaitGroup

for _, metric := range *podMetrics {
Expand Down Expand Up @@ -258,7 +261,7 @@ func getAllPods(coreClientset *kubernetes.Clientset, namespace string, config *r

// Helper function to retrieve all pods from the Kube API
func obtainDiskUsage(coreClientset *kubernetes.Clientset, namespace string, pod string, container string, config *rest.Config) float64 {

// per default, we do not collect disk space statistics
if os.Getenv("COLLECT_DISKUSAGE") != "true" {
return 0
Expand Down Expand Up @@ -304,12 +307,16 @@ func obtainDiskUsage(coreClientset *kubernetes.Clientset, namespace string, pod

// Render captured system error messages, in case the stdout stream did not receive any valid content
if err != nil {
fmt.Println("obtainDiskUsage of pod:" + pod + "/container:" + container + " failed with a stream err - " + err.Error() + " - stderr: '" + errBuf.String() + "'")
if err.Error() == "Internal error occurred: failed calling webhook \"validating.webhook.pod-exec-auth-check.codeengine.cloud.ibm.com\": failed to call webhook: Post \"https://validating-webhook-serving.ibm-cfn-system.svc:443/validate/pod-exec?timeout=5s\": EOF" {
// Do nothing and silently ignore this issue as it is most likely related to pod terminations
} else {
fmt.Println("obtainDiskUsage of pod:" + pod + "/container:" + container + " failed with a stream err - " + err.Error() + " - stderr: '" + errBuf.String() + "'")
}
}

return float64(0)
}

// Parse the output "4000 /" by splitting the words
diskUsageOutput := strings.Fields(strings.TrimSuffix(diskUsageOutputStr, "\n"))
if len(diskUsageOutput) > 2 {
Expand Down
Loading

0 comments on commit 37a6677

Please sign in to comment.