feat: More performance metrics #4823

simster7 · 2021-01-04T18:11:06Z

Closes: #4800

Signed-off-by: Simon Behar simbeh7@gmail.com

Checklist:

My organization is added to USERS.md.

Signed-off-by: Simon Behar <simbeh7@gmail.com>

workflow/cron/controller.go

alexec

I did something similar to this recently, but I took a slightly different approach. I created a new queue:

metrics.NewWorkQueueWithMetrics(workqueue.NewNamedRateLimitingQueue(&fixedItemIntervalRateLimiter{}, "workflow_queue"), "workflow_queue")

I don't think there's much in it mind you. Just a thought.

alexec · 2021-01-04T18:40:25Z

@simster7 I've got a few metrics we might want to do at some point:

Busy workers.
K8S API Metrics
Count of workflow pods by phase (similar impl to count of workflows)
Count of workflows that actually have running pods (by adding a condition).

I did PoC for these, but 3 and 4 are a bit complex and I don't know if we need both.

simster7 · 2021-01-04T18:45:18Z

Sure, I'll look into this

Signed-off-by: Simon Behar <simbeh7@gmail.com>

simster7 · 2021-01-05T17:42:37Z

~~Busy workers~~ Added

K8S API Metrics: not sure it should be our responsibility to emit K8s metrics. Shouldn't K8s have an endpoint for these already? K8s emits metrics already: https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/

Count of workflow pods by phase (similar impl to count of workflows): currently have only 1 metric for active pods (i.e. phase==Running). I had actually first set the metric for all phases, but that would require changing our Pod informer from only looking at not completed pods to looking at all pods. If we think this is worth it, I can work to ensure that changing the nature of the informer doesn't have any intended consequences.

Count of workflows that actually have running pods (by adding a condition): this seems to be tied more to the refactored mentioned in #4824. This should perhaps be changed with that

Signed-off-by: Simon Behar <simbeh7@gmail.com>

workflow/controller/indexes/labels.go

alexec

I think some tests are probably in order for this PR please.

workflow/controller/indexes/labels.go

workflow/controller/indexes/labels_test.go

workflow/metrics/util.go

alexec · 2021-01-05T22:19:28Z

workflow/metrics/metrics.go

@@ -205,6 +213,31 @@ func (m *Metrics) CronWorkflowSubmissionError() {
 	m.errors[ErrorCauseCronWorkflowSubmissionError].Inc()
 }

+func (m *Metrics) WorkerBusy(workerType string) {
+	m.mutex.Lock()


I think these mutex may create contention in the VM - doesn't the gauge already have these?

Sorry, I'm not too sure what you mean. These mutexes are needed to keep concurrency issues between writing/updating metrics and metrics being scraped. While individual metrics may be concurrent safe, the collection of metrics and the metrics scraper are not without these locks. They are present in all other functions in this file.

workflow/metrics/metrics.go

workflow/controller/controller.go

alexec · 2021-01-05T22:23:28Z

Can you take a look at https://github.com/argoproj/argo/pull/4811/files#diff-2721f0228996fd51c5f9a8db168a353b398d051154c82cebb03f03ddd1ee0574?

I think it gives a way to do the workers busy metric very safely.

alexec · 2021-01-05T22:25:16Z

K8S API Metrics: not sure it should be our responsibility to emit K8s metrics. Shouldn't K8s have an endpoint for these already? K8s emits metrics already: https://kubernetes.io/docs/concepts/cluster-administration/system-metrics/

Interesting. @jessesuen are you aware of this?

Count of workflow pods by phase (similar impl to count of workflows): currently have only 1 metric for active pods (i.e. phase==Running). I had actually first set the metric for all phases, but that would require changing our Pod informer from only looking at not completed pods to looking at all pods. If we think this is worth it, I can work to ensure that changing the nature of the informer doesn't have any intended consequences.

I think pods that are pending are useful. I think you can do all in-complete pods.

Count of workflows that actually have running pods (by adding a condition): this seems to be tied more to the refactored mentioned in #4824. This should perhaps be changed with that

This is harder and I'm not 100% sure this is a valuable metric. Lets hold on it.

Signed-off-by: Simon Behar <simbeh7@gmail.com>

workflow/metrics/util.go

Signed-off-by: Simon Behar <simbeh7@gmail.com>

alexec

some minor comments - maybe when you merge give a more specific commit message than the PR currently has?

workflow/controller/indexes/pod_index.go

alexec · 2021-01-07T00:56:06Z

workflow/metrics/util.go

+	return map[v1.PodPhase]prometheus.Gauge{
+		v1.PodPending: prometheus.NewGauge(getOptsByPhase(v1.PodPending)),
+		v1.PodRunning: prometheus.NewGauge(getOptsByPhase(v1.PodRunning)),
+		//v1.PodSucceeded: prometheus.NewGauge(getOptsByPhase(v1.PodSucceeded)),


delete comments?

workflow/metrics/util.go

workflow/metrics/work_queue.go

workflow/metrics/work_queue_test.go

Signed-off-by: Simon Behar <simbeh7@gmail.com>

simster7 added 2 commits January 4, 2021 10:10

feat: More performance metrics

3df84ca

Signed-off-by: Simon Behar <simbeh7@gmail.com>

minor

bcc4155

Signed-off-by: Simon Behar <simbeh7@gmail.com>

alexec reviewed Jan 4, 2021

View reviewed changes

workflow/cron/controller.go Outdated Show resolved Hide resolved

alexec reviewed Jan 4, 2021

View reviewed changes

simster7 added 4 commits January 5, 2021 08:07

Merge branch 'master' into more-metrics

4bb9de7

Name cleanup

1e451c9

Signed-off-by: Simon Behar <simbeh7@gmail.com>

active pods

ebc066b

Signed-off-by: Simon Behar <simbeh7@gmail.com>

tests

f82a632

Signed-off-by: Simon Behar <simbeh7@gmail.com>

simster7 marked this pull request as ready for review January 5, 2021 17:42

fix tests

2f29324

Signed-off-by: Simon Behar <simbeh7@gmail.com>

sarabala1979 assigned alexec Jan 5, 2021

alexec reviewed Jan 5, 2021

View reviewed changes

workflow/controller/indexes/labels.go Outdated Show resolved Hide resolved

alexec reviewed Jan 5, 2021

View reviewed changes

simster7 added 4 commits January 5, 2021 14:52

Merge branch 'master' into more-metrics

076eac1

comments

b13206d

Signed-off-by: Simon Behar <simbeh7@gmail.com>

Use decorator pattern

f860cc0

Signed-off-by: Simon Behar <simbeh7@gmail.com>

test

c89c0fe

Signed-off-by: Simon Behar <simbeh7@gmail.com>

alexec reviewed Jan 6, 2021

View reviewed changes

workflow/metrics/util.go Outdated Show resolved Hide resolved

simster7 added 2 commits January 6, 2021 09:54

remove

2f634f3

Signed-off-by: Simon Behar <simbeh7@gmail.com>

minor

8272611

Signed-off-by: Simon Behar <simbeh7@gmail.com>

alexec approved these changes Jan 7, 2021

View reviewed changes

simster7 added 2 commits January 7, 2021 08:55

Merge branch 'master' into more-metrics

8c8fd5d

done

ea4e236

Signed-off-by: Simon Behar <simbeh7@gmail.com>

simster7 merged commit 6b3ce50 into argoproj:master Jan 7, 2021

simster7 mentioned this pull request Jan 12, 2021

Cherry-pick for v2.12.4 #4863

Closed

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: More performance metrics #4823

feat: More performance metrics #4823

simster7 commented Jan 4, 2021

alexec left a comment

alexec commented Jan 4, 2021

simster7 commented Jan 4, 2021

simster7 commented Jan 5, 2021 •

edited

Loading

alexec left a comment

alexec Jan 5, 2021

simster7 Jan 5, 2021

alexec commented Jan 5, 2021

alexec commented Jan 5, 2021

alexec left a comment

alexec Jan 7, 2021

feat: More performance metrics #4823

feat: More performance metrics #4823

Conversation

simster7 commented Jan 4, 2021

alexec left a comment

Choose a reason for hiding this comment

alexec commented Jan 4, 2021

simster7 commented Jan 4, 2021

simster7 commented Jan 5, 2021 • edited Loading

alexec left a comment

Choose a reason for hiding this comment

alexec Jan 5, 2021

Choose a reason for hiding this comment

simster7 Jan 5, 2021

Choose a reason for hiding this comment

alexec commented Jan 5, 2021

alexec commented Jan 5, 2021

alexec left a comment

Choose a reason for hiding this comment

alexec Jan 7, 2021

Choose a reason for hiding this comment

simster7 commented Jan 5, 2021 •

edited

Loading