additional metrics #579

pohly · 2021-03-10T14:18:18Z

What type of PR is this?
/kind feature

What this PR does / why we need it:

As pointed out in the KEP review for storage capacity tracking, admins need more insights into the operation of the sidecar than just the CSI calls. This PR adds new metrics gathering specifically for storage capacity tracking, but also enables the gather of metrices that already existed and just weren't reported (process and Go runtime, work queues, leaderelection).

Note to reviewer

The commit messages have examples of what the new data looks like. I'm not sure how we want to document this. IMHO we should first decide about central documentation (= kubernetes-csi/docs#339), then add something to each repo as needed.

Does this PR introduce a user-facing change?:

new metrics data (storage capacity tracking, process and Go runtime, work queues, leaderelection)

verult · 2021-03-16T21:36:21Z

/assign

verult

Mostly minor suggestions

verult · 2021-03-17T01:02:40Z

pkg/capacity/capacity_test.go

+	}
+	return nil
+}
+
 // TestCapacityController checks that the controller handles the initial state and


nit: Update function name

verult · 2021-03-22T22:37:38Z

cmd/csi-provisioner/csi-provisioner.go

-	metricsManager := metrics.NewCSIMetricsManager("" /* driverName */)
+	metricsManager := metrics.NewCSIMetricsManagerWithOptions("", /* driverName */
+		// Will be provided via default gatherer.
+		metrics.WithProcessStartTime(false),


How does this affect existing CSI cumulative metrics that rely on process start time, like csi_sidecar_operations_seconds for example?

The process start time metric is registered in the legacy registry. By gathering from there, we still get the same value. If we enabled it here, we would have the same metric from two different places.

verult · 2021-03-22T22:39:39Z

cmd/csi-provisioner/csi-provisioner.go

+	metricsManager := metrics.NewCSIMetricsManagerWithOptions("", /* driverName */
+		// Will be provided via default gatherer.
+		metrics.WithProcessStartTime(false),
+		metrics.WithSubsystem(metrics.SubsystemSidecar),


Is this just to make the default explicit?

verult · 2021-03-22T22:41:25Z

cmd/csi-provisioner/csi-provisioner.go

@@ -227,18 +237,18 @@ func main() {
 		}
 	}

-	// Prepare http endpoint for metrics + leader election healthz
+	// Prepare http endpoint for metrics + leader election healthz + other metrics


nit: "other metrics" is redundant since "metrics" is mentioned in the beginning

verult · 2021-03-22T22:50:49Z

cmd/csi-provisioner/csi-provisioner.go

+		mux.Handle(*metricsPath,
+			promhttp.InstrumentMetricHandler(
+				reg,
+				promhttp.HandlerFor(gatherers, promhttp.HandlerOpts{})))


RegisterToServer() includes a HandlerOpt of ErrorHandling: metrics.ContinueOnError. Is that needed here?

I don't think our gatherers ever return an error, so this shouldn't make a difference.

Even if they did, I'd prefer to get a clear indication that something was broken instead continuing with incomplete data. But I don't feel strongly about this and could add it.

Ok I also don't have a strong preference :)

verult · 2021-03-22T23:08:14Z

pkg/capacity/capacity_test.go

@@ -96,6 +123,9 @@ func TestController(t *testing.T) {
 		modify             func(ctx context.Context, clientSet *fakeclientset.Clientset, expected []testCapacity) (modifiedExpected []testCapacity, err error)
 		capacityChange     func(ctx context.Context, storage *mockCapacity, expected []testCapacity) (modifiedExpected []testCapacity)
 		topologyChange     func(ctx context.Context, topology *topology.Mock, expected []testCapacity) (modifiedExpected []testCapacity)
+
+		expectedObjectsPrepared  objects
+		expectedObjectsProcessed objects


Is it the case that in expectedObjectsProcessed, goal always == current and obsolete always == 0? If so, it might be useful to either add this here as a comment, or just use a single int as the expectation representing both the expected goal and current.

Indeed, using objects as type here is unnecessary complex. I've switched to int.

verult · 2021-03-22T23:12:21Z

pkg/capacity/capacity.go

+	)
+	objectsObsoleteDesc = metrics.NewDesc(
+		"csistoragecapacities_obsolete",
+		"Number of CSIStorageCapacity objects that exist and will be deleted automatically.",


Maybe add to the description mentioning that modified objects are not considered obsolete. I wasn't sure from the current descriptions alone what the behavior of these metrics are with updated objects.

There's no such thing as a "modified object" 😁

There are objects that are already scheduled for a refresh, which then may or may not lead to them being updated. The length of the work queue is an indicator for that.

I added "Objects that exist and may need an update are not considered obsolete and therefore not included in this value." as clarification. But I don't think it actually says something that wasn't already in that existing description.

verult · 2021-03-22T23:15:04Z

pkg/capacity/capacity.go

@@ -115,6 +140,8 @@ type CSICapacityClient interface {
 }

 // NewController creates a new controller for CSIStorageCapacity objects.
+// It implements metrics.StableCollector and thus can be registered in


nit: Consider having var _ metrics.StableCollector = &Controller{} instead of or in addition to this comment

I added the type assertion. That's always useful and not adding it was an oversight.

But as it wouldn't show up in generated documentation, therefore I kept the comment.

verult · 2021-03-22T23:19:04Z

pkg/capacity/capacity.go

@@ -358,8 +385,11 @@ func (c *Controller) addWorkItem(segment *topology.Segment, sc *storagev1.Storag
 		storageClassName: sc.Name,
 	}
 	// Ensure that we have an entry for it...
-	capacity := c.capacities[item]
-	c.capacities[item] = capacity
+	_, found := c.capacities[item]


Just curious: do the two versions do the same thing and the newer version just makes it more explicit, or are there actual functional differences?

This change here alone is not functionally identical: the new line only checks for the capacity without adding it.

But directly underneath it we set the entry to nil if not found, and together with that it is functionally equivalent because capacity := c.capacities[item] will also be nil when not found.

verult · 2021-03-22T23:26:28Z

pkg/capacity/capacity.go

+
+func (c *Controller) isObsolete(capacity *storagev1alpha1.CSIStorageCapacity) bool {
+	for item, _ := range c.capacities {
+		if item.storageClassName == capacity.StorageClassName &&


nit: define an workItem.equals() method, which can be easily identified and reused by future devs

Yes, that is useful. I already had this comparison in two places.

The external-provisioner has several named work queue which all can provide metrics data. We just need to - initialize gathering through the default legacy registry by importing the helper package for that - configure the mux such that the default legacy registry is also included Leader election can be enabled the same way. The default legacy registry also includes information about processes and the Go runtime. Here is some example data without leader election (wasn't enabled): # HELP csi_sidecar_operations_seconds [ALPHA] Container Storage Interface operation duration with gRPC error code status total # TYPE csi_sidecar_operations_seconds histogram ... csi_sidecar_operations_seconds_sum{driver_name="hostpath.csi.k8s.io",grpc_status_code="OK",method_name="/csi.v1.Controller/ControllerGetCapabilities"} 0.001358078 csi_sidecar_operations_seconds_count{driver_name="hostpath.csi.k8s.io",grpc_status_code="OK",method_name="/csi.v1.Controller/ControllerGetCapabilities"} 1 ... csi_sidecar_operations_seconds_sum{driver_name="hostpath.csi.k8s.io",grpc_status_code="OK",method_name="/csi.v1.Controller/GetCapacity"} 0.025071011 csi_sidecar_operations_seconds_count{driver_name="hostpath.csi.k8s.io",grpc_status_code="OK",method_name="/csi.v1.Controller/GetCapacity"} 25 ... csi_sidecar_operations_seconds_sum{driver_name="hostpath.csi.k8s.io",grpc_status_code="OK",method_name="/csi.v1.Identity/GetPluginCapabilities"} 0.001320474 csi_sidecar_operations_seconds_count{driver_name="hostpath.csi.k8s.io",grpc_status_code="OK",method_name="/csi.v1.Identity/GetPluginCapabilities"} 1 ... csi_sidecar_operations_seconds_sum{driver_name="hostpath.csi.k8s.io",grpc_status_code="OK",method_name="/csi.v1.Node/NodeGetInfo"} 0.000994998 csi_sidecar_operations_seconds_count{driver_name="hostpath.csi.k8s.io",grpc_status_code="OK",method_name="/csi.v1.Node/NodeGetInfo"} 1 ... csi_sidecar_operations_seconds_sum{driver_name="unknown-driver",grpc_status_code="OK",method_name="/csi.v1.Identity/GetPluginInfo"} 0.000752024 csi_sidecar_operations_seconds_count{driver_name="unknown-driver",grpc_status_code="OK",method_name="/csi.v1.Identity/GetPluginInfo"} 1 ... csi_sidecar_operations_seconds_sum{driver_name="unknown-driver",grpc_status_code="OK",method_name="/csi.v1.Identity/Probe"} 0.002666977 csi_sidecar_operations_seconds_count{driver_name="unknown-driver",grpc_status_code="OK",method_name="/csi.v1.Identity/Probe"} 1 # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 7.5671e-05 go_gc_duration_seconds{quantile="0.25"} 9.2559e-05 go_gc_duration_seconds{quantile="0.5"} 9.9967e-05 go_gc_duration_seconds{quantile="0.75"} 0.000135593 go_gc_duration_seconds{quantile="1"} 0.000194901 go_gc_duration_seconds_sum 0.0012424 go_gc_duration_seconds_count 11 # HELP go_goroutines Number of goroutines that currently exist. # TYPE go_goroutines gauge go_goroutines 113 # HELP go_info Information about the Go environment. # TYPE go_info gauge go_info{version="go1.16"} 1 # HELP go_memstats_alloc_bytes Number of bytes allocated and still in use. # TYPE go_memstats_alloc_bytes gauge go_memstats_alloc_bytes 4.669544e+06 # HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed. # TYPE go_memstats_alloc_bytes_total counter go_memstats_alloc_bytes_total 1.5654752e+07 # HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table. # TYPE go_memstats_buck_hash_sys_bytes gauge go_memstats_buck_hash_sys_bytes 1.45326e+06 # HELP go_memstats_frees_total Total number of frees. # TYPE go_memstats_frees_total counter go_memstats_frees_total 108160 # HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started. # TYPE go_memstats_gc_cpu_fraction gauge go_memstats_gc_cpu_fraction 6.767118038487869e-06 # HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata. # TYPE go_memstats_gc_sys_bytes gauge go_memstats_gc_sys_bytes 5.536336e+06 # HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use. # TYPE go_memstats_heap_alloc_bytes gauge go_memstats_heap_alloc_bytes 4.669544e+06 # HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used. # TYPE go_memstats_heap_idle_bytes gauge go_memstats_heap_idle_bytes 5.7565184e+07 # HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use. # TYPE go_memstats_heap_inuse_bytes gauge go_memstats_heap_inuse_bytes 7.512064e+06 # HELP go_memstats_heap_objects Number of allocated objects. # TYPE go_memstats_heap_objects gauge go_memstats_heap_objects 25904 # HELP go_memstats_heap_released_bytes Number of heap bytes released to OS. # TYPE go_memstats_heap_released_bytes gauge go_memstats_heap_released_bytes 5.6336384e+07 # HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system. # TYPE go_memstats_heap_sys_bytes gauge go_memstats_heap_sys_bytes 6.5077248e+07 # HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection. # TYPE go_memstats_last_gc_time_seconds gauge go_memstats_last_gc_time_seconds 1.6152977257743602e+09 # HELP go_memstats_lookups_total Total number of pointer lookups. # TYPE go_memstats_lookups_total counter go_memstats_lookups_total 0 # HELP go_memstats_mallocs_total Total number of mallocs. # TYPE go_memstats_mallocs_total counter go_memstats_mallocs_total 134064 # HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures. # TYPE go_memstats_mcache_inuse_bytes gauge go_memstats_mcache_inuse_bytes 43200 # HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system. # TYPE go_memstats_mcache_sys_bytes gauge go_memstats_mcache_sys_bytes 49152 # HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures. # TYPE go_memstats_mspan_inuse_bytes gauge go_memstats_mspan_inuse_bytes 298928 # HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system. # TYPE go_memstats_mspan_sys_bytes gauge go_memstats_mspan_sys_bytes 311296 # HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place. # TYPE go_memstats_next_gc_bytes gauge go_memstats_next_gc_bytes 9.140384e+06 # HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations. # TYPE go_memstats_other_sys_bytes gauge go_memstats_other_sys_bytes 3.742956e+06 # HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator. # TYPE go_memstats_stack_inuse_bytes gauge go_memstats_stack_inuse_bytes 2.031616e+06 # HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator. # TYPE go_memstats_stack_sys_bytes gauge go_memstats_stack_sys_bytes 2.031616e+06 # HELP go_memstats_sys_bytes Number of bytes obtained from system. # TYPE go_memstats_sys_bytes gauge go_memstats_sys_bytes 7.8201864e+07 # HELP go_threads Number of OS threads created. # TYPE go_threads gauge go_threads 31 # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds. # TYPE process_cpu_seconds_total counter process_cpu_seconds_total 1.7 # HELP process_max_fds Maximum number of open file descriptors. # TYPE process_max_fds gauge process_max_fds 1.048576e+06 # HELP process_open_fds Number of open file descriptors. # TYPE process_open_fds gauge process_open_fds 11 # HELP process_resident_memory_bytes Resident memory size in bytes. # TYPE process_resident_memory_bytes gauge process_resident_memory_bytes 4.8300032e+07 # HELP process_start_time_seconds Start time of the process since unix epoch in seconds. # TYPE process_start_time_seconds gauge process_start_time_seconds 1.6152970493e+09 # HELP process_virtual_memory_bytes Virtual memory size in bytes. # TYPE process_virtual_memory_bytes gauge process_virtual_memory_bytes 7.61479168e+08 # HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes. # TYPE process_virtual_memory_max_bytes gauge process_virtual_memory_max_bytes -1 # HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served. # TYPE promhttp_metric_handler_requests_in_flight gauge promhttp_metric_handler_requests_in_flight 1 # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code. # TYPE promhttp_metric_handler_requests_total counter promhttp_metric_handler_requests_total{code="200"} 11 promhttp_metric_handler_requests_total{code="500"} 0 promhttp_metric_handler_requests_total{code="503"} 0 # HELP workqueue_adds_total [ALPHA] Total number of adds handled by workqueue # TYPE workqueue_adds_total counter workqueue_adds_total{name="claims"} 0 workqueue_adds_total{name="csistoragecapacity"} 25 workqueue_adds_total{name="unsavedpvs"} 0 workqueue_adds_total{name="volumes"} 0 # HELP workqueue_depth [ALPHA] Current depth of workqueue # TYPE workqueue_depth gauge workqueue_depth{name="claims"} 0 workqueue_depth{name="csistoragecapacity"} 0 workqueue_depth{name="unsavedpvs"} 0 workqueue_depth{name="volumes"} 0 # HELP workqueue_longest_running_processor_seconds [ALPHA] How many seconds has the longest running processor for workqueue been running. # TYPE workqueue_longest_running_processor_seconds gauge workqueue_longest_running_processor_seconds{name="claims"} 0 workqueue_longest_running_processor_seconds{name="csistoragecapacity"} 0 workqueue_longest_running_processor_seconds{name="unsavedpvs"} 0 workqueue_longest_running_processor_seconds{name="volumes"} 0 # HELP workqueue_queue_duration_seconds [ALPHA] How long in seconds an item stays in workqueue before being requested. # TYPE workqueue_queue_duration_seconds histogram ... workqueue_queue_duration_seconds_sum{name="claims"} 0 workqueue_queue_duration_seconds_count{name="claims"} 0 ... workqueue_queue_duration_seconds_sum{name="csistoragecapacity"} 0.229351696 workqueue_queue_duration_seconds_count{name="csistoragecapacity"} 25 ... workqueue_queue_duration_seconds_sum{name="unsavedpvs"} 0 workqueue_queue_duration_seconds_count{name="unsavedpvs"} 0 ... workqueue_queue_duration_seconds_sum{name="volumes"} 0 workqueue_queue_duration_seconds_count{name="volumes"} 0 # HELP workqueue_retries_total [ALPHA] Total number of retries handled by workqueue # TYPE workqueue_retries_total counter workqueue_retries_total{name="claims"} 0 workqueue_retries_total{name="csistoragecapacity"} 0 workqueue_retries_total{name="unsavedpvs"} 0 workqueue_retries_total{name="volumes"} 0 # HELP workqueue_unfinished_work_seconds [ALPHA] How many seconds of work has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases. # TYPE workqueue_unfinished_work_seconds gauge workqueue_unfinished_work_seconds{name="claims"} 0 workqueue_unfinished_work_seconds{name="csistoragecapacity"} 0 workqueue_unfinished_work_seconds{name="unsavedpvs"} 0 workqueue_unfinished_work_seconds{name="volumes"} 0 # HELP workqueue_work_duration_seconds [ALPHA] How long in seconds processing an item from workqueue takes. # TYPE workqueue_work_duration_seconds histogram ... workqueue_work_duration_seconds_sum{name="claims"} 0 workqueue_work_duration_seconds_count{name="claims"} 0 ... workqueue_work_duration_seconds_sum{name="csistoragecapacity"} 0.04584993 workqueue_work_duration_seconds_count{name="csistoragecapacity"} 25 ... workqueue_work_duration_seconds_sum{name="unsavedpvs"} 0 workqueue_work_duration_seconds_count{name="unsavedpvs"} 0 ... workqueue_work_duration_seconds_sum{name="volumes"} 0 workqueue_work_duration_seconds_count{name="volumes"} 0

This adds gauges that describe how many objects the external-provisioner manages, how many of those already exist, and how many are to be deleted. Example: # HELP csistoragecapacities_desired_current [ALPHA] Number of CSIStorageCapacity objects that exist and are supposed to be managed automatically. # TYPE csistoragecapacities_desired_current gauge csistoragecapacities_desired_current 2 # HELP csistoragecapacities_desired_goal [ALPHA] Number of CSIStorageCapacity objects that are supposed to be managed automatically. # TYPE csistoragecapacities_desired_goal gauge csistoragecapacities_desired_goal 2 # HELP csistoragecapacities_obsolete [ALPHA] Number of CSIStorageCapacity objects that exist and will be deleted automatically. # TYPE csistoragecapacities_obsolete gauge csistoragecapacities_obsolete 0

The workItem against capacity comparison was done in two places. A helper function makes the code easier to read.

pohly · 2021-03-23T15:00:39Z

@verult : updated and rebased - please take another look.

verult

/lgtm

/assign @msau42
for approval

verult · 2021-03-23T20:24:17Z

pkg/capacity/capacity_test.go

@@ -96,6 +123,9 @@ func TestController(t *testing.T) {
 		modify             func(ctx context.Context, clientSet *fakeclientset.Clientset, expected []testCapacity) (modifiedExpected []testCapacity, err error)
 		capacityChange     func(ctx context.Context, storage *mockCapacity, expected []testCapacity) (modifiedExpected []testCapacity)
 		topologyChange     func(ctx context.Context, topology *topology.Mock, expected []testCapacity) (modifiedExpected []testCapacity)
+
+		expectedObjectsPrepared objects
+		expectedTotalProcessed  int64


nit: name it something like expectedTotalFinal. "Total Processed" could also imply the number of objects processed

"Processed" refers to the step when the check runs (= after processing the work queue). If the tests ever get extended to do more work, the "final" might become misleading. I therefore prefer "Processed".

verult · 2021-03-23T20:44:21Z

pkg/capacity/capacity.go

+	)
+	objectsObsoleteDesc = metrics.NewDesc(
+		"csistoragecapacities_obsolete",
+		"Number of CSIStorageCapacity objects that exist and will be deleted automatically. Objects that exist and may need an update are not considered obsolete and therefore not included in this value.",


Your comment makes sense to me and I don't have a strong preference either way whether to add that part of the description

@msau42 I'll leave it as-is unless you have some other preference.

The comment was the one in #579 (comment)

msau42 · 2021-03-25T02:22:59Z

/assign @Jiawei0227
actually I think this is the PR for metrics

pohly · 2021-03-25T16:28:35Z

/assign @Jiawei0227

Somehow this previously had no effect?

Jiawei0227

/lgtm

I am not familiar with the storage capacity feature so I think @verult had more context. From the metrics perspective, honestly it is a mess already in the k8s metrics now... I think until that is figured out, this change is okay from my point of view.

But honestly I think we need to do something with the csi-lib-utils to make the whole thing of adding metrics to sidecars more elegant. The purpose of csi-lib-utils is to provide the central metrics registry for these sidecars. But I dont know if that would be a too much change or I guess that can be a separate issue...

Jiawei0227 · 2021-03-26T22:07:13Z

cmd/csi-provisioner/csi-provisioner.go

+		// https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/component-base/metrics/prometheus/clientgo/leaderelection/metrics.go
+		//
+		// Also to happens to include Go runtime and process metrics:
+		// https://github.com/kubernetes/kubernetes/blob/9780d88cb6a4b5b067256ecb4abf56892093ee87/staging/src/k8s.io/component-base/metrics/legacyregistry/registry.go#L46-L49


I once asked for component-base to include the whole set of runtime and process metrics here: kubernetes/kubernetes#95152

But they decided not to include because of stability concern.

I guess that only leaves us the option to either manually add those metrics or use legacyregistry...? Dont really know how this will end up... It just feel messy for me overall for the metrics in k8s.

I once asked for component-base to include the whole set of runtime and process metrics here: kubernetes/kubernetes#95152

But they decided not to include because of stability concern.

Hmm, isn't https://github.com/kubernetes/kubernetes/blob/9780d88cb6a4b5b067256ecb4abf56892093ee87/staging/src/k8s.io/component-base/metrics/legacyregistry/registry.go#L46-L49 doing what you asked for?

Or do you want them to be added elsewhere?

I have not found documentation about the role this metrics/legacyregistry package plays respectively is meant to play. I just figured out that in order to get access to workqueue metrics, I have to gather from that legacyregistry.

Yes, this code you reference does what I asked for. But they are under this legacyregistry which implies this is not something we might not want to use moving forward. So I was asking them to have something in the new registry.

I guess my main concern is that since we are using a lot legacyregistry. And I am afraid that for some later k8s release if the legacyregistry get deprecated and removed. We will have more work to do here. But this seems a bigger issue in the whole k8s

Jiawei0227 · 2021-03-26T22:09:38Z

cmd/csi-provisioner/csi-provisioner.go

@@ -458,6 +468,32 @@ func main() {
 			*capacityPollInterval,
 			*capacityImmediateBinding,
 		)
+		legacyregistry.CustomMustRegister(capacityController)


We can change the csi-lib-utils metricsManager constructor function to include an KubeRegistry instead of creating our own I guess?

Jiawei0227 · 2021-03-26T22:11:42Z

pkg/capacity/capacity.go

@@ -107,6 +115,28 @@ var (
 		Factor:   1.1,
 		Steps:    10,
 	}
+
+	objectsGoalDesc = metrics.NewDesc(
+		"csistoragecapacities_desired_goal",


nit: csistoragecapacities seems very long, do you think it makes sense to break them down a little bit like csi_storage_capacities_... totally up to you.

I prefer csistoragecapacities for consistency with kubectl get csistoragecapacities.

pohly · 2021-03-29T08:07:05Z

I am not familiar with the storage capacity feature so I think @verult had more context. From the metrics perspective, honestly it is a mess already in the k8s metrics now...

I think your opinion about the metrics changes was what Michelle was after, so thanks for taking a look. FWIW, I tend to agree with your assessment of the overall situation in k8s metrics.

But honestly I think we need to do something with the csi-lib-utils to make the whole thing of adding metrics to sidecars more elegant. The purpose of csi-lib-utils is to provide the central metrics registry for these sidecars. But I dont know if that would be a too much change or I guess that can be a separate issue...

Separate issue, please.

pohly · 2021-03-29T08:23:49Z

@msau42 : both @verult and @Jiawei0227 are okay with the current state, please approve.

msau42 · 2021-03-29T18:03:43Z

/approve

k8s-ci-robot · 2021-03-29T18:03:57Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, pohly, verult

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [msau42]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 10, 2021

k8s-ci-robot requested review from lpabon and saad-ali March 10, 2021 14:18

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 10, 2021

pohly force-pushed the storage-capacity-metrics branch 2 times, most recently from 8b96197 to cf1f5af Compare March 10, 2021 18:23

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 10, 2021

pohly force-pushed the storage-capacity-metrics branch 2 times, most recently from 24e5a70 to 306d40d Compare March 10, 2021 18:24

pohly mentioned this pull request Mar 11, 2021

storage capacity: object ownership #583

Merged

k8s-ci-robot assigned verult Mar 16, 2021

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 18, 2021

verult reviewed Mar 23, 2021

View reviewed changes

pohly added 3 commits March 23, 2021 15:34

capacity: code cleanup

2b79a45

The workItem against capacity comparison was done in two places. A helper function makes the code easier to read.

pohly force-pushed the storage-capacity-metrics branch from 306d40d to 2b79a45 Compare March 23, 2021 14:35

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 23, 2021

verult approved these changes Mar 23, 2021

View reviewed changes

k8s-ci-robot assigned msau42 Mar 23, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 23, 2021

verult mentioned this pull request Mar 25, 2021

storage capacity: maximum volume size #584

Merged

k8s-ci-robot assigned Jiawei0227 Mar 25, 2021

Jiawei0227 reviewed Mar 26, 2021

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 29, 2021

k8s-ci-robot merged commit e11ccd6 into kubernetes-csi:master Mar 29, 2021

additional metrics #579

additional metrics #579

Conversation

pohly commented Mar 10, 2021 • edited Loading

verult commented Mar 16, 2021

verult left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly commented Mar 23, 2021

verult left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msau42 commented Mar 25, 2021

pohly commented Mar 25, 2021 • edited Loading

Jiawei0227 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly Mar 29, 2021 • edited Loading

Choose a reason for hiding this comment

Jiawei0227 Mar 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pohly commented Mar 29, 2021

pohly commented Mar 29, 2021

msau42 commented Mar 29, 2021

k8s-ci-robot commented Mar 29, 2021

pohly commented Mar 10, 2021 •

edited

Loading

pohly commented Mar 25, 2021 •

edited

Loading

pohly Mar 29, 2021 •

edited

Loading

Jiawei0227 Mar 29, 2021 •

edited

Loading