-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task/use etcd metrics endpoint #11280
Task/use etcd metrics endpoint #11280
Conversation
I need to wrap my head around the unit tests and docs. |
@odacremolbap Happy to help, let me know where :-) |
Received some out-of-band feedback on metricset naming:
From a user perspective:
At this moment, we are exposing
3 first for V2, last one for V3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking this! I have added some comments, mainly about fields and default configs.
"1.024": 6, | ||
"2.048": 6, | ||
"4.096": 6, | ||
"8.192": 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having dots in field names is problematic as they are going to be stored as objects in Elasticsearch, scale them to milliseconds (take a look to this conversation and this PR).
// Disk | ||
"etcd_mvcc_db_total_size_in_bytes": prometheus.Metric("disk.mvcc_db_total_size_in_bytes"), | ||
"etcd_disk_wal_fsync_duration_seconds": prometheus.Metric("disk.wal_fsync_duration_seconds"), | ||
"etcd_disk_backend_commit_duration_seconds": prometheus.Metric("disk.backend_commit_duration_seconds"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To scale it to milliseconds it'd be something like
"etcd_disk_backend_commit_duration_seconds": prometheus.Metric("disk.backend_commit_duration_seconds"), | |
"etcd_disk_backend_commit_duration_seconds": prometheus.Metric("disk.backend_commit_duration_seconds", prometheus.OpMultiplyBuckets(1000))), |
description: > | ||
Write ahead logs latency sum | ||
|
||
- name: disk.backend_commit_duration_seconds.bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a wildcard is needed here (and the same for other histograms)
- name: disk.backend_commit_duration_seconds.bucket | |
- name: disk.backend_commit_duration_seconds.bucket.* |
if err := mbtest.WriteEventsReporterV2(f, t, ""); err != nil { | ||
t.Fatal("write", err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WriteEventsReporterV2
already checks for errors and non empty events, so this method should be just:
func TestData(t *testing.T) {
compose.EnsureUp(t, "etcd")
f := mbtest.NewReportingMetricSetV2(t, getConfig())
if err := mbtest.WriteEventsReporterV2(f, t, ""); err != nil {
t.Fatal("write", err)
}
}
grpc_server_handled_total{grpc_code="Internal",grpc_method="UserGrantRole",grpc_service="etcdserverpb.Auth",grpc_type="unary"} 0 | ||
grpc_server_handled_total{grpc_code="Internal",grpc_method="UserList",grpc_service="etcdserverpb.Auth",grpc_type="unary"} 0 | ||
grpc_server_handled_total{grpc_code="Internal",grpc_method="UserRevokeRole",grpc_service="etcdserverpb.Auth",grpc_type="unary"} 0 | ||
grpc_server_handled_total{grp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be probably nice to expose some of these metrics on methods, but lets leave it for future changes.
Regarding v2 vs v3 storages. I think we should continue enabling the old metricsets by default, and at some moment we would enable also this new one so we have all covered by default. I saw that some operations on v2 storage also affect the metrics exposed by the new endpoint, maybe at some moment the new endpoint covers both storages, not sure about their plan on this. If that is the case, maybe we can deprecate the old metricsets at some point, but not before metricbeat 8. |
I think there are 2 users: Admin that deploys Metricbeat and he should know if we uses v2 or v3. From a data consumer perspective it should not matter which one is used. My understanding so far is that v3 is mostly a superset of v2. So if a user upgrades from v2 to v3, the resulting events should still look the same but have more data inside. The above assume that we like the current data structure. If we don't I'm also ok with introducing a new better data structure for v3. |
Regarding module/metricsets: current schema is
The main thing to improve is noticing users which etcd version matches which metrics. Current V2 metricsets ( Choices:
|
As mentioned before, I prefer not to mention V3 in the final doc as for the consumer it should not matter, so just One other idea triggered by your comment that it's a huge migration from A to B and it's unlikely that both will be used at the same time. What happens if not use the |
generally speaking adding version to metric names is not a good idea, I agree with you, no point on creating metricsets for fooV1 and a new set for fooV2. The problem with etcd is that they sort of bundle 2 products in 1 binary. As an example, here is how you use the client for V2 for any etcd release:
and here is how you use V3
Admins, devs, users, anyone dealing with etcd must know beforehand if they are targeting V2 or V3. Internally the command will be redirected to a different set of functions based on the environment variable. So, if i understood it correctly, that proposal would be, keeping current V2
I'm ok with that since it avoids using extremely generic term We might not add any V3 reference to metrics, but as an etcd user in my past life, I would be confused to see If you feel that our best move is setting those names, and clarifying at docs, I'm ok with that, I guess there is no perfect solution since this problem is upstream etcd. |
As discussed out of band with @ruflin We will add a field to metricsets that indicate the API version used when retrieving the metrics. Although users will still need to get to the docs to check which metricset applies to what apiVersion, this solution has a number of advantages:
@jsoriano feedback? |
Ok to add an |
I haven't included the updated JSON until we sort out why |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good
"etcd_network_client_grpc_sent_bytes_total": prometheus.Metric("network.client_grpc_sent.bytes"), | ||
"etcd_network_client_grpc_received_bytes_total": prometheus.Metric("network.client_grpc_received.bytes"), | ||
}, | ||
ExtraFields: map[string]string{"apiVersion": "3"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use camel case for field names.
ExtraFields: map[string]string{"apiVersion": "3"}, | |
ExtraFields: map[string]string{"api_version": "3"}, |
r.Event(mb.Event{ | ||
MetricSetFields: event, | ||
Namespace: mapping.Namespace, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I think this should be moved to its own PR, with a note in the developers changelog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on having a separate PR. I think @sayden will also be happy to see this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It LGTM, just one more thing I have just thought about, could you add this metricset to the list of ones tested here?
} | ||
|
||
func TestData(t *testing.T) { | ||
compose.EnsureUp(t, "etcd") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we already have the new test setup for this, we don't need this method I think.
Add etcd metrics endpoint for Etcd V3 as a new metricset
Fixes #9093