-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metrics: export go runtime metrics #87823
Labels
A-observability-inf
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-observability
Comments
irfansharif
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-observability-inf
labels
Sep 12, 2022
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this issue
Sep 15, 2022
And record data into CRDB's internal time-series database. Informs \cockroachdb#82743 and cockroachdb#87823. To export scheduling latencies to prometheus, we choose an exponential bucketing scheme with base multiple of 1.1, and the output range bounded to [50us, 100ms). This makes for ~70 buckets. It's worth noting that the default histogram buckets used in Go are not fit for our purposes. If we care about improving it, we could consider patching the runtime. bucket[ 0] width=0s boundary=[-Inf, 0s) bucket[ 1] width=1ns boundary=[0s, 1ns) bucket[ 2] width=1ns boundary=[1ns, 2ns) bucket[ 3] width=1ns boundary=[2ns, 3ns) bucket[ 4] width=1ns boundary=[3ns, 4ns) ... bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs) bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs) bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms) bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms) bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms) ... bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s) bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s) bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s) bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf) Release note: None
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this issue
Sep 15, 2022
And record data into CRDB's internal time-series database. Informs \cockroachdb#82743 and cockroachdb#87823. To export scheduling latencies to prometheus, we choose an exponential bucketing scheme with base multiple of 1.1, and the output range bounded to [50us, 100ms). This makes for ~70 buckets. It's worth noting that the default histogram buckets used in Go are not fit for our purposes. If we care about improving it, we could consider patching the runtime. bucket[ 0] width=0s boundary=[-Inf, 0s) bucket[ 1] width=1ns boundary=[0s, 1ns) bucket[ 2] width=1ns boundary=[1ns, 2ns) bucket[ 3] width=1ns boundary=[2ns, 3ns) bucket[ 4] width=1ns boundary=[3ns, 4ns) ... bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs) bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs) bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms) bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms) bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms) ... bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s) bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s) bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s) bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf) Release note: None
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this issue
Sep 16, 2022
And record data into CRDB's internal time-series database. Informs \cockroachdb#82743 and cockroachdb#87823. To export scheduling latencies to prometheus, we choose an exponential bucketing scheme with base multiple of 1.1, and the output range bounded to [50us, 100ms). This makes for ~70 buckets. It's worth noting that the default histogram buckets used in Go are not fit for our purposes. If we care about improving it, we could consider patching the runtime. bucket[ 0] width=0s boundary=[-Inf, 0s) bucket[ 1] width=1ns boundary=[0s, 1ns) bucket[ 2] width=1ns boundary=[1ns, 2ns) bucket[ 3] width=1ns boundary=[2ns, 3ns) bucket[ 4] width=1ns boundary=[3ns, 4ns) ... bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs) bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs) bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms) bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms) bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms) ... bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s) bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s) bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s) bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf) Release note: None
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this issue
Sep 21, 2022
And record data into CRDB's internal time-series database. Informs \cockroachdb#82743 and cockroachdb#87823. To export scheduling latencies to prometheus, we choose an exponential bucketing scheme with base multiple of 1.1, and the output range bounded to [50us, 100ms). This makes for ~70 buckets. It's worth noting that the default histogram buckets used in Go are not fit for our purposes. If we care about improving it, we could consider patching the runtime. bucket[ 0] width=0s boundary=[-Inf, 0s) bucket[ 1] width=1ns boundary=[0s, 1ns) bucket[ 2] width=1ns boundary=[1ns, 2ns) bucket[ 3] width=1ns boundary=[2ns, 3ns) bucket[ 4] width=1ns boundary=[3ns, 4ns) ... bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs) bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs) bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms) bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms) bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms) ... bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s) bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s) bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s) bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf) Release note: None
craig bot
pushed a commit
that referenced
this issue
Sep 21, 2022
86277: eventpb: add storage event types r=jbowens,sumeerbhola a=nicktrav Add the `StoreStats` event type, a per-store event emitted to the `TELEMETRY` logging channel. This event type will be computed from the Pebble metrics for each store. Emit a `StoreStats` event periodically, by default, once per hour, per store. Touches #85589. Release note: None. Release justification: low risk, high benefit changes to existing functionality. 87142: workload/mixed-version/schemachanger: re-enable mixed version workload r=fqazi a=fqazi Fixes: #58489 #87477 Previously the mixed version schema changer workload was disabled because of the lack of version gates. These changes will do the following: - Start reporting errors on this workload again. - Disable trigrams in a mixed version state. - Disable the insert part of the workload in a mixed version state (there is an optimizer on 22.1 that can cause some of the queries to fail) Release justification: low risk only extends test coverage 87883: schedulerlatency: export Go scheduling latency metric r=irfansharif a=irfansharif And record data into CRDB's internal time-series database. Informs \#82743 and #87823. To export scheduling latencies to prometheus, we choose an exponential bucketing scheme with base multiple of 1.1, and the output range bounded to [50us, 100ms). This makes for ~70 buckets. It's worth noting that the default histogram buckets used in Go are not fit for our purposes. If we care about improving it, we could consider patching the runtime. ``` bucket[ 0] width=0s boundary=[-Inf, 0s) bucket[ 1] width=1ns boundary=[0s, 1ns) bucket[ 2] width=1ns boundary=[1ns, 2ns) bucket[ 3] width=1ns boundary=[2ns, 3ns) bucket[ 4] width=1ns boundary=[3ns, 4ns) ... bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs) bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs) bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms) bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms) bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms) ... bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s) bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s) bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s) bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf) ``` Release note: None Release justification: observability-only PR, low-risk high-benefit; would help understand admission control out in the wild 88179: ui/cluster-ui: fix no most recent stmt for active txns r=xinhaoz a=xinhaoz Fixes #87738 Previously, active txns could have an empty 'Most Recent Statement' column, even if their executed statement count was non-zero. This was due to the most recent query text being populated by the active stmt, which could be empty at the time of querying. This commit populates the last statement text for a txn even when it is not currently executing a query. This commit also removes the `isFullScan` field from active txn pages, as we cannot fill this field out without all stmts in the txn. Release note (ui change): Full scan field is removed from active txn details page. Release note (bug fix): active txns with non-zero executed statement count now always have populated stmt text, even when no stmt is being executed. 88334: kvserver: align Raft recv/send queue sizes r=erikgrinaker a=pavelkalinnikov Fixes #87465 Release justification: performance fix Release note: Made sending and receiving Raft queue sizes match. Previously the receiver could unnecessarily drop messages in situations when the sending queue is bigger than the receiving one. Co-authored-by: Nick Travers <travers@cockroachlabs.com> Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com> Co-authored-by: Xin Hao Zhang <xzhang@cockroachlabs.com> Co-authored-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
#87883 introduces a |
blathers-crl bot
pushed a commit
that referenced
this issue
Sep 21, 2022
And record data into CRDB's internal time-series database. Informs \#82743 and #87823. To export scheduling latencies to prometheus, we choose an exponential bucketing scheme with base multiple of 1.1, and the output range bounded to [50us, 100ms). This makes for ~70 buckets. It's worth noting that the default histogram buckets used in Go are not fit for our purposes. If we care about improving it, we could consider patching the runtime. bucket[ 0] width=0s boundary=[-Inf, 0s) bucket[ 1] width=1ns boundary=[0s, 1ns) bucket[ 2] width=1ns boundary=[1ns, 2ns) bucket[ 3] width=1ns boundary=[2ns, 3ns) bucket[ 4] width=1ns boundary=[3ns, 4ns) ... bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs) bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs) bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms) bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms) bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms) ... bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s) bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s) bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s) bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf) Release note: None
Resolved by #118875, which introduced the framework for exporting go runtime metrics. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-observability-inf
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-observability
Is your feature request related to a problem? Please describe.
The Go runtime exports several metrics about the Go scheduler, garbage collector, and heap that would be useful to export directly: https://pkg.go.dev/runtime/metrics#hdr-Supported_metrics. Newer versions of the prometheus Go client (https://github.com/prometheus/client_golang) support exporting this into a form prometheus can scrape. It may be a bit of work to munge this data into a form writable to CRDB's internal tsdb. This would help better diagnose behavior of the Go GC, scheduler, etc.
Additional context
We're exporting
/sched/latencies:seconds
as part of #82743. This issue covers everything else.Jira issue: CRDB-19553
Epic CRDB-34227
The text was updated successfully, but these errors were encountered: