Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: export go runtime metrics #87823

Closed
irfansharif opened this issue Sep 12, 2022 · 2 comments
Closed

metrics: export go runtime metrics #87823

irfansharif opened this issue Sep 12, 2022 · 2 comments
Labels
A-observability-inf C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Sep 12, 2022

Is your feature request related to a problem? Please describe.

The Go runtime exports several metrics about the Go scheduler, garbage collector, and heap that would be useful to export directly: https://pkg.go.dev/runtime/metrics#hdr-Supported_metrics. Newer versions of the prometheus Go client (https://github.com/prometheus/client_golang) support exporting this into a form prometheus can scrape. It may be a bit of work to munge this data into a form writable to CRDB's internal tsdb. This would help better diagnose behavior of the Go GC, scheduler, etc.

Additional context

We're exporting /sched/latencies:seconds as part of #82743. This issue covers everything else.

Jira issue: CRDB-19553

Epic CRDB-34227

@irfansharif irfansharif added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-observability-inf labels Sep 12, 2022
irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 15, 2022
And record data into CRDB's internal time-series database. Informs
\cockroachdb#82743 and cockroachdb#87823. To export scheduling latencies to prometheus, we
choose an exponential bucketing scheme with base multiple of 1.1, and
the output range bounded to [50us, 100ms). This makes for ~70 buckets.
It's worth noting that the default histogram buckets used in Go are
not fit for our purposes. If we care about improving it, we could
consider patching the runtime.

  bucket[  0] width=0s boundary=[-Inf, 0s)
  bucket[  1] width=1ns boundary=[0s, 1ns)
  bucket[  2] width=1ns boundary=[1ns, 2ns)
  bucket[  3] width=1ns boundary=[2ns, 3ns)
  bucket[  4] width=1ns boundary=[3ns, 4ns)
  ...
  bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs)
  bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs)
  bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms)
  bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms)
  bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms)
  ...
  bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s)
  bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s)
  bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s)
  bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf)

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 15, 2022
And record data into CRDB's internal time-series database. Informs
\cockroachdb#82743 and cockroachdb#87823. To export scheduling latencies to prometheus, we
choose an exponential bucketing scheme with base multiple of 1.1, and
the output range bounded to [50us, 100ms). This makes for ~70 buckets.
It's worth noting that the default histogram buckets used in Go are
not fit for our purposes. If we care about improving it, we could
consider patching the runtime.

  bucket[  0] width=0s boundary=[-Inf, 0s)
  bucket[  1] width=1ns boundary=[0s, 1ns)
  bucket[  2] width=1ns boundary=[1ns, 2ns)
  bucket[  3] width=1ns boundary=[2ns, 3ns)
  bucket[  4] width=1ns boundary=[3ns, 4ns)
  ...
  bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs)
  bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs)
  bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms)
  bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms)
  bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms)
  ...
  bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s)
  bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s)
  bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s)
  bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf)

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 16, 2022
And record data into CRDB's internal time-series database. Informs
\cockroachdb#82743 and cockroachdb#87823. To export scheduling latencies to prometheus, we
choose an exponential bucketing scheme with base multiple of 1.1, and
the output range bounded to [50us, 100ms). This makes for ~70 buckets.
It's worth noting that the default histogram buckets used in Go are
not fit for our purposes. If we care about improving it, we could
consider patching the runtime.

  bucket[  0] width=0s boundary=[-Inf, 0s)
  bucket[  1] width=1ns boundary=[0s, 1ns)
  bucket[  2] width=1ns boundary=[1ns, 2ns)
  bucket[  3] width=1ns boundary=[2ns, 3ns)
  bucket[  4] width=1ns boundary=[3ns, 4ns)
  ...
  bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs)
  bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs)
  bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms)
  bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms)
  bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms)
  ...
  bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s)
  bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s)
  bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s)
  bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf)

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this issue Sep 21, 2022
And record data into CRDB's internal time-series database. Informs
\cockroachdb#82743 and cockroachdb#87823. To export scheduling latencies to prometheus, we
choose an exponential bucketing scheme with base multiple of 1.1, and
the output range bounded to [50us, 100ms). This makes for ~70 buckets.
It's worth noting that the default histogram buckets used in Go are
not fit for our purposes. If we care about improving it, we could
consider patching the runtime.

  bucket[  0] width=0s boundary=[-Inf, 0s)
  bucket[  1] width=1ns boundary=[0s, 1ns)
  bucket[  2] width=1ns boundary=[1ns, 2ns)
  bucket[  3] width=1ns boundary=[2ns, 3ns)
  bucket[  4] width=1ns boundary=[3ns, 4ns)
  ...
  bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs)
  bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs)
  bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms)
  bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms)
  bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms)
  ...
  bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s)
  bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s)
  bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s)
  bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf)

Release note: None
craig bot pushed a commit that referenced this issue Sep 21, 2022
86277: eventpb: add storage event types r=jbowens,sumeerbhola a=nicktrav

Add the `StoreStats` event type, a per-store event emitted to the
`TELEMETRY` logging channel. This event type will be computed from the
Pebble metrics for each store.

Emit a `StoreStats` event periodically, by default, once per hour, per
store.

Touches #85589.

Release note: None.

Release justification: low risk, high benefit changes to existing
functionality.

87142: workload/mixed-version/schemachanger: re-enable mixed version workload r=fqazi a=fqazi

Fixes: #58489 #87477

Previously the mixed version schema changer workload was disabled because of the lack of version gates. These changes will do the following:

- Start reporting errors on this workload again.
- Disable trigrams in a mixed version state.
- Disable the insert part of the workload in a mixed version state (there is an optimizer on 22.1 that can cause some of the queries to fail)

Release justification: low risk only extends test coverage

87883: schedulerlatency: export Go scheduling latency metric r=irfansharif a=irfansharif

And record data into CRDB's internal time-series database. Informs
\#82743 and #87823. To export scheduling latencies to prometheus, we
choose an exponential bucketing scheme with base multiple of 1.1, and
the output range bounded to [50us, 100ms). This makes for ~70 buckets.
It's worth noting that the default histogram buckets used in Go are
not fit for our purposes. If we care about improving it, we could
consider patching the runtime.

```
  bucket[  0] width=0s boundary=[-Inf, 0s)
  bucket[  1] width=1ns boundary=[0s, 1ns)
  bucket[  2] width=1ns boundary=[1ns, 2ns)
  bucket[  3] width=1ns boundary=[2ns, 3ns)
  bucket[  4] width=1ns boundary=[3ns, 4ns)
  ...
  bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs)
  bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs)
  bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms)
  bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms)
  bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms)
  ...
  bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s)
  bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s)
  bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s)
  bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf)
```

Release note: None
Release justification: observability-only PR, low-risk high-benefit; would help understand admission control out in the wild

88179: ui/cluster-ui: fix no most recent stmt for active txns r=xinhaoz a=xinhaoz

Fixes #87738

Previously, active txns could have an empty 'Most Recent Statement' column, even if their executed statement count was non-zero. This was due to the most recent query text being populated by the active stmt, which could be empty at the time of querying. This commit populates the last statement text for a txn even when it is not currently executing a query.

This commit also removes the `isFullScan` field from active txn pages, as we cannot fill this field out without all stmts in the txn.

Release note (ui change): Full scan field is removed from active txn details page.

Release note (bug fix): active txns with non-zero
executed statement count now always have populated stmt text, even when no stmt is being executed.

88334: kvserver: align Raft recv/send queue sizes r=erikgrinaker a=pavelkalinnikov

Fixes #87465

Release justification: performance fix
Release note: Made sending and receiving Raft queue sizes match. Previously the receiver could unnecessarily drop messages in situations when the sending queue is bigger than the receiving one.

Co-authored-by: Nick Travers <travers@cockroachlabs.com>
Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Xin Hao Zhang <xzhang@cockroachlabs.com>
Co-authored-by: Pavel Kalinnikov <pavel@cockroachlabs.com>
@irfansharif
Copy link
Contributor Author

#87883 introduces a runtimeHistogram type that's a bit improperly placed in the schedulerlatency package. It's used to export scheduling latencies but can be used as is for other histograms from the runtime. I'll leave that extraction to the obs-inf folks.

blathers-crl bot pushed a commit that referenced this issue Sep 21, 2022
And record data into CRDB's internal time-series database. Informs
\#82743 and #87823. To export scheduling latencies to prometheus, we
choose an exponential bucketing scheme with base multiple of 1.1, and
the output range bounded to [50us, 100ms). This makes for ~70 buckets.
It's worth noting that the default histogram buckets used in Go are
not fit for our purposes. If we care about improving it, we could
consider patching the runtime.

  bucket[  0] width=0s boundary=[-Inf, 0s)
  bucket[  1] width=1ns boundary=[0s, 1ns)
  bucket[  2] width=1ns boundary=[1ns, 2ns)
  bucket[  3] width=1ns boundary=[2ns, 3ns)
  bucket[  4] width=1ns boundary=[3ns, 4ns)
  ...
  bucket[270] width=16.384µs boundary=[737.28µs, 753.664µs)
  bucket[271] width=16.384µs boundary=[753.664µs, 770.048µs)
  bucket[272] width=278.528µs boundary=[770.048µs, 1.048576ms)
  bucket[273] width=32.768µs boundary=[1.048576ms, 1.081344ms)
  bucket[274] width=32.768µs boundary=[1.081344ms, 1.114112ms)
  ...
  bucket[717] width=1h13m18.046511104s boundary=[53h45m14.046488576s, 54h58m32.09299968s)
  bucket[718] width=1h13m18.046511104s boundary=[54h58m32.09299968s, 56h11m50.139510784s)
  bucket[719] width=1h13m18.046511104s boundary=[56h11m50.139510784s, 57h25m8.186021888s)
  bucket[720] width=57h25m8.186021888s boundary=[57h25m8.186021888s, +Inf)

Release note: None
@lyang24 lyang24 self-assigned this Jan 26, 2024
@nvanbenschoten
Copy link
Member

Resolved by #118875, which introduced the framework for exporting go runtime metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-observability-inf C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-observability
Projects
None yet
Development

No branches or pull requests

3 participants