Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: consider aggregating iterator stats #95790

Closed
jbowens opened this issue Jan 24, 2023 · 3 comments · Fixed by #99726
Closed

storage: consider aggregating iterator stats #95790

jbowens opened this issue Jan 24, 2023 · 3 comments · Fixed by #99726
Labels
A-kv-observability A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs T-storage Storage Team

Comments

@jbowens
Copy link
Collaborator

jbowens commented Jan 24, 2023

Pebble Iterators expose iterator statistics. These statistics are included in SQL traces, and can be helpful in determining where reads are spending time. We can consider aggregating these iterator statistics across all of a node's iterators, to surface time series metrics. These metrics might be helpful in diagnosing problems where slow/expensive reads are substantial enough to affect node health.

Jira issue: CRDB-23742

@jbowens jbowens added A-storage Relating to our storage engine (Pebble) on-disk storage. A-kv-observability T-storage Storage Team labels Jan 24, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jan 24, 2023

Hi @jbowens, please add a C-ategory label to your issue. Check out the label system docs.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@jbowens jbowens added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jan 24, 2023
@tbg tbg added the O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs label Mar 23, 2023
@tbg
Copy link
Member

tbg commented Mar 23, 2023

In addition to time series, we should also aggregate per-replica counters (all iterator usage in CRDB is tied to a replica), which allows us to find "hot ranges" over more dimensions (like skipped tombstones).

This shouldn't be step one since it requires annotating all iterators that we create with a RangeID, which can be cumbersome.

@tbg
Copy link
Member

tbg commented Mar 24, 2023

In https://github.com/cockroachlabs/support/issues/2182#issuecomment-1482809892, we had nodes exhibit high disk read throughput, and I knew there were changefeed restarts. But it's difficult to determine whether these explained the disk read bandwidth or not.

This is related to both this issue and #65414. If operations that can consume significant amounts of disk bandwidth had their own metrics that would feed off the stats of the iterators they use, we'd be in a much better position to "explain" observed disk throughput.

Probably capturing the entire IteratorStats for each operation is overkill, but some basic numbers like bytes read from disk make sense. "Operations" I'm thinking about here are "evaluating request of type X" or "rangefeed catch-up scan" or "snapshot streaming" or "consistency check" and so on.

jbowens added a commit to jbowens/cockroach that referenced this issue Mar 28, 2023
Aggregate the iterator stats across all of an engine's iterators. Expose seven
new timeseries metrics for visibility into the behavior of storage engine iterators:

  - storage.iterator.block-load.bytes
  - storage.iterator.block-load.cached-bytes
  - storage.iterator.block-load.read-duration
  - storage.iterator.external.seeks
  - storage.iterator.external.steps
  - storage.iterator.internal.seeks
  - storage.iterator.internal.steps

Close cockroachdb#95790.
Epic: None
Release note (ops change): Introduces seven new timeseries metrics for better
visibility into the behavior of storage engine iterators and their internals.
jbowens added a commit to jbowens/cockroach that referenced this issue Mar 28, 2023
Aggregate the iterator stats across all of an engine's iterators. Expose seven
new timeseries metrics for visibility into the behavior of storage engine iterators:

  - storage.iterator.block-load.bytes
  - storage.iterator.block-load.cached-bytes
  - storage.iterator.block-load.read-duration
  - storage.iterator.external.seeks
  - storage.iterator.external.steps
  - storage.iterator.internal.seeks
  - storage.iterator.internal.steps

Close cockroachdb#95790.
Epic: None
Release note (ops change): Introduces seven new timeseries metrics for better
visibility into the behavior of storage engine iterators and their internals.
jbowens added a commit to jbowens/cockroach that referenced this issue Mar 28, 2023
Aggregate the iterator stats across all of an engine's iterators. Expose seven
new timeseries metrics for visibility into the behavior of storage engine iterators:

  - storage.iterator.block-load.bytes
  - storage.iterator.block-load.cached-bytes
  - storage.iterator.block-load.read-duration
  - storage.iterator.external.seeks
  - storage.iterator.external.steps
  - storage.iterator.internal.seeks
  - storage.iterator.internal.steps

Close cockroachdb#95790.
Epic: None
Release note (ops change): Introduces seven new timeseries metrics for better
visibility into the behavior of storage engine iterators and their internals.
jbowens added a commit to jbowens/cockroach that referenced this issue Mar 29, 2023
Aggregate the iterator stats across all of an engine's iterators. Expose seven
new timeseries metrics for visibility into the behavior of storage engine iterators:

  - storage.iterator.block-load.bytes
  - storage.iterator.block-load.cached-bytes
  - storage.iterator.block-load.read-duration
  - storage.iterator.external.seeks
  - storage.iterator.external.steps
  - storage.iterator.internal.seeks
  - storage.iterator.internal.steps

Close cockroachdb#95790.
Epic: None
Release note (ops change): Introduces seven new timeseries metrics for better
visibility into the behavior of storage engine iterators and their internals.
jbowens added a commit to jbowens/cockroach that referenced this issue Mar 30, 2023
Aggregate the iterator stats across all of an engine's iterators. Expose seven
new timeseries metrics for visibility into the behavior of storage engine iterators:

  - storage.iterator.block-load.bytes
  - storage.iterator.block-load.cached-bytes
  - storage.iterator.block-load.read-duration
  - storage.iterator.external.seeks
  - storage.iterator.external.steps
  - storage.iterator.internal.seeks
  - storage.iterator.internal.steps

Close cockroachdb#95790.
Epic: None
Release note (ops change): Introduces seven new timeseries metrics for better
visibility into the behavior of storage engine iterators and their internals.
craig bot pushed a commit that referenced this issue Apr 3, 2023
99726: storage: aggregate iterator stats r=jbowens a=jbowens

Aggregate the iterator stats across all of an engine's iterators. Expose seven
new timeseries metrics for visibility into the behavior of storage engine iterators:

  - storage.iterator.block-load.bytes
  - storage.iterator.block-load.cached-bytes
  - storage.iterator.block-load.read-duration
  - storage.iterator.external.seeks
  - storage.iterator.external.steps
  - storage.iterator.internal.seeks
  - storage.iterator.internal.steps

Close #95790.
Epic: None
Release note (ops change): Introduces seven new timeseries metrics for better
visibility into the behavior of storage engine iterators and their internals.

Co-authored-by: Jackson Owens <jackson@cockroachlabs.com>
@craig craig bot closed this as completed in e1254a8 Apr 3, 2023
blathers-crl bot pushed a commit that referenced this issue Apr 3, 2023
Aggregate the iterator stats across all of an engine's iterators. Expose seven
new timeseries metrics for visibility into the behavior of storage engine iterators:

  - storage.iterator.block-load.bytes
  - storage.iterator.block-load.cached-bytes
  - storage.iterator.block-load.read-duration
  - storage.iterator.external.seeks
  - storage.iterator.external.steps
  - storage.iterator.internal.seeks
  - storage.iterator.internal.steps

Close #95790.
Epic: None
Release note (ops change): Introduces seven new timeseries metrics for better
visibility into the behavior of storage engine iterators and their internals.
@jbowens jbowens moved this to Done in [Deprecated] Storage Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-observability A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs T-storage Storage Team
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants