Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: ensure stats are reported on idle replicas #30346

Merged

Conversation

teskje
Copy link
Contributor

@teskje teskje commented Nov 5, 2024

When a replica becomes idle (i.e. all its dataflows advance to the empty frontier), there might remain pending storage statistics that haven't been reported yet because the reporting interval hasn't been reached. When the worker thread then gets parked for an indefinite amount of time, those statistics also don't get reported for an indefinite amount of time.

This PR ensures that pending storage statistics always get reported by imposing a limit on the time a worker thread is allowed to park, ensuring it wakes up again when the stats collection interval has elapsed.

Relevant Slack thread.

Motivation

  • This PR adds a known-desirable feature.

Part of https://github.com/MaterializeInc/database-issues/issues/7020

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

When a replica becomes idle (i.e. all its dataflows advance to the empty
frontier), there might remain pending storage statistics that haven't
been reported yet because the reporting interval hasn't been reached.
When the worker thread then gets parked for an indefinite amount of
time, those statistics also don't get reported for an indefinite amount
of time.

This commit ensures that pending storage statistics always get reported
by imposing a limit on the time a worker thread is allowed to park,
ensuring it wakes up again when the stats collection interval has
elapsed.
@teskje teskje force-pushed the storage-ensure-statistics-reporting branch from 3292954 to 67a3f14 Compare November 7, 2024 10:51
@teskje teskje changed the title storage: fixes for reclocking to latest storage: ensure stats are reported on idle replicas Nov 7, 2024

let mut last_stats_time: Option<Instant> = None;
// The last time we reported statistics.
let mut last_stats_time = Instant::now();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a slight change to the previous behavior. I figured it's probably not important that we collect statistics immediately after the first worker step. If that's wrong I can change it back into an Option, just makes the code a bit more noisy.

@teskje teskje marked this pull request as ready for review November 7, 2024 13:23
@teskje teskje requested a review from a team as a code owner November 7, 2024 13:23
@teskje teskje requested a review from petrosagg November 7, 2024 13:23
@teskje
Copy link
Contributor Author

teskje commented Nov 11, 2024

TFTR!

@teskje teskje merged commit c793a46 into MaterializeInc:main Nov 11, 2024
221 of 226 checks passed
@teskje teskje deleted the storage-ensure-statistics-reporting branch November 11, 2024 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants