Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add time measuring metrics for file ingestion in PerfContext #13219

Closed
wants to merge 3 commits into from

Conversation

jowlyzhang
Copy link
Contributor

As titled. And also added some documentation for an approach to name perf context metrics that can help identify the starting PerfLevel that enables collecting it.

Test Plan
Unit test

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jowlyzhang jowlyzhang requested a review from anand1976 December 17, 2024 23:00
// These metrics usually have this pattern: "_[wait|delay]_*_[time|nanos]".
kEnableWait = 3,
// Starts enabling metrics that measure the end to end time of an operation.
// These metrics' names have keywords "time" or "nanos". Check other time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "time" or time unit e.g, nanos, micros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I specifically used "nanos" because that's the unit used for for all the time measuring perf context:

uint64_t time_now() {
if (!use_cpu_time_) {
return clock_->NowNanos();
} else {
return clock_->CPUNanos();
}
}

opts.allow_blocking_flush = true;
ASSERT_OK(db_->IngestExternalFile({file1}, opts));
ASSERT_EQ(perf_context.file_ingestion_nanos, 0);
ASSERT_EQ(perf_context.file_ingestion_blocking_live_writes_nanos, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed your test ensure blocking live writes will happen so this proves your point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually haven't test it, let me add that part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added now, thanks!

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@jowlyzhang merged this pull request in c8bc2b6.

@jowlyzhang jowlyzhang deleted the ingestion_time branch December 20, 2024 17:43
facebook-github-bot pushed a commit that referenced this pull request Dec 20, 2024
Summary:
This test assertion was added in #13219. It checks the concurrent write thread's wait time is not longer than the file ingestion thread's write blocking time since the former entered the write thread after the blocking already started in the test. This test runs into flakiness like this:
```db/external_sst_file_basic_test.cc:300: Failure
Expected: (perf_context.file_ingestion_blocking_live_writes_nanos) > (write_thread_perf_context->write_thread_wait_nanos), actual: 166210 vs 279681
```
 In reality the write thread is yielding starting with a 1 micro period and then every 100 micros: https://github.com/facebook/rocksdb/blob/54b614de5bd3e26d332b85557d44bde86b2a2e87/db/write_thread.cc#L68-L70

So this 113 micros errors is within this margin
This fix the test with just removing this assertion. The other assertion `ASSERT_GT(write_thread_perf_context->write_thread_wait_nanos, 0)` should be sufficient for the test's purpose.

Pull Request resolved: #13241

Reviewed By: hx235

Differential Revision: D67526804

Pulled By: jowlyzhang

fbshipit-source-id: 23ee9771247e4c13444054a1e86ad9293902cb56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants