-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add time measuring metrics for file ingestion in PerfContext #13219
Conversation
@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
// These metrics usually have this pattern: "_[wait|delay]_*_[time|nanos]". | ||
kEnableWait = 3, | ||
// Starts enabling metrics that measure the end to end time of an operation. | ||
// These metrics' names have keywords "time" or "nanos". Check other time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "time" or time unit e.g, nanos, micros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I specifically used "nanos" because that's the unit used for for all the time measuring perf context:
rocksdb/monitoring/perf_step_timer.h
Lines 60 to 66 in 1d919ac
uint64_t time_now() { | |
if (!use_cpu_time_) { | |
return clock_->NowNanos(); | |
} else { | |
return clock_->CPUNanos(); | |
} | |
} |
opts.allow_blocking_flush = true; | ||
ASSERT_OK(db_->IngestExternalFile({file1}, opts)); | ||
ASSERT_EQ(perf_context.file_ingestion_nanos, 0); | ||
ASSERT_EQ(perf_context.file_ingestion_blocking_live_writes_nanos, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed your test ensure blocking live writes will happen so this proves your point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually haven't test it, let me add that part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added now, thanks!
9a6da9d
to
560d254
Compare
@jowlyzhang has updated the pull request. You must reimport the pull request before landing. |
@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jowlyzhang has updated the pull request. You must reimport the pull request before landing. |
@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jowlyzhang merged this pull request in c8bc2b6. |
Summary: This test assertion was added in #13219. It checks the concurrent write thread's wait time is not longer than the file ingestion thread's write blocking time since the former entered the write thread after the blocking already started in the test. This test runs into flakiness like this: ```db/external_sst_file_basic_test.cc:300: Failure Expected: (perf_context.file_ingestion_blocking_live_writes_nanos) > (write_thread_perf_context->write_thread_wait_nanos), actual: 166210 vs 279681 ``` In reality the write thread is yielding starting with a 1 micro period and then every 100 micros: https://github.com/facebook/rocksdb/blob/54b614de5bd3e26d332b85557d44bde86b2a2e87/db/write_thread.cc#L68-L70 So this 113 micros errors is within this margin This fix the test with just removing this assertion. The other assertion `ASSERT_GT(write_thread_perf_context->write_thread_wait_nanos, 0)` should be sufficient for the test's purpose. Pull Request resolved: #13241 Reviewed By: hx235 Differential Revision: D67526804 Pulled By: jowlyzhang fbshipit-source-id: 23ee9771247e4c13444054a1e86ad9293902cb56
As titled. And also added some documentation for an approach to name perf context metrics that can help identify the starting
PerfLevel
that enables collecting it.Test Plan
Unit test