-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storages: small refine of MergedTask. #8512
Conversation
/run-all-tests |
/run-integration-test |
bool allStreamsFinished() const | ||
{ | ||
return std::all_of(units.begin(), units.end(), [](const auto & u) { return u.isFinished(); }); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this cause performance regression when there are multiple MergedUnit
s? Previously we only compare two size_t, now we need to iterate all units to get the result.
tiflash/dbms/src/Storages/DeltaMerge/ReadThread/SegmentReader.cpp
Lines 90 to 111 in 939532a
int read_count = 0; | |
while (!merged_task->allStreamsFinished() && !isStop()) | |
{ | |
auto c = merged_task->readBlock(); | |
read_count += c; | |
if (c <= 0) | |
{ | |
break; | |
} | |
} | |
if (read_count <= 0) | |
{ | |
LOG_DEBUG(log, "All finished, merged_task=<{}> read_count={}", merged_task->toString(), read_count); | |
} | |
// If `merged_task` is pushed back to `MergedTaskPool`, it can be accessed by another read thread if it is scheduled. | |
// So do not push back to `MergedTaskPool` when exception happened since current read thread can still access to this `merged_task` object and set exception message to it. | |
// If exception happens, `merged_task` will be released by `shared_ptr` automatically. | |
if (!merged_task->allStreamsFinished()) | |
{ | |
SegmentReadTaskScheduler::instance().pushMergedTask(merged_task); | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, size of units
is small, such as 1, 2, or 3... In such cases, no performance hurted.
In extreme cases, its size is 100.
Currently, 100 is the limitation of the total number of merged segments, not for each MergedTask.
I think we should limit the size of each MergedTask
too, such 5, because segments of a MergedTask
is read sequentially. (This will be considered in another PR.)
tiflash/dbms/src/Storages/DeltaMerge/ReadThread/SegmentReadTaskScheduler.cpp
Lines 189 to 202 in 939532a
if (MergedTask::getPassiveMergedSegments() < 100 || target->second.size() == 1) | |
{ | |
result = *target; | |
merging_segments.erase(target); | |
} | |
else | |
{ | |
result = std::pair{target->first, std::vector<uint64_t>(1, pool->pool_id)}; | |
auto itr = std::find(target->second.begin(), target->second.end(), pool->pool_id); | |
*itr = target->second | |
.back(); // SegmentReadTaskPool::scheduleSegment ensures `pool->poolId` must exists in `target`. | |
target->second.resize(target->second.size() - 1); | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, keeping finished_count
could be more effective in all scenarios. So I have added finished_count
back.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JaySon-Huang, Lloyd-Pottiger The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
/run-all-tests |
What problem does this PR solve?
Issue Number: ref #6834
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note