-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug that mpp_task is not move to cancel thread due to llvm SOO #5637
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Co-authored-by: Liqi Geng <gengliqiii@gmail.com>
// can be moved to cancel thread. Meanwhile, also save the moved wrap in a vector to guarantee that even if cancel functors | ||
// fail to move due to some other issues, it still does not destruct inside the loop | ||
cancel_functors.push_back(MPPTaskCancelFunctor(std::move(current_task), reason)); | ||
thread_manager->schedule(false, "CancelMPPTask", std::move(cancel_functors[cancel_functors.size() - 1])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, take a consideration if any exception thrown during the for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Others LGTM.
Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>
Signed-off-by: xufei <xufeixw@mail.ustc.edu.cn>
/merge |
@windtalker: It seems you want to merge this PR, I will help you trigger all the tests: /run-all-tests You only need to trigger If you have any questions about the PR merge process, please refer to pr process. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
This pull request has been accepted and is ready to merge. Commit hash: 98fcda0
|
/run-integration-test |
Coverage for changed files
Coverage summary
full coverage report (for internal network access only) |
In response to a cherrypick label: new pull request created: #5641. |
public: | ||
MPPTaskPtr task; | ||
String reason; | ||
MPPTaskCancelFunctor(const MPPTaskCancelFunctor & other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not declare it as delete
?
@@ -97,14 +125,32 @@ void MPPTaskManager::cancelMPPQuery(UInt64 query_id, const String & reason) | |||
LOG_WARNING(log, fmt::format("Begin cancel query: {}", query_id)); | |||
FmtBuffer fmt_buf; | |||
fmt_buf.fmtAppend("Remaining task in query {} are: ", query_id); | |||
std::vector<std::function<void()>> cancel_functors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guess we don't actually need this vector
?
// remaining in the current scope, as a workaround we add a wrap(MPPTaskCancelFunctor) here to make sure `current_task` | ||
// can be moved to cancel thread. Meanwhile, also save the moved wrap in a vector to guarantee that even if cancel functors | ||
// fail to move due to some other issues, it still does not destruct inside the loop | ||
cancel_functors.push_back(MPPTaskCancelFunctor(std::move(current_task), reason)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
depend on the behavior of std::function::move
is risky and hard to understand.
how about this?
thread_manager->schedule(false, "CancelMPPTask",
[functor = new MPPTaskCancelFunctor(std::move(current_task), reason)] {
std::unique_ptr(functor)->run(); // do not need `operator()`, just `run()`
});
IMO it's more stable than depending on some specific implementation of std::function
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, since this pr is merged, I will use another pr to refine it.
catch (...) | ||
{ | ||
thread_manager->wait(); | ||
throw; | ||
} | ||
|
||
thread_manager->wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually a finally:
finally {
thread_manager->wait();
}
What problem does this PR solve?
Issue Number: ref #5095, close #5638
Problem Summary:
What is changed and how it works?
After #5361, in
MPPTaskManager::cancelMPPQuery
, we try to move mpp task to the cancel thread so each mpp task can be destructed parallelly instead of be destructed one by one incancelMPPQuery
thread.However, due to SOO in llvm(llvm/llvm-project#32472), this move actaully failed, and there is a chance that mpp task is destructed inside the follow cancel loop:
tiflash/dbms/src/Flash/Mpp/MPPTaskManager.cpp
Lines 102 to 108 in 8404e65
Destruct MPP inside the cancel loop means the task will be destructed before all mpp tasks are cancelled, this may cause some dead lock issues since there are some dependency between mpp tasks(especially in the case of local tunnel).
This pr fix this issue by
Check List
Tests
run some random failpoint tests
Side effects
Documentation
Release note