-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition in Driver::runInternal #1219
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
facebook-github-bot
added
the
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
label
Mar 15, 2022
mbasmanova
changed the title
Fix race condition in Driver::runInternal
[WIP] Fix race condition in Driver::runInternal
Mar 15, 2022
mbasmanova
changed the title
[WIP] Fix race condition in Driver::runInternal
Fix race condition in Driver::runInternal
Mar 15, 2022
@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
mbasmanova
requested review from
majetideepak,
laithsakka,
oerling,
pedroerp and
spershin
March 15, 2022 23:34
ZJie1
pushed a commit
to ZJie1/velox
that referenced
this pull request
Mar 18, 2022
Summary: There was a race condition that caused ASAN heap-use-after-free in Driver::runInternal. It is possible for the Task to terminate (due to an error) while Driver is off-thread sitting in the executor's queue waiting to be scheduled to run on a thread. If this case, Task may have been terminated and operators have been destroyed just before Driver::runInternal executes. In this case `operators_[curOpIndex_]->stats().addRuntimeStat ("queuedWallNanos", ...);` call would try to access freed memory. The fix is to (1) streamline Driver by removing task_ member variable and use DriverCtx::task instead; (2) move logic for updating stats after the call to task->enter(). If task has terminated, enter() will return StopReason::kTerminate and no Driver will abort any further processing. ```AddressSanitizer: heap-use-after-free facebookincubator#6 0x7f01c29ae9b9 in facebook::velox::exec::OperatorStats::addRuntimeStat(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) velox/exec/Operator.h:144 facebookincubator#7 0x7f01c29afaf7 in facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>*) velox/exec/Driver.cpp:249 facebookincubator#8 0x7f01c29b4174 in facebook::velox::exec::Driver::run(std::shared_ptr<facebook::velox::exec::Driver>) velox/exec/Driver.cpp:415 facebookincubator#9 0x7f01c29cce5f in facebook::velox::exec::Driver::enqueue(std::shared_ptr<facebook::velox::exec::Driver>)::$_0::operator()() const velox/exec/Driver.cpp:161 ``` Pull Request resolved: facebookincubator#1219 Test Plan: $ buck test mode/dev-asan //presto_cpp/main/tests:presto_main_test -- --exact 'presto_cpp/main/tests:presto_main_test - TaskManagerTest.outOfQueryUserMemory' --run-disabled --stress-runs 1000 ``` Summary Pass: 1000 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1125900137408748 ``` Imported from GitHub, without a `Test Plan:` line. Reviewed By: spershin, oerling Differential Revision: D34904220 Pulled By: mbasmanova fbshipit-source-id: d4b88cf9e3ec2884c77371c37c571297a59728b6
This pull request has been reverted by cd1f5f3. |
shiyu-bytedance
pushed a commit
to shiyu-bytedance/velox-1
that referenced
this pull request
Aug 18, 2022
Summary: There was a race condition that caused ASAN heap-use-after-free in Driver::runInternal. It is possible for the Task to terminate (due to an error) while Driver is off-thread sitting in the executor's queue waiting to be scheduled to run on a thread. If this case, Task may have been terminated and operators have been destroyed just before Driver::runInternal executes. In this case `operators_[curOpIndex_]->stats().addRuntimeStat ("queuedWallNanos", ...);` call would try to access freed memory. The fix is to (1) streamline Driver by removing task_ member variable and use DriverCtx::task instead; (2) move logic for updating stats after the call to task->enter(). If task has terminated, enter() will return StopReason::kTerminate and no Driver will abort any further processing. ```AddressSanitizer: heap-use-after-free facebookincubator#6 0x7f01c29ae9b9 in facebook::velox::exec::OperatorStats::addRuntimeStat(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) velox/exec/Operator.h:144 facebookincubator#7 0x7f01c29afaf7 in facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>*) velox/exec/Driver.cpp:249 facebookincubator#8 0x7f01c29b4174 in facebook::velox::exec::Driver::run(std::shared_ptr<facebook::velox::exec::Driver>) velox/exec/Driver.cpp:415 facebookincubator#9 0x7f01c29cce5f in facebook::velox::exec::Driver::enqueue(std::shared_ptr<facebook::velox::exec::Driver>)::$_0::operator()() const velox/exec/Driver.cpp:161 ``` Pull Request resolved: facebookincubator#1219 Test Plan: $ buck test mode/dev-asan //presto_cpp/main/tests:presto_main_test -- --exact 'presto_cpp/main/tests:presto_main_test - TaskManagerTest.outOfQueryUserMemory' --run-disabled --stress-runs 1000 ``` Summary Pass: 1000 ListingSuccess: 1 Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1125900137408748 ``` Imported from GitHub, without a `Test Plan:` line. Reviewed By: spershin, oerling Differential Revision: D34904220 Pulled By: mbasmanova fbshipit-source-id: d4b88cf9e3ec2884c77371c37c571297a59728b6
marin-ma
pushed a commit
to marin-ma/velox-oap
that referenced
this pull request
Dec 15, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Reverted
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a race condition that caused ASAN heap-use-after-free in
Driver::runInternal. It is possible for the Task to terminate (due to an error)
while Driver is off-thread sitting in the executor's queue waiting to be
scheduled to run on a thread. If this case, Task may have been terminated and
operators have been destroyed just before Driver::runInternal executes. In this
case
operators_[curOpIndex_]->stats().addRuntimeStat ("queuedWallNanos", ...);
call would try to access freed memory.The fix is to (1) streamline Driver by removing task_ member variable and use
DriverCtx::task instead; (2) move logic for updating stats after the call to
task->enter(). If task has terminated, enter() will return
StopReason::kTerminate and no Driver will abort any further processing.