Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in Driver::runInternal #1219

Closed
wants to merge 1 commit into from

Conversation

mbasmanova
Copy link
Contributor

@mbasmanova mbasmanova commented Mar 15, 2022

There was a race condition that caused ASAN heap-use-after-free in
Driver::runInternal. It is possible for the Task to terminate (due to an error)
while Driver is off-thread sitting in the executor's queue waiting to be
scheduled to run on a thread. If this case, Task may have been terminated and
operators have been destroyed just before Driver::runInternal executes. In this
case operators_[curOpIndex_]->stats().addRuntimeStat ("queuedWallNanos", ...); call would try to access freed memory.

The fix is to (1) streamline Driver by removing task_ member variable and use
DriverCtx::task instead; (2) move logic for updating stats after the call to
task->enter(). If task has terminated, enter() will return
StopReason::kTerminate and no Driver will abort any further processing.

SUMMARY: AddressSanitizer: heap-use-after-free

    #6 0x7f01c29ae9b9 in facebook::velox::exec::OperatorStats::addRuntimeStat(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) velox/exec/Operator.h:144
    #7 0x7f01c29afaf7 in facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>*) velox/exec/Driver.cpp:249
    #8 0x7f01c29b4174 in facebook::velox::exec::Driver::run(std::shared_ptr<facebook::velox::exec::Driver>) velox/exec/Driver.cpp:415
    #9 0x7f01c29cce5f in facebook::velox::exec::Driver::enqueue(std::shared_ptr<facebook::velox::exec::Driver>)::$_0::operator()() const velox/exec/Driver.cpp:161

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 15, 2022
@mbasmanova mbasmanova changed the title Fix race condition in Driver::runInternal [WIP] Fix race condition in Driver::runInternal Mar 15, 2022
@mbasmanova mbasmanova changed the title [WIP] Fix race condition in Driver::runInternal Fix race condition in Driver::runInternal Mar 15, 2022
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ZJie1 pushed a commit to ZJie1/velox that referenced this pull request Mar 18, 2022
Summary:
There was a race condition that caused ASAN heap-use-after-free in
Driver::runInternal. It is possible for the Task to terminate (due to an error)
while Driver is off-thread sitting in the executor's queue waiting to be
scheduled to run on a thread. If this case, Task may have been terminated and
operators have been destroyed just before Driver::runInternal executes. In this
case `operators_[curOpIndex_]->stats().addRuntimeStat
("queuedWallNanos", ...);` call would try to access freed memory.

The fix is to (1) streamline Driver by removing task_ member variable and use
DriverCtx::task instead; (2) move logic for updating stats after the call to
task->enter(). If task has terminated, enter() will return
StopReason::kTerminate and no Driver will abort any further processing.

```AddressSanitizer: heap-use-after-free

    facebookincubator#6 0x7f01c29ae9b9 in facebook::velox::exec::OperatorStats::addRuntimeStat(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) velox/exec/Operator.h:144
    facebookincubator#7 0x7f01c29afaf7 in facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>*) velox/exec/Driver.cpp:249
    facebookincubator#8 0x7f01c29b4174 in facebook::velox::exec::Driver::run(std::shared_ptr<facebook::velox::exec::Driver>) velox/exec/Driver.cpp:415
    facebookincubator#9 0x7f01c29cce5f in facebook::velox::exec::Driver::enqueue(std::shared_ptr<facebook::velox::exec::Driver>)::$_0::operator()() const velox/exec/Driver.cpp:161
```

Pull Request resolved: facebookincubator#1219

Test Plan:
$ buck test mode/dev-asan //presto_cpp/main/tests:presto_main_test -- --exact 'presto_cpp/main/tests:presto_main_test - TaskManagerTest.outOfQueryUserMemory' --run-disabled --stress-runs 1000

```
Summary
  Pass: 1000
  ListingSuccess: 1
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1125900137408748
```

Imported from GitHub, without a `Test Plan:` line.

Reviewed By: spershin, oerling

Differential Revision: D34904220

Pulled By: mbasmanova

fbshipit-source-id: d4b88cf9e3ec2884c77371c37c571297a59728b6
@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by cd1f5f3.

shiyu-bytedance pushed a commit to shiyu-bytedance/velox-1 that referenced this pull request Aug 18, 2022
Summary:
There was a race condition that caused ASAN heap-use-after-free in
Driver::runInternal. It is possible for the Task to terminate (due to an error)
while Driver is off-thread sitting in the executor's queue waiting to be
scheduled to run on a thread. If this case, Task may have been terminated and
operators have been destroyed just before Driver::runInternal executes. In this
case `operators_[curOpIndex_]->stats().addRuntimeStat
("queuedWallNanos", ...);` call would try to access freed memory.

The fix is to (1) streamline Driver by removing task_ member variable and use
DriverCtx::task instead; (2) move logic for updating stats after the call to
task->enter(). If task has terminated, enter() will return
StopReason::kTerminate and no Driver will abort any further processing.

```AddressSanitizer: heap-use-after-free

    facebookincubator#6 0x7f01c29ae9b9 in facebook::velox::exec::OperatorStats::addRuntimeStat(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) velox/exec/Operator.h:144
    facebookincubator#7 0x7f01c29afaf7 in facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>*) velox/exec/Driver.cpp:249
    facebookincubator#8 0x7f01c29b4174 in facebook::velox::exec::Driver::run(std::shared_ptr<facebook::velox::exec::Driver>) velox/exec/Driver.cpp:415
    facebookincubator#9 0x7f01c29cce5f in facebook::velox::exec::Driver::enqueue(std::shared_ptr<facebook::velox::exec::Driver>)::$_0::operator()() const velox/exec/Driver.cpp:161
```

Pull Request resolved: facebookincubator#1219

Test Plan:
$ buck test mode/dev-asan //presto_cpp/main/tests:presto_main_test -- --exact 'presto_cpp/main/tests:presto_main_test - TaskManagerTest.outOfQueryUserMemory' --run-disabled --stress-runs 1000

```
Summary
  Pass: 1000
  ListingSuccess: 1
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1125900137408748
```

Imported from GitHub, without a `Test Plan:` line.

Reviewed By: spershin, oerling

Differential Revision: D34904220

Pulled By: mbasmanova

fbshipit-source-id: d4b88cf9e3ec2884c77371c37c571297a59728b6
marin-ma pushed a commit to marin-ma/velox-oap that referenced this pull request Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Reverted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants