Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Driver loop #9

Closed

Conversation

mbasmanova
Copy link
Contributor

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 9, 2021
@facebook-github-bot
Copy link
Contributor

@mbasmanova has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mbasmanova
Copy link
Contributor Author

This is a wrong fix. The right fix is #10

@mbasmanova mbasmanova closed this Aug 9, 2021
winningsix pushed a commit to winningsix/velox-1 that referenced this pull request Sep 27, 2021
* Rename to reflect some changes in Doc

* Minor changes

* Add OmnisciDB plan translator

* Add to CMakeList file
winningsix pushed a commit to winningsix/velox-1 that referenced this pull request Sep 27, 2021
* Rename to reflect some changes in Doc

* Minor changes

* Add OmnisciDB plan translator

* Add to CMakeList file
@pedroerp pedroerp mentioned this pull request Nov 8, 2021
facebook-github-bot pushed a commit that referenced this pull request Mar 16, 2022
Summary:
There was a race condition that caused ASAN heap-use-after-free in
Driver::runInternal. It is possible for the Task to terminate (due to an error)
while Driver is off-thread sitting in the executor's queue waiting to be
scheduled to run on a thread. If this case, Task may have been terminated and
operators have been destroyed just before Driver::runInternal executes. In this
case `operators_[curOpIndex_]->stats().addRuntimeStat
("queuedWallNanos", ...);` call would try to access freed memory.

The fix is to (1) streamline Driver by removing task_ member variable and use
DriverCtx::task instead; (2) move logic for updating stats after the call to
task->enter(). If task has terminated, enter() will return
StopReason::kTerminate and no Driver will abort any further processing.

```AddressSanitizer: heap-use-after-free

    #6 0x7f01c29ae9b9 in facebook::velox::exec::OperatorStats::addRuntimeStat(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) velox/exec/Operator.h:144
    #7 0x7f01c29afaf7 in facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>*) velox/exec/Driver.cpp:249
    #8 0x7f01c29b4174 in facebook::velox::exec::Driver::run(std::shared_ptr<facebook::velox::exec::Driver>) velox/exec/Driver.cpp:415
    #9 0x7f01c29cce5f in facebook::velox::exec::Driver::enqueue(std::shared_ptr<facebook::velox::exec::Driver>)::$_0::operator()() const velox/exec/Driver.cpp:161
```

Pull Request resolved: #1219

Test Plan:
$ buck test mode/dev-asan //presto_cpp/main/tests:presto_main_test -- --exact 'presto_cpp/main/tests:presto_main_test - TaskManagerTest.outOfQueryUserMemory' --run-disabled --stress-runs 1000

```
Summary
  Pass: 1000
  ListingSuccess: 1
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1125900137408748
```

Imported from GitHub, without a `Test Plan:` line.

Reviewed By: spershin, oerling

Differential Revision: D34904220

Pulled By: mbasmanova

fbshipit-source-id: d4b88cf9e3ec2884c77371c37c571297a59728b6
ZJie1 pushed a commit to ZJie1/velox that referenced this pull request Mar 18, 2022
Summary:
There was a race condition that caused ASAN heap-use-after-free in
Driver::runInternal. It is possible for the Task to terminate (due to an error)
while Driver is off-thread sitting in the executor's queue waiting to be
scheduled to run on a thread. If this case, Task may have been terminated and
operators have been destroyed just before Driver::runInternal executes. In this
case `operators_[curOpIndex_]->stats().addRuntimeStat
("queuedWallNanos", ...);` call would try to access freed memory.

The fix is to (1) streamline Driver by removing task_ member variable and use
DriverCtx::task instead; (2) move logic for updating stats after the call to
task->enter(). If task has terminated, enter() will return
StopReason::kTerminate and no Driver will abort any further processing.

```AddressSanitizer: heap-use-after-free

    facebookincubator#6 0x7f01c29ae9b9 in facebook::velox::exec::OperatorStats::addRuntimeStat(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) velox/exec/Operator.h:144
    facebookincubator#7 0x7f01c29afaf7 in facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>*) velox/exec/Driver.cpp:249
    facebookincubator#8 0x7f01c29b4174 in facebook::velox::exec::Driver::run(std::shared_ptr<facebook::velox::exec::Driver>) velox/exec/Driver.cpp:415
    facebookincubator#9 0x7f01c29cce5f in facebook::velox::exec::Driver::enqueue(std::shared_ptr<facebook::velox::exec::Driver>)::$_0::operator()() const velox/exec/Driver.cpp:161
```

Pull Request resolved: facebookincubator#1219

Test Plan:
$ buck test mode/dev-asan //presto_cpp/main/tests:presto_main_test -- --exact 'presto_cpp/main/tests:presto_main_test - TaskManagerTest.outOfQueryUserMemory' --run-disabled --stress-runs 1000

```
Summary
  Pass: 1000
  ListingSuccess: 1
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1125900137408748
```

Imported from GitHub, without a `Test Plan:` line.

Reviewed By: spershin, oerling

Differential Revision: D34904220

Pulled By: mbasmanova

fbshipit-source-id: d4b88cf9e3ec2884c77371c37c571297a59728b6
rui-mo pushed a commit to rui-mo/velox that referenced this pull request May 12, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Jun 24, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Jun 30, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Jul 20, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Aug 2, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Aug 12, 2022
shiyu-bytedance pushed a commit to shiyu-bytedance/velox-1 that referenced this pull request Aug 18, 2022
Summary:
There was a race condition that caused ASAN heap-use-after-free in
Driver::runInternal. It is possible for the Task to terminate (due to an error)
while Driver is off-thread sitting in the executor's queue waiting to be
scheduled to run on a thread. If this case, Task may have been terminated and
operators have been destroyed just before Driver::runInternal executes. In this
case `operators_[curOpIndex_]->stats().addRuntimeStat
("queuedWallNanos", ...);` call would try to access freed memory.

The fix is to (1) streamline Driver by removing task_ member variable and use
DriverCtx::task instead; (2) move logic for updating stats after the call to
task->enter(). If task has terminated, enter() will return
StopReason::kTerminate and no Driver will abort any further processing.

```AddressSanitizer: heap-use-after-free

    facebookincubator#6 0x7f01c29ae9b9 in facebook::velox::exec::OperatorStats::addRuntimeStat(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, long) velox/exec/Operator.h:144
    facebookincubator#7 0x7f01c29afaf7 in facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>*) velox/exec/Driver.cpp:249
    facebookincubator#8 0x7f01c29b4174 in facebook::velox::exec::Driver::run(std::shared_ptr<facebook::velox::exec::Driver>) velox/exec/Driver.cpp:415
    facebookincubator#9 0x7f01c29cce5f in facebook::velox::exec::Driver::enqueue(std::shared_ptr<facebook::velox::exec::Driver>)::$_0::operator()() const velox/exec/Driver.cpp:161
```

Pull Request resolved: facebookincubator#1219

Test Plan:
$ buck test mode/dev-asan //presto_cpp/main/tests:presto_main_test -- --exact 'presto_cpp/main/tests:presto_main_test - TaskManagerTest.outOfQueryUserMemory' --run-disabled --stress-runs 1000

```
Summary
  Pass: 1000
  ListingSuccess: 1
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/1125900137408748
```

Imported from GitHub, without a `Test Plan:` line.

Reviewed By: spershin, oerling

Differential Revision: D34904220

Pulled By: mbasmanova

fbshipit-source-id: d4b88cf9e3ec2884c77371c37c571297a59728b6
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Aug 22, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Sep 7, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Sep 26, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Oct 26, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Nov 8, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Nov 8, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Nov 22, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Dec 15, 2022
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Jan 6, 2023
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Jan 12, 2023
PHILO-HE pushed a commit to PHILO-HE/velox that referenced this pull request Feb 3, 2023
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Feb 24, 2023
liujiayi771 pushed a commit to liujiayi771/velox that referenced this pull request Mar 3, 2023
liujiayi771 pushed a commit to liujiayi771/velox that referenced this pull request Mar 9, 2023
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Mar 22, 2023
liujiayi771 pushed a commit to liujiayi771/velox that referenced this pull request Mar 22, 2023
liujiayi771 pushed a commit to liujiayi771/velox that referenced this pull request Mar 23, 2023
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Mar 24, 2023
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Mar 27, 2023
liujiayi771 pushed a commit to liujiayi771/velox that referenced this pull request Apr 1, 2023
rui-mo pushed a commit to rui-mo/velox that referenced this pull request Apr 23, 2023
facebook-github-bot pushed a commit that referenced this pull request Aug 23, 2024
Summary:
The default algorithm used is MD5. However, MD5 is not supported with
fips and can cause a SIGSEGV. Set CRC32 instead which is a standard for
checksum computation and is not restricted by fips.
crc32 is also faster than md5.

Internally at IBM, we hit the following SIGSEGV

```
0x0000000000000000 in ?? ()
Missing separate debuginfos, use: dnf debuginfo-install openssl-fips-provider-3.0.7-2.el9.x86_64 xz-libs-5.2.5-8.el9_0.x86_64
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000000004e5f89b in Aws::Utils::Crypto::MD5OpenSSLImpl::Calculate(std::istream&) ()
#2  0x0000000004efd298 in Aws::Utils::Crypto::MD5::Calculate(std::istream&) ()
#3  0x0000000004ef71b9 in Aws::Utils::HashingUtils::CalculateMD5(std::iostream&) ()
#4  0x0000000004e8ebe8 in Aws::Client::AWSClient::AddChecksumToRequest(std::shared_ptr<Aws::Http::HttpRequest> const&, Aws::AmazonWebServiceRequest const&) const ()
#5  0x0000000004e8ed15 in Aws::Client::AWSClient::BuildHttpRequest(Aws::AmazonWebServiceRequest const&, std::shared_ptr<Aws::Http::HttpRequest> const&) const ()
#6  0x0000000004e977f9 in Aws::Client::AWSClient::AttemptOneRequest(std::shared_ptr<Aws::Http::HttpRequest> const&, Aws::AmazonWebServiceRequest const&, char const*, char const*, char const*) const ()
#7  0x0000000004e9e1c0 in Aws::Client::AWSClient::AttemptExhaustively(Aws::Http::URI const&, Aws::AmazonWebServiceRequest const&, Aws::Http::HttpMethod, char const*, char const*, char const*) const ()
#8  0x0000000004ea15e8 in Aws::Client::AWSXMLClient::MakeRequest(Aws::Http::URI const&, Aws::AmazonWebServiceRequest const&, Aws::Http::HttpMethod, char const*, char const*, char const*) const ()
#9  0x0000000004ea1f70 in Aws::Client::AWSXMLClient::MakeRequest(Aws::AmazonWebServiceRequest const&, Aws::Endpoint::AWSEndpoint const&, Aws::Http::HttpMethod, char const*, char const*, char const*) const ()
#10 0x0000000004de0933 in Aws::S3::S3Client::UploadPart(Aws::S3::Model::UploadPartRequest const&) const::{lambda()https://github.com/facebookincubator/velox/issues/1}::operator()() const ()
#11 0x0000000004de0b8c in std::_Function_handler<Aws::Utils::Outcome<Aws::S3::Model::UploadPartResult, Aws::S3::S3Error> (), Aws::S3::S3Client::UploadPart(Aws::S3::Model::UploadPartRequest const&) const::{lambda()https://github.com/facebookincubator/velox/issues/1}>::_M_invoke(std::_Any_data const&) ()
#12 0x0000000004e19317 in Aws::Utils::Outcome<Aws::S3::Model::UploadPartResult, Aws::S3::S3Error> smithy::components::tracing::TracingUtils::MakeCallWithTiming<Aws::Utils::Outcome<Aws::S3::Model::UploadPartResult, Aws::S3::S3Error> >(std::function<Aws::Utils::Outcome<Aws::S3::Model::UploadPartResult, Aws::S3::S3Error> ()>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, smithy::components::tracing::Meter const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#13 0x0000000004d7cdcf in Aws::S3::S3Client::UploadPart(Aws::S3::Model::UploadPartRequest const&) const ()
#14 0x0000000004ca4aa6 in facebook::velox::filesystems::S3WriteFile::Impl::uploadPart (this=0x7fffec2f09a0, part=..., isLast=true)
    at /root/velox/velox/connectors/hive/storage_adapters/s3fs/S3FileSystem.cpp:380
```

Pull Request resolved: #10801

Reviewed By: amitkdutta

Differential Revision: D61671574

Pulled By: kgpai

fbshipit-source-id: 34c7b777b3fde0659ef74c4fbfd93740fdfa3f7c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants