Skip to content

Commit

Permalink
Fix race between flush error recovery and db destruction (#12002)
Browse files Browse the repository at this point in the history
Summary:
**Context:**
DB destruction will wait for ongoing error recovery through `EndAutoRecovery()` and join the recovery thread: https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/db_impl/db_impl.cc#L525 -> https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L250 -> https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L808-L823

However, due to a race between flush error recovery and db destruction, recovery can actually start after such wait during the db shutdown. The consequence is that the recovery thread created as part of this recovery will not be properly joined upon its destruction as part the db destruction. It then crashes the program as below.

```
std::terminate()
std::default_delete<std::thread>::operator()(std::thread*) const
std::unique_ptr<std::thread, std::default_delete<std::thread>>::~unique_ptr()
rocksdb::ErrorHandler::~ErrorHandler() (rocksdb/db/error_handler.h:31)
rocksdb::DBImpl::~DBImpl() (rocksdb/db/db_impl/db_impl.cc:725)
rocksdb::DBImpl::~DBImpl() (rocksdb/db/db_impl/db_impl.cc:725)
rocksdb::DBTestBase::Close() (rocksdb/db/db_test_util.cc:678)
```

**Summary:**
This PR fixed it by considering whether EndAutoRecovery() has been called before creating such thread. This fix is similar to how we currently [handle](https://github.com/facebook/rocksdb/blob/519f2a41fb76e5644c63e4e588addb3b88b36580/db/error_handler.cc#L688-L694) such case inside the created recovery thread.

Pull Request resolved: facebook/rocksdb#12002

Test Plan: A new UT repro-ed the crash before this fix and and pass after.

Reviewed By: ajkr

Differential Revision: D50586191

Pulled By: hx235

fbshipit-source-id: b372f6d7a94eadee4b9283b826cc5fb81779a093
  • Loading branch information
hx235 authored and rockeet committed Sep 1, 2024
1 parent 2b1af48 commit 97a2fa1
Show file tree
Hide file tree
Showing 3 changed files with 65 additions and 0 deletions.
8 changes: 8 additions & 0 deletions db/error_handler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -638,6 +638,13 @@ const Status& ErrorHandler::StartRecoverFromRetryableBGIOError(
} else if (db_options_.max_bgerror_resume_count <= 0 || recovery_in_prog_) {
// Auto resume BG error is not enabled, directly return bg_error_.
return bg_error_;
} else if (end_recovery_) {
// Can temporarily release db mutex
EventHelpers::NotifyOnErrorRecoveryEnd(db_options_.listeners, bg_error_,
Status::ShutdownInProgress(),
db_mutex_);
db_mutex_->AssertHeld();
return bg_error_;
}
if (bg_error_stats_ != nullptr) {
RecordTick(bg_error_stats_.get(), ERROR_HANDLER_AUTORESUME_COUNT);
Expand Down Expand Up @@ -811,6 +818,7 @@ void ErrorHandler::EndAutoRecovery() {
old_recovery_thread->join();
db_mutex_->Lock();
}
TEST_SYNC_POINT("PostEndAutoRecovery");
return;
}

Expand Down
56 changes: 56 additions & 0 deletions db/error_handler_fs_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,15 @@
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file. See the AUTHORS file for names of contributors.

#include <memory>

#include "db/db_test_util.h"
#include "file/sst_file_manager_impl.h"
#include "port/stack_trace.h"
#include "rocksdb/io_status.h"
#include "rocksdb/sst_file_manager.h"
#include "test_util/sync_point.h"
#include "test_util/testharness.h"
#include "util/random.h"
#include "utilities/fault_injection_env.h"
#include "utilities/fault_injection_fs.h"
Expand Down Expand Up @@ -2473,6 +2476,59 @@ TEST_F(DBErrorHandlingFSTest, FLushWritRetryableErrorAbortRecovery) {
Destroy(options);
}

TEST_F(DBErrorHandlingFSTest, FlushErrorRecoveryRaceWithDBDestruction) {
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
std::shared_ptr<ErrorHandlerFSListener> listener =
std::make_shared<ErrorHandlerFSListener>();
options.listeners.emplace_back(listener);
DestroyAndReopen(options);
ASSERT_OK(Put("k1", "val"));

// Inject retryable flush error
bool error_set = false;
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeOutputValidation", [&](void*) {
if (error_set) {
return;
}
IOStatus st = IOStatus::IOError("Injected");
st.SetRetryable(true);
fault_fs_->SetFilesystemActive(false, st);
error_set = true;
});

port::Thread db_close_thread;
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeDeleteFile", [&](void*) {
// Clear retryable flush error injection
fault_fs_->SetFilesystemActive(true);

// Coerce race between ending auto recovery in db destruction and flush
// error recovery
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
{{"PostEndAutoRecovery", "FlushJob::WriteLevel0Table"}});
db_close_thread = port::Thread([&] { Close(); });
});
SyncPoint::GetInstance()->EnableProcessing();

Status s = Flush();
ASSERT_NOK(s);

int placeholder = 1;
listener->WaitForRecovery(placeholder);
ASSERT_TRUE(listener->new_bg_error().IsShutdownInProgress());

// Prior to the fix, the db close will crash due to the recovery thread for
// flush error is not joined by the time of destruction.
db_close_thread.join();

SyncPoint::GetInstance()->DisableProcessing();
SyncPoint::GetInstance()->ClearAllCallBacks();
Destroy(options);
}

TEST_F(DBErrorHandlingFSTest, FlushReadError) {
std::shared_ptr<ErrorHandlerFSListener> listener =
std::make_shared<ErrorHandlerFSListener>();
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix a race between flush error recovery and db destruction that can lead to db crashing.

0 comments on commit 97a2fa1

Please sign in to comment.