Skip to content

Commit 46e4357

Browse files
xavierdfacebook-github-bot
authored andcommitted
inodes: perform maintenance in the overlay thread
Summary: During shutdown, the overlay background thread is terminated, and the overlay closed, but EdenServer timers are still running. One of which being the manageOverlay one which performs a maintenance on the Overlay. In some cases, shutdown and this timer are racing each other, causing the timer to run after the overlay has been closed, causing a use after free and crashing EdenFS. To solve this, we can either add locks around closing the overlay and the maintenance to guarantee that they do not race, or we can move the maintenance operation to a thread that is known to be running only when the overlay is opened. This diff takes the second approach. The bug manifest itself like so: I0712 01:16:48.253296 21620 EdenServiceHandler.cpp:3394] [000002517432E1E0] initiateShutdown() took 368 µs I0712 01:16:48.253838 24700 PrjfsChannel.cpp:1185] Stopping PrjfsChannel for: C:\\cygwin\\tmp\\eden_test.stop_at_ea.m3931hyz\\mounts\\main I0712 01:16:48.258533 19188 EdenServer.cpp:1624] mount point "C:\\cygwin\\tmp\\eden_test.stop_at_ea.m3931hyz\\mounts\\main" stopped V0712 01:16:48.258814 19188 EdenMount.cpp:851] beginning shutdown for EdenMount C:\\cygwin\\tmp\\eden_test.stop_at_ea.m3931hyz\\mounts\\main V0712 01:16:48.259895 19188 EdenMount.cpp:855] shutdown complete for EdenMount C:\\cygwin\\tmp\\eden_test.stop_at_ea.m3931hyz\\mounts\\main V0712 01:16:48.287378 19188 EdenMount.cpp:861] successfully closed overlay at C:\\cygwin\\tmp\\eden_test.stop_at_ea.m3931hyz\\mounts\\main Unhandled win32 exception code=0xC0000005. Fatal error detected at: #0 00007FF707BA3D81 (7a686d9) facebook::eden::SqliteDatabase::checkpoint Z:\shipit\eden\eden\fs\sqlite\SqliteDatabase.cpp:99 #1 00007FF7072D4090 facebook::eden::EdenServer::manageOverlay Z:\shipit\eden\eden\fs\service\EdenServer.cpp:2142 #2 00007FF7074765D0 facebook::eden::PeriodicTask::timeoutExpired Z:\shipit\eden\eden\fs\service\PeriodicTask.cpp:32 #3 00007FF707500F93 folly::HHWheelTimerBase<std::chrono::duration<__int64,std::ratio<1,1000> > >::timeoutExpired Z:\shipit\folly\folly\io\async\HHWheelTimer.cpp:286 #4 00007FF70757BB44 folly::AsyncTimeout::libeventCallback Z:\shipit\folly\folly\io\async\AsyncTimeout.cpp:174 #5 00007FF708E7AD45 (645b6fc) event_priority_set #6 00007FF708E7AA3A event_priority_set #7 00007FF708E77343 event_base_loop #8 00007FF707515FC5 folly::EventBase::loopMain Z:\shipit\folly\folly\io\async\EventBase.cpp:405 #9 00007FF707515C62 folly::EventBase::loopBody Z:\shipit\folly\folly\io\async\EventBase.cpp:326 #10 00007FF7072DC5EE facebook::eden::EdenServer::performCleanup Z:\shipit\eden\eden\fs\service\EdenServer.cpp:1212 #11 00007FF707219BED facebook::eden::runEdenMain Z:\shipit\eden\eden\fs\service\EdenMain.cpp:395 #12 00007FF7071C624A main Z:\shipit\eden\eden\fs\service\oss\main.cpp:23 #13 00007FF708E87A94 __scrt_common_main_seh d:\A01\_work\12\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288 #14 00007FFC96DC7034 BaseThreadInitThunk #15 00007FFC98C5CEC1 RtlUserThreadStart Reviewed By: fanzeyi Differential Revision: D37793444 fbshipit-source-id: cd33302789c2c7a29d566d5bac6e119eccf0a5f2
1 parent f36dd36 commit 46e4357

File tree

2 files changed

+25
-14
lines changed

2 files changed

+25
-14
lines changed

eden/fs/inodes/Overlay.cpp

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -531,8 +531,16 @@ void Overlay::gcThread() noexcept {
531531

532532
void Overlay::handleGCRequest(GCRequest& request) {
533533
IORequest req{this};
534-
if (request.flush) {
535-
request.flush->setValue();
534+
535+
if (std::holds_alternative<GCRequest::MaintenanceRequest>(
536+
request.requestType)) {
537+
backingOverlay_->maintenance();
538+
return;
539+
}
540+
541+
if (auto* flush =
542+
std::get_if<GCRequest::FlushRequest>(&request.requestType)) {
543+
flush->setValue();
536544
return;
537545
}
538546

@@ -573,7 +581,7 @@ void Overlay::handleGCRequest(GCRequest& request) {
573581
}
574582
};
575583

576-
processDir(request.dir);
584+
processDir(std::get<overlay::OverlayDir>(request.requestType));
577585

578586
while (!queue.empty()) {
579587
auto ino = queue.front();
@@ -644,6 +652,6 @@ void Overlay::renameChild(
644652
}
645653

646654
void Overlay::maintenance() {
647-
backingOverlay_->maintenance();
655+
gcQueue_.lock()->queue.emplace_back(Overlay::GCRequest::MaintenanceRequest{});
648656
}
649657
} // namespace facebook::eden

eden/fs/inodes/Overlay.h

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -267,24 +267,27 @@ class Overlay : public std::enable_shared_from_this<Overlay> {
267267
std::shared_ptr<StructuredLogger> logger);
268268

269269
/**
270-
* A request for the background GC thread. There are two types of requests:
271-
* recursively forget data underneath an given directory, or complete a
272-
* promise. The latter is used for synchronization with the GC thread,
273-
* primarily in unit tests.
270+
* A request for the background GC thread. There are three types of
271+
* requests: recursively forget data underneath a given directory, perform
272+
* some maintenance of the overlay or complete a promise. The latter is used
273+
* for synchronization with the GC thread, primarily in unit tests.
274274
*
275275
* If additional request types are added in the future, consider renaming to
276276
* AsyncRequest. However, recursive collection of forgotten inode numbers
277277
* is the only operation that can be made async while preserving our
278278
* durability goals.
279279
*/
280280
struct GCRequest {
281-
GCRequest() {}
282-
explicit GCRequest(overlay::OverlayDir&& d) : dir{std::move(d)} {}
283-
explicit GCRequest(folly::Promise<folly::Unit> p) : flush{std::move(p)} {}
281+
explicit GCRequest(overlay::OverlayDir&& d) : requestType{std::move(d)} {}
284282

285-
overlay::OverlayDir dir;
286-
// Iff set, this is a flush request.
287-
std::optional<folly::Promise<folly::Unit>> flush;
283+
using FlushRequest = folly::Promise<folly::Unit>;
284+
explicit GCRequest(FlushRequest p) : requestType{std::move(p)} {}
285+
286+
struct MaintenanceRequest {};
287+
explicit GCRequest(MaintenanceRequest req) : requestType{std::move(req)} {}
288+
289+
std::variant<MaintenanceRequest, overlay::OverlayDir, FlushRequest>
290+
requestType;
288291
};
289292

290293
struct GCQueue {

0 commit comments

Comments
 (0)