-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lldb] Create dependent modules in parallel #114507
Conversation
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.
CC @DmT021 |
@llvm/pr-subscribers-lldb Author: Jonas Devlieghere (JDevlieghere) ChangesCreate dependent modules in parallel in Target::SetExecutableModule. This change was inspired by #110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images.
We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks. Full diff: https://github.com/llvm/llvm-project/pull/114507.diff 1 Files Affected:
diff --git a/lldb/source/Target/Target.cpp b/lldb/source/Target/Target.cpp
index 199efae8a728cc..ef5d38fc796b08 100644
--- a/lldb/source/Target/Target.cpp
+++ b/lldb/source/Target/Target.cpp
@@ -68,6 +68,7 @@
#include "llvm/ADT/ScopeExit.h"
#include "llvm/ADT/SetVector.h"
+#include "llvm/Support/ThreadPool.h"
#include <memory>
#include <mutex>
@@ -1575,7 +1576,6 @@ void Target::SetExecutableModule(ModuleSP &executable_sp,
m_arch.GetSpec().GetTriple().getTriple());
}
- FileSpecList dependent_files;
ObjectFile *executable_objfile = executable_sp->GetObjectFile();
bool load_dependents = true;
switch (load_dependent_files) {
@@ -1591,10 +1591,14 @@ void Target::SetExecutableModule(ModuleSP &executable_sp,
}
if (executable_objfile && load_dependents) {
+ // FileSpecList is not thread safe and needs to be synchronized.
+ FileSpecList dependent_files;
+ std::mutex dependent_files_mutex;
+
+ // ModuleList is thread safe.
ModuleList added_modules;
- executable_objfile->GetDependentModules(dependent_files);
- for (uint32_t i = 0; i < dependent_files.GetSize(); i++) {
- FileSpec dependent_file_spec(dependent_files.GetFileSpecAtIndex(i));
+
+ auto GetDependentModules = [&](FileSpec dependent_file_spec) {
FileSpec platform_dependent_file_spec;
if (m_platform_sp)
m_platform_sp->GetFileWithUUID(dependent_file_spec, nullptr,
@@ -1608,9 +1612,28 @@ void Target::SetExecutableModule(ModuleSP &executable_sp,
if (image_module_sp) {
added_modules.AppendIfNeeded(image_module_sp, false);
ObjectFile *objfile = image_module_sp->GetObjectFile();
- if (objfile)
+ if (objfile) {
+ std::lock_guard<std::mutex> guard(dependent_files_mutex);
objfile->GetDependentModules(dependent_files);
+ }
+ }
+ };
+
+ executable_objfile->GetDependentModules(dependent_files);
+
+ llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
+ for (uint32_t i = 0; i < dependent_files.GetSize(); i++) {
+ // Process all currently known dependencies in parallel in the innermost
+ // loop. This may create newly discovered dependencies to be appended to
+ // dependent_files. We'll deal with these files during the next
+ // iteration of the outermost loop.
+ {
+ std::lock_guard<std::mutex> guard(dependent_files_mutex);
+ for (; i < dependent_files.GetSize(); i++)
+ task_group.async(GetDependentModules,
+ dependent_files.GetFileSpecAtIndex(i));
}
+ task_group.wait();
}
ModulesDidLoad(added_modules);
}
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
lldb/source/Target/Target.cpp
Outdated
@@ -1608,9 +1612,28 @@ void Target::SetExecutableModule(ModuleSP &executable_sp, | |||
if (image_module_sp) { | |||
added_modules.AppendIfNeeded(image_module_sp, false); | |||
ObjectFile *objfile = image_module_sp->GetObjectFile(); | |||
if (objfile) | |||
if (objfile) { | |||
std::lock_guard<std::mutex> guard(dependent_files_mutex); | |||
objfile->GetDependentModules(dependent_files); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this operation is heavy in any of the ObjectFile
implementations. If it is we may want to lock the mutex only when an actual append to the dependent_files happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question. for mach-o this is a simple iteration over the load commands, but we might want to to pass a temporary file list object to this method, and then acquire the lock and append its entries to dependency_files
, if we didn't want to assume that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fair enough. Even the Mach-O one is doing some filesystem operations that really shouldn't be happening while holding the lock. I don't think we need to make every FileSpecList thread safe, and passing an optional mutex*
was messy, so I went with the local copy as Jason suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case, did you check for a regression in performance after the latest patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I did. I was hoping for a slight speedup, but the difference was in the noise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a minute to understand how you were handling the locking of the dependent file list while iterating across it, but now I see it. You grab the lock at the beginning of the outer loop, then enqueue all of the not-yet-processed dependent binaries into asynchronous threads. They are all creating the Modules and then blocking to acquire the lock, to add their entries to the dependent files list. The inner loop finishes distributing things to the thread pool, releases its lock, and waits for all the just-started async jobs to complete. Then we are back in the outer loop which reads the size of the dependent file list.
Looks good to me, thanks for addressing this other bottleneck for launch startup. |
In D148380 [1], Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is llvm#114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which has a callback to Target::SetExecutableModule. [1] https://reviews.llvm.org/D148380
The test failures were both caused by the locking in |
{ | ||
std::lock_guard<std::mutex> guard(dependent_files_mutex); | ||
for (; i < dependent_files.GetSize(); i++) | ||
task_group.async(GetDependentModules, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async inside a lock_guard looks dangerous. I assume you're sure this won't recurse into this function? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't, but if that becomes a problem we can do the same trick that Jason suggested and iterate over a copy of the list here. For now that seems unnecessary but hopefully someone will see this comment if that ever changes :-)
In [D148380](https://reviews.llvm.org/D148380), Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is #114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which in turns has a callback to Target::SetExecutableModule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
In [D148380](https://reviews.llvm.org/D148380), Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is llvm#114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which in turns has a callback to Target::SetExecutableModule.
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.
In [D148380](https://reviews.llvm.org/D148380), Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is llvm#114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which in turns has a callback to Target::SetExecutableModule.
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.
This reverts commit b360dfd.
In [D148380](https://reviews.llvm.org/D148380), Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is llvm#114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which in turns has a callback to Target::SetExecutableModule.
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.
In [D148380](https://reviews.llvm.org/D148380), Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is #114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which in turns has a callback to Target::SetExecutableModule.
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by #110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.
In [D148380](https://reviews.llvm.org/D148380), Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is llvm#114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which in turns has a callback to Target::SetExecutableModule. (cherry picked from commit 8f8e2b7)
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks. (cherry picked from commit a57296a)
[lldb] Create dependent modules in parallel (llvm#114507)
This reverts commit b360dfd.
In [D148380](https://reviews.llvm.org/D148380), Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is llvm#114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which in turns has a callback to Target::SetExecutableModule.
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.
Release note llvm#110646 and llvm#114507.
Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by #110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB.
I used Slack for benchmarking, which loads 902 images.
We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.