Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lldb] Create dependent modules in parallel #114507

Merged
merged 4 commits into from
Nov 2, 2024

Conversation

JDevlieghere
Copy link
Member

Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by #110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs

We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.

Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by llvm#110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

  Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
    Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
    Range (min … max):    1.220 s …  1.229 s    10 runs

  Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
    Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
    Range (min … max):    3.211 s …  3.310 s    10 runs

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.
@JDevlieghere
Copy link
Member Author

CC @DmT021

@llvmbot
Copy link
Member

llvmbot commented Nov 1, 2024

@llvm/pr-subscribers-lldb

Author: Jonas Devlieghere (JDevlieghere)

Changes

Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by #110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs

We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.


Full diff: https://github.com/llvm/llvm-project/pull/114507.diff

1 Files Affected:

  • (modified) lldb/source/Target/Target.cpp (+28-5)
diff --git a/lldb/source/Target/Target.cpp b/lldb/source/Target/Target.cpp
index 199efae8a728cc..ef5d38fc796b08 100644
--- a/lldb/source/Target/Target.cpp
+++ b/lldb/source/Target/Target.cpp
@@ -68,6 +68,7 @@
 
 #include "llvm/ADT/ScopeExit.h"
 #include "llvm/ADT/SetVector.h"
+#include "llvm/Support/ThreadPool.h"
 
 #include <memory>
 #include <mutex>
@@ -1575,7 +1576,6 @@ void Target::SetExecutableModule(ModuleSP &executable_sp,
                m_arch.GetSpec().GetTriple().getTriple());
     }
 
-    FileSpecList dependent_files;
     ObjectFile *executable_objfile = executable_sp->GetObjectFile();
     bool load_dependents = true;
     switch (load_dependent_files) {
@@ -1591,10 +1591,14 @@ void Target::SetExecutableModule(ModuleSP &executable_sp,
     }
 
     if (executable_objfile && load_dependents) {
+      // FileSpecList is not thread safe and needs to be synchronized.
+      FileSpecList dependent_files;
+      std::mutex dependent_files_mutex;
+
+      // ModuleList is thread safe.
       ModuleList added_modules;
-      executable_objfile->GetDependentModules(dependent_files);
-      for (uint32_t i = 0; i < dependent_files.GetSize(); i++) {
-        FileSpec dependent_file_spec(dependent_files.GetFileSpecAtIndex(i));
+
+      auto GetDependentModules = [&](FileSpec dependent_file_spec) {
         FileSpec platform_dependent_file_spec;
         if (m_platform_sp)
           m_platform_sp->GetFileWithUUID(dependent_file_spec, nullptr,
@@ -1608,9 +1612,28 @@ void Target::SetExecutableModule(ModuleSP &executable_sp,
         if (image_module_sp) {
           added_modules.AppendIfNeeded(image_module_sp, false);
           ObjectFile *objfile = image_module_sp->GetObjectFile();
-          if (objfile)
+          if (objfile) {
+            std::lock_guard<std::mutex> guard(dependent_files_mutex);
             objfile->GetDependentModules(dependent_files);
+          }
+        }
+      };
+
+      executable_objfile->GetDependentModules(dependent_files);
+
+      llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
+      for (uint32_t i = 0; i < dependent_files.GetSize(); i++) {
+        // Process all currently known dependencies in parallel in the innermost
+        // loop. This may create newly discovered dependencies to be appended to
+        // dependent_files. We'll deal with these files during the next
+        // iteration of the outermost loop.
+        {
+          std::lock_guard<std::mutex> guard(dependent_files_mutex);
+          for (; i < dependent_files.GetSize(); i++)
+            task_group.async(GetDependentModules,
+                            dependent_files.GetFileSpecAtIndex(i));
         }
+        task_group.wait();
       }
       ModulesDidLoad(added_modules);
     }

Copy link

github-actions bot commented Nov 1, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@@ -1608,9 +1612,28 @@ void Target::SetExecutableModule(ModuleSP &executable_sp,
if (image_module_sp) {
added_modules.AppendIfNeeded(image_module_sp, false);
ObjectFile *objfile = image_module_sp->GetObjectFile();
if (objfile)
if (objfile) {
std::lock_guard<std::mutex> guard(dependent_files_mutex);
objfile->GetDependentModules(dependent_files);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this operation is heavy in any of the ObjectFile implementations. If it is we may want to lock the mutex only when an actual append to the dependent_files happens.

Copy link
Collaborator

@jasonmolenda jasonmolenda Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question. for mach-o this is a simple iteration over the load commands, but we might want to to pass a temporary file list object to this method, and then acquire the lock and append its entries to dependency_files, if we didn't want to assume that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah fair enough. Even the Mach-O one is doing some filesystem operations that really shouldn't be happening while holding the lock. I don't think we need to make every FileSpecList thread safe, and passing an optional mutex* was messy, so I went with the local copy as Jason suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case, did you check for a regression in performance after the latest patch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I did. I was hoping for a slight speedup, but the difference was in the noise.

Copy link
Collaborator

@jasonmolenda jasonmolenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It took me a minute to understand how you were handling the locking of the dependent file list while iterating across it, but now I see it. You grab the lock at the beginning of the outer loop, then enqueue all of the not-yet-processed dependent binaries into asynchronous threads. They are all creating the Modules and then blocking to acquire the lock, to add their entries to the dependent files list. The inner loop finishes distributing things to the thread pool, releases its lock, and waits for all the just-started async jobs to complete. Then we are back in the outer loop which reads the size of the dependent file list.

@jasonmolenda
Copy link
Collaborator

Looks good to me, thanks for addressing this other bottleneck for launch startup.

JDevlieghere added a commit to JDevlieghere/llvm-project that referenced this pull request Nov 1, 2024
In D148380 [1], Alex added locking to PathMappingLists. The current
implementation runs the callback under the lock, which I don't believe
is necessary. As far as I can tell, no users of the callback are relying
on the list not having been modified until the callback is handled.

This patch implements my suggestion to unlock the mutex before the
callback. I also switched to a non-recursive mutex as I don't believe
the recursive property is needed. To make the class fully thread safe, I
did have to introduce another mutex to protect the callback members.

The motivation for this change is llvm#114507. Specifically,
Target::SetExecutableModule calls Target::GetOrCreateModule, which
potentially performs path remapping, which has a callback to
Target::SetExecutableModule.

[1] https://reviews.llvm.org/D148380
@JDevlieghere
Copy link
Member Author

The test failures were both caused by the locking in PathMappingList, which I'm addressing in #114576.

{
std::lock_guard<std::mutex> guard(dependent_files_mutex);
for (; i < dependent_files.GetSize(); i++)
task_group.async(GetDependentModules,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async inside a lock_guard looks dangerous. I assume you're sure this won't recurse into this function? :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, but if that becomes a problem we can do the same trick that Jason suggested and iterate over a copy of the list here. For now that seems unnecessary but hopefully someone will see this comment if that ever changes :-)

JDevlieghere added a commit that referenced this pull request Nov 1, 2024
In [D148380](https://reviews.llvm.org/D148380), Alex added locking to
PathMappingLists. The current implementation runs the callback under the
lock, which I don't believe is necessary. As far as I can tell, no users
of the callback are relying on the list not having been modified until
the callback is handled.

This patch implements my suggestion to unlock the mutex before the
callback. I also switched to a non-recursive mutex as I don't believe
the recursive property is needed. To make the class fully thread safe, I
did have to introduce another mutex to protect the callback members.

The motivation for this change is #114507. Specifically,
Target::SetExecutableModule calls Target::GetOrCreateModule, which
potentially performs path remapping, which in turns has a callback to
Target::SetExecutableModule.
Copy link
Collaborator

@jasonmolenda jasonmolenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@JDevlieghere JDevlieghere merged commit b360dfd into llvm:main Nov 2, 2024
7 checks passed
@JDevlieghere JDevlieghere deleted the parallel-dependents branch November 2, 2024 16:38
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
In [D148380](https://reviews.llvm.org/D148380), Alex added locking to
PathMappingLists. The current implementation runs the callback under the
lock, which I don't believe is necessary. As far as I can tell, no users
of the callback are relying on the list not having been modified until
the callback is handled.

This patch implements my suggestion to unlock the mutex before the
callback. I also switched to a non-recursive mutex as I don't believe
the recursive property is needed. To make the class fully thread safe, I
did have to introduce another mutex to protect the callback members.

The motivation for this change is llvm#114507. Specifically,
Target::SetExecutableModule calls Target::GetOrCreateModule, which
potentially performs path remapping, which in turns has a callback to
Target::SetExecutableModule.
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by llvm#110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

```
Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs
```

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
In [D148380](https://reviews.llvm.org/D148380), Alex added locking to
PathMappingLists. The current implementation runs the callback under the
lock, which I don't believe is necessary. As far as I can tell, no users
of the callback are relying on the list not having been modified until
the callback is handled.

This patch implements my suggestion to unlock the mutex before the
callback. I also switched to a non-recursive mutex as I don't believe
the recursive property is needed. To make the class fully thread safe, I
did have to introduce another mutex to protect the callback members.

The motivation for this change is llvm#114507. Specifically,
Target::SetExecutableModule calls Target::GetOrCreateModule, which
potentially performs path remapping, which in turns has a callback to
Target::SetExecutableModule.
smallp-o-p pushed a commit to smallp-o-p/llvm-project that referenced this pull request Nov 3, 2024
Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by llvm#110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

```
Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs
```

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.
JDevlieghere added a commit that referenced this pull request Nov 3, 2024
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
In [D148380](https://reviews.llvm.org/D148380), Alex added locking to
PathMappingLists. The current implementation runs the callback under the
lock, which I don't believe is necessary. As far as I can tell, no users
of the callback are relying on the list not having been modified until
the callback is handled.

This patch implements my suggestion to unlock the mutex before the
callback. I also switched to a non-recursive mutex as I don't believe
the recursive property is needed. To make the class fully thread safe, I
did have to introduce another mutex to protect the callback members.

The motivation for this change is llvm#114507. Specifically,
Target::SetExecutableModule calls Target::GetOrCreateModule, which
potentially performs path remapping, which in turns has a callback to
Target::SetExecutableModule.
NoumanAmir657 pushed a commit to NoumanAmir657/llvm-project that referenced this pull request Nov 4, 2024
Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by llvm#110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

```
Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs
```

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.
JDevlieghere added a commit that referenced this pull request Nov 4, 2024
In [D148380](https://reviews.llvm.org/D148380), Alex added locking to
PathMappingLists. The current implementation runs the callback under the
lock, which I don't believe is necessary. As far as I can tell, no users
of the callback are relying on the list not having been modified until
the callback is handled.

This patch implements my suggestion to unlock the mutex before the
callback. I also switched to a non-recursive mutex as I don't believe
the recursive property is needed. To make the class fully thread safe, I
did have to introduce another mutex to protect the callback members.

The motivation for this change is #114507. Specifically,
Target::SetExecutableModule calls Target::GetOrCreateModule, which
potentially performs path remapping, which in turns has a callback to
Target::SetExecutableModule.
JDevlieghere added a commit that referenced this pull request Nov 4, 2024
Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by #110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

```
Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs
```

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.
JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Nov 4, 2024
In [D148380](https://reviews.llvm.org/D148380), Alex added locking to
PathMappingLists. The current implementation runs the callback under the
lock, which I don't believe is necessary. As far as I can tell, no users
of the callback are relying on the list not having been modified until
the callback is handled.

This patch implements my suggestion to unlock the mutex before the
callback. I also switched to a non-recursive mutex as I don't believe
the recursive property is needed. To make the class fully thread safe, I
did have to introduce another mutex to protect the callback members.

The motivation for this change is llvm#114507. Specifically,
Target::SetExecutableModule calls Target::GetOrCreateModule, which
potentially performs path remapping, which in turns has a callback to
Target::SetExecutableModule.

(cherry picked from commit 8f8e2b7)
JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Nov 4, 2024
Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by llvm#110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

```
Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs
```

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.

(cherry picked from commit a57296a)
JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Nov 5, 2024
[lldb] Create dependent modules in parallel (llvm#114507)
PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024
PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024
In [D148380](https://reviews.llvm.org/D148380), Alex added locking to
PathMappingLists. The current implementation runs the callback under the
lock, which I don't believe is necessary. As far as I can tell, no users
of the callback are relying on the list not having been modified until
the callback is handled.

This patch implements my suggestion to unlock the mutex before the
callback. I also switched to a non-recursive mutex as I don't believe
the recursive property is needed. To make the class fully thread safe, I
did have to introduce another mutex to protect the callback members.

The motivation for this change is llvm#114507. Specifically,
Target::SetExecutableModule calls Target::GetOrCreateModule, which
potentially performs path remapping, which in turns has a callback to
Target::SetExecutableModule.
PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024
Create dependent modules in parallel in Target::SetExecutableModule.
This change was inspired by llvm#110646 which takes the same approach when
attaching. Jason suggested we could use the same approach when you
create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

```
Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs
```

We see about a 2x speedup, which matches what Jason saw for the attach
scenario. I also ran this under TSan to confirm this doesn't introduce
any races or deadlocks.
JDevlieghere added a commit to JDevlieghere/llvm-project that referenced this pull request Nov 19, 2024
JDevlieghere added a commit that referenced this pull request Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants