[lldb] Create dependent modules in parallel #114507

JDevlieghere · 2024-11-01T04:50:32Z

Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by #110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs

We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.

Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.

JDevlieghere · 2024-11-01T04:50:46Z

CC @DmT021

llvmbot · 2024-11-01T04:51:04Z

@llvm/pr-subscribers-lldb

Author: Jonas Devlieghere (JDevlieghere)

Changes

Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by #110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB.

I used Slack for benchmarking, which loads 902 images.

Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      1.225 s ±  0.003 s    [User: 3.977 s, System: 1.521 s]
  Range (min … max):    1.220 s …  1.229 s    10 runs

Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack
  Time (mean ± σ):      3.253 s ±  0.037 s    [User: 3.013 s, System: 0.248 s]
  Range (min … max):    3.211 s …  3.310 s    10 runs

We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.

Full diff: https://github.com/llvm/llvm-project/pull/114507.diff

1 Files Affected:

(modified) lldb/source/Target/Target.cpp (+28-5)

diff --git a/lldb/source/Target/Target.cpp b/lldb/source/Target/Target.cpp
index 199efae8a728cc..ef5d38fc796b08 100644
--- a/lldb/source/Target/Target.cpp
+++ b/lldb/source/Target/Target.cpp
@@ -68,6 +68,7 @@
 
 #include "llvm/ADT/ScopeExit.h"
 #include "llvm/ADT/SetVector.h"
+#include "llvm/Support/ThreadPool.h"
 
 #include <memory>
 #include <mutex>
@@ -1575,7 +1576,6 @@ void Target::SetExecutableModule(ModuleSP &executable_sp,
                m_arch.GetSpec().GetTriple().getTriple());
     }
 
-    FileSpecList dependent_files;
     ObjectFile *executable_objfile = executable_sp->GetObjectFile();
     bool load_dependents = true;
     switch (load_dependent_files) {
@@ -1591,10 +1591,14 @@ void Target::SetExecutableModule(ModuleSP &executable_sp,
     }
 
     if (executable_objfile && load_dependents) {
+      // FileSpecList is not thread safe and needs to be synchronized.
+      FileSpecList dependent_files;
+      std::mutex dependent_files_mutex;
+
+      // ModuleList is thread safe.
       ModuleList added_modules;
-      executable_objfile->GetDependentModules(dependent_files);
-      for (uint32_t i = 0; i < dependent_files.GetSize(); i++) {
-        FileSpec dependent_file_spec(dependent_files.GetFileSpecAtIndex(i));
+
+      auto GetDependentModules = [&](FileSpec dependent_file_spec) {
         FileSpec platform_dependent_file_spec;
         if (m_platform_sp)
           m_platform_sp->GetFileWithUUID(dependent_file_spec, nullptr,
@@ -1608,9 +1612,28 @@ void Target::SetExecutableModule(ModuleSP &executable_sp,
         if (image_module_sp) {
           added_modules.AppendIfNeeded(image_module_sp, false);
           ObjectFile *objfile = image_module_sp->GetObjectFile();
-          if (objfile)
+          if (objfile) {
+            std::lock_guard<std::mutex> guard(dependent_files_mutex);
             objfile->GetDependentModules(dependent_files);
+          }
+        }
+      };
+
+      executable_objfile->GetDependentModules(dependent_files);
+
+      llvm::ThreadPoolTaskGroup task_group(Debugger::GetThreadPool());
+      for (uint32_t i = 0; i < dependent_files.GetSize(); i++) {
+        // Process all currently known dependencies in parallel in the innermost
+        // loop. This may create newly discovered dependencies to be appended to
+        // dependent_files. We'll deal with these files during the next
+        // iteration of the outermost loop.
+        {
+          std::lock_guard<std::mutex> guard(dependent_files_mutex);
+          for (; i < dependent_files.GetSize(); i++)
+            task_group.async(GetDependentModules,
+                            dependent_files.GetFileSpecAtIndex(i));
         }
+        task_group.wait();
       }
       ModulesDidLoad(added_modules);
     }

github-actions · 2024-11-01T04:53:59Z

✅ With the latest revision this PR passed the C/C++ code formatter.

DmT021 · 2024-11-01T05:09:05Z

lldb/source/Target/Target.cpp

@@ -1608,9 +1612,28 @@ void Target::SetExecutableModule(ModuleSP &executable_sp,
        if (image_module_sp) {
          added_modules.AppendIfNeeded(image_module_sp, false);
          ObjectFile *objfile = image_module_sp->GetObjectFile();
-          if (objfile)
+          if (objfile) {
+            std::lock_guard<std::mutex> guard(dependent_files_mutex);
            objfile->GetDependentModules(dependent_files);


I wonder if this operation is heavy in any of the ObjectFile implementations. If it is we may want to lock the mutex only when an actual append to the dependent_files happens.

good question. for mach-o this is a simple iteration over the load commands, but we might want to to pass a temporary file list object to this method, and then acquire the lock and append its entries to dependency_files, if we didn't want to assume that.

Yeah fair enough. Even the Mach-O one is doing some filesystem operations that really shouldn't be happening while holding the lock. I don't think we need to make every FileSpecList thread safe, and passing an optional mutex* was messy, so I went with the local copy as Jason suggested.

Just in case, did you check for a regression in performance after the latest patch?

Yeah I did. I was hoping for a slight speedup, but the difference was in the noise.

jasonmolenda

It took me a minute to understand how you were handling the locking of the dependent file list while iterating across it, but now I see it. You grab the lock at the beginning of the outer loop, then enqueue all of the not-yet-processed dependent binaries into asynchronous threads. They are all creating the Modules and then blocking to acquire the lock, to add their entries to the dependent files list. The inner loop finishes distributing things to the thread pool, releases its lock, and waits for all the just-started async jobs to complete. Then we are back in the outer loop which reads the size of the dependent file list.

jasonmolenda · 2024-11-01T05:12:35Z

Looks good to me, thanks for addressing this other bottleneck for launch startup.

In D148380 [1], Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is llvm#114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which has a callback to Target::SetExecutableModule. [1] https://reviews.llvm.org/D148380

JDevlieghere · 2024-11-01T17:11:49Z

The test failures were both caused by the locking in PathMappingList, which I'm addressing in #114576.

adrian-prantl · 2024-11-01T23:22:44Z

lldb/source/Target/Target.cpp

+        {
+          std::lock_guard<std::mutex> guard(dependent_files_mutex);
+          for (; i < dependent_files.GetSize(); i++)
+            task_group.async(GetDependentModules,


async inside a lock_guard looks dangerous. I assume you're sure this won't recurse into this function? :-)

It doesn't, but if that becomes a problem we can do the same trick that Jason suggested and iterate over a copy of the list here. For now that seems unnecessary but hopefully someone will see this comment if that ever changes :-)

In [D148380](https://reviews.llvm.org/D148380), Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is #114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which in turns has a callback to Target::SetExecutableModule.

jasonmolenda

LGTM.

In [D148380](https://reviews.llvm.org/D148380), Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is llvm#114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which in turns has a callback to Target::SetExecutableModule.

Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.

In [D148380](https://reviews.llvm.org/D148380), Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is llvm#114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which in turns has a callback to Target::SetExecutableModule.

Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.

This reverts commit b360dfd.

In [D148380](https://reviews.llvm.org/D148380), Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is llvm#114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which in turns has a callback to Target::SetExecutableModule.

Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.

In [D148380](https://reviews.llvm.org/D148380), Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is #114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which in turns has a callback to Target::SetExecutableModule.

Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by #110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.

In [D148380](https://reviews.llvm.org/D148380), Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is llvm#114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which in turns has a callback to Target::SetExecutableModule. (cherry picked from commit 8f8e2b7)

Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks. (cherry picked from commit a57296a)

[lldb] Create dependent modules in parallel (llvm#114507)

This reverts commit b360dfd.

In [D148380](https://reviews.llvm.org/D148380), Alex added locking to PathMappingLists. The current implementation runs the callback under the lock, which I don't believe is necessary. As far as I can tell, no users of the callback are relying on the list not having been modified until the callback is handled. This patch implements my suggestion to unlock the mutex before the callback. I also switched to a non-recursive mutex as I don't believe the recursive property is needed. To make the class fully thread safe, I did have to introduce another mutex to protect the callback members. The motivation for this change is llvm#114507. Specifically, Target::SetExecutableModule calls Target::GetOrCreateModule, which potentially performs path remapping, which in turns has a callback to Target::SetExecutableModule.

Create dependent modules in parallel in Target::SetExecutableModule. This change was inspired by llvm#110646 which takes the same approach when attaching. Jason suggested we could use the same approach when you create a target in LLDB. I used Slack for benchmarking, which loads 902 images. ``` Benchmark 1: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 1.225 s ± 0.003 s [User: 3.977 s, System: 1.521 s] Range (min … max): 1.220 s … 1.229 s 10 runs Benchmark 2: ./bin/lldb /Applications/Slack.app/Contents/MacOS/Slack Time (mean ± σ): 3.253 s ± 0.037 s [User: 3.013 s, System: 0.248 s] Range (min … max): 3.211 s … 3.310 s 10 runs ``` We see about a 2x speedup, which matches what Jason saw for the attach scenario. I also ran this under TSan to confirm this doesn't introduce any races or deadlocks.

Release note llvm#110646 and llvm#114507.

Release note #110646 and #114507.

JDevlieghere requested review from DavidSpickett, labath and jasonmolenda November 1, 2024 04:50

llvmbot added the lldb label Nov 1, 2024

Fix formatting

0bba5f7

DmT021 reviewed Nov 1, 2024

View reviewed changes

jasonmolenda approved these changes Nov 1, 2024

View reviewed changes

JDevlieghere mentioned this pull request Nov 1, 2024

[lldb] Improve locking in PathMappingLists (NFC) #114576

Merged

Don't lock for the duration fo GetDependentModules

535c27d

adrian-prantl reviewed Nov 1, 2024

View reviewed changes

Merge branch 'main' into parallel-dependents

37243c1

jasonmolenda approved these changes Nov 1, 2024

View reviewed changes

JDevlieghere merged commit b360dfd into llvm:main Nov 2, 2024
7 checks passed

JDevlieghere deleted the parallel-dependents branch November 2, 2024 16:38

JDevlieghere added a commit that referenced this pull request Nov 3, 2024

Revert "[lldb] Create dependent modules in parallel (#114507)"

4c8cc5a

This reverts commit b360dfd.

JDevlieghere added a commit to swiftlang/llvm-project that referenced this pull request Nov 5, 2024

Merge pull request #9508 from swiftlang/cherrypick-parallel-load

0ffc76a

[lldb] Create dependent modules in parallel (llvm#114507)

PhilippRados pushed a commit to PhilippRados/llvm-project that referenced this pull request Nov 6, 2024

Revert "[lldb] Create dependent modules in parallel (llvm#114507)"

5155d76

This reverts commit b360dfd.

JDevlieghere mentioned this pull request Nov 19, 2024

Add release note for parallel module creation in LLDB #116857

Merged

JDevlieghere added a commit to JDevlieghere/llvm-project that referenced this pull request Nov 19, 2024

Add release note for parallel module creation in LLDB

4dafd39

Release note llvm#110646 and llvm#114507.

JDevlieghere added a commit that referenced this pull request Nov 20, 2024

Add release note for parallel module creation in LLDB (#116857)

4acf935

Release note #110646 and #114507.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[lldb] Create dependent modules in parallel #114507

[lldb] Create dependent modules in parallel #114507

JDevlieghere commented Nov 1, 2024

JDevlieghere commented Nov 1, 2024

llvmbot commented Nov 1, 2024

github-actions bot commented Nov 1, 2024 •

edited

Loading

DmT021 Nov 1, 2024

jasonmolenda Nov 1, 2024 •

edited

Loading

JDevlieghere Nov 1, 2024

DmT021 Nov 2, 2024

JDevlieghere Nov 2, 2024

jasonmolenda left a comment

jasonmolenda commented Nov 1, 2024

JDevlieghere commented Nov 1, 2024

adrian-prantl Nov 1, 2024

JDevlieghere Nov 2, 2024

jasonmolenda left a comment

[lldb] Create dependent modules in parallel #114507

[lldb] Create dependent modules in parallel #114507

Conversation

JDevlieghere commented Nov 1, 2024

JDevlieghere commented Nov 1, 2024

llvmbot commented Nov 1, 2024

github-actions bot commented Nov 1, 2024 • edited Loading

DmT021 Nov 1, 2024

Choose a reason for hiding this comment

jasonmolenda Nov 1, 2024 • edited Loading

Choose a reason for hiding this comment

JDevlieghere Nov 1, 2024

Choose a reason for hiding this comment

DmT021 Nov 2, 2024

Choose a reason for hiding this comment

JDevlieghere Nov 2, 2024

Choose a reason for hiding this comment

jasonmolenda left a comment

Choose a reason for hiding this comment

jasonmolenda commented Nov 1, 2024

JDevlieghere commented Nov 1, 2024

adrian-prantl Nov 1, 2024

Choose a reason for hiding this comment

JDevlieghere Nov 2, 2024

Choose a reason for hiding this comment

jasonmolenda left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 1, 2024 •

edited

Loading

jasonmolenda Nov 1, 2024 •

edited

Loading