Skip to content

Conversation

@Michael137
Copy link
Member

@Michael137 Michael137 commented Jul 15, 2025

LLDB currently attaches AsmLabels to FunctionDecls such that that the IRExecutionUnit can determine which mangled name to call (we can't rely on Clang deriving the correct mangled name to call because the debug-info AST doesn't contain all the info that would be encoded in the DWARF linkage names). However, we don't attach AsmLabels for structors because they have multiple variants and thus it's not clear which mangled name to use. In the RFC on fixing expression evaluation of abi-tagged structors we discussed encoding the structor variant into the AsmLabels. Specifically in this thread we discussed that the contents of the AsmLabel are completely under LLDB's control and we could make use of it to uniquely identify a function by encoding the exact module and DIE that the function is associated with (mangled names need not be enough since two identical mangled symbols may live in different modules). So if we already have a custom AsmLabel format, we can encode the structor variant in a follow-up (the current idea is to append the structor variant as a suffix to our custom AsmLabel when Clang emits the mangled name into the JITted IR). Then we would just have to teach the IRExecutionUnit to pick the correct structor variant DIE during symbol resolution. The draft of this is available here

This patch sets up the infrastructure for the custom AsmLabel format by encoding the module id, DIE id and mangled name in it.

Implementation

The flow is as follows:

  1. Create the label in DWARFASTParserClang. The format is: $__lldb_func:module_id:die_id:mangled_name
  2. When resolving external symbols in IRExecutionUnit, we parse this label and then do a lookup by DIE ID (or mangled name into the module if the encoded DIE is a declaration).

Depends on #151355

@Michael137 Michael137 requested a review from labath July 15, 2025 16:01
@github-actions
Copy link

github-actions bot commented Jul 15, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Comment on lines 270 to 277
// Module UID is only a Darwin concept (?)
// If UUID is not available encode as pointer.
// Maybe add character to signal whether this is a pointer
// or UUID. Or maybe if it's not hex that implies a UUID?
auto module_id = module_sp->GetUUID();
Module * module_ptr = nullptr;
if (!module_id.IsValid())
module_ptr = module_sp.get();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@labath I'm trying to clean up this prototype of your suggestion in my ABI-tagged structors RFC.

This function here is what encodes the label.

Something I wasn't sure about though was how we should encode the Module in the string. Since UIDs need not exist (e.g., on Linux IIUC?), was your original suggestion to simply print out the pointer? If so, how would we defend against the module getting unloaded under our feet? I might've misunderstood your idea though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(e.g., on Linux IIUC?)

It's more common there, but even on darwin you can create binaries without a UUID (ld --no_uuid). And even if there is one, there's no guarantee it's going to be truly unique.

was your original suggestion to simply print out the pointer? If so, how would we defend against the module getting unloaded under our feet? I might've misunderstood your idea though.

Yes, that was the idea, though I can't say I have given much thought about this aspect. Intuitively, I would expect this to be fine. Since the module pointer is only encoded in the AST inside that module, the fact that we got that pointer here means that the module was present at the beginning of the compilation process. The pointer is consumed after clang is done (which happens midway into compilation), and I'd hope that noone can unload a module between those two points.

That said, I admit this is risky, and I'm not pushing for this particular encoding. In particular, it may enable someone to crash lldb by creating a function with a funny name.

There are plenty of other ways to implement this detail though, the most obvious one being assigning a surrogate ID (an increasing integer) to each module upon creation (just like we do with debuggers).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are plenty of other ways to implement this detail though, the most obvious one being assigning a surrogate ID (an increasing integer) to each module upon creation (just like we do with debuggers).

Yea that seems like a good alternative

@Michael137 Michael137 force-pushed the lldb/reliable-function-call-labels branch 8 times, most recently from cfc23ca to dd5782e Compare July 19, 2025 21:55
@Michael137 Michael137 force-pushed the lldb/reliable-function-call-labels branch 6 times, most recently from 4c87014 to fd6b6e8 Compare July 21, 2025 09:01
@Michael137 Michael137 marked this pull request as ready for review July 21, 2025 11:08
@Michael137 Michael137 requested a review from JDevlieghere as a code owner July 21, 2025 11:08
@Michael137 Michael137 changed the title [DRAFT] [lldb][Expression] Encode Module and DIE UIDs into function AsmLabels [lldb][Expression] Encode Module and DIE UIDs into function AsmLabels Jul 21, 2025
@llvmbot llvmbot added the lldb label Jul 21, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 21, 2025

@llvm/pr-subscribers-lldb

Author: Michael Buch (Michael137)

Changes

LLDB currently attaches AsmLabels to FunctionDecls such that that the IRExecutionUnit can determine which mangled name to call (we can't rely on Clang deriving the correct mangled name to call because the debug-info AST doesn't contain all the info that would be encoded in the DWARF linkage names). However, we don't attach AsmLabels for structors because they have multiple variants and thus it's not clear which mangled name to use. In the RFC on fixing expression evaluation of abi-tagged structors we discussed encoding the structor variant into the AsmLabels. Specifically in this thread we discussed that the contents of the AsmLabel are completely under LLDB's control and we could make use of it to uniquely identify a function by encoding the exact module and DIE that the function is associated with (mangled names need not be enough since two identical mangled symbols may live in different modules). So if we already have a custom AsmLabel format, we can encode the structor variant in a follow-up (the current idea is to append the structor variant as a suffix to our custom AsmLabel when Clang emits the mangled name into the JITted IR). Then we would just have to teach the IRExecutionUnit to pick the correct structor variant DIE during symbol resolution.

This patch sets up the infrastructure for the custom AsmLabel format by encoding the module id, DIE id and mangled name in it.

Implementation

The flow is as follows:

  1. Create the label in DWARFASTParserClang. The format is: $__lldb_func:mangled_name:module_id:die_id
  2. When resolving external symbols in IRExecutionUnit, we parse this label and then do a lookup by mangled name into the module. The reason I'm not using the DIE ID here is that for C++ methods, we create the decl using the declaration DW_TAG_subprogram. So we would still need to do some kind of search for the definition.

I made the label creation an API on TypeSystemClang because the whole AsmLabel logic is Clang-specific, and in the future I want to stuff knowledge about constructor/destructor variants into the label, which is even more C++/Clang specific.

But happy to consider other architectures for this.


Patch is 34.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/148877.diff

17 Files Affected:

  • (modified) lldb/include/lldb/Core/Module.h (+3-1)
  • (modified) lldb/include/lldb/Expression/Expression.h (+25)
  • (modified) lldb/include/lldb/Symbol/SymbolFile.h (+14)
  • (modified) lldb/include/lldb/Symbol/TypeSystem.h (+16)
  • (modified) lldb/source/Core/Module.cpp (+16-3)
  • (modified) lldb/source/Expression/Expression.cpp (+16)
  • (modified) lldb/source/Expression/IRExecutionUnit.cpp (+74)
  • (modified) lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionDeclMap.cpp (+1-1)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp (+26-17)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp (+41)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h (+3)
  • (modified) lldb/source/Plugins/SymbolFile/NativePDB/PdbAstBuilder.cpp (+3-3)
  • (modified) lldb/source/Plugins/SymbolFile/NativePDB/UdtRecordCompleter.cpp (+2-3)
  • (modified) lldb/source/Plugins/SymbolFile/PDB/PDBASTParser.cpp (+4-3)
  • (modified) lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp (+76-5)
  • (modified) lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.h (+9-2)
  • (modified) lldb/unittests/Symbol/TestTypeSystemClang.cpp (+6-6)
diff --git a/lldb/include/lldb/Core/Module.h b/lldb/include/lldb/Core/Module.h
index 8bb55c95773bc..3991a12997541 100644
--- a/lldb/include/lldb/Core/Module.h
+++ b/lldb/include/lldb/Core/Module.h
@@ -86,7 +86,8 @@ struct ModuleFunctionSearchOptions {
 ///
 /// The module will parse more detailed information as more queries are made.
 class Module : public std::enable_shared_from_this<Module>,
-               public SymbolContextScope {
+               public SymbolContextScope,
+               public UserID {
 public:
   class LookupInfo;
   // Static functions that can track the lifetime of module objects. This is
@@ -97,6 +98,7 @@ class Module : public std::enable_shared_from_this<Module>,
   // using the "--global" (-g for short).
   static size_t GetNumberAllocatedModules();
 
+  static Module *GetAllocatedModuleWithUID(lldb::user_id_t uid);
   static Module *GetAllocatedModuleAtIndex(size_t idx);
 
   static std::recursive_mutex &GetAllocationModuleCollectionMutex();
diff --git a/lldb/include/lldb/Expression/Expression.h b/lldb/include/lldb/Expression/Expression.h
index 8de9364436ccf..f32878c9bf876 100644
--- a/lldb/include/lldb/Expression/Expression.h
+++ b/lldb/include/lldb/Expression/Expression.h
@@ -96,6 +96,31 @@ class Expression {
                                  ///invalid.
 };
 
+/// Holds parsed information about a function call label that
+/// LLDB attaches as an AsmLabel to function AST nodes it parses
+/// from debug-info.
+///
+/// The format being:
+///
+///   <prefix>:<mangled name>:<module id>:<DIE id>
+///
+/// The label string needs to stay valid for the entire lifetime
+/// of this object.
+struct FunctionCallLabel {
+  llvm::StringRef m_lookup_name;
+  lldb::user_id_t m_module_id;
+
+  /// Mostly for debuggability.
+  lldb::user_id_t m_die_id;
+};
+
+/// LLDB attaches this prefix to mangled names of functions that it get called
+/// from JITted expressions.
+inline constexpr llvm::StringRef FunctionCallLabelPrefix = "$__lldb_func";
+
+bool consumeFunctionCallLabelPrefix(llvm::StringRef &name);
+bool hasFunctionCallLabelPrefix(llvm::StringRef name);
+
 } // namespace lldb_private
 
 #endif // LLDB_EXPRESSION_EXPRESSION_H
diff --git a/lldb/include/lldb/Symbol/SymbolFile.h b/lldb/include/lldb/Symbol/SymbolFile.h
index e95f95553c17c..6aca276fc85b6 100644
--- a/lldb/include/lldb/Symbol/SymbolFile.h
+++ b/lldb/include/lldb/Symbol/SymbolFile.h
@@ -18,6 +18,7 @@
 #include "lldb/Symbol/CompilerType.h"
 #include "lldb/Symbol/Function.h"
 #include "lldb/Symbol/SourceModule.h"
+#include "lldb/Symbol/SymbolContext.h"
 #include "lldb/Symbol/Type.h"
 #include "lldb/Symbol/TypeList.h"
 #include "lldb/Symbol/TypeSystem.h"
@@ -328,6 +329,19 @@ class SymbolFile : public PluginInterface {
   GetMangledNamesForFunction(const std::string &scope_qualified_name,
                              std::vector<ConstString> &mangled_names);
 
+  /// Resolves the function DIE identified by \c lookup_name within
+  /// this SymbolFile.
+  ///
+  /// \param[in,out] sc_list The resolved functions will be appended to this
+  /// list.
+  ///
+  /// \param[in] lookup_name The UID of the function DIE to resolve.
+  ///
+  virtual llvm::Error FindAndResolveFunction(SymbolContextList &sc_list,
+                                             llvm::StringRef lookup_name) {
+    return llvm::createStringError("Not implemented");
+  }
+
   virtual void GetTypes(lldb_private::SymbolContextScope *sc_scope,
                         lldb::TypeClass type_mask,
                         lldb_private::TypeList &type_list) = 0;
diff --git a/lldb/include/lldb/Symbol/TypeSystem.h b/lldb/include/lldb/Symbol/TypeSystem.h
index cb1f0130b548d..742c09251ea2f 100644
--- a/lldb/include/lldb/Symbol/TypeSystem.h
+++ b/lldb/include/lldb/Symbol/TypeSystem.h
@@ -548,6 +548,22 @@ class TypeSystem : public PluginInterface,
   bool GetHasForcefullyCompletedTypes() const {
     return m_has_forcefully_completed_types;
   }
+
+  /// Returns the components of the specified function call label.
+  ///
+  /// The format of \c label is described in \c FunctionCallLabel.
+  /// The label prefix is not one of the components.
+  virtual llvm::Expected<llvm::SmallVector<llvm::StringRef, 3>>
+  splitFunctionCallLabel(llvm::StringRef label) const {
+    return llvm::createStringError("Not implemented.");
+  }
+
+  // Decodes the function label into a \c FunctionCallLabel.
+  virtual llvm::Expected<FunctionCallLabel>
+  makeFunctionCallLabel(llvm::StringRef label) const {
+    return llvm::createStringError("Not implemented.");
+  }
+
 protected:
   SymbolFile *m_sym_file = nullptr;
   /// Used for reporting statistics.
diff --git a/lldb/source/Core/Module.cpp b/lldb/source/Core/Module.cpp
index 90997dada3666..edd79aff5d065 100644
--- a/lldb/source/Core/Module.cpp
+++ b/lldb/source/Core/Module.cpp
@@ -121,6 +121,15 @@ size_t Module::GetNumberAllocatedModules() {
   return GetModuleCollection().size();
 }
 
+Module *Module::GetAllocatedModuleWithUID(lldb::user_id_t uid) {
+  std::lock_guard<std::recursive_mutex> guard(
+      GetAllocationModuleCollectionMutex());
+  for (Module *mod : GetModuleCollection())
+    if (mod->GetID() == uid)
+      return mod;
+  return nullptr;
+}
+
 Module *Module::GetAllocatedModuleAtIndex(size_t idx) {
   std::lock_guard<std::recursive_mutex> guard(
       GetAllocationModuleCollectionMutex());
@@ -130,8 +139,11 @@ Module *Module::GetAllocatedModuleAtIndex(size_t idx) {
   return nullptr;
 }
 
+// TODO: needs a mutex
+static lldb::user_id_t g_unique_id = 1;
+
 Module::Module(const ModuleSpec &module_spec)
-    : m_unwind_table(*this), m_file_has_changed(false),
+    : UserID(g_unique_id++), m_unwind_table(*this), m_file_has_changed(false),
       m_first_file_changed_log(false) {
   // Scope for locker below...
   {
@@ -236,7 +248,8 @@ Module::Module(const ModuleSpec &module_spec)
 Module::Module(const FileSpec &file_spec, const ArchSpec &arch,
                ConstString object_name, lldb::offset_t object_offset,
                const llvm::sys::TimePoint<> &object_mod_time)
-    : m_mod_time(FileSystem::Instance().GetModificationTime(file_spec)),
+    : UserID(g_unique_id++),
+      m_mod_time(FileSystem::Instance().GetModificationTime(file_spec)),
       m_arch(arch), m_file(file_spec), m_object_name(object_name),
       m_object_offset(object_offset), m_object_mod_time(object_mod_time),
       m_unwind_table(*this), m_file_has_changed(false),
@@ -257,7 +270,7 @@ Module::Module(const FileSpec &file_spec, const ArchSpec &arch,
 }
 
 Module::Module()
-    : m_unwind_table(*this), m_file_has_changed(false),
+    : UserID(g_unique_id++), m_unwind_table(*this), m_file_has_changed(false),
       m_first_file_changed_log(false) {
   std::lock_guard<std::recursive_mutex> guard(
       GetAllocationModuleCollectionMutex());
diff --git a/lldb/source/Expression/Expression.cpp b/lldb/source/Expression/Expression.cpp
index 93f585edfce3d..e19c804caa3c6 100644
--- a/lldb/source/Expression/Expression.cpp
+++ b/lldb/source/Expression/Expression.cpp
@@ -10,6 +10,10 @@
 #include "lldb/Target/ExecutionContextScope.h"
 #include "lldb/Target/Target.h"
 
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Error.h"
+
 using namespace lldb_private;
 
 Expression::Expression(Target &target)
@@ -26,3 +30,15 @@ Expression::Expression(ExecutionContextScope &exe_scope)
       m_jit_end_addr(LLDB_INVALID_ADDRESS) {
   assert(m_target_wp.lock());
 }
+
+bool lldb_private::consumeFunctionCallLabelPrefix(llvm::StringRef &name) {
+  // On Darwin mangled names get a '_' prefix.
+  name.consume_front("_");
+  return name.consume_front(FunctionCallLabelPrefix);
+}
+
+bool lldb_private::hasFunctionCallLabelPrefix(llvm::StringRef name) {
+  // On Darwin mangled names get a '_' prefix.
+  name.consume_front("_");
+  return name.starts_with(FunctionCallLabelPrefix);
+}
diff --git a/lldb/source/Expression/IRExecutionUnit.cpp b/lldb/source/Expression/IRExecutionUnit.cpp
index 6f812b91a8b1d..a9ea244889cce 100644
--- a/lldb/source/Expression/IRExecutionUnit.cpp
+++ b/lldb/source/Expression/IRExecutionUnit.cpp
@@ -13,6 +13,7 @@
 #include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/Module.h"
+#include "llvm/Support/Error.h"
 #include "llvm/Support/SourceMgr.h"
 #include "llvm/Support/raw_ostream.h"
 
@@ -20,6 +21,7 @@
 #include "lldb/Core/Disassembler.h"
 #include "lldb/Core/Module.h"
 #include "lldb/Core/Section.h"
+#include "lldb/Expression/Expression.h"
 #include "lldb/Expression/IRExecutionUnit.h"
 #include "lldb/Expression/ObjectFileJIT.h"
 #include "lldb/Host/HostInfo.h"
@@ -36,6 +38,7 @@
 #include "lldb/Utility/LLDBAssert.h"
 #include "lldb/Utility/LLDBLog.h"
 #include "lldb/Utility/Log.h"
+#include "lldb/lldb-defines.h"
 
 #include <optional>
 
@@ -771,6 +774,63 @@ class LoadAddressResolver {
   lldb::addr_t m_best_internal_load_address = LLDB_INVALID_ADDRESS;
 };
 
+/// Returns address of the function referred to by the special function call
+/// label \c label.
+///
+/// \param[in] label Function call label encoding the unique location of the
+/// function to look up.
+///                  Assumes that the \c FunctionCallLabelPrefix has been
+///                  stripped from the front of the label.
+static llvm::Expected<lldb::addr_t>
+ResolveFunctionCallLabel(llvm::StringRef name,
+                         const lldb_private::SymbolContext &sc,
+                         bool &symbol_was_missing_weak) {
+  symbol_was_missing_weak = false;
+
+  if (!sc.target_sp)
+    return llvm::createStringError("target not available.");
+
+  auto ts_or_err = sc.target_sp->GetScratchTypeSystemForLanguage(
+      lldb::eLanguageTypeC_plus_plus);
+  if (!ts_or_err || !*ts_or_err)
+    return llvm::joinErrors(
+        llvm::createStringError(
+            "failed to find scratch C++ TypeSystem for current target."),
+        ts_or_err.takeError());
+
+  auto ts_sp = *ts_or_err;
+
+  auto label_or_err = ts_sp->makeFunctionCallLabel(name);
+  if (!label_or_err)
+    return llvm::joinErrors(
+        llvm::createStringError("failed to create FunctionCallLabel from: %s",
+                                name.data()),
+        label_or_err.takeError());
+
+  const auto &label = *label_or_err;
+
+  Module *module = Module::GetAllocatedModuleWithUID(label.m_module_id);
+
+  if (!module)
+    return llvm::createStringError(
+        llvm::formatv("failed to find module by UID {0}", label.m_module_id));
+
+  auto *symbol_file = module->GetSymbolFile();
+  if (!symbol_file)
+    return llvm::createStringError(
+        llvm::formatv("no SymbolFile found on module {0:x}.", module));
+
+  SymbolContextList sc_list;
+  if (auto err =
+          symbol_file->FindAndResolveFunction(sc_list, label.m_lookup_name))
+    return llvm::joinErrors(
+        llvm::createStringError("failed to resolve function by UID"),
+        std::move(err));
+
+  LoadAddressResolver resolver(*sc.target_sp, symbol_was_missing_weak);
+  return resolver.Resolve(sc_list).value_or(LLDB_INVALID_ADDRESS);
+}
+
 lldb::addr_t
 IRExecutionUnit::FindInSymbols(const std::vector<ConstString> &names,
                                const lldb_private::SymbolContext &sc,
@@ -906,6 +966,20 @@ lldb::addr_t IRExecutionUnit::FindInUserDefinedSymbols(
 
 lldb::addr_t IRExecutionUnit::FindSymbol(lldb_private::ConstString name,
                                          bool &missing_weak) {
+  if (hasFunctionCallLabelPrefix(name.GetStringRef())) {
+    if (auto addr_or_err = ResolveFunctionCallLabel(name.GetStringRef(),
+                                                    m_sym_ctx, missing_weak)) {
+      return *addr_or_err;
+    } else {
+      LLDB_LOG_ERROR(GetLog(LLDBLog::Expressions), addr_or_err.takeError(),
+                     "Failed to resolve function call label {1}: {0}",
+                     name.GetStringRef());
+      return LLDB_INVALID_ADDRESS;
+    }
+  }
+
+  // TODO: do we still need the following lookups?
+
   std::vector<ConstString> candidate_C_names;
   std::vector<ConstString> candidate_CPlusPlus_names;
 
diff --git a/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionDeclMap.cpp b/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionDeclMap.cpp
index 9f77fbc1d2434..a6c4334bf2e59 100644
--- a/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionDeclMap.cpp
+++ b/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionDeclMap.cpp
@@ -1991,7 +1991,7 @@ void ClangExpressionDeclMap::AddContextClassType(NameSearchContext &context,
     const bool is_artificial = false;
 
     CXXMethodDecl *method_decl = m_clang_ast_context->AddMethodToCXXRecordType(
-        copied_clang_type.GetOpaqueQualType(), "$__lldb_expr", nullptr,
+        copied_clang_type.GetOpaqueQualType(), "$__lldb_expr", std::nullopt,
         method_type, lldb::eAccessPublic, is_virtual, is_static, is_inline,
         is_explicit, is_attr_used, is_artificial);
 
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
index c76d67b47b336..6e9598c13ed5c 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
@@ -24,6 +24,7 @@
 #include "Plugins/Language/ObjC/ObjCLanguage.h"
 #include "lldb/Core/Module.h"
 #include "lldb/Core/Value.h"
+#include "lldb/Expression/Expression.h"
 #include "lldb/Host/Host.h"
 #include "lldb/Symbol/CompileUnit.h"
 #include "lldb/Symbol/Function.h"
@@ -249,6 +250,28 @@ static unsigned GetCXXMethodCVQuals(const DWARFDIE &subprogram,
   return cv_quals;
 }
 
+static std::optional<std::string> MakeLLDBFuncAsmLabel(const DWARFDIE &die) {
+  char const *name = die.GetMangledName(/*substitute_name_allowed*/ false);
+  if (!name)
+    return std::nullopt;
+
+  auto module_sp = die.GetModule();
+  if (!module_sp)
+    return std::nullopt;
+
+  lldb::user_id_t module_id = module_sp->GetID();
+  if (module_id == LLDB_INVALID_UID)
+    return std::nullopt;
+
+  const auto die_id = die.GetID();
+  if (die_id == LLDB_INVALID_UID)
+    return std::nullopt;
+
+  return llvm::formatv("{0}:{1}:{2:x}:{3:x}", FunctionCallLabelPrefix, name,
+                       module_id, die_id)
+      .str();
+}
+
 TypeSP DWARFASTParserClang::ParseTypeFromClangModule(const SymbolContext &sc,
                                                      const DWARFDIE &die,
                                                      Log *log) {
@@ -1231,7 +1254,7 @@ std::pair<bool, TypeSP> DWARFASTParserClang::ParseCXXMethod(
 
   clang::CXXMethodDecl *cxx_method_decl = m_ast.AddMethodToCXXRecordType(
       class_opaque_type.GetOpaqueQualType(), attrs.name.GetCString(),
-      attrs.mangled_name, clang_type, accessibility, attrs.is_virtual,
+      MakeLLDBFuncAsmLabel(die), clang_type, accessibility, attrs.is_virtual,
       is_static, attrs.is_inline, attrs.is_explicit, is_attr_used,
       attrs.is_artificial);
 
@@ -1384,7 +1407,7 @@ DWARFASTParserClang::ParseSubroutine(const DWARFDIE &die,
             ignore_containing_context ? m_ast.GetTranslationUnitDecl()
                                       : containing_decl_ctx,
             GetOwningClangModule(die), name, clang_type, attrs.storage,
-            attrs.is_inline);
+            attrs.is_inline, MakeLLDBFuncAsmLabel(die));
         std::free(name_buf);
 
         if (has_template_params) {
@@ -1394,7 +1417,7 @@ DWARFASTParserClang::ParseSubroutine(const DWARFDIE &die,
               ignore_containing_context ? m_ast.GetTranslationUnitDecl()
                                         : containing_decl_ctx,
               GetOwningClangModule(die), attrs.name.GetStringRef(), clang_type,
-              attrs.storage, attrs.is_inline);
+              attrs.storage, attrs.is_inline, /*asm_label=*/std::nullopt);
           clang::FunctionTemplateDecl *func_template_decl =
               m_ast.CreateFunctionTemplateDecl(
                   containing_decl_ctx, GetOwningClangModule(die),
@@ -1406,20 +1429,6 @@ DWARFASTParserClang::ParseSubroutine(const DWARFDIE &die,
         lldbassert(function_decl);
 
         if (function_decl) {
-          // Attach an asm(<mangled_name>) label to the FunctionDecl.
-          // This ensures that clang::CodeGen emits function calls
-          // using symbols that are mangled according to the DW_AT_linkage_name.
-          // If we didn't do this, the external symbols wouldn't exactly
-          // match the mangled name LLDB knows about and the IRExecutionUnit
-          // would have to fall back to searching object files for
-          // approximately matching function names. The motivating
-          // example is generating calls to ABI-tagged template functions.
-          // This is done separately for member functions in
-          // AddMethodToCXXRecordType.
-          if (attrs.mangled_name)
-            function_decl->addAttr(clang::AsmLabelAttr::CreateImplicit(
-                m_ast.getASTContext(), attrs.mangled_name, /*literal=*/false));
-
           LinkDeclContextToDIE(function_decl, die);
 
           const clang::FunctionProtoType *function_prototype(
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
index 5b16ce5f75138..b6960bfe38fde 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
@@ -2471,6 +2471,47 @@ bool SymbolFileDWARF::ResolveFunction(const DWARFDIE &orig_die,
   return false;
 }
 
+llvm::Error
+SymbolFileDWARF::FindAndResolveFunction(SymbolContextList &sc_list,
+                                        llvm::StringRef lookup_name) {
+  std::lock_guard<std::recursive_mutex> guard(GetModuleMutex());
+
+  DWARFDIE die;
+  Module::LookupInfo info(ConstString(lookup_name), lldb::eFunctionNameTypeFull,
+                          lldb::eLanguageTypeUnknown);
+
+  m_index->GetFunctions(info, *this, {}, [&](DWARFDIE entry) {
+    if (entry.GetAttributeValueAsUnsigned(llvm::dwarf::DW_AT_declaration, 0))
+      return true;
+
+    // We don't check whether the specification DIE for this function
+    // corresponds to the declaration DIE because the declaration might be in
+    // a type-unit but the definition in the compile-unit (and it's
+    // specifcation would point to the declaration in the compile-unit). We
+    // rely on the mangled name within the module to be enough to find us the
+    // unique definition.
+    die = entry;
+    return false;
+  });
+
+  if (!die.IsValid())
+    return llvm::createStringError(
+        llvm::formatv("failed to find definition DIE for '{0}'", lookup_name));
+
+  if (!ResolveFunction(die, false, sc_list))
+    return llvm::createStringError("failed to resolve function DIE");
+
+  if (sc_list.IsEmpty())
+    return llvm::createStringError("no definition DIE found");
+
+  if (sc_list.GetSize() > 1)
+    return llvm::createStringError(
+        "found %d functions for %s but expected only 1", sc_list.GetSize(),
+        lookup_name.data());
+
+  return llvm::Error::success();
+}
+
 bool SymbolFileDWARF::DIEInDeclContext(const CompilerDeclContext &decl_ctx,
                                        const DWARFDIE &die,
                                        bool only_root_namespaces) {
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
index 2dc862cccca14..ee966366d3f2b 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
@@ -434,6 +434,9 @@ class SymbolFileDWARF : public SymbolFileCommon {
   DIEArray MergeBlockAbstractParameters(const DWARFDIE &block_die,
                                         DIEArray &&variable_dies);
 
+  llvm::Error FindAndResolveFunction(SymbolContextList &sc_list,
+                                     llvm::StringRef lookup_name) override;
+
   // Given a die_offset, figure out the symbol context representing that die.
   bool ResolveFunction(const DWARFDIE &die, bool include_inlines,
                        SymbolContextList &sc_list);
diff --git a/lldb/source/Plugins/SymbolFile/NativePDB/PdbAstBuilder.cpp b/lldb/source/Plugins/SymbolFile/NativePDB/PdbAstBuilder.cpp
index 702ec5e5c9ea9..bce721c149fee 100644
--- a/lldb/source/Plugins/SymbolFile/NativePDB/PdbAstBuilder.cpp
+++ b/lldb/source/Plugins/SymbolFile/Nativ...
[truncated]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't use the work DIE here in the generic code (PDB and all). It's just an function identifier that the symbol file knows how to interpret.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about symbol_id or function_id?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment to say this accounts for the target-specific mangling prefix. Otherwise, it's not clear why one needs this function.
That said, we might be able to use the llvm \01 mangling escape prefix to avoid this difference. Clang seems to do that automatically:

$ clang -target arm64-apple-macosx -x c -o - -S - <<<'void f() asm("FFF"); void f(){}' | grep FFF
	.globl	FFF                             ; -- Begin function FFF
FFF:                                    ; @"\01FFF"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, we might be able to use the llvm \01 mangling escape prefix to avoid this difference. Clang seems to do that automatically:

Good point. LLDB disabled the mangling prefix in:

commit f6bc251274f39dee5d09f668a56922c88bd027d8
Author: Vedant Kumar <vsk@apple.com>
Date:   Wed Sep 25 18:00:31 2019 +0000

    [Mangle] Add flag to asm labels to disable '\01' prefixing
    
    LLDB synthesizes decls using asm labels. These decls cannot have a mangle
    different than the one specified in the label name. I.e., the '\01' prefix
    should not be added.
    
    Fixes an expression evaluation failure in lldb's TestVirtual.py on iOS.

But that wouldn't apply with this patch anymore since we're doing the handling of the AsmLabel ourselves.

Didn't realize that the \01 prevents the global mangling prefix to get added into the IR names.

I'll go with this approach. Then we don't need these APIs either

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe that means we can also get rid of this functionality in Clang (since it was just added for LLDB).

Comment on lines 124 to 132
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably put this function into the Target class. It will not need to iterate through that many modules and it avoids accidentally pulling in an unrelated module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah now I remember why i did it this way initially. In some of the tests, the module that we get from die.GetModule() is for an object file (e.g., main.o) but the target image list doesn't contain that module. So I iterated over all the allocated modules. There's probably a better way to get from DIE module to module in the image list. Though I haven't found a good way yet

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that's tricky. We have something sort of similar in (e.g.) SymbolFileDWARF::GetDIEToType() which returns the DWARFDebugMap die-to-type map when dealing with a MachO .o files. It's not completely ideal, but I think it wouldn't be too bad if we had a SymbolFileDWARF function to return the ID of the "main" module.

Alternatively, since main symbol file which creates the type system (and the type system creates the DWARFASTParser), we may be able to plumb the module ID to the DWARFASTParser constructor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. or you can make it an atomic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't use the m_ prefix for plain structs.

Comment on lines 793 to 794
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it makes sense to involve the type system here. At this point, do you even know if you're running a c++ expression?

Is there any harm in making this mangling scheme global? We can always change it or even add some language identifier if needed.

Alternatively, we could make each module responsible for its own mangling scheme. We could just parse out the module ID here and then pass the rest of the string verbatim.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I made it TypeSystemClang specific is that in the follow-up patch to support constructor/destructor variants (#149827) I plan to put a Clang type into the FunctionCallLabel structure. Of course I could avoid pulling Clang in by using plain integers instead of (clang::CXXCtorType/clang::CXXDtorType). Not sure if we get here outside of expression evaluation, but happy to hoist this to somewhere more global.

Do you prefer me move this out of TypeSystem and into Expression (or something similar). And we can refactor it if needed for the constructor/destructor patch?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to put clang-specific types into the generic FunctionCallLabel structure. If you want that, then the struct itself should by plugin-specific. I think that'd would be doable if you make the generic code treat the label.. generically (as a string I guess). The parsing would then happen inside the SymbolFile, which can know the encoding scheme used, the language and what not. In this world, it may make sense for this code to live in the type system, but then I'd make sure it's called from the SymbolFile class, and not directly from generic code.

However, I think that keeping the struct generic is also a viable option. We just need to avoid language and plugin-specific terms. For the structor type field, we could just say we have an additional "discriminator" field which the plugin can use to discriminate different "variants" of the given function. In this world, the label parsing should not be in a plugin. It could be just a method defined in Expression.h, next to the struct definition.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea happy to do that instead. That was what my first iteration of the patch did. I'll add a "uint64_t discriminator" field here instead. And pass the FunctionCallLabel structure into the SymbolFile instead of the individual components. Makes things much simpler

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should at least be in the same plugin (ideally, in the same file) as the code that parses it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised to see a name lookup here. Is the idea to change that in a follow-up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I wanted to just use the DIE ID that we encoded. We can do that for regular functions, because the DWARFASTParserClang encodes the definition DIE ID. But for methods (and especially constructors/destructors) the DWARFASTParserClang just looks at the declaration DIE. So I do the lookup here.

We could make sure to encode the definition DIE into the label and move this logic into the DWARFASTParserClang, but unless I'm missing something we need the lookup to fetch the definitions (for methods at least)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So, we definitely don't want to look for the definition DIE just so we can encode it into the label. What we could do is encode the declaration DIE, and then when (and if -- I expect that the majority of functions will never be called) we need to resolve the call, we look up the matching definition DIE. That would be my ideal implementation as that's the only way to avoid ambiguities arising from multiple functions having identical linkage names (these occur in anonymous namespace. the ambiguities are solved because there's no need to look up definition DIEs, as these DIEs will be CU-local and always be definitions). It would also let us debug code which doesn't include the linkage names in the definition DIEs (which is what clang does with -gsce).

Another reason for dropping the linkage names might be to reduce the memory footprint of long asm labels, but if this isn't big, we may want to keep them anyway to help with debugging.

That said, I can imagine postponing that to a later time. However, I'd suggest designing this API such that it is possible to implement that in the future, such as by passing the entire FunctionCallLabel object so that the module can choose what it uses to perform the lookup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encode the declaration DIE, and then when (and if -- I expect that the majority of functions will never be called) we need to resolve the call, we look up the matching definition DIE

Hmm how do we do this without a name lookup? SymbolFileDWARF::FindDefinitionDIE?

Another reason for dropping the linkage names might be to reduce the memory footprint of long asm labels, but if this isn't big, we may want to keep them anyway to help with debugging.

Yea had the same thought. For debugging it has been useful actually. When something goes wrong and you get a couldn't resolve symbol $__lldb_func:_Z3foov:0x123:0xf00d having the mangled name there is pretty nice.

Copy link
Collaborator

@labath labath Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm how do we do this without a name lookup? SymbolFileDWARF::FindDefinitionDIE?

Maybe, although I suspect that function only works on types/classes right now. Ultimately, we are going to need some kind of a name lookup to go from the declaration to the definition DIE, and I think that using the mangled name (when it exists) for that is the fastest and simplest way to achieve that. However, I think we should do that only if its needed. The most unique identifier is the DIE ID, so I'd start by looking that up. After that, there are three cases:

  1. This is a definition DIE. We don't need to do anything. The DIE doesn't even have to contain a mangled name. We just take the DW_AT_low/entry_pc, and we're done.
  2. This is a declaration of a CU-local entity (anonymous namespace). Here, we need to do a lookup, but we know the result must be in the current CU. The mangled name is the simplest way to do that. We could try to do structural matching (like we do for classes with simplified template names) if DW_AT_linkage_name is not present, but that's a job for another patch.
  3. This is a declaration of a global entity. We also need to do a lookup, but the result could be anywhere(*) -- though we still may want to prefer the definition in the current CU, if it exists.

(*) I just realized that "anywhere" also means "in a different module", which means that there should be some sort of a fallback to searching in other modules. Encoding the mangled name into the asm label might make that easier, as a retry could happen at a different level -- though it would still be nice if the "owning module" would signal that this kind of search is necessary (instead of blindly doing the full search whenever the first module doesn't produce a match).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying! Yes, that's about what I imagined it would look like

I just realized that "anywhere" also means "in a different module", which means that there should be some sort of a fallback to searching in other modules

Great point. I think we can re-use the existing IRExecutionUnit to do a fallback search through the other modules.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put mangled name last? That way you don't have to worry about it containing :s...

@Michael137 Michael137 force-pushed the lldb/reliable-function-call-labels branch from e64ce7d to 9aec4ec Compare July 28, 2025 14:45
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to drop the \01 mangling prefix here since we're not creating literal AsmLabels anymore. Fixed a couple of test failures where we ran expressions like expression func (to print the function pointer). We would try to find the symbol here (including the mangling prefix) and fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do this in a separate PR technically

Michael137 added a commit that referenced this pull request Aug 4, 2025
This was added purely for the needs of LLDB. However, starting with
#151355 and
#148877 we no longer create
literal AsmLabels in LLDB either. So this is unused and we can get rid
of it.

In the near future LLDB will want to add special support for mangling
`AsmLabel`s (specifically
#149827).
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 4, 2025
…(#151858)

This was added purely for the needs of LLDB. However, starting with
llvm/llvm-project#151355 and
llvm/llvm-project#148877 we no longer create
literal AsmLabels in LLDB either. So this is unused and we can get rid
of it.

In the near future LLDB will want to add special support for mangling
`AsmLabel`s (specifically
llvm/llvm-project#149827).
Michael137 added a commit to swiftlang/llvm-project that referenced this pull request Sep 2, 2025
…vm#151355)

Split out from llvm#148877

This patch prepares `TypeSystemClang` APIs to take `AsmLabel`s which
concatenated strings (hence `std::string`) instead of a plain `const
char*`.

(cherry picked from commit 7410f6d)
Michael137 added a commit to swiftlang/llvm-project that referenced this pull request Sep 2, 2025
…llvm#148877)

LLDB currently attaches `AsmLabel`s to `FunctionDecl`s such that that
the `IRExecutionUnit` can determine which mangled name to call (we can't
rely on Clang deriving the correct mangled name to call because the
debug-info AST doesn't contain all the info that would be encoded in the
DWARF linkage names). However, we don't attach `AsmLabel`s for structors
because they have multiple variants and thus it's not clear which
mangled name to use. In the [RFC on fixing expression evaluation of
abi-tagged
structors](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816)
we discussed encoding the structor variant into the `AsmLabel`s.
Specifically in [this
thread](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7)
we discussed that the contents of the `AsmLabel` are completely under
LLDB's control and we could make use of it to uniquely identify a
function by encoding the exact module and DIE that the function is
associated with (mangled names need not be enough since two identical
mangled symbols may live in different modules). So if we already have a
custom `AsmLabel` format, we can encode the structor variant in a
follow-up (the current idea is to append the structor variant as a
suffix to our custom `AsmLabel` when Clang emits the mangled name into
the JITted IR). Then we would just have to teach the `IRExecutionUnit`
to pick the correct structor variant DIE during symbol resolution. The
draft of this is available
[here](llvm#149827)

This patch sets up the infrastructure for the custom `AsmLabel` format
by encoding the module id, DIE id and mangled name in it.

**Implementation**

The flow is as follows:
1. Create the label in `DWARFASTParserClang`. The format is:
`$__lldb_func:module_id:die_id:mangled_name`
2. When resolving external symbols in `IRExecutionUnit`, we parse this
label and then do a lookup by DIE ID (or mangled name into the module if
the encoded DIE is a declaration).

Depends on llvm#151355

(cherry picked from commit f890591)
Michael137 added a commit to swiftlang/llvm-project that referenced this pull request Sep 2, 2025
This was added purely for the needs of LLDB. However, starting with
llvm#151355 and
llvm#148877 we no longer create
literal AsmLabels in LLDB either. So this is unused and we can get rid
of it.

In the near future LLDB will want to add special support for mangling
`AsmLabel`s (specifically
llvm#149827).

(cherry picked from commit cd8f348)
Michael137 added a commit that referenced this pull request Sep 9, 2025
#149827)

Depends on
* #148877
* #155483
* #155485
* #154137
* #154142

This patch is an implementation of [this
discussion](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7)
about handling ABI-tagged structors during expression evaluation.

**Motivation**

LLDB encodes the mangled name of a `DW_TAG_subprogram` into `AsmLabel`s
on function and method Clang AST nodes. This means that when calls to
these functions get lowered into IR (when running JITted expressions),
the address resolver can locate the appropriate symbol by mangled name
(and it is guaranteed to find the symbol because we got the mangled name
from debug-info, instead of letting Clang mangle it based on AST
structure). However, we don't do this for
`CXXConstructorDecl`s/`CXXDestructorDecl`s because these structor
declarations in DWARF don't have a linkage name. This is because there
can be multiple variants of a structor, each with a distinct mangling in
the Itanium ABI. Each structor variant has its own definition
`DW_TAG_subprogram`. So LLDB doesn't know which mangled name to put into
the `AsmLabel`.

Currently this means using ABI-tagged structors in LLDB expressions
won't work (see [this
RFC](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816)
for concrete examples).

**Proposed Solution**

The `FunctionCallLabel` encoding that we put into `AsmLabel`s already
supports stuffing more info about a DIE into it. So this patch extends
the `FunctionCallLabel` to contain an optional discriminator (a sequence
of bytes) which the `SymbolFileDWARF` plugin interprets as the
constructor/destructor variant of that DIE. So when searching for the
definition DIE, LLDB will include the structor variant in its heuristic
for determining a match.

There's a few subtleties here:
1. At the point at which LLDB first constructs the label, it has no way
of knowing (just by looking at the debug-info declaration), which
structor variant the expression evaluator is supposed to call. That's
something that gets decided when compiling the expression. So we let the
Clang mangler inject the correct structor variant into the `AsmLabel`
during JITing. I adjusted the `AsmLabelAttr` mangling for this in
#155485. An option would've
been to create a new Clang attribute which behaved like an `AsmLabel`
but with these special semantics for LLDB. My main concern there is that
we'd have to adjust all the `AsmLabelAttr` checks around Clang to also
now account for this new attribute.
2. The compiler is free to omit the `C1` variant of a constructor if the
`C2` variant is sufficient. In that case it may alias `C1` to `C2`,
leaving us with only the `C2` `DW_TAG_subprogram` in the object file.
Linux is one of the platforms where this occurs. For those cases I added
a heuristic in `SymbolFileDWARF` where we pick `C2` if we asked for `C1`
but it doesn't exist. This may not always be correct (e.g., if the
compiler decided to drop `C1` for other reasons).
3. In #154142 Clang will emit
`C4`/`D4` variants of ctors/dtors on declarations. When resolving the
`FunctionCallLabel` we will now substitute the actual variant that Clang
told us we need to call into the mangled name. We do this using LLDB's
`ManglingSubstitutor`. That way we find the definition DIE exactly the
same way we do for regular function calls.
4. In cases where declarations and definitions live in separate modules,
the DIE ID encoded in the function call label may not be enough to find
the definition DIE in the encoded module ID. For those cases we fall
back to how LLDB used to work: look up in all images of the target. To
make sure we don't use the unified mangled name for the fallback lookup,
we change the lookup name to whatever mangled name the FunctionCallLabel
resolved to.

rdar://104968288
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 9, 2025
… call labels (#149827)

Depends on
* llvm/llvm-project#148877
* llvm/llvm-project#155483
* llvm/llvm-project#155485
* llvm/llvm-project#154137
* llvm/llvm-project#154142

This patch is an implementation of [this
discussion](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7)
about handling ABI-tagged structors during expression evaluation.

**Motivation**

LLDB encodes the mangled name of a `DW_TAG_subprogram` into `AsmLabel`s
on function and method Clang AST nodes. This means that when calls to
these functions get lowered into IR (when running JITted expressions),
the address resolver can locate the appropriate symbol by mangled name
(and it is guaranteed to find the symbol because we got the mangled name
from debug-info, instead of letting Clang mangle it based on AST
structure). However, we don't do this for
`CXXConstructorDecl`s/`CXXDestructorDecl`s because these structor
declarations in DWARF don't have a linkage name. This is because there
can be multiple variants of a structor, each with a distinct mangling in
the Itanium ABI. Each structor variant has its own definition
`DW_TAG_subprogram`. So LLDB doesn't know which mangled name to put into
the `AsmLabel`.

Currently this means using ABI-tagged structors in LLDB expressions
won't work (see [this
RFC](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816)
for concrete examples).

**Proposed Solution**

The `FunctionCallLabel` encoding that we put into `AsmLabel`s already
supports stuffing more info about a DIE into it. So this patch extends
the `FunctionCallLabel` to contain an optional discriminator (a sequence
of bytes) which the `SymbolFileDWARF` plugin interprets as the
constructor/destructor variant of that DIE. So when searching for the
definition DIE, LLDB will include the structor variant in its heuristic
for determining a match.

There's a few subtleties here:
1. At the point at which LLDB first constructs the label, it has no way
of knowing (just by looking at the debug-info declaration), which
structor variant the expression evaluator is supposed to call. That's
something that gets decided when compiling the expression. So we let the
Clang mangler inject the correct structor variant into the `AsmLabel`
during JITing. I adjusted the `AsmLabelAttr` mangling for this in
llvm/llvm-project#155485. An option would've
been to create a new Clang attribute which behaved like an `AsmLabel`
but with these special semantics for LLDB. My main concern there is that
we'd have to adjust all the `AsmLabelAttr` checks around Clang to also
now account for this new attribute.
2. The compiler is free to omit the `C1` variant of a constructor if the
`C2` variant is sufficient. In that case it may alias `C1` to `C2`,
leaving us with only the `C2` `DW_TAG_subprogram` in the object file.
Linux is one of the platforms where this occurs. For those cases I added
a heuristic in `SymbolFileDWARF` where we pick `C2` if we asked for `C1`
but it doesn't exist. This may not always be correct (e.g., if the
compiler decided to drop `C1` for other reasons).
3. In llvm/llvm-project#154142 Clang will emit
`C4`/`D4` variants of ctors/dtors on declarations. When resolving the
`FunctionCallLabel` we will now substitute the actual variant that Clang
told us we need to call into the mangled name. We do this using LLDB's
`ManglingSubstitutor`. That way we find the definition DIE exactly the
same way we do for regular function calls.
4. In cases where declarations and definitions live in separate modules,
the DIE ID encoded in the function call label may not be enough to find
the definition DIE in the encoded module ID. For those cases we fall
back to how LLDB used to work: look up in all images of the target. To
make sure we don't use the unified mangled name for the fallback lookup,
we change the lookup name to whatever mangled name the FunctionCallLabel
resolved to.

rdar://104968288
Michael137 added a commit to swiftlang/llvm-project that referenced this pull request Sep 9, 2025
llvm#149827)

Depends on
* llvm#148877
* llvm#155483
* llvm#155485
* llvm#154137
* llvm#154142

This patch is an implementation of [this
discussion](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7)
about handling ABI-tagged structors during expression evaluation.

**Motivation**

LLDB encodes the mangled name of a `DW_TAG_subprogram` into `AsmLabel`s
on function and method Clang AST nodes. This means that when calls to
these functions get lowered into IR (when running JITted expressions),
the address resolver can locate the appropriate symbol by mangled name
(and it is guaranteed to find the symbol because we got the mangled name
from debug-info, instead of letting Clang mangle it based on AST
structure). However, we don't do this for
`CXXConstructorDecl`s/`CXXDestructorDecl`s because these structor
declarations in DWARF don't have a linkage name. This is because there
can be multiple variants of a structor, each with a distinct mangling in
the Itanium ABI. Each structor variant has its own definition
`DW_TAG_subprogram`. So LLDB doesn't know which mangled name to put into
the `AsmLabel`.

Currently this means using ABI-tagged structors in LLDB expressions
won't work (see [this
RFC](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816)
for concrete examples).

**Proposed Solution**

The `FunctionCallLabel` encoding that we put into `AsmLabel`s already
supports stuffing more info about a DIE into it. So this patch extends
the `FunctionCallLabel` to contain an optional discriminator (a sequence
of bytes) which the `SymbolFileDWARF` plugin interprets as the
constructor/destructor variant of that DIE. So when searching for the
definition DIE, LLDB will include the structor variant in its heuristic
for determining a match.

There's a few subtleties here:
1. At the point at which LLDB first constructs the label, it has no way
of knowing (just by looking at the debug-info declaration), which
structor variant the expression evaluator is supposed to call. That's
something that gets decided when compiling the expression. So we let the
Clang mangler inject the correct structor variant into the `AsmLabel`
during JITing. I adjusted the `AsmLabelAttr` mangling for this in
llvm#155485. An option would've
been to create a new Clang attribute which behaved like an `AsmLabel`
but with these special semantics for LLDB. My main concern there is that
we'd have to adjust all the `AsmLabelAttr` checks around Clang to also
now account for this new attribute.
2. The compiler is free to omit the `C1` variant of a constructor if the
`C2` variant is sufficient. In that case it may alias `C1` to `C2`,
leaving us with only the `C2` `DW_TAG_subprogram` in the object file.
Linux is one of the platforms where this occurs. For those cases I added
a heuristic in `SymbolFileDWARF` where we pick `C2` if we asked for `C1`
but it doesn't exist. This may not always be correct (e.g., if the
compiler decided to drop `C1` for other reasons).
3. In llvm#154142 Clang will emit
`C4`/`D4` variants of ctors/dtors on declarations. When resolving the
`FunctionCallLabel` we will now substitute the actual variant that Clang
told us we need to call into the mangled name. We do this using LLDB's
`ManglingSubstitutor`. That way we find the definition DIE exactly the
same way we do for regular function calls.
4. In cases where declarations and definitions live in separate modules,
the DIE ID encoded in the function call label may not be enough to find
the definition DIE in the encoded module ID. For those cases we fall
back to how LLDB used to work: look up in all images of the target. To
make sure we don't use the unified mangled name for the fallback lookup,
we change the lookup name to whatever mangled name the FunctionCallLabel
resolved to.

rdar://104968288
(cherry picked from commit 57a7907)
Michael137 added a commit to Michael137/llvm-project that referenced this pull request Sep 30, 2025
…ddress

Starting with llvm#148877 we
started encoding the module ID of the function DIE we are currently
parsing into its `AsmLabel` in the AST. When the JIT asks LLDB to
resolve our special mangled name, we would locate the module and resolve
the function/symbol we found in it.

If we are debugging with a `SymbolFileDWARFDebugMap`, the module ID we
encode is that of the `.o` file that is tracked by the debug-map. To
resolve the address of the DIE in that `.o` file, we have to ask
`SymbolFileDWARFDebugMap::LinkOSOAddress` to turn the address of the
`.o` DIE into a real address in the linked executable. This will only
work if the `.o` address was actually tracked by the debug-map. However,
if the function definition appears in multiple `.o` files (which is the
case for functions defined in headers), the linker will most likely
de-deuplicate that definition. So most `.o`'s definition DIEs for that
function won't have a contribution in the debug-map, and thus we fail to
resolve the address.

When debugging Clang on Darwin, e.g., you'd see:
```
(lldb) expr CXXDecl->getName()

error: Couldn't look up symbols:
  $__lldb_func::0x1:0x4000d000002359da:_ZNK5clang9NamedDecl7getNameEv
Hint: The expression tried to call a function that is not present in the target, perhaps because it was optimized out by the compiler.
```
unless you were stopped in the `.o` file whose definition of `getName`
made it into the final executable.

The fix here is to error out if we fail to resolve the address, causing
us to fall back on the old flow which did a lookup by mangled name,
which the `SymbolFileDWARFDebugMap` will handle correctly.

An alternative fix to this would be to encode the
`SymbolFileDWARFDebugMap`'s module-id. And implement
`SymbolFileDWARFDebugMap::ResolveFunctionCallLabel` by doing a mangled
name lookup. The proposed approach doesn't stop us from implementing
that, so we could choose to do it in a follow-up.

rdar://161393045
Michael137 added a commit that referenced this pull request Oct 1, 2025
…ddress (#161363)

Starting with #148877 we
started encoding the module ID of the function DIE we are currently
parsing into its `AsmLabel` in the AST. When the JIT asks LLDB to
resolve our special mangled name, we would locate the module and resolve
the function/symbol we found in it.

If we are debugging with a `SymbolFileDWARFDebugMap`, the module ID we
encode is that of the `.o` file that is tracked by the debug-map. To
resolve the address of the DIE in that `.o` file, we have to ask
`SymbolFileDWARFDebugMap::LinkOSOAddress` to turn the address of the
`.o` DIE into a real address in the linked executable. This will only
work if the `.o` address was actually tracked by the debug-map. However,
if the function definition appears in multiple `.o` files (which is the
case for functions defined in headers), the linker will most likely
de-deuplicate that definition. So most `.o`'s definition DIEs for that
function won't have a contribution in the debug-map, and thus we fail to
resolve the address.

When debugging Clang on Darwin, e.g., you'd see:
```
(lldb) expr CXXDecl->getName()

error: Couldn't look up symbols:
  $__lldb_func::0x1:0x4000d000002359da:_ZNK5clang9NamedDecl7getNameEv
Hint: The expression tried to call a function that is not present in the target, perhaps because it was optimized out by the compiler.
```
unless you were stopped in the `.o` file whose definition of `getName`
made it into the final executable.

The fix here is to error out if we fail to resolve the address, causing
us to fall back on the old flow which did a lookup by mangled name,
which the `SymbolFileDWARFDebugMap` will handle correctly.

An alternative fix to this would be to encode the
`SymbolFileDWARFDebugMap`'s module-id. And implement
`SymbolFileDWARFDebugMap::ResolveFunctionCallLabel` by doing a mangled
name lookup. The proposed approach doesn't stop us from implementing
that, so we could choose to do it in a follow-up.

rdar://161393045
Michael137 added a commit to swiftlang/llvm-project that referenced this pull request Oct 1, 2025
… address

Starting with llvm#148877 we started encoding the module ID of the function DIE we are currently
parsing into its `AsmLabel` in the AST. When the JIT asks LLDB to
resolve our special mangled name, we would locate the module and resolve
the function/symbol we found in it.

If we are debugging with a `SymbolFileDWARFDebugMap`, the module ID we
encode is that of the `.o` file that is tracked by the debug-map. To
resolve the address of the DIE in that `.o` file, we have to ask
`SymbolFileDWARFDebugMap::LinkOSOAddress` to turn the address of the
`.o` DIE into a real address in the linked executable. This will only
work if the `.o` address was actually tracked by the debug-map. However,
if the function definition appears in multiple `.o` files (which is the
case for functions defined in headers), the linker will most likely
de-deuplicate that definition. So most `.o`'s definition DIEs for that
function won't have a contribution in the debug-map, and thus we fail to
resolve the address.

When debugging Clang on Darwin, e.g., you'd see:
```
(lldb) expr CXXDecl->getName()

error: Couldn't look up symbols:
  $__lldb_func::0x1:0x4000d000002359da:_ZNK5clang9NamedDecl7getNameEv
Hint: The expression tried to call a function that is not present in the target, perhaps because it was optimized out by the compiler.
```
unless you were stopped in the `.o` file whose definition of `getName`
made it into the final executable.

The fix here is to error out if we fail to resolve the address, causing
us to fall back on the old flow which did a lookup by mangled name,
which the `SymbolFileDWARFDebugMap` will handle correctly.

An alternative fix to this would be to encode the
`SymbolFileDWARFDebugMap`'s module-id. And implement
`SymbolFileDWARFDebugMap::ResolveFunctionCallLabel` by doing a mangled
name lookup. The proposed approach doesn't stop us from implementing
that, so we could choose to do it in a follow-up.

rdar://161393045

(cherry-picked from 332b4de)
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Oct 1, 2025
… function address (#161363)

Starting with llvm/llvm-project#148877 we
started encoding the module ID of the function DIE we are currently
parsing into its `AsmLabel` in the AST. When the JIT asks LLDB to
resolve our special mangled name, we would locate the module and resolve
the function/symbol we found in it.

If we are debugging with a `SymbolFileDWARFDebugMap`, the module ID we
encode is that of the `.o` file that is tracked by the debug-map. To
resolve the address of the DIE in that `.o` file, we have to ask
`SymbolFileDWARFDebugMap::LinkOSOAddress` to turn the address of the
`.o` DIE into a real address in the linked executable. This will only
work if the `.o` address was actually tracked by the debug-map. However,
if the function definition appears in multiple `.o` files (which is the
case for functions defined in headers), the linker will most likely
de-deuplicate that definition. So most `.o`'s definition DIEs for that
function won't have a contribution in the debug-map, and thus we fail to
resolve the address.

When debugging Clang on Darwin, e.g., you'd see:
```
(lldb) expr CXXDecl->getName()

error: Couldn't look up symbols:
  $__lldb_func::0x1:0x4000d000002359da:_ZNK5clang9NamedDecl7getNameEv
Hint: The expression tried to call a function that is not present in the target, perhaps because it was optimized out by the compiler.
```
unless you were stopped in the `.o` file whose definition of `getName`
made it into the final executable.

The fix here is to error out if we fail to resolve the address, causing
us to fall back on the old flow which did a lookup by mangled name,
which the `SymbolFileDWARFDebugMap` will handle correctly.

An alternative fix to this would be to encode the
`SymbolFileDWARFDebugMap`'s module-id. And implement
`SymbolFileDWARFDebugMap::ResolveFunctionCallLabel` by doing a mangled
name lookup. The proposed approach doesn't stop us from implementing
that, so we could choose to do it in a follow-up.

rdar://161393045
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
…ddress (llvm#161363)

Starting with llvm#148877 we
started encoding the module ID of the function DIE we are currently
parsing into its `AsmLabel` in the AST. When the JIT asks LLDB to
resolve our special mangled name, we would locate the module and resolve
the function/symbol we found in it.

If we are debugging with a `SymbolFileDWARFDebugMap`, the module ID we
encode is that of the `.o` file that is tracked by the debug-map. To
resolve the address of the DIE in that `.o` file, we have to ask
`SymbolFileDWARFDebugMap::LinkOSOAddress` to turn the address of the
`.o` DIE into a real address in the linked executable. This will only
work if the `.o` address was actually tracked by the debug-map. However,
if the function definition appears in multiple `.o` files (which is the
case for functions defined in headers), the linker will most likely
de-deuplicate that definition. So most `.o`'s definition DIEs for that
function won't have a contribution in the debug-map, and thus we fail to
resolve the address.

When debugging Clang on Darwin, e.g., you'd see:
```
(lldb) expr CXXDecl->getName()

error: Couldn't look up symbols:
  $__lldb_func::0x1:0x4000d000002359da:_ZNK5clang9NamedDecl7getNameEv
Hint: The expression tried to call a function that is not present in the target, perhaps because it was optimized out by the compiler.
```
unless you were stopped in the `.o` file whose definition of `getName`
made it into the final executable.

The fix here is to error out if we fail to resolve the address, causing
us to fall back on the old flow which did a lookup by mangled name,
which the `SymbolFileDWARFDebugMap` will handle correctly.

An alternative fix to this would be to encode the
`SymbolFileDWARFDebugMap`'s module-id. And implement
`SymbolFileDWARFDebugMap::ResolveFunctionCallLabel` by doing a mangled
name lookup. The proposed approach doesn't stop us from implementing
that, so we could choose to do it in a follow-up.

rdar://161393045
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants