- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
[lldb][Expression] Encode Module and DIE UIDs into function AsmLabels #148877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lldb][Expression] Encode Module and DIE UIDs into function AsmLabels #148877
Conversation
| ✅ With the latest revision this PR passed the C/C++ code formatter. | 
| // Module UID is only a Darwin concept (?) | ||
| // If UUID is not available encode as pointer. | ||
| // Maybe add character to signal whether this is a pointer | ||
| // or UUID. Or maybe if it's not hex that implies a UUID? | ||
| auto module_id = module_sp->GetUUID(); | ||
| Module * module_ptr = nullptr; | ||
| if (!module_id.IsValid()) | ||
| module_ptr = module_sp.get(); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@labath I'm trying to clean up this prototype of your suggestion in my ABI-tagged structors RFC.
This function here is what encodes the label.
Something I wasn't sure about though was how we should encode the Module in the string. Since UIDs need not exist (e.g., on Linux IIUC?), was your original suggestion to simply print out the pointer? If so, how would we defend against the module getting unloaded under our feet? I might've misunderstood your idea though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(e.g., on Linux IIUC?)
It's more common there, but even on darwin you can create binaries without a UUID (ld --no_uuid). And even if there is one, there's no guarantee it's going to be truly unique.
was your original suggestion to simply print out the pointer? If so, how would we defend against the module getting unloaded under our feet? I might've misunderstood your idea though.
Yes, that was the idea, though I can't say I have given much thought about this aspect. Intuitively, I would expect this to be fine. Since the module pointer is only encoded in the AST inside that module, the fact that we got that pointer here means that the module was present at the beginning of the compilation process. The pointer is consumed after clang is done (which happens midway into compilation), and I'd hope that noone can unload a module between those two points.
That said, I admit this is risky, and I'm not pushing for this particular encoding. In particular, it may enable someone to crash lldb by creating a function with a funny name.
There are plenty of other ways to implement this detail though, the most obvious one being assigning a surrogate ID (an increasing integer) to each module upon creation (just like we do with debuggers).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are plenty of other ways to implement this detail though, the most obvious one being assigning a surrogate ID (an increasing integer) to each module upon creation (just like we do with debuggers).
Yea that seems like a good alternative
cfc23ca    to
    dd5782e      
    Compare
  
    4c87014    to
    fd6b6e8      
    Compare
  
    | @llvm/pr-subscribers-lldb Author: Michael Buch (Michael137) ChangesLLDB currently attaches  This patch sets up the infrastructure for the custom  Implementation The flow is as follows: 
 I made the label creation an API on  But happy to consider other architectures for this. Patch is 34.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/148877.diff 17 Files Affected: 
 diff --git a/lldb/include/lldb/Core/Module.h b/lldb/include/lldb/Core/Module.h
index 8bb55c95773bc..3991a12997541 100644
--- a/lldb/include/lldb/Core/Module.h
+++ b/lldb/include/lldb/Core/Module.h
@@ -86,7 +86,8 @@ struct ModuleFunctionSearchOptions {
 ///
 /// The module will parse more detailed information as more queries are made.
 class Module : public std::enable_shared_from_this<Module>,
-               public SymbolContextScope {
+               public SymbolContextScope,
+               public UserID {
 public:
   class LookupInfo;
   // Static functions that can track the lifetime of module objects. This is
@@ -97,6 +98,7 @@ class Module : public std::enable_shared_from_this<Module>,
   // using the "--global" (-g for short).
   static size_t GetNumberAllocatedModules();
 
+  static Module *GetAllocatedModuleWithUID(lldb::user_id_t uid);
   static Module *GetAllocatedModuleAtIndex(size_t idx);
 
   static std::recursive_mutex &GetAllocationModuleCollectionMutex();
diff --git a/lldb/include/lldb/Expression/Expression.h b/lldb/include/lldb/Expression/Expression.h
index 8de9364436ccf..f32878c9bf876 100644
--- a/lldb/include/lldb/Expression/Expression.h
+++ b/lldb/include/lldb/Expression/Expression.h
@@ -96,6 +96,31 @@ class Expression {
                                  ///invalid.
 };
 
+/// Holds parsed information about a function call label that
+/// LLDB attaches as an AsmLabel to function AST nodes it parses
+/// from debug-info.
+///
+/// The format being:
+///
+///   <prefix>:<mangled name>:<module id>:<DIE id>
+///
+/// The label string needs to stay valid for the entire lifetime
+/// of this object.
+struct FunctionCallLabel {
+  llvm::StringRef m_lookup_name;
+  lldb::user_id_t m_module_id;
+
+  /// Mostly for debuggability.
+  lldb::user_id_t m_die_id;
+};
+
+/// LLDB attaches this prefix to mangled names of functions that it get called
+/// from JITted expressions.
+inline constexpr llvm::StringRef FunctionCallLabelPrefix = "$__lldb_func";
+
+bool consumeFunctionCallLabelPrefix(llvm::StringRef &name);
+bool hasFunctionCallLabelPrefix(llvm::StringRef name);
+
 } // namespace lldb_private
 
 #endif // LLDB_EXPRESSION_EXPRESSION_H
diff --git a/lldb/include/lldb/Symbol/SymbolFile.h b/lldb/include/lldb/Symbol/SymbolFile.h
index e95f95553c17c..6aca276fc85b6 100644
--- a/lldb/include/lldb/Symbol/SymbolFile.h
+++ b/lldb/include/lldb/Symbol/SymbolFile.h
@@ -18,6 +18,7 @@
 #include "lldb/Symbol/CompilerType.h"
 #include "lldb/Symbol/Function.h"
 #include "lldb/Symbol/SourceModule.h"
+#include "lldb/Symbol/SymbolContext.h"
 #include "lldb/Symbol/Type.h"
 #include "lldb/Symbol/TypeList.h"
 #include "lldb/Symbol/TypeSystem.h"
@@ -328,6 +329,19 @@ class SymbolFile : public PluginInterface {
   GetMangledNamesForFunction(const std::string &scope_qualified_name,
                              std::vector<ConstString> &mangled_names);
 
+  /// Resolves the function DIE identified by \c lookup_name within
+  /// this SymbolFile.
+  ///
+  /// \param[in,out] sc_list The resolved functions will be appended to this
+  /// list.
+  ///
+  /// \param[in] lookup_name The UID of the function DIE to resolve.
+  ///
+  virtual llvm::Error FindAndResolveFunction(SymbolContextList &sc_list,
+                                             llvm::StringRef lookup_name) {
+    return llvm::createStringError("Not implemented");
+  }
+
   virtual void GetTypes(lldb_private::SymbolContextScope *sc_scope,
                         lldb::TypeClass type_mask,
                         lldb_private::TypeList &type_list) = 0;
diff --git a/lldb/include/lldb/Symbol/TypeSystem.h b/lldb/include/lldb/Symbol/TypeSystem.h
index cb1f0130b548d..742c09251ea2f 100644
--- a/lldb/include/lldb/Symbol/TypeSystem.h
+++ b/lldb/include/lldb/Symbol/TypeSystem.h
@@ -548,6 +548,22 @@ class TypeSystem : public PluginInterface,
   bool GetHasForcefullyCompletedTypes() const {
     return m_has_forcefully_completed_types;
   }
+
+  /// Returns the components of the specified function call label.
+  ///
+  /// The format of \c label is described in \c FunctionCallLabel.
+  /// The label prefix is not one of the components.
+  virtual llvm::Expected<llvm::SmallVector<llvm::StringRef, 3>>
+  splitFunctionCallLabel(llvm::StringRef label) const {
+    return llvm::createStringError("Not implemented.");
+  }
+
+  // Decodes the function label into a \c FunctionCallLabel.
+  virtual llvm::Expected<FunctionCallLabel>
+  makeFunctionCallLabel(llvm::StringRef label) const {
+    return llvm::createStringError("Not implemented.");
+  }
+
 protected:
   SymbolFile *m_sym_file = nullptr;
   /// Used for reporting statistics.
diff --git a/lldb/source/Core/Module.cpp b/lldb/source/Core/Module.cpp
index 90997dada3666..edd79aff5d065 100644
--- a/lldb/source/Core/Module.cpp
+++ b/lldb/source/Core/Module.cpp
@@ -121,6 +121,15 @@ size_t Module::GetNumberAllocatedModules() {
   return GetModuleCollection().size();
 }
 
+Module *Module::GetAllocatedModuleWithUID(lldb::user_id_t uid) {
+  std::lock_guard<std::recursive_mutex> guard(
+      GetAllocationModuleCollectionMutex());
+  for (Module *mod : GetModuleCollection())
+    if (mod->GetID() == uid)
+      return mod;
+  return nullptr;
+}
+
 Module *Module::GetAllocatedModuleAtIndex(size_t idx) {
   std::lock_guard<std::recursive_mutex> guard(
       GetAllocationModuleCollectionMutex());
@@ -130,8 +139,11 @@ Module *Module::GetAllocatedModuleAtIndex(size_t idx) {
   return nullptr;
 }
 
+// TODO: needs a mutex
+static lldb::user_id_t g_unique_id = 1;
+
 Module::Module(const ModuleSpec &module_spec)
-    : m_unwind_table(*this), m_file_has_changed(false),
+    : UserID(g_unique_id++), m_unwind_table(*this), m_file_has_changed(false),
       m_first_file_changed_log(false) {
   // Scope for locker below...
   {
@@ -236,7 +248,8 @@ Module::Module(const ModuleSpec &module_spec)
 Module::Module(const FileSpec &file_spec, const ArchSpec &arch,
                ConstString object_name, lldb::offset_t object_offset,
                const llvm::sys::TimePoint<> &object_mod_time)
-    : m_mod_time(FileSystem::Instance().GetModificationTime(file_spec)),
+    : UserID(g_unique_id++),
+      m_mod_time(FileSystem::Instance().GetModificationTime(file_spec)),
       m_arch(arch), m_file(file_spec), m_object_name(object_name),
       m_object_offset(object_offset), m_object_mod_time(object_mod_time),
       m_unwind_table(*this), m_file_has_changed(false),
@@ -257,7 +270,7 @@ Module::Module(const FileSpec &file_spec, const ArchSpec &arch,
 }
 
 Module::Module()
-    : m_unwind_table(*this), m_file_has_changed(false),
+    : UserID(g_unique_id++), m_unwind_table(*this), m_file_has_changed(false),
       m_first_file_changed_log(false) {
   std::lock_guard<std::recursive_mutex> guard(
       GetAllocationModuleCollectionMutex());
diff --git a/lldb/source/Expression/Expression.cpp b/lldb/source/Expression/Expression.cpp
index 93f585edfce3d..e19c804caa3c6 100644
--- a/lldb/source/Expression/Expression.cpp
+++ b/lldb/source/Expression/Expression.cpp
@@ -10,6 +10,10 @@
 #include "lldb/Target/ExecutionContextScope.h"
 #include "lldb/Target/Target.h"
 
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Support/Error.h"
+
 using namespace lldb_private;
 
 Expression::Expression(Target &target)
@@ -26,3 +30,15 @@ Expression::Expression(ExecutionContextScope &exe_scope)
       m_jit_end_addr(LLDB_INVALID_ADDRESS) {
   assert(m_target_wp.lock());
 }
+
+bool lldb_private::consumeFunctionCallLabelPrefix(llvm::StringRef &name) {
+  // On Darwin mangled names get a '_' prefix.
+  name.consume_front("_");
+  return name.consume_front(FunctionCallLabelPrefix);
+}
+
+bool lldb_private::hasFunctionCallLabelPrefix(llvm::StringRef name) {
+  // On Darwin mangled names get a '_' prefix.
+  name.consume_front("_");
+  return name.starts_with(FunctionCallLabelPrefix);
+}
diff --git a/lldb/source/Expression/IRExecutionUnit.cpp b/lldb/source/Expression/IRExecutionUnit.cpp
index 6f812b91a8b1d..a9ea244889cce 100644
--- a/lldb/source/Expression/IRExecutionUnit.cpp
+++ b/lldb/source/Expression/IRExecutionUnit.cpp
@@ -13,6 +13,7 @@
 #include "llvm/IR/DiagnosticInfo.h"
 #include "llvm/IR/LLVMContext.h"
 #include "llvm/IR/Module.h"
+#include "llvm/Support/Error.h"
 #include "llvm/Support/SourceMgr.h"
 #include "llvm/Support/raw_ostream.h"
 
@@ -20,6 +21,7 @@
 #include "lldb/Core/Disassembler.h"
 #include "lldb/Core/Module.h"
 #include "lldb/Core/Section.h"
+#include "lldb/Expression/Expression.h"
 #include "lldb/Expression/IRExecutionUnit.h"
 #include "lldb/Expression/ObjectFileJIT.h"
 #include "lldb/Host/HostInfo.h"
@@ -36,6 +38,7 @@
 #include "lldb/Utility/LLDBAssert.h"
 #include "lldb/Utility/LLDBLog.h"
 #include "lldb/Utility/Log.h"
+#include "lldb/lldb-defines.h"
 
 #include <optional>
 
@@ -771,6 +774,63 @@ class LoadAddressResolver {
   lldb::addr_t m_best_internal_load_address = LLDB_INVALID_ADDRESS;
 };
 
+/// Returns address of the function referred to by the special function call
+/// label \c label.
+///
+/// \param[in] label Function call label encoding the unique location of the
+/// function to look up.
+///                  Assumes that the \c FunctionCallLabelPrefix has been
+///                  stripped from the front of the label.
+static llvm::Expected<lldb::addr_t>
+ResolveFunctionCallLabel(llvm::StringRef name,
+                         const lldb_private::SymbolContext &sc,
+                         bool &symbol_was_missing_weak) {
+  symbol_was_missing_weak = false;
+
+  if (!sc.target_sp)
+    return llvm::createStringError("target not available.");
+
+  auto ts_or_err = sc.target_sp->GetScratchTypeSystemForLanguage(
+      lldb::eLanguageTypeC_plus_plus);
+  if (!ts_or_err || !*ts_or_err)
+    return llvm::joinErrors(
+        llvm::createStringError(
+            "failed to find scratch C++ TypeSystem for current target."),
+        ts_or_err.takeError());
+
+  auto ts_sp = *ts_or_err;
+
+  auto label_or_err = ts_sp->makeFunctionCallLabel(name);
+  if (!label_or_err)
+    return llvm::joinErrors(
+        llvm::createStringError("failed to create FunctionCallLabel from: %s",
+                                name.data()),
+        label_or_err.takeError());
+
+  const auto &label = *label_or_err;
+
+  Module *module = Module::GetAllocatedModuleWithUID(label.m_module_id);
+
+  if (!module)
+    return llvm::createStringError(
+        llvm::formatv("failed to find module by UID {0}", label.m_module_id));
+
+  auto *symbol_file = module->GetSymbolFile();
+  if (!symbol_file)
+    return llvm::createStringError(
+        llvm::formatv("no SymbolFile found on module {0:x}.", module));
+
+  SymbolContextList sc_list;
+  if (auto err =
+          symbol_file->FindAndResolveFunction(sc_list, label.m_lookup_name))
+    return llvm::joinErrors(
+        llvm::createStringError("failed to resolve function by UID"),
+        std::move(err));
+
+  LoadAddressResolver resolver(*sc.target_sp, symbol_was_missing_weak);
+  return resolver.Resolve(sc_list).value_or(LLDB_INVALID_ADDRESS);
+}
+
 lldb::addr_t
 IRExecutionUnit::FindInSymbols(const std::vector<ConstString> &names,
                                const lldb_private::SymbolContext &sc,
@@ -906,6 +966,20 @@ lldb::addr_t IRExecutionUnit::FindInUserDefinedSymbols(
 
 lldb::addr_t IRExecutionUnit::FindSymbol(lldb_private::ConstString name,
                                          bool &missing_weak) {
+  if (hasFunctionCallLabelPrefix(name.GetStringRef())) {
+    if (auto addr_or_err = ResolveFunctionCallLabel(name.GetStringRef(),
+                                                    m_sym_ctx, missing_weak)) {
+      return *addr_or_err;
+    } else {
+      LLDB_LOG_ERROR(GetLog(LLDBLog::Expressions), addr_or_err.takeError(),
+                     "Failed to resolve function call label {1}: {0}",
+                     name.GetStringRef());
+      return LLDB_INVALID_ADDRESS;
+    }
+  }
+
+  // TODO: do we still need the following lookups?
+
   std::vector<ConstString> candidate_C_names;
   std::vector<ConstString> candidate_CPlusPlus_names;
 
diff --git a/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionDeclMap.cpp b/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionDeclMap.cpp
index 9f77fbc1d2434..a6c4334bf2e59 100644
--- a/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionDeclMap.cpp
+++ b/lldb/source/Plugins/ExpressionParser/Clang/ClangExpressionDeclMap.cpp
@@ -1991,7 +1991,7 @@ void ClangExpressionDeclMap::AddContextClassType(NameSearchContext &context,
     const bool is_artificial = false;
 
     CXXMethodDecl *method_decl = m_clang_ast_context->AddMethodToCXXRecordType(
-        copied_clang_type.GetOpaqueQualType(), "$__lldb_expr", nullptr,
+        copied_clang_type.GetOpaqueQualType(), "$__lldb_expr", std::nullopt,
         method_type, lldb::eAccessPublic, is_virtual, is_static, is_inline,
         is_explicit, is_attr_used, is_artificial);
 
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
index c76d67b47b336..6e9598c13ed5c 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
@@ -24,6 +24,7 @@
 #include "Plugins/Language/ObjC/ObjCLanguage.h"
 #include "lldb/Core/Module.h"
 #include "lldb/Core/Value.h"
+#include "lldb/Expression/Expression.h"
 #include "lldb/Host/Host.h"
 #include "lldb/Symbol/CompileUnit.h"
 #include "lldb/Symbol/Function.h"
@@ -249,6 +250,28 @@ static unsigned GetCXXMethodCVQuals(const DWARFDIE &subprogram,
   return cv_quals;
 }
 
+static std::optional<std::string> MakeLLDBFuncAsmLabel(const DWARFDIE &die) {
+  char const *name = die.GetMangledName(/*substitute_name_allowed*/ false);
+  if (!name)
+    return std::nullopt;
+
+  auto module_sp = die.GetModule();
+  if (!module_sp)
+    return std::nullopt;
+
+  lldb::user_id_t module_id = module_sp->GetID();
+  if (module_id == LLDB_INVALID_UID)
+    return std::nullopt;
+
+  const auto die_id = die.GetID();
+  if (die_id == LLDB_INVALID_UID)
+    return std::nullopt;
+
+  return llvm::formatv("{0}:{1}:{2:x}:{3:x}", FunctionCallLabelPrefix, name,
+                       module_id, die_id)
+      .str();
+}
+
 TypeSP DWARFASTParserClang::ParseTypeFromClangModule(const SymbolContext &sc,
                                                      const DWARFDIE &die,
                                                      Log *log) {
@@ -1231,7 +1254,7 @@ std::pair<bool, TypeSP> DWARFASTParserClang::ParseCXXMethod(
 
   clang::CXXMethodDecl *cxx_method_decl = m_ast.AddMethodToCXXRecordType(
       class_opaque_type.GetOpaqueQualType(), attrs.name.GetCString(),
-      attrs.mangled_name, clang_type, accessibility, attrs.is_virtual,
+      MakeLLDBFuncAsmLabel(die), clang_type, accessibility, attrs.is_virtual,
       is_static, attrs.is_inline, attrs.is_explicit, is_attr_used,
       attrs.is_artificial);
 
@@ -1384,7 +1407,7 @@ DWARFASTParserClang::ParseSubroutine(const DWARFDIE &die,
             ignore_containing_context ? m_ast.GetTranslationUnitDecl()
                                       : containing_decl_ctx,
             GetOwningClangModule(die), name, clang_type, attrs.storage,
-            attrs.is_inline);
+            attrs.is_inline, MakeLLDBFuncAsmLabel(die));
         std::free(name_buf);
 
         if (has_template_params) {
@@ -1394,7 +1417,7 @@ DWARFASTParserClang::ParseSubroutine(const DWARFDIE &die,
               ignore_containing_context ? m_ast.GetTranslationUnitDecl()
                                         : containing_decl_ctx,
               GetOwningClangModule(die), attrs.name.GetStringRef(), clang_type,
-              attrs.storage, attrs.is_inline);
+              attrs.storage, attrs.is_inline, /*asm_label=*/std::nullopt);
           clang::FunctionTemplateDecl *func_template_decl =
               m_ast.CreateFunctionTemplateDecl(
                   containing_decl_ctx, GetOwningClangModule(die),
@@ -1406,20 +1429,6 @@ DWARFASTParserClang::ParseSubroutine(const DWARFDIE &die,
         lldbassert(function_decl);
 
         if (function_decl) {
-          // Attach an asm(<mangled_name>) label to the FunctionDecl.
-          // This ensures that clang::CodeGen emits function calls
-          // using symbols that are mangled according to the DW_AT_linkage_name.
-          // If we didn't do this, the external symbols wouldn't exactly
-          // match the mangled name LLDB knows about and the IRExecutionUnit
-          // would have to fall back to searching object files for
-          // approximately matching function names. The motivating
-          // example is generating calls to ABI-tagged template functions.
-          // This is done separately for member functions in
-          // AddMethodToCXXRecordType.
-          if (attrs.mangled_name)
-            function_decl->addAttr(clang::AsmLabelAttr::CreateImplicit(
-                m_ast.getASTContext(), attrs.mangled_name, /*literal=*/false));
-
           LinkDeclContextToDIE(function_decl, die);
 
           const clang::FunctionProtoType *function_prototype(
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
index 5b16ce5f75138..b6960bfe38fde 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
@@ -2471,6 +2471,47 @@ bool SymbolFileDWARF::ResolveFunction(const DWARFDIE &orig_die,
   return false;
 }
 
+llvm::Error
+SymbolFileDWARF::FindAndResolveFunction(SymbolContextList &sc_list,
+                                        llvm::StringRef lookup_name) {
+  std::lock_guard<std::recursive_mutex> guard(GetModuleMutex());
+
+  DWARFDIE die;
+  Module::LookupInfo info(ConstString(lookup_name), lldb::eFunctionNameTypeFull,
+                          lldb::eLanguageTypeUnknown);
+
+  m_index->GetFunctions(info, *this, {}, [&](DWARFDIE entry) {
+    if (entry.GetAttributeValueAsUnsigned(llvm::dwarf::DW_AT_declaration, 0))
+      return true;
+
+    // We don't check whether the specification DIE for this function
+    // corresponds to the declaration DIE because the declaration might be in
+    // a type-unit but the definition in the compile-unit (and it's
+    // specifcation would point to the declaration in the compile-unit). We
+    // rely on the mangled name within the module to be enough to find us the
+    // unique definition.
+    die = entry;
+    return false;
+  });
+
+  if (!die.IsValid())
+    return llvm::createStringError(
+        llvm::formatv("failed to find definition DIE for '{0}'", lookup_name));
+
+  if (!ResolveFunction(die, false, sc_list))
+    return llvm::createStringError("failed to resolve function DIE");
+
+  if (sc_list.IsEmpty())
+    return llvm::createStringError("no definition DIE found");
+
+  if (sc_list.GetSize() > 1)
+    return llvm::createStringError(
+        "found %d functions for %s but expected only 1", sc_list.GetSize(),
+        lookup_name.data());
+
+  return llvm::Error::success();
+}
+
 bool SymbolFileDWARF::DIEInDeclContext(const CompilerDeclContext &decl_ctx,
                                        const DWARFDIE &die,
                                        bool only_root_namespaces) {
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
index 2dc862cccca14..ee966366d3f2b 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
@@ -434,6 +434,9 @@ class SymbolFileDWARF : public SymbolFileCommon {
   DIEArray MergeBlockAbstractParameters(const DWARFDIE &block_die,
                                         DIEArray &&variable_dies);
 
+  llvm::Error FindAndResolveFunction(SymbolContextList &sc_list,
+                                     llvm::StringRef lookup_name) override;
+
   // Given a die_offset, figure out the symbol context representing that die.
   bool ResolveFunction(const DWARFDIE &die, bool include_inlines,
                        SymbolContextList &sc_list);
diff --git a/lldb/source/Plugins/SymbolFile/NativePDB/PdbAstBuilder.cpp b/lldb/source/Plugins/SymbolFile/NativePDB/PdbAstBuilder.cpp
index 702ec5e5c9ea9..bce721c149fee 100644
--- a/lldb/source/Plugins/SymbolFile/NativePDB/PdbAstBuilder.cpp
+++ b/lldb/source/Plugins/SymbolFile/Nativ...
[truncated]
 | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't use the work DIE here in the generic code (PDB and all). It's just an function identifier that the symbol file knows how to interpret.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about symbol_id or function_id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment to say this accounts for the target-specific mangling prefix. Otherwise, it's not clear why one needs this function.
That said, we might be able to use the llvm \01 mangling escape prefix to avoid this difference. Clang seems to do that automatically:
$ clang -target arm64-apple-macosx -x c -o - -S - <<<'void f() asm("FFF"); void f(){}' | grep FFF
	.globl	FFF                             ; -- Begin function FFF
FFF:                                    ; @"\01FFF"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, we might be able to use the llvm \01 mangling escape prefix to avoid this difference. Clang seems to do that automatically:
Good point. LLDB disabled the mangling prefix in:
commit f6bc251274f39dee5d09f668a56922c88bd027d8
Author: Vedant Kumar <vsk@apple.com>
Date:   Wed Sep 25 18:00:31 2019 +0000
    [Mangle] Add flag to asm labels to disable '\01' prefixing
    
    LLDB synthesizes decls using asm labels. These decls cannot have a mangle
    different than the one specified in the label name. I.e., the '\01' prefix
    should not be added.
    
    Fixes an expression evaluation failure in lldb's TestVirtual.py on iOS.
But that wouldn't apply with this patch anymore since we're doing the handling of the AsmLabel ourselves.
Didn't realize that the \01 prevents the global mangling prefix to get added into the IR names.
I'll go with this approach. Then we don't need these APIs either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that means we can also get rid of this functionality in Clang (since it was just added for LLDB).
        
          
                lldb/source/Core/Module.cpp
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd probably put this function into the Target class. It will not need to iterate through that many modules and it avoids accidentally pulling in an unrelated module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah now I remember why i did it this way initially. In some of the tests, the module that we get from die.GetModule() is for an object file (e.g., main.o) but the target image list doesn't contain that module. So I iterated over all the allocated modules. There's probably a better way to get from DIE module to module in the image list. Though I haven't found a good way yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes, that's tricky. We have something sort of similar in (e.g.) SymbolFileDWARF::GetDIEToType() which returns the DWARFDebugMap die-to-type map when dealing with a MachO .o files. It's not completely ideal, but I think it wouldn't be too bad if we had a SymbolFileDWARF function to return the ID of the "main" module.
Alternatively, since main symbol file which creates the type system (and the type system creates the DWARFASTParser), we may be able to plumb the module ID to the DWARFASTParser constructor.
        
          
                lldb/source/Core/Module.cpp
              
                Outdated
          
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. or you can make it an atomic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't use the m_ prefix for plain structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it makes sense to involve the type system here. At this point, do you even know if you're running a c++ expression?
Is there any harm in making this mangling scheme global? We can always change it or even add some language identifier if needed.
Alternatively, we could make each module responsible for its own mangling scheme. We could just parse out the module ID here and then pass the rest of the string verbatim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I made it TypeSystemClang specific is that in the follow-up patch to support constructor/destructor variants (#149827) I plan to put a Clang type into the FunctionCallLabel structure. Of course I could avoid pulling Clang in by using plain integers instead of (clang::CXXCtorType/clang::CXXDtorType). Not sure if we get here outside of expression evaluation, but happy to hoist this to somewhere more global.
Do you prefer me move this out of TypeSystem and into Expression (or something similar). And we can refactor it if needed for the constructor/destructor patch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to put clang-specific types into the generic FunctionCallLabel structure. If you want that, then the struct itself should by plugin-specific. I think that'd would be doable if you make the generic code treat the label.. generically (as a string I guess). The parsing would then happen inside the SymbolFile, which can know the encoding scheme used, the language and what not. In this world, it may make sense for this code to live in the type system, but then I'd make sure it's called from the SymbolFile class, and not directly from generic code.
However, I think that keeping the struct generic is also a viable option. We just need to avoid language and plugin-specific terms. For the structor type field, we could just say we have an additional "discriminator" field which the plugin can use to discriminate different "variants" of the given function. In this world, the label parsing should not be in a plugin. It could be just a method defined in Expression.h, next to the struct definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea happy to do that instead. That was what my first iteration of the patch did. I'll add a "uint64_t discriminator" field here instead. And pass the FunctionCallLabel structure into the SymbolFile instead of the individual components. Makes things much simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should at least be in the same plugin (ideally, in the same file) as the code that parses it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised to see a name lookup here. Is the idea to change that in a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I wanted to just use the DIE ID that we encoded. We can do that for regular functions, because the DWARFASTParserClang encodes the definition DIE ID. But for methods (and especially constructors/destructors) the DWARFASTParserClang just looks at the declaration DIE. So I do the lookup here.
We could make sure to encode the definition DIE into the label and move this logic into the DWARFASTParserClang, but unless I'm missing something we need the lookup to fetch the definitions (for methods at least)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So, we definitely don't want to look for the definition DIE just so we can encode it into the label. What we could do is encode the declaration DIE, and then when (and if -- I expect that the majority of functions will never be called) we need to resolve the call, we look up the matching definition DIE. That would be my ideal implementation as that's the only way to avoid ambiguities arising from multiple functions having identical linkage names (these occur in anonymous namespace. the ambiguities are solved because there's no need to look up definition DIEs, as these DIEs will be CU-local and always be definitions). It would also let us debug code which doesn't include the linkage names in the definition DIEs (which is what clang does with -gsce).
Another reason for dropping the linkage names might be to reduce the memory footprint of long asm labels, but if this isn't big, we may want to keep them anyway to help with debugging.
That said, I can imagine postponing that to a later time. However, I'd suggest designing this API such that it is possible to implement that in the future, such as by passing the entire FunctionCallLabel object so that the module can choose what it uses to perform the lookup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encode the declaration DIE, and then when (and if -- I expect that the majority of functions will never be called) we need to resolve the call, we look up the matching definition DIE
Hmm how do we do this without a name lookup? SymbolFileDWARF::FindDefinitionDIE?
Another reason for dropping the linkage names might be to reduce the memory footprint of long asm labels, but if this isn't big, we may want to keep them anyway to help with debugging.
Yea had the same thought. For debugging it has been useful actually. When something goes wrong and you get a couldn't resolve symbol $__lldb_func:_Z3foov:0x123:0xf00d having the mangled name there is pretty nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm how do we do this without a name lookup?
SymbolFileDWARF::FindDefinitionDIE?
Maybe, although I suspect that function only works on types/classes right now. Ultimately, we are going to need some kind of a name lookup to go from the declaration to the definition DIE, and I think that using the mangled name (when it exists) for that is the fastest and simplest way to achieve that. However, I think we should do that only if its needed. The most unique identifier is the DIE ID, so I'd start by looking that up. After that, there are three cases:
- This is a definition DIE. We don't need to do anything. The DIE doesn't even have to contain a mangled name. We just take the DW_AT_low/entry_pc, and we're done.
- This is a declaration of a CU-local entity (anonymous namespace). Here, we need to do a lookup, but we know the result must be in the current CU. The mangled name is the simplest way to do that. We could try to do structural matching (like we do for classes with simplified template names) if DW_AT_linkage_name is not present, but that's a job for another patch.
- This is a declaration of a global entity. We also need to do a lookup, but the result could be anywhere(*) -- though we still may want to prefer the definition in the current CU, if it exists.
(*) I just realized that "anywhere" also means "in a different module", which means that there should be some sort of a fallback to searching in other modules. Encoding the mangled name into the asm label might make that easier, as a retry could happen at a different level -- though it would still be nice if the "owning module" would signal that this kind of search is necessary (instead of blindly doing the full search whenever the first module doesn't produce a match).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying! Yes, that's about what I imagined it would look like
I just realized that "anywhere" also means "in a different module", which means that there should be some sort of a fallback to searching in other modules
Great point. I think we can re-use the existing IRExecutionUnit to do a fallback search through the other modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put mangled name last? That way you don't have to worry about it containing :s...
e64ce7d    to
    9aec4ec      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to drop the \01 mangling prefix here since we're not creating literal AsmLabels anymore. Fixed a couple of test failures where we ran expressions like expression func (to print the function pointer). We would try to find the symbol here (including the mangling prefix) and fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do this in a separate PR technically
…(#151858) This was added purely for the needs of LLDB. However, starting with llvm/llvm-project#151355 and llvm/llvm-project#148877 we no longer create literal AsmLabels in LLDB either. So this is unused and we can get rid of it. In the near future LLDB will want to add special support for mangling `AsmLabel`s (specifically llvm/llvm-project#149827).
…vm#151355) Split out from llvm#148877 This patch prepares `TypeSystemClang` APIs to take `AsmLabel`s which concatenated strings (hence `std::string`) instead of a plain `const char*`. (cherry picked from commit 7410f6d)
…llvm#148877) LLDB currently attaches `AsmLabel`s to `FunctionDecl`s such that that the `IRExecutionUnit` can determine which mangled name to call (we can't rely on Clang deriving the correct mangled name to call because the debug-info AST doesn't contain all the info that would be encoded in the DWARF linkage names). However, we don't attach `AsmLabel`s for structors because they have multiple variants and thus it's not clear which mangled name to use. In the [RFC on fixing expression evaluation of abi-tagged structors](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816) we discussed encoding the structor variant into the `AsmLabel`s. Specifically in [this thread](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7) we discussed that the contents of the `AsmLabel` are completely under LLDB's control and we could make use of it to uniquely identify a function by encoding the exact module and DIE that the function is associated with (mangled names need not be enough since two identical mangled symbols may live in different modules). So if we already have a custom `AsmLabel` format, we can encode the structor variant in a follow-up (the current idea is to append the structor variant as a suffix to our custom `AsmLabel` when Clang emits the mangled name into the JITted IR). Then we would just have to teach the `IRExecutionUnit` to pick the correct structor variant DIE during symbol resolution. The draft of this is available [here](llvm#149827) This patch sets up the infrastructure for the custom `AsmLabel` format by encoding the module id, DIE id and mangled name in it. **Implementation** The flow is as follows: 1. Create the label in `DWARFASTParserClang`. The format is: `$__lldb_func:module_id:die_id:mangled_name` 2. When resolving external symbols in `IRExecutionUnit`, we parse this label and then do a lookup by DIE ID (or mangled name into the module if the encoded DIE is a declaration). Depends on llvm#151355 (cherry picked from commit f890591)
This was added purely for the needs of LLDB. However, starting with llvm#151355 and llvm#148877 we no longer create literal AsmLabels in LLDB either. So this is unused and we can get rid of it. In the near future LLDB will want to add special support for mangling `AsmLabel`s (specifically llvm#149827). (cherry picked from commit cd8f348)
#149827) Depends on * #148877 * #155483 * #155485 * #154137 * #154142 This patch is an implementation of [this discussion](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7) about handling ABI-tagged structors during expression evaluation. **Motivation** LLDB encodes the mangled name of a `DW_TAG_subprogram` into `AsmLabel`s on function and method Clang AST nodes. This means that when calls to these functions get lowered into IR (when running JITted expressions), the address resolver can locate the appropriate symbol by mangled name (and it is guaranteed to find the symbol because we got the mangled name from debug-info, instead of letting Clang mangle it based on AST structure). However, we don't do this for `CXXConstructorDecl`s/`CXXDestructorDecl`s because these structor declarations in DWARF don't have a linkage name. This is because there can be multiple variants of a structor, each with a distinct mangling in the Itanium ABI. Each structor variant has its own definition `DW_TAG_subprogram`. So LLDB doesn't know which mangled name to put into the `AsmLabel`. Currently this means using ABI-tagged structors in LLDB expressions won't work (see [this RFC](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816) for concrete examples). **Proposed Solution** The `FunctionCallLabel` encoding that we put into `AsmLabel`s already supports stuffing more info about a DIE into it. So this patch extends the `FunctionCallLabel` to contain an optional discriminator (a sequence of bytes) which the `SymbolFileDWARF` plugin interprets as the constructor/destructor variant of that DIE. So when searching for the definition DIE, LLDB will include the structor variant in its heuristic for determining a match. There's a few subtleties here: 1. At the point at which LLDB first constructs the label, it has no way of knowing (just by looking at the debug-info declaration), which structor variant the expression evaluator is supposed to call. That's something that gets decided when compiling the expression. So we let the Clang mangler inject the correct structor variant into the `AsmLabel` during JITing. I adjusted the `AsmLabelAttr` mangling for this in #155485. An option would've been to create a new Clang attribute which behaved like an `AsmLabel` but with these special semantics for LLDB. My main concern there is that we'd have to adjust all the `AsmLabelAttr` checks around Clang to also now account for this new attribute. 2. The compiler is free to omit the `C1` variant of a constructor if the `C2` variant is sufficient. In that case it may alias `C1` to `C2`, leaving us with only the `C2` `DW_TAG_subprogram` in the object file. Linux is one of the platforms where this occurs. For those cases I added a heuristic in `SymbolFileDWARF` where we pick `C2` if we asked for `C1` but it doesn't exist. This may not always be correct (e.g., if the compiler decided to drop `C1` for other reasons). 3. In #154142 Clang will emit `C4`/`D4` variants of ctors/dtors on declarations. When resolving the `FunctionCallLabel` we will now substitute the actual variant that Clang told us we need to call into the mangled name. We do this using LLDB's `ManglingSubstitutor`. That way we find the definition DIE exactly the same way we do for regular function calls. 4. In cases where declarations and definitions live in separate modules, the DIE ID encoded in the function call label may not be enough to find the definition DIE in the encoded module ID. For those cases we fall back to how LLDB used to work: look up in all images of the target. To make sure we don't use the unified mangled name for the fallback lookup, we change the lookup name to whatever mangled name the FunctionCallLabel resolved to. rdar://104968288
… call labels (#149827) Depends on * llvm/llvm-project#148877 * llvm/llvm-project#155483 * llvm/llvm-project#155485 * llvm/llvm-project#154137 * llvm/llvm-project#154142 This patch is an implementation of [this discussion](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7) about handling ABI-tagged structors during expression evaluation. **Motivation** LLDB encodes the mangled name of a `DW_TAG_subprogram` into `AsmLabel`s on function and method Clang AST nodes. This means that when calls to these functions get lowered into IR (when running JITted expressions), the address resolver can locate the appropriate symbol by mangled name (and it is guaranteed to find the symbol because we got the mangled name from debug-info, instead of letting Clang mangle it based on AST structure). However, we don't do this for `CXXConstructorDecl`s/`CXXDestructorDecl`s because these structor declarations in DWARF don't have a linkage name. This is because there can be multiple variants of a structor, each with a distinct mangling in the Itanium ABI. Each structor variant has its own definition `DW_TAG_subprogram`. So LLDB doesn't know which mangled name to put into the `AsmLabel`. Currently this means using ABI-tagged structors in LLDB expressions won't work (see [this RFC](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816) for concrete examples). **Proposed Solution** The `FunctionCallLabel` encoding that we put into `AsmLabel`s already supports stuffing more info about a DIE into it. So this patch extends the `FunctionCallLabel` to contain an optional discriminator (a sequence of bytes) which the `SymbolFileDWARF` plugin interprets as the constructor/destructor variant of that DIE. So when searching for the definition DIE, LLDB will include the structor variant in its heuristic for determining a match. There's a few subtleties here: 1. At the point at which LLDB first constructs the label, it has no way of knowing (just by looking at the debug-info declaration), which structor variant the expression evaluator is supposed to call. That's something that gets decided when compiling the expression. So we let the Clang mangler inject the correct structor variant into the `AsmLabel` during JITing. I adjusted the `AsmLabelAttr` mangling for this in llvm/llvm-project#155485. An option would've been to create a new Clang attribute which behaved like an `AsmLabel` but with these special semantics for LLDB. My main concern there is that we'd have to adjust all the `AsmLabelAttr` checks around Clang to also now account for this new attribute. 2. The compiler is free to omit the `C1` variant of a constructor if the `C2` variant is sufficient. In that case it may alias `C1` to `C2`, leaving us with only the `C2` `DW_TAG_subprogram` in the object file. Linux is one of the platforms where this occurs. For those cases I added a heuristic in `SymbolFileDWARF` where we pick `C2` if we asked for `C1` but it doesn't exist. This may not always be correct (e.g., if the compiler decided to drop `C1` for other reasons). 3. In llvm/llvm-project#154142 Clang will emit `C4`/`D4` variants of ctors/dtors on declarations. When resolving the `FunctionCallLabel` we will now substitute the actual variant that Clang told us we need to call into the mangled name. We do this using LLDB's `ManglingSubstitutor`. That way we find the definition DIE exactly the same way we do for regular function calls. 4. In cases where declarations and definitions live in separate modules, the DIE ID encoded in the function call label may not be enough to find the definition DIE in the encoded module ID. For those cases we fall back to how LLDB used to work: look up in all images of the target. To make sure we don't use the unified mangled name for the fallback lookup, we change the lookup name to whatever mangled name the FunctionCallLabel resolved to. rdar://104968288
llvm#149827) Depends on * llvm#148877 * llvm#155483 * llvm#155485 * llvm#154137 * llvm#154142 This patch is an implementation of [this discussion](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7) about handling ABI-tagged structors during expression evaluation. **Motivation** LLDB encodes the mangled name of a `DW_TAG_subprogram` into `AsmLabel`s on function and method Clang AST nodes. This means that when calls to these functions get lowered into IR (when running JITted expressions), the address resolver can locate the appropriate symbol by mangled name (and it is guaranteed to find the symbol because we got the mangled name from debug-info, instead of letting Clang mangle it based on AST structure). However, we don't do this for `CXXConstructorDecl`s/`CXXDestructorDecl`s because these structor declarations in DWARF don't have a linkage name. This is because there can be multiple variants of a structor, each with a distinct mangling in the Itanium ABI. Each structor variant has its own definition `DW_TAG_subprogram`. So LLDB doesn't know which mangled name to put into the `AsmLabel`. Currently this means using ABI-tagged structors in LLDB expressions won't work (see [this RFC](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816) for concrete examples). **Proposed Solution** The `FunctionCallLabel` encoding that we put into `AsmLabel`s already supports stuffing more info about a DIE into it. So this patch extends the `FunctionCallLabel` to contain an optional discriminator (a sequence of bytes) which the `SymbolFileDWARF` plugin interprets as the constructor/destructor variant of that DIE. So when searching for the definition DIE, LLDB will include the structor variant in its heuristic for determining a match. There's a few subtleties here: 1. At the point at which LLDB first constructs the label, it has no way of knowing (just by looking at the debug-info declaration), which structor variant the expression evaluator is supposed to call. That's something that gets decided when compiling the expression. So we let the Clang mangler inject the correct structor variant into the `AsmLabel` during JITing. I adjusted the `AsmLabelAttr` mangling for this in llvm#155485. An option would've been to create a new Clang attribute which behaved like an `AsmLabel` but with these special semantics for LLDB. My main concern there is that we'd have to adjust all the `AsmLabelAttr` checks around Clang to also now account for this new attribute. 2. The compiler is free to omit the `C1` variant of a constructor if the `C2` variant is sufficient. In that case it may alias `C1` to `C2`, leaving us with only the `C2` `DW_TAG_subprogram` in the object file. Linux is one of the platforms where this occurs. For those cases I added a heuristic in `SymbolFileDWARF` where we pick `C2` if we asked for `C1` but it doesn't exist. This may not always be correct (e.g., if the compiler decided to drop `C1` for other reasons). 3. In llvm#154142 Clang will emit `C4`/`D4` variants of ctors/dtors on declarations. When resolving the `FunctionCallLabel` we will now substitute the actual variant that Clang told us we need to call into the mangled name. We do this using LLDB's `ManglingSubstitutor`. That way we find the definition DIE exactly the same way we do for regular function calls. 4. In cases where declarations and definitions live in separate modules, the DIE ID encoded in the function call label may not be enough to find the definition DIE in the encoded module ID. For those cases we fall back to how LLDB used to work: look up in all images of the target. To make sure we don't use the unified mangled name for the fallback lookup, we change the lookup name to whatever mangled name the FunctionCallLabel resolved to. rdar://104968288 (cherry picked from commit 57a7907)
…ddress Starting with llvm#148877 we started encoding the module ID of the function DIE we are currently parsing into its `AsmLabel` in the AST. When the JIT asks LLDB to resolve our special mangled name, we would locate the module and resolve the function/symbol we found in it. If we are debugging with a `SymbolFileDWARFDebugMap`, the module ID we encode is that of the `.o` file that is tracked by the debug-map. To resolve the address of the DIE in that `.o` file, we have to ask `SymbolFileDWARFDebugMap::LinkOSOAddress` to turn the address of the `.o` DIE into a real address in the linked executable. This will only work if the `.o` address was actually tracked by the debug-map. However, if the function definition appears in multiple `.o` files (which is the case for functions defined in headers), the linker will most likely de-deuplicate that definition. So most `.o`'s definition DIEs for that function won't have a contribution in the debug-map, and thus we fail to resolve the address. When debugging Clang on Darwin, e.g., you'd see: ``` (lldb) expr CXXDecl->getName() error: Couldn't look up symbols: $__lldb_func::0x1:0x4000d000002359da:_ZNK5clang9NamedDecl7getNameEv Hint: The expression tried to call a function that is not present in the target, perhaps because it was optimized out by the compiler. ``` unless you were stopped in the `.o` file whose definition of `getName` made it into the final executable. The fix here is to error out if we fail to resolve the address, causing us to fall back on the old flow which did a lookup by mangled name, which the `SymbolFileDWARFDebugMap` will handle correctly. An alternative fix to this would be to encode the `SymbolFileDWARFDebugMap`'s module-id. And implement `SymbolFileDWARFDebugMap::ResolveFunctionCallLabel` by doing a mangled name lookup. The proposed approach doesn't stop us from implementing that, so we could choose to do it in a follow-up. rdar://161393045
…ddress (#161363) Starting with #148877 we started encoding the module ID of the function DIE we are currently parsing into its `AsmLabel` in the AST. When the JIT asks LLDB to resolve our special mangled name, we would locate the module and resolve the function/symbol we found in it. If we are debugging with a `SymbolFileDWARFDebugMap`, the module ID we encode is that of the `.o` file that is tracked by the debug-map. To resolve the address of the DIE in that `.o` file, we have to ask `SymbolFileDWARFDebugMap::LinkOSOAddress` to turn the address of the `.o` DIE into a real address in the linked executable. This will only work if the `.o` address was actually tracked by the debug-map. However, if the function definition appears in multiple `.o` files (which is the case for functions defined in headers), the linker will most likely de-deuplicate that definition. So most `.o`'s definition DIEs for that function won't have a contribution in the debug-map, and thus we fail to resolve the address. When debugging Clang on Darwin, e.g., you'd see: ``` (lldb) expr CXXDecl->getName() error: Couldn't look up symbols: $__lldb_func::0x1:0x4000d000002359da:_ZNK5clang9NamedDecl7getNameEv Hint: The expression tried to call a function that is not present in the target, perhaps because it was optimized out by the compiler. ``` unless you were stopped in the `.o` file whose definition of `getName` made it into the final executable. The fix here is to error out if we fail to resolve the address, causing us to fall back on the old flow which did a lookup by mangled name, which the `SymbolFileDWARFDebugMap` will handle correctly. An alternative fix to this would be to encode the `SymbolFileDWARFDebugMap`'s module-id. And implement `SymbolFileDWARFDebugMap::ResolveFunctionCallLabel` by doing a mangled name lookup. The proposed approach doesn't stop us from implementing that, so we could choose to do it in a follow-up. rdar://161393045
… address Starting with llvm#148877 we started encoding the module ID of the function DIE we are currently parsing into its `AsmLabel` in the AST. When the JIT asks LLDB to resolve our special mangled name, we would locate the module and resolve the function/symbol we found in it. If we are debugging with a `SymbolFileDWARFDebugMap`, the module ID we encode is that of the `.o` file that is tracked by the debug-map. To resolve the address of the DIE in that `.o` file, we have to ask `SymbolFileDWARFDebugMap::LinkOSOAddress` to turn the address of the `.o` DIE into a real address in the linked executable. This will only work if the `.o` address was actually tracked by the debug-map. However, if the function definition appears in multiple `.o` files (which is the case for functions defined in headers), the linker will most likely de-deuplicate that definition. So most `.o`'s definition DIEs for that function won't have a contribution in the debug-map, and thus we fail to resolve the address. When debugging Clang on Darwin, e.g., you'd see: ``` (lldb) expr CXXDecl->getName() error: Couldn't look up symbols: $__lldb_func::0x1:0x4000d000002359da:_ZNK5clang9NamedDecl7getNameEv Hint: The expression tried to call a function that is not present in the target, perhaps because it was optimized out by the compiler. ``` unless you were stopped in the `.o` file whose definition of `getName` made it into the final executable. The fix here is to error out if we fail to resolve the address, causing us to fall back on the old flow which did a lookup by mangled name, which the `SymbolFileDWARFDebugMap` will handle correctly. An alternative fix to this would be to encode the `SymbolFileDWARFDebugMap`'s module-id. And implement `SymbolFileDWARFDebugMap::ResolveFunctionCallLabel` by doing a mangled name lookup. The proposed approach doesn't stop us from implementing that, so we could choose to do it in a follow-up. rdar://161393045 (cherry-picked from 332b4de)
… function address (#161363) Starting with llvm/llvm-project#148877 we started encoding the module ID of the function DIE we are currently parsing into its `AsmLabel` in the AST. When the JIT asks LLDB to resolve our special mangled name, we would locate the module and resolve the function/symbol we found in it. If we are debugging with a `SymbolFileDWARFDebugMap`, the module ID we encode is that of the `.o` file that is tracked by the debug-map. To resolve the address of the DIE in that `.o` file, we have to ask `SymbolFileDWARFDebugMap::LinkOSOAddress` to turn the address of the `.o` DIE into a real address in the linked executable. This will only work if the `.o` address was actually tracked by the debug-map. However, if the function definition appears in multiple `.o` files (which is the case for functions defined in headers), the linker will most likely de-deuplicate that definition. So most `.o`'s definition DIEs for that function won't have a contribution in the debug-map, and thus we fail to resolve the address. When debugging Clang on Darwin, e.g., you'd see: ``` (lldb) expr CXXDecl->getName() error: Couldn't look up symbols: $__lldb_func::0x1:0x4000d000002359da:_ZNK5clang9NamedDecl7getNameEv Hint: The expression tried to call a function that is not present in the target, perhaps because it was optimized out by the compiler. ``` unless you were stopped in the `.o` file whose definition of `getName` made it into the final executable. The fix here is to error out if we fail to resolve the address, causing us to fall back on the old flow which did a lookup by mangled name, which the `SymbolFileDWARFDebugMap` will handle correctly. An alternative fix to this would be to encode the `SymbolFileDWARFDebugMap`'s module-id. And implement `SymbolFileDWARFDebugMap::ResolveFunctionCallLabel` by doing a mangled name lookup. The proposed approach doesn't stop us from implementing that, so we could choose to do it in a follow-up. rdar://161393045
…ddress (llvm#161363) Starting with llvm#148877 we started encoding the module ID of the function DIE we are currently parsing into its `AsmLabel` in the AST. When the JIT asks LLDB to resolve our special mangled name, we would locate the module and resolve the function/symbol we found in it. If we are debugging with a `SymbolFileDWARFDebugMap`, the module ID we encode is that of the `.o` file that is tracked by the debug-map. To resolve the address of the DIE in that `.o` file, we have to ask `SymbolFileDWARFDebugMap::LinkOSOAddress` to turn the address of the `.o` DIE into a real address in the linked executable. This will only work if the `.o` address was actually tracked by the debug-map. However, if the function definition appears in multiple `.o` files (which is the case for functions defined in headers), the linker will most likely de-deuplicate that definition. So most `.o`'s definition DIEs for that function won't have a contribution in the debug-map, and thus we fail to resolve the address. When debugging Clang on Darwin, e.g., you'd see: ``` (lldb) expr CXXDecl->getName() error: Couldn't look up symbols: $__lldb_func::0x1:0x4000d000002359da:_ZNK5clang9NamedDecl7getNameEv Hint: The expression tried to call a function that is not present in the target, perhaps because it was optimized out by the compiler. ``` unless you were stopped in the `.o` file whose definition of `getName` made it into the final executable. The fix here is to error out if we fail to resolve the address, causing us to fall back on the old flow which did a lookup by mangled name, which the `SymbolFileDWARFDebugMap` will handle correctly. An alternative fix to this would be to encode the `SymbolFileDWARFDebugMap`'s module-id. And implement `SymbolFileDWARFDebugMap::ResolveFunctionCallLabel` by doing a mangled name lookup. The proposed approach doesn't stop us from implementing that, so we could choose to do it in a follow-up. rdar://161393045
LLDB currently attaches
AsmLabels toFunctionDecls such that that theIRExecutionUnitcan determine which mangled name to call (we can't rely on Clang deriving the correct mangled name to call because the debug-info AST doesn't contain all the info that would be encoded in the DWARF linkage names). However, we don't attachAsmLabels for structors because they have multiple variants and thus it's not clear which mangled name to use. In the RFC on fixing expression evaluation of abi-tagged structors we discussed encoding the structor variant into theAsmLabels. Specifically in this thread we discussed that the contents of theAsmLabelare completely under LLDB's control and we could make use of it to uniquely identify a function by encoding the exact module and DIE that the function is associated with (mangled names need not be enough since two identical mangled symbols may live in different modules). So if we already have a customAsmLabelformat, we can encode the structor variant in a follow-up (the current idea is to append the structor variant as a suffix to our customAsmLabelwhen Clang emits the mangled name into the JITted IR). Then we would just have to teach theIRExecutionUnitto pick the correct structor variant DIE during symbol resolution. The draft of this is available hereThis patch sets up the infrastructure for the custom
AsmLabelformat by encoding the module id, DIE id and mangled name in it.Implementation
The flow is as follows:
DWARFASTParserClang. The format is:$__lldb_func:module_id:die_id:mangled_nameIRExecutionUnit, we parse this label and then do a lookup by DIE ID (or mangled name into the module if the encoded DIE is a declaration).Depends on #151355