-
Notifications
You must be signed in to change notification settings - Fork 13.3k
[llvm][aarch64] Fix Arm64EC name mangling algorithm #115567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-platform-windows @llvm/pr-subscribers-llvm-ir Author: Daniel Paoliello (dpaoliello) ChangesArm64EC uses a special name mangling mode that adds This change fixes the algorithm to:
Also fixed Fixes #115231 Full diff: https://github.com/llvm/llvm-project/pull/115567.diff 3 Files Affected:
diff --git a/llvm/include/llvm/IR/Mangler.h b/llvm/include/llvm/IR/Mangler.h
index 3c3f0c6dce80fa..6c8ebf5f072f28 100644
--- a/llvm/include/llvm/IR/Mangler.h
+++ b/llvm/include/llvm/IR/Mangler.h
@@ -64,7 +64,7 @@ std::optional<std::string> getArm64ECDemangledFunctionName(StringRef Name);
/// Check if an ARM64EC function name is mangled.
bool inline isArm64ECMangledFunctionName(StringRef Name) {
return Name[0] == '#' ||
- (Name[0] == '?' && Name.find("$$h") != StringRef::npos);
+ (Name[0] == '?' && Name.find("@$$h") != StringRef::npos);
}
} // End llvm namespace
diff --git a/llvm/lib/IR/Mangler.cpp b/llvm/lib/IR/Mangler.cpp
index 15a4debf191a5b..12be14156f6656 100644
--- a/llvm/lib/IR/Mangler.cpp
+++ b/llvm/lib/IR/Mangler.cpp
@@ -302,14 +302,23 @@ std::optional<std::string> llvm::getArm64ECMangledFunctionName(StringRef Name) {
// Insert the ARM64EC "$$h" tag after the mangled function name.
if (Name.contains("$$h"))
return std::nullopt;
- size_t InsertIdx = Name.find("@@");
- size_t ThreeAtSignsIdx = Name.find("@@@");
- if (InsertIdx != std::string::npos && InsertIdx != ThreeAtSignsIdx) {
+
+ // The last 4 characters of the symbol type may contain a `@@` if the symbol
+ // is returning a qualified type. We don't want to insert `$$h` at that point.
+ auto TrimmedName = Name.drop_back(4);
+
+ // The last `@@` is the separation between the qualified name of the symbol
+ // and its type, which is where we want to insert `$$h`.
+ auto InsertIdx = TrimmedName.rfind("@@");
+ if (InsertIdx != StringRef::npos) {
InsertIdx += 2;
} else {
- InsertIdx = Name.find("@");
- if (InsertIdx != std::string::npos)
- InsertIdx++;
+ // If there is no `@@`, then this is a global symbol (e.g., `operator new`)
+ // so look for a `@` instead (since we assume that it will not return a
+ // qualified type).
+ InsertIdx = TrimmedName.find_last_of('@');
+ assert(InsertIdx != StringRef::npos && "Invalid mangled name");
+ InsertIdx += 1;
}
return std::optional<std::string>(
diff --git a/llvm/unittests/IR/ManglerTest.cpp b/llvm/unittests/IR/ManglerTest.cpp
index 5ac784b7e89ac6..017d8303551244 100644
--- a/llvm/unittests/IR/ManglerTest.cpp
+++ b/llvm/unittests/IR/ManglerTest.cpp
@@ -172,4 +172,66 @@ TEST(ManglerTest, GOFF) {
"L#foo");
}
+TEST(ManglerTest, Arm64EC) {
+ constexpr std::string_view Arm64ECNames[] = {
+ // Basic C name.
+ "#Foo",
+
+ // Basic C++ name.
+ "?foo@@$$hYAHXZ",
+
+ // Regression test: https://github.com/llvm/llvm-project/issues/115231
+ "?GetValue@?$Wrapper@UA@@@@$$hQEBAHXZ",
+
+ // Symbols from:
+ // ```
+ // namespace A::B::C::D {
+ // struct Base {
+ // virtual int f() { return 0; }
+ // };
+ // }
+ // struct Derived : public A::B::C::D::Base {
+ // virtual int f() override { return 1; }
+ // };
+ // A::B::C::D::Base* MakeObj() { return new Derived(); }
+ // ```
+ // void * __cdecl operator new(unsigned __int64)
+ "??2@$$hYAPEAX_K@Z",
+ // public: virtual int __cdecl A::B::C::D::Base::f(void)
+ "?f@Base@D@C@B@A@@$$hUEAAHXZ",
+ // public: __cdecl A::B::C::D::Base::Base(void)
+ "??0Base@D@C@B@A@@$$hQEAA@XZ",
+ // public: virtual int __cdecl Derived::f(void)
+ "?f@Derived@@$$hUEAAHXZ",
+ // public: __cdecl Derived::Derived(void)
+ "??0Derived@@$$hQEAA@XZ",
+ // struct A::B::C::D::Base * __cdecl MakeObj(void)
+ "?MakeObj@@$$hYAPEAUBase@D@C@B@A@@XZ",
+ };
+
+ for (const auto &Arm64ECName : Arm64ECNames) {
+ // Check that this is a mangled name.
+ EXPECT_TRUE(isArm64ECMangledFunctionName(Arm64ECName))
+ << "Test case: " << Arm64ECName;
+ // Refuse to mangle it again.
+ EXPECT_FALSE(getArm64ECMangledFunctionName(Arm64ECName).has_value())
+ << "Test case: " << Arm64ECName;
+
+ // Demangle.
+ auto Arm64Name = getArm64ECDemangledFunctionName(Arm64ECName);
+ EXPECT_TRUE(Arm64Name.has_value()) << "Test case: " << Arm64ECName;
+ // Check that it is not mangled.
+ EXPECT_FALSE(isArm64ECMangledFunctionName(Arm64Name.value()))
+ << "Test case: " << Arm64ECName;
+ // Refuse to demangle it again.
+ EXPECT_FALSE(getArm64ECDemangledFunctionName(Arm64Name.value()).has_value())
+ << "Test case: " << Arm64ECName;
+
+ // Round-trip.
+ auto RoundTripArm64ECName =
+ getArm64ECMangledFunctionName(Arm64Name.value());
+ EXPECT_EQ(RoundTripArm64ECName, Arm64ECName);
+ }
+}
+
} // end anonymous namespace
|
llvm/lib/IR/Mangler.cpp
Outdated
// so look for a `@` instead (since we assume that it will not return a | ||
// qualified type). | ||
InsertIdx = TrimmedName.find_last_of('@'); | ||
assert(InsertIdx != StringRef::npos && "Invalid mangled name"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this assertion is likely fine for IR callers, in LLD we may encounter arbitrary user-provided input. Instead, we could simply return nullopt
here. This would break the assumption that exactly one of the mangled or demangled functions always returns a value, so some adjustments will be needed in the callers. I’ve drafted the LLD change here: cjacek@ccb9503.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed all insertions
llvm/lib/IR/Mangler.cpp
Outdated
auto TrimmedName = Name.drop_back(4); | ||
|
||
// The last `@@` is the separation between the qualified name of the symbol | ||
// and its type, which is where we want to insert `$$h`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not confident this actually works. Consider, for example:
template<typename T> struct WW { struct Z{}; };
template <typename X> struct Wrapper {
int GetValue(typename WW<X>::Z) const;
};
struct A {
};
template<typename X> int Wrapper<X>::GetValue(typename WW<X>::Z) const { return 3; }
template class Wrapper<A>;
If we can't figure out a simple way to figure out the correct insertion point, it might end up being simpler to just compute the name in clang, using the real name mangler, and attach it using metadata, or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSVC generates the mangled name in the frontend, and that's certainly the most robust choice.
Unfortunately, LLD and import library generation have both taken a dependency on this function so we'd have to detangle that first.
The other option is to use a name demangler to parse the mangled name instead of this adhoc from-the-back parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.
I'd guess parsing forward from the front is easier than parsing backwards, because that's the way mangling naturally works. And I'm not sure there are any good shortcuts here. Not sure how much code a "mini-demangler" that knows enough to parse the name of the function without actually printing the demangled name would be, vs. just pulling in the real demangler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out using the real demangler was fairly easy. Also added your new test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Do you think it makes sense to cherry-pick this to 19.x? |
Yeah, that's probably a good idea otherwise we'll be generating different names than MSVC |
This is a follow-up to llvm#115567. Emit an error for invalid function names, similar to MSVC's `lib.exe` behavior. Returning an error from `writeImportLibrary` exposed bugs in error handling by its callers, which have been addressed in this patch.
This is a follow-up to llvm#115567.
Arm64EC uses a special name mangling mode that adds `$$h` between the symbol name and its type. In MSVC's name mangling `@` is used to separate the name and type BUT it is also used for other purposes, such as the separator between paths in a fully qualified name. The original algorithm was quite fragile and made assumptions that didn't hold true for all MSVC mangled symbols, so instead of trying to improve this algorithm we are now using the demangler to indicate where the insertion point should be (i.e., to parse the fully-qualified name and return the current string offset). Also fixed `isArm64ECMangledFunctionName` to search for `@$$h` since the `$$h` must always be after a `@`. Fixes llvm#115231
Arm64EC uses a special name mangling mode that adds `$$h` between the symbol name and its type. In MSVC's name mangling `@` is used to separate the name and type BUT it is also used for other purposes, such as the separator between paths in a fully qualified name. The original algorithm was quite fragile and made assumptions that didn't hold true for all MSVC mangled symbols, so instead of trying to improve this algorithm we are now using the demangler to indicate where the insertion point should be (i.e., to parse the fully-qualified name and return the current string offset). Also fixed `isArm64ECMangledFunctionName` to search for `@$$h` since the `$$h` must always be after a `@`. Fixes llvm#115231
Since these symbols cannot be mangled or demangled, there is no symbol to check for conflicts in checkLazyECPair, nor is there an alias to create in addUndefined. Attempting to create an import library with such symbols results in an error; the patch includes a test to ensure the error is handled correctly. This is a follow-up to llvm#115567.
Since these symbols cannot be mangled or demangled, there is no symbol to check for conflicts in `checkLazyECPair`, nor is there an alias to create in `addUndefined`. Attempting to create an import library with such symbols results in an error; the patch includes a test to ensure the error is handled correctly. This is a follow-up to #115567.
Arm64EC uses a special name mangling mode that adds `$$h` between the symbol name and its type. In MSVC's name mangling `@` is used to separate the name and type BUT it is also used for other purposes, such as the separator between paths in a fully qualified name. The original algorithm was quite fragile and made assumptions that didn't hold true for all MSVC mangled symbols, so instead of trying to improve this algorithm we are now using the demangler to indicate where the insertion point should be (i.e., to parse the fully-qualified name and return the current string offset). Also fixed `isArm64ECMangledFunctionName` to search for `@$$h` since the `$$h` must always be after a `@`. Fixes llvm#115231
Arm64EC uses a special name mangling mode that adds
$$h
between the symbol name and its type. In MSVC's name mangling@
is used to separate the name and type BUT it is also used for other purposes, such as the separator between paths in a fully qualified name.The original algorithm was quite fragile and made assumptions that didn't hold true for all MSVC mangled symbols, so instead of trying to improve this algorithm we are now using the demangler to indicate where the insertion point should be (i.e., to parse the fully-qualified name and return the current string offset).
Also fixed
isArm64ECMangledFunctionName
to search for@$$h
since the$$h
must always be after a@
.Fixes #115231