Skip to content

Conversation

ericastor
Copy link
Contributor

@ericastor ericastor commented Mar 11, 2025

MASM supports some built-in macro-type functions.

We start our support for these with @CatStr, one of the more commonly used.

@llvmbot llvmbot added the llvm:mc Machine (object) code label Mar 11, 2025
@ericastor ericastor changed the title [ms] [llvm-ml] Add support for @CatStr built-in function symbol [ms] [llvm-ml] Add support for @CatStr built-in function symbol Mar 11, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 11, 2025

@llvm/pr-subscribers-mc

Author: Eric Astor (ericastor)

Changes

MASM supports some built-in macro-type functions.

We start our support for these with @<!-- -->CatStr, one of the more commonly used.


Full diff: https://github.com/llvm/llvm-project/pull/130781.diff

2 Files Affected:

  • (modified) llvm/lib/MC/MCParser/MasmParser.cpp (+89-9)
  • (modified) llvm/test/tools/llvm-ml/macro_function.asm (+9-3)
diff --git a/llvm/lib/MC/MCParser/MasmParser.cpp b/llvm/lib/MC/MCParser/MasmParser.cpp
index c1a451a7119d5..e6123e6e74e43 100644
--- a/llvm/lib/MC/MCParser/MasmParser.cpp
+++ b/llvm/lib/MC/MCParser/MasmParser.cpp
@@ -778,6 +778,19 @@ class MasmParser : public MCAsmParser {
   std::optional<std::string> evaluateBuiltinTextMacro(BuiltinSymbol Symbol,
                                                       SMLoc StartLoc);
 
+  // Generic (target and platform independent) directive parsing.
+  enum BuiltinFunction {
+    BI_NO_FUNCTION, // Placeholder
+    BI_CATSTR,
+  };
+
+  /// Maps builtin name --> BuiltinFunction enum, for builtins handled by this
+  /// class.
+  StringMap<BuiltinFunction> BuiltinFunctionMap;
+
+  bool evaluateBuiltinMacroFunction(BuiltinFunction Function, StringRef Name,
+                                    std::string &Res);
+
   // ".ascii", ".asciz", ".string"
   bool parseDirectiveAscii(StringRef IDVal, bool ZeroTerminated);
 
@@ -959,7 +972,7 @@ class MasmParser : public MCAsmParser {
   bool parseDirectiveEcho(SMLoc DirectiveLoc);
 
   void initializeDirectiveKindMap();
-  void initializeBuiltinSymbolMap();
+  void initializeBuiltinSymbolMaps();
 };
 
 } // end anonymous namespace
@@ -999,7 +1012,7 @@ MasmParser::MasmParser(SourceMgr &SM, MCContext &Ctx, MCStreamer &Out,
 
   initializeDirectiveKindMap();
   PlatformParser->Initialize(*this);
-  initializeBuiltinSymbolMap();
+  initializeBuiltinSymbolMaps();
 
   NumOfMacroInstantiations = 0;
 }
@@ -1084,15 +1097,25 @@ bool MasmParser::expandMacros() {
   }
 
   std::optional<std::string> ExpandedValue;
-  auto BuiltinIt = BuiltinSymbolMap.find(IDLower);
-  if (BuiltinIt != BuiltinSymbolMap.end()) {
+
+  if (auto BuiltinIt = BuiltinSymbolMap.find(IDLower);
+      BuiltinIt != BuiltinSymbolMap.end()) {
     ExpandedValue =
         evaluateBuiltinTextMacro(BuiltinIt->getValue(), Tok.getLoc());
-  } else {
-    auto VarIt = Variables.find(IDLower);
-    if (VarIt != Variables.end() && VarIt->getValue().IsText) {
-      ExpandedValue = VarIt->getValue().TextValue;
+  } else if (auto BuiltinFuncIt = BuiltinFunctionMap.find(IDLower);
+             BuiltinFuncIt != BuiltinFunctionMap.end()) {
+    StringRef Name;
+    if (parseIdentifier(Name)) {
+      return true;
+    }
+    std::string Res;
+    if (evaluateBuiltinMacroFunction(BuiltinFuncIt->getValue(), Name, Res)) {
+      return true;
     }
+    ExpandedValue = Res;
+  } else if (auto VarIt = Variables.find(IDLower);
+             VarIt != Variables.end() && VarIt->getValue().IsText) {
+    ExpandedValue = VarIt->getValue().TextValue;
   }
 
   if (!ExpandedValue)
@@ -3197,6 +3220,18 @@ bool MasmParser::parseTextItem(std::string &Data) {
         continue;
       }
 
+      // Try to resolve as a built-in macro function
+      auto BuiltinFuncIt = BuiltinFunctionMap.find(ID.lower());
+      if (BuiltinFuncIt != BuiltinFunctionMap.end()) {
+        Data.clear();
+        if (evaluateBuiltinMacroFunction(BuiltinFuncIt->getValue(), ID, Data)) {
+          return true;
+        }
+        ID = StringRef(Data);
+        Expanded = true;
+        continue;
+      }
+
       // Try to resolve as a variable text macro
       auto VarIt = Variables.find(ID.lower());
       if (VarIt != Variables.end()) {
@@ -6204,7 +6239,7 @@ bool MasmParser::parseMSInlineAsm(
   return false;
 }
 
-void MasmParser::initializeBuiltinSymbolMap() {
+void MasmParser::initializeBuiltinSymbolMaps() {
   // Numeric built-ins (supported in all versions)
   BuiltinSymbolMap["@version"] = BI_VERSION;
   BuiltinSymbolMap["@line"] = BI_LINE;
@@ -6216,6 +6251,9 @@ void MasmParser::initializeBuiltinSymbolMap() {
   BuiltinSymbolMap["@filename"] = BI_FILENAME;
   BuiltinSymbolMap["@curseg"] = BI_CURSEG;
 
+  // Function built-ins (supported in all versions)
+  BuiltinFunctionMap["@catstr"] = BI_CATSTR;
+
   // Some built-ins exist only for MASM32 (32-bit x86)
   if (getContext().getSubtargetInfo()->getTargetTriple().getArch() ==
       Triple::x86) {
@@ -6289,6 +6327,48 @@ MasmParser::evaluateBuiltinTextMacro(BuiltinSymbol Symbol, SMLoc StartLoc) {
   llvm_unreachable("unhandled built-in symbol");
 }
 
+bool MasmParser::evaluateBuiltinMacroFunction(BuiltinFunction Function,
+                                              StringRef Name,
+                                              std::string &Res) {
+  if (parseToken(AsmToken::LParen, "invoking macro function '" + Name +
+                                       "' requires arguments in parentheses")) {
+    return true;
+  }
+
+  MCAsmMacroParameters P;
+  switch (Function) {
+  default:
+    return true;
+  case BI_CATSTR:
+    break;
+  }
+  MCAsmMacro M(Name, "", P, {}, true);
+
+  MCAsmMacroArguments A;
+  if (parseMacroArguments(&M, A, AsmToken::RParen) || parseRParen()) {
+    return true;
+  }
+
+  switch (Function) {
+  default:
+    llvm_unreachable("unhandled built-in function");
+  case BI_CATSTR: {
+    for (const MCAsmMacroArgument &Arg : A) {
+      for (const AsmToken &Tok : Arg) {
+        if (Tok.is(AsmToken::String)) {
+          Res.append(Tok.getStringContents());
+        } else {
+          Res.append(Tok.getString());
+        }
+      }
+    }
+    return false;
+  }
+  }
+  llvm_unreachable("unhandled built-in function");
+  return true;
+}
+
 /// Create an MCAsmParser instance.
 MCAsmParser *llvm::createMCMasmParser(SourceMgr &SM, MCContext &C,
                                       MCStreamer &Out, const MCAsmInfo &MAI,
diff --git a/llvm/test/tools/llvm-ml/macro_function.asm b/llvm/test/tools/llvm-ml/macro_function.asm
index c28d7c8c6222c..475c8b52dce79 100644
--- a/llvm/test/tools/llvm-ml/macro_function.asm
+++ b/llvm/test/tools/llvm-ml/macro_function.asm
@@ -103,14 +103,20 @@ expr_recursive_test PROC
   ret
 expr_recursive_test ENDP
 
+expand_as_directive_test @CatStr(P, RO, C)
+; CHECK-LABEL: expand_as_directive_test:
+
+  ret
+expand_as_directive_test ENDP
+
 custom_strcat MACRO arg1, arg2
   EXITM <arg1&arg2>
 ENDM
 
-expand_as_directive_test custom_strcat(P, ROC)
-; CHECK-LABEL: expand_as_directive_test:
+expand_as_directive_custom_test custom_strcat(P, ROC)
+; CHECK-LABEL: expand_as_directive_custom_test:
 
   ret
-expand_as_directive_test ENDP
+expand_as_directive_custom_test ENDP
 
 end

MASM supports some built-in functions, including @catstr to concatenate arbitrary string arguments.
@MisterDA
Copy link

MisterDA commented Mar 19, 2025

With this patch applied, the following macro crashes the assembler:

        index_a = 2
SubstitutionMacro MACRO field:REQ
        EXITM @CatStr(<[r14+>, %(index_&field), <*8]>)
ENDM

        .CODE
        mov SubstitutionMacro(a), r15
        END

It seems that the + and * operands have disappeared. Maybe there's just an off-by-one error, or a token that's been erroneously consumed.

$ llvm-project/build/bin/llvm-ml -m64 -nologo -c -Fo test.obj test.asm
Included from <instantiation>:2:
<instantiation>:1:1: error: unable to parse text item in 'EXITM' directive
r14%(my_struct_a)8
^
test.asm:7:13: note: while in macro instantiation
        mov SubstitutionMacro(a), r15
            ^
Assertion failed: (EndStatementAtEOFStack.empty()), function Lex, file MasmParser.cpp, line 1193.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /Users/user/llvm-project/build/bin/llvm-ml -m64 -nologo -c -Fo test.obj test.asm
 #0 0x0000000100fca168 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/Users/user/llvm-project/build/bin/llvm-ml+0x1006aa168)
 #1 0x0000000100fc88b0 llvm::sys::RunSignalHandlers() (/Users/user/llvm-project/build/bin/llvm-ml+0x1006a88b0)
 #2 0x0000000100fca8f0 SignalHandler(int, __siginfo*, void*) (/Users/user/llvm-project/build/bin/llvm-ml+0x1006aa8f0)
 #3 0x0000000192b9ade4 (/usr/lib/system/libsystem_platform.dylib+0x180482de4)
 #4 0x0000000192b63f70 (/usr/lib/system/libsystem_pthread.dylib+0x18044bf70)
 #5 0x0000000192a70908 (/usr/lib/system/libsystem_c.dylib+0x180358908)
 #6 0x0000000192a6fc1c (/usr/lib/system/libsystem_c.dylib+0x180357c1c)
 #7 0x0000000100f682a0 (anonymous namespace)::MasmParser::Lex((anonymous namespace)::MasmParser::ExpandKind) (.cold.6) (/Users/user/llvm-project/build/bin/llvm-ml+0x1006482a0)
 #8 0x0000000100f5bd9c (anonymous namespace)::MasmParser::Lex((anonymous namespace)::MasmParser::ExpandKind) (/Users/user/llvm-project/build/bin/llvm-ml+0x10063bd9c)
 #9 0x0000000100ab99a0 (anonymous namespace)::X86AsmParser::parseInstruction(llvm::ParseInstructionInfo&, llvm::StringRef, llvm::SMLoc, llvm::SmallVectorImpl<std::__1::unique_ptr<llvm::MCParsedAsmOperand, std::__1::default_delete<llvm::MCParsedAsmOperand>>>&) (/Users/user/llvm-project/build/bin/llvm-ml+0x1001999a0)
#10 0x0000000100f50a10 (anonymous namespace)::MasmParser::parseStatement((anonymous namespace)::ParseStatementInfo&, llvm::MCAsmParserSemaCallback*) (/Users/user/llvm-project/build/bin/llvm-ml+0x100630a10)
#11 0x0000000100f49c60 (anonymous namespace)::MasmParser::Run(bool, bool) (/Users/user/llvm-project/build/bin/llvm-ml+0x100629c60)
#12 0x0000000100924348 AssembleInput(llvm::StringRef, llvm::Target const*, llvm::SourceMgr&, llvm::MCContext&, llvm::MCStreamer&, llvm::MCAsmInfo&, llvm::MCSubtargetInfo&, llvm::MCInstrInfo&, llvm::MCTargetOptions&, llvm::opt::ArgList const&) (/Users/user/llvm-project/build/bin/llvm-ml+0x100004348)
#13 0x0000000100923484 llvm_ml_main(int, char**, llvm::ToolContext const&) (/Users/user/Tarides/llvm-project/build/bin/llvm-ml+0x100003484)
#14 0x0000000100925d54 main (/Users/user/llvm-project/build/bin/llvm-ml+0x100005d54)
#15 0x00000001927e4274
[1]    81581 abort      ~/llvm-project/build/bin/llvm-ml -m64 -nologo -c -Fo test.obj test.as

ml64 accepts it:

$ ml64 -nologo -Cp -c -Fo test.obj -W3 test.asm
 Assembling: test.asm
$ objdump -D --section='.text$mn' test.obj

test.obj:     file format pe-x86-64


Disassembly of section .text$mn:

0000000000000000 <.text$mn>:
   0:   4d 89 7e 10             mov    %r15,0x10(%r14)

@ericastor
Copy link
Contributor Author

ericastor commented Mar 24, 2025

@MisterDA:

With this patch applied, the following macro crashes the assembler:

...

This is absolutely correct - but turns out to be an existing issue with our handling of angle-bracket strings in macro calls, which you can see if you use a "custom" CatStr implementation before this change goes in. I'm going to fix this in a separate PR, if that's alright.

@ericastor ericastor merged commit 0ab330b into llvm:main Apr 24, 2025
11 checks passed
IanWood1 pushed a commit to IanWood1/llvm-project that referenced this pull request May 6, 2025
…vm#130781)

MASM supports some built-in macro-type functions.

We start our support for these with `@CatStr`, one of the more commonly used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llvm:mc Machine (object) code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants