Skip to content

Conversation

ellishg
Copy link
Contributor

@ellishg ellishg commented Sep 29, 2025

Add the flag --tail-merge-strings to enable tail merging of cstrings. For example, if we have strings mystring\0 and ring\0, we could place mystring\0 at address 0x1000 and ring\0 at address 0x1004 and have them share the same underlying data.

It turns out that many ObjC method names can be tail merged. For example, error: and doFoo:error:. On a large iOS binary, we saw nearly a 15% size improvement in the __TEXT__objc_methname section and negligible impact on link time.

$ bloaty --domain=vm merged.o.stripped -- base.o.stripped
     VM SIZE
 --------------
   +95% +5.85Ki    [__TEXT]
  -2.4%  -239Ki    __TEXT,__cstring
 -14.5%  -710Ki    __TEXT,__objc_methname
  -1.0%  -944Ki    TOTAL

Tail merging for MachO was originally removed in 7c269db. The previous implementation used StringTableBuilder, but that was removed in 4308f03 to ensure deduplicated strings are aligned correctly. This implementation ensures that tail merged strings are also aligned correctly.

Special thanks to nocchijiang for pointing this out in #158720 (comment).

Depends on #161253.

@ellishg ellishg requested review from SharonXSharon, int3, kyulee-com and thevinster and removed request for thevinster September 29, 2025 23:11
@ellishg ellishg marked this pull request as ready for review September 29, 2025 23:11
@llvmbot
Copy link
Member

llvmbot commented Sep 29, 2025

@llvm/pr-subscribers-lld-macho

Author: Ellis Hoag (ellishg)

Changes

Tail merge cstrings. For example, if we have strings mystring\0 and ring\0, we could place mystring\0 at address 0x1000 and ring\0 at address 0x1004 and have them share the same underlying data.

It turns out that many ObjC method names can be tail merged. For example, error: and doFoo:error:. On a large iOS binary, we saw nearly a 15% size improvement in the __TEXT__objc_methname section and negligible impact on link time.

$ bloaty --domain=vm merged.o.stripped -- base.o.stripped
     VM SIZE
 --------------
   +95% +5.85Ki    [__TEXT]
  -2.4%  -239Ki    __TEXT,__cstring
 -14.5%  -710Ki    __TEXT,__objc_methname
  -1.0%  -944Ki    TOTAL

Tail merging for MachO was originally removed in 7c269db. The previous implementation used StringTableBuilder, but that was removed in 4308f03 to ensure deduplicated strings are aligned correctly. This implementation ensures that tail merged strings are also aligned correctly.

Special thanks to nocchijiang for pointing this out in #158720 (comment).

Depends on #161253.


Full diff: https://github.com/llvm/llvm-project/pull/161262.diff

4 Files Affected:

  • (modified) lld/MachO/SyntheticSections.cpp (+58-1)
  • (modified) lld/docs/ReleaseNotes.rst (+3)
  • (modified) lld/test/MachO/cstring-dedup.s (+1-2)
  • (added) lld/test/MachO/cstring-tailmerge.s (+85)
diff --git a/lld/MachO/SyntheticSections.cpp b/lld/MachO/SyntheticSections.cpp
index 903ba78a27c75..edb90fe7fcbcc 100644
--- a/lld/MachO/SyntheticSections.cpp
+++ b/lld/MachO/SyntheticSections.cpp
@@ -1746,6 +1746,7 @@ void CStringSection::finalizeContents() {
 void DeduplicatedCStringSection::finalizeContents() {
   // Find the largest alignment required for each string.
   DenseMap<CachedHashStringRef, Align> strToAlignment;
+  std::vector<CachedHashStringRef> deduplicatedStrs;
   for (const CStringInputSection *isec : inputs) {
     for (const auto &[i, piece] : llvm::enumerate(isec->pieces)) {
       if (!piece.live)
@@ -1754,17 +1755,57 @@ void DeduplicatedCStringSection::finalizeContents() {
       assert(isec->align != 0);
       auto align = getStringPieceAlignment(isec, piece);
       auto [it, wasInserted] = strToAlignment.try_emplace(s, align);
+      if (wasInserted)
+        deduplicatedStrs.push_back(s);
       if (!wasInserted && it->second < align)
         it->second = align;
     }
   }
 
+  // Like lexigraphical sort, except we read strings in reverse and take the
+  // longest string first
+  // TODO: We could improve performance by implementing our own sort that avoids
+  // comparing characters we know to be the same. See
+  // StringTableBuilder::multikeySort() for details
+  llvm::sort(deduplicatedStrs, [](const auto &left, const auto &right) {
+    for (const auto &[leftChar, rightChar] :
+         llvm::zip(llvm::reverse(left.val()), llvm::reverse(right.val()))) {
+      if (leftChar == rightChar)
+        continue;
+      return leftChar < rightChar;
+    }
+    return left.size() > right.size();
+  });
+  std::optional<CachedHashStringRef> mergeCandidate;
+  DenseMap<CachedHashStringRef, std::pair<CachedHashStringRef, uint64_t>>
+      tailMergeMap;
+  for (auto &s : deduplicatedStrs) {
+    if (!mergeCandidate || !mergeCandidate->val().ends_with(s.val())) {
+      mergeCandidate = s;
+      continue;
+    }
+    uint64_t tailOffset = mergeCandidate->size() - s.size();
+    // TODO: If the tail offset is incompatible with this string's alignment, we
+    // might be able to find another superstring with a compatible tail offset.
+    // The difficulty is how to do this efficiently
+    const auto &align = strToAlignment.at(s);
+    if (!isAligned(align, tailOffset))
+      continue;
+    auto &mergeCandidateAlign = strToAlignment[*mergeCandidate];
+    if (align > mergeCandidateAlign)
+      mergeCandidateAlign = align;
+    tailMergeMap.try_emplace(s, *mergeCandidate, tailOffset);
+  }
+
   // Sort the strings for performance and compression size win, and then
   // assign an offset for each string and save it to the corresponding
   // StringPieces for easy access.
   for (auto &[isec, i] : priorityBuilder.buildCStringPriorities(inputs)) {
     auto &piece = isec->pieces[i];
     auto s = isec->getCachedHashStringRef(i);
+    // Skip tail merged strings until their superstring offsets are resolved
+    if (tailMergeMap.count(s))
+      continue;
     auto [it, wasInserted] = stringOffsetMap.try_emplace(s, /*placeholder*/ 0);
     if (wasInserted) {
       // Avoid computing the offset until we are sure we will need to
@@ -1776,8 +1817,24 @@ void DeduplicatedCStringSection::finalizeContents() {
     // only need to assign the offset.
     piece.outSecOff = it->second;
   }
-  for (CStringInputSection *isec : inputs)
+  for (CStringInputSection *isec : inputs) {
+    for (const auto &[i, piece] : llvm::enumerate(isec->pieces)) {
+      if (!piece.live)
+        continue;
+      auto s = isec->getCachedHashStringRef(i);
+      auto it = tailMergeMap.find(s);
+      if (it == tailMergeMap.end())
+        continue;
+      const auto &[superString, tailOffset] = it->second;
+      assert(superString.val().ends_with(s.val()));
+      assert(!tailMergeMap.count(superString));
+      auto &outSecOff = stringOffsetMap[s];
+      outSecOff = stringOffsetMap.at(superString) + tailOffset;
+      piece.outSecOff = outSecOff;
+      assert(isAligned(strToAlignment.at(s), piece.outSecOff));
+    }
     isec->isFinal = true;
+  }
 }
 
 void DeduplicatedCStringSection::writeTo(uint8_t *buf) const {
diff --git a/lld/docs/ReleaseNotes.rst b/lld/docs/ReleaseNotes.rst
index 6ea1ea0fd6c2f..fa2247d64b690 100644
--- a/lld/docs/ReleaseNotes.rst
+++ b/lld/docs/ReleaseNotes.rst
@@ -44,6 +44,9 @@ MinGW Improvements
 MachO Improvements
 ------------------
 
+* cstrings sections are now tail merged
+  (`#161262 <https://github.com/llvm/llvm-project/pull/161262>`_)
+
 WebAssembly Improvements
 ------------------------
 
diff --git a/lld/test/MachO/cstring-dedup.s b/lld/test/MachO/cstring-dedup.s
index a4b15f26afff0..0a42b3d6fcff3 100644
--- a/lld/test/MachO/cstring-dedup.s
+++ b/lld/test/MachO/cstring-dedup.s
@@ -8,11 +8,10 @@
 # RUN: llvm-objdump --macho --section="__DATA,ptrs" --syms %t/test | FileCheck %s
 # RUN: llvm-readobj --section-headers %t/test | FileCheck %s --check-prefix=HEADER
 
-## Make sure we only have 3 deduplicated strings in __cstring.
+## Make sure we only have 2 deduplicated strings in __cstring.
 # STR: Contents of (__TEXT,__cstring) section
 # STR: {{[[:xdigit:]]+}} foo
 # STR: {{[[:xdigit:]]+}} barbaz
-# STR: {{[[:xdigit:]]+}} {{$}}
 
 ## Make sure both symbol and section relocations point to the right thing.
 # CHECK:      Contents of (__DATA,ptrs) section
diff --git a/lld/test/MachO/cstring-tailmerge.s b/lld/test/MachO/cstring-tailmerge.s
new file mode 100644
index 0000000000000..83d2810a78139
--- /dev/null
+++ b/lld/test/MachO/cstring-tailmerge.s
@@ -0,0 +1,85 @@
+# REQUIRES: x86
+# RUN: rm -rf %t; split-file %s %t
+
+# RUN: sed "s/<ALIGN>/0/g" %t/align.s.template > %t/align-1.s
+# RUN: sed "s/<ALIGN>/1/g" %t/align.s.template > %t/align-2.s
+# RUN: sed "s/<ALIGN>/2/g" %t/align.s.template > %t/align-4.s
+
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/first.s -o %t/first.o
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-1.s -o %t/align-1.o
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-2.s -o %t/align-2.o
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-4.s -o %t/align-4.o
+
+# RUN: %lld -dylib --deduplicate-strings %t/first.o %t/align-1.o -o %t/align-1
+# RUN: llvm-objdump --macho --section="__TEXT,__cstring" --syms %t/align-1 | FileCheck %s --check-prefixes=CHECK,ALIGN1
+
+# RUN: %lld -dylib --deduplicate-strings %t/first.o %t/align-2.o -o %t/align-2
+# RUN: llvm-objdump --macho --section="__TEXT,__cstring" --syms %t/align-2 | FileCheck %s --check-prefixes=CHECK,ALIGN2
+
+# RUN: %lld -dylib --deduplicate-strings %t/first.o %t/align-4.o -o %t/align-4
+# RUN: llvm-objdump --macho --section="__TEXT,__cstring" --syms %t/align-4 | FileCheck %s --check-prefixes=CHECK,ALIGN4
+
+# CHECK: Contents of (__TEXT,__cstring) section
+# CHECK: [[#%.16x,START:]] get awkward offset{{$}}
+
+# ALIGN1: [[#%.16x,START+19]] myotherlongstr{{$}}
+# ALIGN1: [[#%.16x,START+19+15]] otherstr{{$}}
+
+# ALIGN2: [[#%.16x,START+20]] myotherlongstr{{$}}
+# ALIGN2: [[#%.16x,START+20+16]] longstr{{$}}
+# ALIGN2: [[#%.16x,START+20+16+8]] otherstr{{$}}
+# ALIGN2: [[#%.16x,START+20+16+8+10]] str{{$}}
+
+# ALIGN4: [[#%.16x,START+20]] myotherlongstr{{$}}
+# ALIGN4: [[#%.16x,START+20+16]] otherlongstr{{$}}
+# ALIGN4: [[#%.16x,START+20+16+16]] longstr{{$}}
+# ALIGN4: [[#%.16x,START+20+16+16+8]] otherstr{{$}}
+# ALIGN4: [[#%.16x,START+20+16+16+8+12]] str{{$}}
+
+# CHECK: SYMBOL TABLE:
+
+# ALIGN1: [[#%.16x,START+19]] l     O __TEXT,__cstring _myotherlongstr
+# ALIGN1: [[#%.16x,START+21]] l     O __TEXT,__cstring _otherlongstr
+# ALIGN1: [[#%.16x,START+26]] l     O __TEXT,__cstring _longstr
+# ALIGN1: [[#%.16x,START+34]] l     O __TEXT,__cstring _otherstr
+# ALIGN1: [[#%.16x,START+39]] l     O __TEXT,__cstring _str
+
+# ALIGN2: [[#%.16x,START+20]] l     O __TEXT,__cstring _myotherlongstr
+# ALIGN2: [[#%.16x,START+20+2]] l     O __TEXT,__cstring _otherlongstr
+# ALIGN2: [[#%.16x,START+20+16]] l     O __TEXT,__cstring _longstr
+# ALIGN2: [[#%.16x,START+20+16+8]] l     O __TEXT,__cstring _otherstr
+# ALIGN2: [[#%.16x,START+20+16+8+10]] l     O __TEXT,__cstring _str
+
+# ALIGN4: [[#%.16x,START+20]] l     O __TEXT,__cstring _myotherlongstr
+# ALIGN4: [[#%.16x,START+20+16]] l     O __TEXT,__cstring _otherlongstr
+# ALIGN4: [[#%.16x,START+20+16+16]] l     O __TEXT,__cstring _longstr
+# ALIGN4: [[#%.16x,START+20+16+16+8]] l     O __TEXT,__cstring _otherstr
+# ALIGN4: [[#%.16x,START+20+16+16+8+12]] l     O __TEXT,__cstring _str
+
+#--- first.s
+.cstring
+.p2align 2
+.asciz "get awkward offset"  # length = 19
+
+#--- align.s.template
+.cstring
+
+.p2align <ALIGN>
+  _myotherlongstr:
+.asciz "myotherlongstr"      # length = 15
+
+.p2align <ALIGN>
+  _otherlongstr:
+.asciz   "otherlongstr"      # length = 13, tail offset = 2
+
+.p2align <ALIGN>
+  _longstr:
+.asciz        "longstr"      # length = 8, tail offset = 7
+
+.p2align <ALIGN>
+  _otherstr:
+.asciz       "otherstr"      # length = 9
+
+.p2align <ALIGN>
+  _str:
+.asciz            "str"      # length = 4, tail offset = 5

@llvmbot
Copy link
Member

llvmbot commented Sep 29, 2025

@llvm/pr-subscribers-lld

Author: Ellis Hoag (ellishg)

Changes

Tail merge cstrings. For example, if we have strings mystring\0 and ring\0, we could place mystring\0 at address 0x1000 and ring\0 at address 0x1004 and have them share the same underlying data.

It turns out that many ObjC method names can be tail merged. For example, error: and doFoo:error:. On a large iOS binary, we saw nearly a 15% size improvement in the __TEXT__objc_methname section and negligible impact on link time.

$ bloaty --domain=vm merged.o.stripped -- base.o.stripped
     VM SIZE
 --------------
   +95% +5.85Ki    [__TEXT]
  -2.4%  -239Ki    __TEXT,__cstring
 -14.5%  -710Ki    __TEXT,__objc_methname
  -1.0%  -944Ki    TOTAL

Tail merging for MachO was originally removed in 7c269db. The previous implementation used StringTableBuilder, but that was removed in 4308f03 to ensure deduplicated strings are aligned correctly. This implementation ensures that tail merged strings are also aligned correctly.

Special thanks to nocchijiang for pointing this out in #158720 (comment).

Depends on #161253.


Full diff: https://github.com/llvm/llvm-project/pull/161262.diff

4 Files Affected:

  • (modified) lld/MachO/SyntheticSections.cpp (+58-1)
  • (modified) lld/docs/ReleaseNotes.rst (+3)
  • (modified) lld/test/MachO/cstring-dedup.s (+1-2)
  • (added) lld/test/MachO/cstring-tailmerge.s (+85)
diff --git a/lld/MachO/SyntheticSections.cpp b/lld/MachO/SyntheticSections.cpp
index 903ba78a27c75..edb90fe7fcbcc 100644
--- a/lld/MachO/SyntheticSections.cpp
+++ b/lld/MachO/SyntheticSections.cpp
@@ -1746,6 +1746,7 @@ void CStringSection::finalizeContents() {
 void DeduplicatedCStringSection::finalizeContents() {
   // Find the largest alignment required for each string.
   DenseMap<CachedHashStringRef, Align> strToAlignment;
+  std::vector<CachedHashStringRef> deduplicatedStrs;
   for (const CStringInputSection *isec : inputs) {
     for (const auto &[i, piece] : llvm::enumerate(isec->pieces)) {
       if (!piece.live)
@@ -1754,17 +1755,57 @@ void DeduplicatedCStringSection::finalizeContents() {
       assert(isec->align != 0);
       auto align = getStringPieceAlignment(isec, piece);
       auto [it, wasInserted] = strToAlignment.try_emplace(s, align);
+      if (wasInserted)
+        deduplicatedStrs.push_back(s);
       if (!wasInserted && it->second < align)
         it->second = align;
     }
   }
 
+  // Like lexigraphical sort, except we read strings in reverse and take the
+  // longest string first
+  // TODO: We could improve performance by implementing our own sort that avoids
+  // comparing characters we know to be the same. See
+  // StringTableBuilder::multikeySort() for details
+  llvm::sort(deduplicatedStrs, [](const auto &left, const auto &right) {
+    for (const auto &[leftChar, rightChar] :
+         llvm::zip(llvm::reverse(left.val()), llvm::reverse(right.val()))) {
+      if (leftChar == rightChar)
+        continue;
+      return leftChar < rightChar;
+    }
+    return left.size() > right.size();
+  });
+  std::optional<CachedHashStringRef> mergeCandidate;
+  DenseMap<CachedHashStringRef, std::pair<CachedHashStringRef, uint64_t>>
+      tailMergeMap;
+  for (auto &s : deduplicatedStrs) {
+    if (!mergeCandidate || !mergeCandidate->val().ends_with(s.val())) {
+      mergeCandidate = s;
+      continue;
+    }
+    uint64_t tailOffset = mergeCandidate->size() - s.size();
+    // TODO: If the tail offset is incompatible with this string's alignment, we
+    // might be able to find another superstring with a compatible tail offset.
+    // The difficulty is how to do this efficiently
+    const auto &align = strToAlignment.at(s);
+    if (!isAligned(align, tailOffset))
+      continue;
+    auto &mergeCandidateAlign = strToAlignment[*mergeCandidate];
+    if (align > mergeCandidateAlign)
+      mergeCandidateAlign = align;
+    tailMergeMap.try_emplace(s, *mergeCandidate, tailOffset);
+  }
+
   // Sort the strings for performance and compression size win, and then
   // assign an offset for each string and save it to the corresponding
   // StringPieces for easy access.
   for (auto &[isec, i] : priorityBuilder.buildCStringPriorities(inputs)) {
     auto &piece = isec->pieces[i];
     auto s = isec->getCachedHashStringRef(i);
+    // Skip tail merged strings until their superstring offsets are resolved
+    if (tailMergeMap.count(s))
+      continue;
     auto [it, wasInserted] = stringOffsetMap.try_emplace(s, /*placeholder*/ 0);
     if (wasInserted) {
       // Avoid computing the offset until we are sure we will need to
@@ -1776,8 +1817,24 @@ void DeduplicatedCStringSection::finalizeContents() {
     // only need to assign the offset.
     piece.outSecOff = it->second;
   }
-  for (CStringInputSection *isec : inputs)
+  for (CStringInputSection *isec : inputs) {
+    for (const auto &[i, piece] : llvm::enumerate(isec->pieces)) {
+      if (!piece.live)
+        continue;
+      auto s = isec->getCachedHashStringRef(i);
+      auto it = tailMergeMap.find(s);
+      if (it == tailMergeMap.end())
+        continue;
+      const auto &[superString, tailOffset] = it->second;
+      assert(superString.val().ends_with(s.val()));
+      assert(!tailMergeMap.count(superString));
+      auto &outSecOff = stringOffsetMap[s];
+      outSecOff = stringOffsetMap.at(superString) + tailOffset;
+      piece.outSecOff = outSecOff;
+      assert(isAligned(strToAlignment.at(s), piece.outSecOff));
+    }
     isec->isFinal = true;
+  }
 }
 
 void DeduplicatedCStringSection::writeTo(uint8_t *buf) const {
diff --git a/lld/docs/ReleaseNotes.rst b/lld/docs/ReleaseNotes.rst
index 6ea1ea0fd6c2f..fa2247d64b690 100644
--- a/lld/docs/ReleaseNotes.rst
+++ b/lld/docs/ReleaseNotes.rst
@@ -44,6 +44,9 @@ MinGW Improvements
 MachO Improvements
 ------------------
 
+* cstrings sections are now tail merged
+  (`#161262 <https://github.com/llvm/llvm-project/pull/161262>`_)
+
 WebAssembly Improvements
 ------------------------
 
diff --git a/lld/test/MachO/cstring-dedup.s b/lld/test/MachO/cstring-dedup.s
index a4b15f26afff0..0a42b3d6fcff3 100644
--- a/lld/test/MachO/cstring-dedup.s
+++ b/lld/test/MachO/cstring-dedup.s
@@ -8,11 +8,10 @@
 # RUN: llvm-objdump --macho --section="__DATA,ptrs" --syms %t/test | FileCheck %s
 # RUN: llvm-readobj --section-headers %t/test | FileCheck %s --check-prefix=HEADER
 
-## Make sure we only have 3 deduplicated strings in __cstring.
+## Make sure we only have 2 deduplicated strings in __cstring.
 # STR: Contents of (__TEXT,__cstring) section
 # STR: {{[[:xdigit:]]+}} foo
 # STR: {{[[:xdigit:]]+}} barbaz
-# STR: {{[[:xdigit:]]+}} {{$}}
 
 ## Make sure both symbol and section relocations point to the right thing.
 # CHECK:      Contents of (__DATA,ptrs) section
diff --git a/lld/test/MachO/cstring-tailmerge.s b/lld/test/MachO/cstring-tailmerge.s
new file mode 100644
index 0000000000000..83d2810a78139
--- /dev/null
+++ b/lld/test/MachO/cstring-tailmerge.s
@@ -0,0 +1,85 @@
+# REQUIRES: x86
+# RUN: rm -rf %t; split-file %s %t
+
+# RUN: sed "s/<ALIGN>/0/g" %t/align.s.template > %t/align-1.s
+# RUN: sed "s/<ALIGN>/1/g" %t/align.s.template > %t/align-2.s
+# RUN: sed "s/<ALIGN>/2/g" %t/align.s.template > %t/align-4.s
+
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/first.s -o %t/first.o
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-1.s -o %t/align-1.o
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-2.s -o %t/align-2.o
+# RUN: llvm-mc -filetype=obj -triple=x86_64-apple-darwin %t/align-4.s -o %t/align-4.o
+
+# RUN: %lld -dylib --deduplicate-strings %t/first.o %t/align-1.o -o %t/align-1
+# RUN: llvm-objdump --macho --section="__TEXT,__cstring" --syms %t/align-1 | FileCheck %s --check-prefixes=CHECK,ALIGN1
+
+# RUN: %lld -dylib --deduplicate-strings %t/first.o %t/align-2.o -o %t/align-2
+# RUN: llvm-objdump --macho --section="__TEXT,__cstring" --syms %t/align-2 | FileCheck %s --check-prefixes=CHECK,ALIGN2
+
+# RUN: %lld -dylib --deduplicate-strings %t/first.o %t/align-4.o -o %t/align-4
+# RUN: llvm-objdump --macho --section="__TEXT,__cstring" --syms %t/align-4 | FileCheck %s --check-prefixes=CHECK,ALIGN4
+
+# CHECK: Contents of (__TEXT,__cstring) section
+# CHECK: [[#%.16x,START:]] get awkward offset{{$}}
+
+# ALIGN1: [[#%.16x,START+19]] myotherlongstr{{$}}
+# ALIGN1: [[#%.16x,START+19+15]] otherstr{{$}}
+
+# ALIGN2: [[#%.16x,START+20]] myotherlongstr{{$}}
+# ALIGN2: [[#%.16x,START+20+16]] longstr{{$}}
+# ALIGN2: [[#%.16x,START+20+16+8]] otherstr{{$}}
+# ALIGN2: [[#%.16x,START+20+16+8+10]] str{{$}}
+
+# ALIGN4: [[#%.16x,START+20]] myotherlongstr{{$}}
+# ALIGN4: [[#%.16x,START+20+16]] otherlongstr{{$}}
+# ALIGN4: [[#%.16x,START+20+16+16]] longstr{{$}}
+# ALIGN4: [[#%.16x,START+20+16+16+8]] otherstr{{$}}
+# ALIGN4: [[#%.16x,START+20+16+16+8+12]] str{{$}}
+
+# CHECK: SYMBOL TABLE:
+
+# ALIGN1: [[#%.16x,START+19]] l     O __TEXT,__cstring _myotherlongstr
+# ALIGN1: [[#%.16x,START+21]] l     O __TEXT,__cstring _otherlongstr
+# ALIGN1: [[#%.16x,START+26]] l     O __TEXT,__cstring _longstr
+# ALIGN1: [[#%.16x,START+34]] l     O __TEXT,__cstring _otherstr
+# ALIGN1: [[#%.16x,START+39]] l     O __TEXT,__cstring _str
+
+# ALIGN2: [[#%.16x,START+20]] l     O __TEXT,__cstring _myotherlongstr
+# ALIGN2: [[#%.16x,START+20+2]] l     O __TEXT,__cstring _otherlongstr
+# ALIGN2: [[#%.16x,START+20+16]] l     O __TEXT,__cstring _longstr
+# ALIGN2: [[#%.16x,START+20+16+8]] l     O __TEXT,__cstring _otherstr
+# ALIGN2: [[#%.16x,START+20+16+8+10]] l     O __TEXT,__cstring _str
+
+# ALIGN4: [[#%.16x,START+20]] l     O __TEXT,__cstring _myotherlongstr
+# ALIGN4: [[#%.16x,START+20+16]] l     O __TEXT,__cstring _otherlongstr
+# ALIGN4: [[#%.16x,START+20+16+16]] l     O __TEXT,__cstring _longstr
+# ALIGN4: [[#%.16x,START+20+16+16+8]] l     O __TEXT,__cstring _otherstr
+# ALIGN4: [[#%.16x,START+20+16+16+8+12]] l     O __TEXT,__cstring _str
+
+#--- first.s
+.cstring
+.p2align 2
+.asciz "get awkward offset"  # length = 19
+
+#--- align.s.template
+.cstring
+
+.p2align <ALIGN>
+  _myotherlongstr:
+.asciz "myotherlongstr"      # length = 15
+
+.p2align <ALIGN>
+  _otherlongstr:
+.asciz   "otherlongstr"      # length = 13, tail offset = 2
+
+.p2align <ALIGN>
+  _longstr:
+.asciz        "longstr"      # length = 8, tail offset = 7
+
+.p2align <ALIGN>
+  _otherstr:
+.asciz       "otherstr"      # length = 9
+
+.p2align <ALIGN>
+  _str:
+.asciz            "str"      # length = 4, tail offset = 5

@ellishg ellishg force-pushed the users/ellishg/lld-align branch from 4e0bde2 to 72bc6a8 Compare September 30, 2025 16:49
Base automatically changed from users/ellishg/lld-align to main September 30, 2025 16:57
Copy link
Contributor

@kyulee-com kyulee-com left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work!
Overall it looks good to me

for (CStringInputSection *isec : inputs)
for (CStringInputSection *isec : inputs) {
for (const auto &[i, piece] : llvm::enumerate(isec->pieces)) {
if (!piece.live)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we've already skipped this for tailMergeMap in the prior loop, we can drop this check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. If the string is in tailMergeMap then it will be live. However, we can skip the lookup if the string is dead, so this is more for performance than correctness.

piece.outSecOff = it->second;
}
for (CStringInputSection *isec : inputs)
for (CStringInputSection *isec : inputs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comment on this loop? It appears the previous loop layouts the substrings while the loop resolves the substring offsets.

// might be able to find another superstring with a compatible tail offset.
// The difficulty is how to do this efficiently
const auto &align = strToAlignment.at(s);
if (!isAligned(align, tailOffset))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit confused initially. Because the alignment of superstring will be promoted by the following code, here we can just check if the tailOffset is aligned and bail-out the failing case early.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isAligned(align, tailOffset) check assumes that the superstring is laid out at address zero. If this check fails, it's impossible to tail merge with this candidate correctly.

But if the check passes, then we also need to make sure the superstring does not have a weaker alignment than the substring.

Suppose the superstring is at address 0x1004 and the substring could tail merge at address 0x1004 + 8, but requires an alignment of 8. That would break alignment, so instead we need to force the superstring to have alignment 8 so it is laid out at address 0x1008.

@ellishg
Copy link
Contributor Author

ellishg commented Oct 1, 2025

Thanks for the accept! I think I will guard this behind a flag. One reason is for safety, but it also impacts the layout order which could impact performance. The order of the tail merged strings will effectively be ignored, and I want to think about how I could mitigate that.

@ellishg
Copy link
Contributor Author

ellishg commented Oct 1, 2025

I've added the flag --tail-merge-strings. Once I've verified this is safe and doesn't regress performance, I'd like to eventually turn this on by default. Although we may need an RFC for that.

@ellishg ellishg enabled auto-merge (squash) October 3, 2025 16:31
@ellishg ellishg merged commit d0e9890 into main Oct 3, 2025
8 of 9 checks passed
@ellishg ellishg deleted the users/ellishg/lld-macho-tail-merge branch October 3, 2025 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants