Skip to content

Conversation

@jcranmer-intel
Copy link
Contributor

@jcranmer-intel jcranmer-intel commented Oct 8, 2025

This fixes issue 162148.

Common symbols are intended to have only a single version of the data present in the final executable. The MSVC linker is able to successfully deduplicate these chunks. If you have an application with a large number of translation units with a large block of common data (this is possible, for example, with Fortran code), then failing to deduplicate these chunks can make the data size so large that the resulting executable fails to load.

The logic in this patch doesn't catch all of the potential cases for deduplication, but it should catch the most common ones.

@llvmbot
Copy link
Member

llvmbot commented Oct 8, 2025

@llvm/pr-subscribers-platform-windows

Author: Joshua Cranmer (jcranmer-intel)

Changes

This fixes issue 162148.

Common symbols are intended to have only a single version of the data present in the final executable. The MSVC linker is able to successfully deduplicate these chunks. If you have an application with a large number of translation units with a large block of common data (this is possible, for example, with Fortran code), then failing to deduplicate these chunks can make the data size so large that the resulting executable fails to load.

The logic in this patch doesn't catch all of the potential cases for deduplication, but it should catch the most common ones.


Full diff: https://github.com/llvm/llvm-project/pull/162553.diff

4 Files Affected:

  • (modified) lld/COFF/Chunks.cpp (+1-1)
  • (modified) lld/COFF/Chunks.h (+2)
  • (modified) lld/COFF/Symbols.h (+2)
  • (modified) lld/COFF/Writer.cpp (+4)
diff --git a/lld/COFF/Chunks.cpp b/lld/COFF/Chunks.cpp
index ff3c89884c24d..d752a5bef7594 100644
--- a/lld/COFF/Chunks.cpp
+++ b/lld/COFF/Chunks.cpp
@@ -773,7 +773,7 @@ uint32_t SectionChunk::getSectionNumber() const {
   return s.getIndex() + 1;
 }
 
-CommonChunk::CommonChunk(const COFFSymbolRef s) : sym(s) {
+CommonChunk::CommonChunk(const COFFSymbolRef s) : active(false), sym(s) {
   // The value of a common symbol is its size. Align all common symbols smaller
   // than 32 bytes naturally, i.e. round the size up to the next power of two.
   // This is what MSVC link.exe does.
diff --git a/lld/COFF/Chunks.h b/lld/COFF/Chunks.h
index 7ba58e336451f..bf05d547f9c88 100644
--- a/lld/COFF/Chunks.h
+++ b/lld/COFF/Chunks.h
@@ -520,6 +520,8 @@ class CommonChunk : public NonSectionChunk {
   uint32_t getOutputCharacteristics() const override;
   StringRef getSectionName() const override { return ".bss"; }
 
+  bool active;
+
 private:
   const COFFSymbolRef sym;
 };
diff --git a/lld/COFF/Symbols.h b/lld/COFF/Symbols.h
index 465d4df52c630..e166329a66bdf 100644
--- a/lld/COFF/Symbols.h
+++ b/lld/COFF/Symbols.h
@@ -233,6 +233,8 @@ class DefinedCommon : public DefinedCOFF {
                 CommonChunk *c = nullptr)
       : DefinedCOFF(DefinedCommonKind, f, n, s), data(c), size(size) {
     this->isExternal = true;
+    if (c)
+      c->active = true;
   }
 
   static bool classof(const Symbol *s) {
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index 3d95d219a493c..e365eb140f52b 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -1093,6 +1093,10 @@ void Writer::createSections() {
         sc->printDiscardedMessage();
       continue;
     }
+    if (auto *cc = dyn_cast<CommonChunk>(c)) {
+      if (!cc->active)
+        continue;
+    }
     StringRef name = c->getSectionName();
     if (shouldStripSectionSuffix(sc, name, ctx.config.mingw))
       name = name.split('$').first;

@llvmbot
Copy link
Member

llvmbot commented Oct 8, 2025

@llvm/pr-subscribers-lld

Author: Joshua Cranmer (jcranmer-intel)

Changes

This fixes issue 162148.

Common symbols are intended to have only a single version of the data present in the final executable. The MSVC linker is able to successfully deduplicate these chunks. If you have an application with a large number of translation units with a large block of common data (this is possible, for example, with Fortran code), then failing to deduplicate these chunks can make the data size so large that the resulting executable fails to load.

The logic in this patch doesn't catch all of the potential cases for deduplication, but it should catch the most common ones.


Full diff: https://github.com/llvm/llvm-project/pull/162553.diff

4 Files Affected:

  • (modified) lld/COFF/Chunks.cpp (+1-1)
  • (modified) lld/COFF/Chunks.h (+2)
  • (modified) lld/COFF/Symbols.h (+2)
  • (modified) lld/COFF/Writer.cpp (+4)
diff --git a/lld/COFF/Chunks.cpp b/lld/COFF/Chunks.cpp
index ff3c89884c24d..d752a5bef7594 100644
--- a/lld/COFF/Chunks.cpp
+++ b/lld/COFF/Chunks.cpp
@@ -773,7 +773,7 @@ uint32_t SectionChunk::getSectionNumber() const {
   return s.getIndex() + 1;
 }
 
-CommonChunk::CommonChunk(const COFFSymbolRef s) : sym(s) {
+CommonChunk::CommonChunk(const COFFSymbolRef s) : active(false), sym(s) {
   // The value of a common symbol is its size. Align all common symbols smaller
   // than 32 bytes naturally, i.e. round the size up to the next power of two.
   // This is what MSVC link.exe does.
diff --git a/lld/COFF/Chunks.h b/lld/COFF/Chunks.h
index 7ba58e336451f..bf05d547f9c88 100644
--- a/lld/COFF/Chunks.h
+++ b/lld/COFF/Chunks.h
@@ -520,6 +520,8 @@ class CommonChunk : public NonSectionChunk {
   uint32_t getOutputCharacteristics() const override;
   StringRef getSectionName() const override { return ".bss"; }
 
+  bool active;
+
 private:
   const COFFSymbolRef sym;
 };
diff --git a/lld/COFF/Symbols.h b/lld/COFF/Symbols.h
index 465d4df52c630..e166329a66bdf 100644
--- a/lld/COFF/Symbols.h
+++ b/lld/COFF/Symbols.h
@@ -233,6 +233,8 @@ class DefinedCommon : public DefinedCOFF {
                 CommonChunk *c = nullptr)
       : DefinedCOFF(DefinedCommonKind, f, n, s), data(c), size(size) {
     this->isExternal = true;
+    if (c)
+      c->active = true;
   }
 
   static bool classof(const Symbol *s) {
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index 3d95d219a493c..e365eb140f52b 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -1093,6 +1093,10 @@ void Writer::createSections() {
         sc->printDiscardedMessage();
       continue;
     }
+    if (auto *cc = dyn_cast<CommonChunk>(c)) {
+      if (!cc->active)
+        continue;
+    }
     StringRef name = c->getSectionName();
     if (shouldStripSectionSuffix(sc, name, ctx.config.mingw))
       name = name.split('$').first;

@llvmbot
Copy link
Member

llvmbot commented Oct 8, 2025

@llvm/pr-subscribers-lld-coff

Author: Joshua Cranmer (jcranmer-intel)

Changes

This fixes issue 162148.

Common symbols are intended to have only a single version of the data present in the final executable. The MSVC linker is able to successfully deduplicate these chunks. If you have an application with a large number of translation units with a large block of common data (this is possible, for example, with Fortran code), then failing to deduplicate these chunks can make the data size so large that the resulting executable fails to load.

The logic in this patch doesn't catch all of the potential cases for deduplication, but it should catch the most common ones.


Full diff: https://github.com/llvm/llvm-project/pull/162553.diff

4 Files Affected:

  • (modified) lld/COFF/Chunks.cpp (+1-1)
  • (modified) lld/COFF/Chunks.h (+2)
  • (modified) lld/COFF/Symbols.h (+2)
  • (modified) lld/COFF/Writer.cpp (+4)
diff --git a/lld/COFF/Chunks.cpp b/lld/COFF/Chunks.cpp
index ff3c89884c24d..d752a5bef7594 100644
--- a/lld/COFF/Chunks.cpp
+++ b/lld/COFF/Chunks.cpp
@@ -773,7 +773,7 @@ uint32_t SectionChunk::getSectionNumber() const {
   return s.getIndex() + 1;
 }
 
-CommonChunk::CommonChunk(const COFFSymbolRef s) : sym(s) {
+CommonChunk::CommonChunk(const COFFSymbolRef s) : active(false), sym(s) {
   // The value of a common symbol is its size. Align all common symbols smaller
   // than 32 bytes naturally, i.e. round the size up to the next power of two.
   // This is what MSVC link.exe does.
diff --git a/lld/COFF/Chunks.h b/lld/COFF/Chunks.h
index 7ba58e336451f..bf05d547f9c88 100644
--- a/lld/COFF/Chunks.h
+++ b/lld/COFF/Chunks.h
@@ -520,6 +520,8 @@ class CommonChunk : public NonSectionChunk {
   uint32_t getOutputCharacteristics() const override;
   StringRef getSectionName() const override { return ".bss"; }
 
+  bool active;
+
 private:
   const COFFSymbolRef sym;
 };
diff --git a/lld/COFF/Symbols.h b/lld/COFF/Symbols.h
index 465d4df52c630..e166329a66bdf 100644
--- a/lld/COFF/Symbols.h
+++ b/lld/COFF/Symbols.h
@@ -233,6 +233,8 @@ class DefinedCommon : public DefinedCOFF {
                 CommonChunk *c = nullptr)
       : DefinedCOFF(DefinedCommonKind, f, n, s), data(c), size(size) {
     this->isExternal = true;
+    if (c)
+      c->active = true;
   }
 
   static bool classof(const Symbol *s) {
diff --git a/lld/COFF/Writer.cpp b/lld/COFF/Writer.cpp
index 3d95d219a493c..e365eb140f52b 100644
--- a/lld/COFF/Writer.cpp
+++ b/lld/COFF/Writer.cpp
@@ -1093,6 +1093,10 @@ void Writer::createSections() {
         sc->printDiscardedMessage();
       continue;
     }
+    if (auto *cc = dyn_cast<CommonChunk>(c)) {
+      if (!cc->active)
+        continue;
+    }
     StringRef name = c->getSectionName();
     if (shouldStripSectionSuffix(sc, name, ctx.config.mingw))
       name = name.split('$').first;

@jcranmer-intel
Copy link
Contributor Author

I apologize for the lack of a test with this change--I'm not exactly sure how to go about writing an LLD test for linking together multiple objects, and would like some guidance on how to do so. (I do have some source test files from #162148, though).

@efriedma-quic
Copy link
Collaborator

I'm not exactly sure how to go about writing an LLD test for linking together multiple objects

If you need multiple input files, you can use split-file, or stick extra files in the Inputs/ directory. There are a lot of examples of both.

This fixes issue 162148.

Common symbols are intended to have only a single version of the data
present in the final executable. The MSVC linker is able to successfully
deduplicate these chunks. If you have an application with a large number
of translation units with a large block of common data (this is
possible, for example, with Fortran code), then failing to deduplicate
these chunks can make the data size so large that the resulting
executable fails to load.

The logic in this patch doesn't catch all of the potential cases for
deduplication, but it should catch the most common ones.
@jcranmer-intel jcranmer-intel force-pushed the lld-common-deduplicate branch from ed74c23 to 543500a Compare October 8, 2025 22:45
@jcranmer-intel
Copy link
Contributor Author

I've included a test based on split-file now. Let me know if you need more comprehensive testing for this change.

@jcranmer-intel
Copy link
Contributor Author

Friendly review ping?

@jcranmer-intel
Copy link
Contributor Author

Friendly review ping for @aganea , @mstorsjo ?
(Sorry, I'm not familiar enough with LLD to know the best people to suggest for review of this code).

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, just a couple very minor comments on the test.

@@ -0,0 +1,38 @@
; REQUIRES: x86
; RUN: split-file %s %t.dir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does split-file automatically make the output directory if it is missing? Wasn't aware of that.

Still, it may be good style to do a preemptive rm -rf %t.dir before this, in case there's incompatible leftover files from an earlier round.

@@ -0,0 +1,38 @@
; REQUIRES: x86
; RUN: split-file %s %t.dir
; RUN: llc %t.dir/t1.ll -o %t.t1.obj --filetype=obj
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DavidSpickett is strongly of the opinion that mixing %t.dir/... and %t.<suffix> is a bad inconsistency, and if using %t.dir we should be using it for everything. Personally, I don't mind the mix.

; RUN: split-file %s %t.dir
; RUN: llc %t.dir/t1.ll -o %t.t1.obj --filetype=obj
; RUN: llc %t.dir/t2.ll -o %t.t2.obj --filetype=obj
; RUN: lld-link %t.t1.obj %t.t2.obj -entry:main -out:%t.exe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You have an accidental double space here, in %t.t2.obj -entry:main.

Copy link
Member

@mstorsjo mstorsjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@jcranmer-intel jcranmer-intel merged commit 0b01b96 into llvm:main Oct 24, 2025
10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 24, 2025

LLVM Buildbot has detected a new failure on builder clang-aarch64-sve-vls running on linaro-g3-01 while building lld at step 7 "ninja check 1".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/143/builds/11950

Here is the relevant piece of the build log for the reference
Step 7 (ninja check 1) failure: stage 1 checked (failure)
******************** TEST 'ORC-aarch64-linux :: TestCases/Linux/aarch64/priority-static-initializer.S' FAILED ********************
Exit Code: 2

Command Output (stderr):
--
/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/./bin/clang    -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta   -c -o /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/runtimes/runtimes-bins/compiler-rt/test/orc/AARCH64LinuxConfig/TestCases/Linux/aarch64/Output/priority-static-initializer.S.tmp /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/llvm/compiler-rt/test/orc/TestCases/Linux/aarch64/priority-static-initializer.S # RUN: at line 4
+ /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/./bin/clang -Wthread-safety -Wthread-safety-reference -Wthread-safety-beta -c -o /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/runtimes/runtimes-bins/compiler-rt/test/orc/AARCH64LinuxConfig/TestCases/Linux/aarch64/Output/priority-static-initializer.S.tmp /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/llvm/compiler-rt/test/orc/TestCases/Linux/aarch64/priority-static-initializer.S
/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/./bin/llvm-jitlink -orc-runtime=/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/./lib/../lib/clang/22/lib/aarch64-unknown-linux-gnu/liborc_rt.a /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/runtimes/runtimes-bins/compiler-rt/test/orc/AARCH64LinuxConfig/TestCases/Linux/aarch64/Output/priority-static-initializer.S.tmp | FileCheck /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/llvm/compiler-rt/test/orc/TestCases/Linux/aarch64/priority-static-initializer.S # RUN: at line 5
+ /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/./bin/llvm-jitlink -orc-runtime=/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/./lib/../lib/clang/22/lib/aarch64-unknown-linux-gnu/liborc_rt.a /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/runtimes/runtimes-bins/compiler-rt/test/orc/AARCH64LinuxConfig/TestCases/Linux/aarch64/Output/priority-static-initializer.S.tmp
+ FileCheck /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/llvm/compiler-rt/test/orc/TestCases/Linux/aarch64/priority-static-initializer.S
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
#0 0x0000adf0abc05ac0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/./bin/llvm-jitlink+0x2ab5ac0)
#1 0x0000adf0abc03554 llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/clang-aarch64-sve-vls/stage1/./bin/llvm-jitlink+0x2ab3554)
#2 0x0000adf0abc0685c SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
#3 0x0000ff2cb0030968 (linux-vdso.so.1+0x968)
#4 0x0000adf0aba05c04 llvm::orc::AsynchronousSymbolQuery::handleComplete(llvm::orc::ExecutionSession&)::RunQueryCompleteTask::run() Core.cpp:0:0
#5 0x0000adf0abadedbc std::thread::_State_impl<std::thread::_Invoker<std::tuple<llvm::orc::DynamicThreadPoolTaskDispatcher::dispatch(std::unique_ptr<llvm::orc::Task, std::default_delete<llvm::orc::Task>>)::$_0>>>::_M_run() TaskDispatch.cpp:0:0
#6 0x0000ff2cafdd29cc (/lib/aarch64-linux-gnu/libstdc++.so.6+0xd29cc)
#7 0x0000ff2cafba0398 start_thread ./nptl/pthread_create.c:442:8
#8 0x0000ff2cafc09e9c ./misc/../sysdeps/unix/sysv/linux/aarch64/clone.S:82:0
FileCheck error: '<stdin>' is empty.
FileCheck command line:  FileCheck /home/tcwg-buildbot/worker/clang-aarch64-sve-vls/llvm/compiler-rt/test/orc/TestCases/Linux/aarch64/priority-static-initializer.S

--

********************


@rnk
Copy link
Collaborator

rnk commented Oct 24, 2025

Thanks for the fix, it's somewhat hard for me to believe that LLD was just keeping common symbols around since forever. I guess most of our usage has been focused on C++ usages, with very few common symbols.

dvbuka pushed a commit to dvbuka/llvm-project that referenced this pull request Oct 27, 2025
…162553)

This fixes [issue
162148](llvm#162148).

Common symbols are intended to have only a single version of the data
present in the final executable. The MSVC linker is able to successfully
deduplicate these chunks. If you have an application with a large number
of translation units with a large block of common data (this is
possible, for example, with Fortran code), then failing to deduplicate
these chunks can make the data size so large that the resulting
executable fails to load.

The logic in this patch doesn't catch all of the potential cases for
deduplication, but it should catch the most common ones.
Lukacma pushed a commit to Lukacma/llvm-project that referenced this pull request Oct 29, 2025
…162553)

This fixes [issue
162148](llvm#162148).

Common symbols are intended to have only a single version of the data
present in the final executable. The MSVC linker is able to successfully
deduplicate these chunks. If you have an application with a large number
of translation units with a large block of common data (this is
possible, for example, with Fortran code), then failing to deduplicate
these chunks can make the data size so large that the resulting
executable fails to load.

The logic in this patch doesn't catch all of the potential cases for
deduplication, but it should catch the most common ones.
aokblast pushed a commit to aokblast/llvm-project that referenced this pull request Oct 30, 2025
…162553)

This fixes [issue
162148](llvm#162148).

Common symbols are intended to have only a single version of the data
present in the final executable. The MSVC linker is able to successfully
deduplicate these chunks. If you have an application with a large number
of translation units with a large block of common data (this is
possible, for example, with Fortran code), then failing to deduplicate
these chunks can make the data size so large that the resulting
executable fails to load.

The logic in this patch doesn't catch all of the potential cases for
deduplication, but it should catch the most common ones.
SquallATF pushed a commit to SquallATF/llvm-project that referenced this pull request Nov 22, 2025
…162553)

This fixes [issue
162148](llvm#162148).

Common symbols are intended to have only a single version of the data
present in the final executable. The MSVC linker is able to successfully
deduplicate these chunks. If you have an application with a large number
of translation units with a large block of common data (this is
possible, for example, with Fortran code), then failing to deduplicate
these chunks can make the data size so large that the resulting
executable fails to load.

The logic in this patch doesn't catch all of the potential cases for
deduplication, but it should catch the most common ones.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[lld][COFF] LLD does not deduplicate common symbols link link.exe does

7 participants