Skip to content

Conversation

imciner2
Copy link
Member

This is the toolchain part of the implementation of the build-id. This should now make the linker build in a build-id to the libraries/executables, and use one that also allows reproducibility.

ref: JuliaPackaging/Yggdrasil#11013, JuliaPackaging/BinaryBuilder.jl#1272

@imciner2
Copy link
Member Author

Forgot that build-id is an ELF only thing, so we can only add it on Linux and FreeBSD.

@giordano
Copy link
Member

Can we have some tests?

@giordano
Copy link
Member

Side note, I just realised we may not have reproducibility tests in this repo. We have some in BinaryBuilder: https://github.com/search?q=repo%3AJuliaPackaging%2FBinaryBuilder.jl+reproducibility+language%3AJulia+path%3A%2F%5Etest%5C%2F%2F&type=code

@imciner2
Copy link
Member Author

I've blind-coded a test for this based on what I think the output should be. Let's see if that passes. it's a bit awkward because we don't have the actual elf to query here, so I had to do a readelf inside the script and just parse the output.

Co-authored-by: Mosè Giordano <765740+giordano@users.noreply.github.com>
@giordano
Copy link
Member

Can ObjectFile.jl read the notes?

@imciner2
Copy link
Member Author

Can ObjectFile.jl read the notes?

It can, but I didn't see any place where we can get the actual compiled file out in these tests already (or at least I didn't see any tests doing it), and we didn't have ObjectFile.jl in the deps already, so I didn't dive deeper into trying to make that work.

Using ObjectFile.jl, this is pseudo-code to test if the build-id section exists (and can also form the basis for an Audit pass in BinaryBuilder.jl):

l = open(<lib>)
obj_handles = readmeta(l)
obj = first(obj_handles)
sec = ELFSections(obj)

# This will be true if there is no section
findfirst(s -> section_name(s) == ".note.gnu.build-id", sec) === nothing

@imciner2
Copy link
Member Author

And I guess readelf in the sandbox is old enough it doesn't decode the name of the section, so we have to match on the description NT_GNU_BUILD_ID instead (which was in the output).

Copy link
Member

@giordano giordano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. @vtjnash do you want to have a look?

@imciner2
Copy link
Member Author

It's not quite as easy on Windows. Apparently, the support for build-id was added in binutils 2.25, which means we can only support the build-id on Windows for GCC >= 5.2.0 and not the default GCC 4.8.5 we use.

@giordano
Copy link
Member

You should be able to access gcc_version to conditionally set that, right?

@imciner2
Copy link
Member Author

Yep, that variable should be accessible. I just went down the rabbit hole of also adding a test for the Windows COFF format including the build-id info.

@imciner2
Copy link
Member Author

Ok, after going down yet another rabbit hole to get the tests for the build-id running on the Windows binaries, this should be good to go now. We will add a reproducible build-id to all platforms other than macOS automatically now (but I think I read somewhere macOS compilers provided a build-id already, so that shouldn't be a problem).

@giordano giordano added runner 🏃‍♀️ enhancement New feature or request labels Jul 30, 2025
@giordano giordano merged commit 0c9860c into master Aug 5, 2025
8 checks passed
@giordano giordano deleted the im/buildid branch August 5, 2025 15:13
@imciner2
Copy link
Member Author

imciner2 commented Aug 5, 2025

So I was just thinking... I don't know if this would have covered the Rust, Go or OCaml toolchain binaries. I'll have to look at it a bit later to see if those will also get the build-id from this, or if we will have to inject the linker flag separately into their toolchains.

@giordano
Copy link
Member

@imciner2
Copy link
Member Author

Are you comparing reproducibility between two binaries compiled with this option or against a static hash computed earlier? It is expected that there would be a one-time change in binary hashes because of this, because it is adding a new entry into the headers of the binaries that wasn't there before.

@giordano
Copy link
Member

I'm updating the hashes in JuliaPackaging/BinaryBuilder.jl#1401. Compiling twice with the new option you get two different binaries, but apparently only on Windows.

@vtjnash
Copy link
Member

vtjnash commented Aug 13, 2025

On Windows I'm told that this should appear to be a copy of the CodeView GUID value already in the binary

@giordano
Copy link
Member

Reproducer:

julia> BinaryBuilderBase.runshell(Platform("i686", "windows"); preferred_gcc_version=v"6");
sandbox:${WORKSPACE} # echo 'int foo(){ return 42; }' | SUPER_VERBOSE=1 cc -x c -shared - -o libfoo.${dlext} -Wl,--out-implib,libfoo.${dlext}.a
ccache /opt/i686-w64-mingw32/bin/i686-w64-mingw32-gcc -D_GLIBCXX_USE_CXX11_ABI=1 -frandom-seed=0x8ac3fab9 -DWINVER=0x0A00 -D_WIN32_WINNT=0x0A00 -march=pentium4 -mtune=generic -x c -shared - -o libfoo.dll -Wl,--out-implib,libfoo.dll.a -Wl,--no-insert-timestamp -Wl,--build-id=sha1
sandbox:${WORKSPACE} # sha256sum libfoo.dll*
edfd3305aec90e38954c80ef2d3fea1d9a966abf16602a0d157c7cff5c7cfbc1  libfoo.dll
c20ac05a44844b9d3180ac2ba1e206e9a11e247bc6617f4a58cd5ee216991d5c  libfoo.dll.a
sandbox:${WORKSPACE} # echo 'int foo(){ return 42; }' | SUPER_VERBOSE=1 cc -x c -shared - -o libfoo.${dlext} -Wl,--out-implib,libfoo.${dlext}.a
ccache /opt/i686-w64-mingw32/bin/i686-w64-mingw32-gcc -D_GLIBCXX_USE_CXX11_ABI=1 -frandom-seed=0x8ac3fab9 -DWINVER=0x0A00 -D_WIN32_WINNT=0x0A00 -march=pentium4 -mtune=generic -x c -shared - -o libfoo.dll -Wl,--out-implib,libfoo.dll.a -Wl,--no-insert-timestamp -Wl,--build-id=sha1
sandbox:${WORKSPACE} # sha256sum libfoo.dll*
19a31cc606dc96cf7aafb8b2a6929a6cd810c9584f7936a36410266c1e4cac14  libfoo.dll
7eb234fd60bd6d17ef6b5753819d6aac7c5a7762ba690d74284eeffab006e145  libfoo.dll.a

I tried to compare the two dlls with https://try.diffoscope.org, but it crashed with an OOM. However this may be specific to the i686 toolchain, I believe the x86_64 one is still reproducible.

@imciner2
Copy link
Member Author

On Windows I'm told that this should appear to be a copy of the CodeView GUID value already in the binary

Do you know if that is implemented in ld for Windows?

Looking at the current source tree for binutils (https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=ld/ldbuildid.c;h=a4a6dc3c3e9403e72bcd52eaa6dfda39fd912586;hb=HEAD), it seems our options for build-id types are: md5, sha1, xx (XXHash), UUID, and self-generated hex. The UUID that is used is generated by the UuidCreate in the RPC library.

Right now, we have it set to SHA1, because that was supposed to be reproducible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants