Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation artifacts are not reproducible due to everchanging build ID #518

Closed
gbryant-arm opened this issue Sep 20, 2022 · 11 comments
Closed
Labels
bug Something isn't working build-process Something related to the Veracruz build process

Comments

@gbryant-arm
Copy link
Contributor

gbryant-arm commented Sep 20, 2022

Describe the bug
Veracruz binaries (client, server, attestation, runtime manager) contain a build ID (.note.gnu.build-id section) in their ELF headers. This build ID is a hash determined by rustc and/or the linker.
However, for unknown reasons (timestamp somewhere? source of randomness?), the build ID changes every time the crate is cargo cleaned then rebuilt.
This results in binaries that are functionally equivalent but have different overall hashes, which gives different Docker images.
FWIW some debug sections (e.g. .debug_info, .debug_abbrev, .debug_loc, .debug_pubtypes) change between builds as well

To Reproduce

  • Build Veracruz, e.g. make nitro
  • Hash e.g. veracruz-server: sha256sum nitro-host/target/debug/veracruz-server
  • Clean: make clean
  • Build Veracruz again: make nitro
  • Hash veracruz-server: sha256sum nitro-host/target/debug/veracruz-server
  • The two hashes are different because the build ID is different: objdump --section=.note.gnu.build-id --full-contents nitro-host/target/release/veracruz-server

Expected behaviour
It's not clear to me how the build ID is determined, but I would expect it to stay constant between builds as long as the binaries are functionally equivalent and the code doesn't change.

Solutions
Strip build ID from binaries (strip --remove-section=.note.gnu.build-id <binary>) or fix build ID generation

@gbryant-arm gbryant-arm added bug Something isn't working build-process Something related to the Veracruz build process labels Sep 20, 2022
@dreemkiller
Copy link
Member

I find it hard to imagine that the toolchain doesn't have a way to resolve this, as this is a big barrier to reproducible builds.
However, a quick search doesn't show anything obvious.

@gbryant-arm
Copy link
Contributor Author

We could ignore it (stripping it or setting a default build ID) or fixing the cause: it's possible that one of our dependencies is doing something funky and interferes with the build ID somehow

@egrimley-arm
Copy link
Contributor

Firstly, I can't reproduce this as described above: I got the same binary after running make clean. However, when I tried with a fresh check-out I did get a different binary, and the only difference between the veracruz-server binaries seemed to be the build-id.

We can suppress generation of the build-id with RUSTFLAGS="-C link-args=-Wl,--build-id=none": I tested that and it seemed to work.

However, it would be interesting to know why the build-id is changing. One of the inputs to the linker must be different each time, even though the change only affects the build-id. If we could discover what that input is perhaps we could find a better solution that using --build-id=none.

@gbryant-arm
Copy link
Contributor Author

gbryant-arm commented Sep 22, 2022

Looks like the build ID difference could come among other things from sections (.debug_info, .debug_abbrev, .debug_loc, .debug_pubtypes) that get stripped after the build (033a241): I commented the stripping out and am getting two distinct binaries regardless of the build ID. Most diffs seems to be simple byte replacements or insertions here and there.
We don't want to fall into a rabbit hole so how about we just strip the build ID? The changes are minimal. It could affect GDB's behaviour in some specific cases but again there is no GDB in Nitro. We'll figure something out for the other platforms; right now, Nitro is a target we really want reproducibility for

@dreemkiller
Copy link
Member

Are we using the build id for anything? If not, let's just strip it.

@gbryant-arm
Copy link
Contributor Author

I don't think we're using it. I'm opening a PR for that

@egrimley-arm
Copy link
Contributor

Can other people please try make clean ; make nitro and see whether veracruz-server changes. For me it doesn't change. The difference seems to come from something that is not deleted by make clean. What is it?

@egrimley-arm
Copy link
Contributor

Some careless experiments have given me the impression that serde_cbor might be responsible for the non-reproducibility, or perhaps just the way it's used by nitro-enclave-attestation-document.

@egrimley-arm
Copy link
Contributor

PR #520 introduced a work-around for this issue, so I'll close it. However, it is still interesting to know where the non-reproducibility comes from.

Non-reproducibility can be reproduced (ha!) with the following trivial program:

[dependencies]
aws-nitro-enclaves-cose = "0.1.0"
fn main() {
    let x: [u8; 0] = [];
    let _ = aws_nitro_enclaves_cose::sign::COSESign1::from_bytes(&x).unwrap();
}

Test by running ( cargo clean ; cargo build ; sha1sum target/debug/prog ) several times; twice is not enough as sometimes you get the same result by luck.

This is not inconsistent with the problem originating in serde_cbor because aws_nitro_enclaves_cose uses serde_cbor.

It seems the non-reproducibility of that particular program goes away if you upgrade the toolchain from 1.56.1 to 1.58.1, but that doesn't mean there's a bug in the 1.56.1 toolchain, of course.

Perhaps the work-around introduced by PR #520 won't always be needed, but it doesn't do any harm to leave it in.

I'd expect non-reproducibility to come from a crate that uses a build script (build.rs or whatever). Neither aws_nitro_enclaves_cose nor serde_cbor seems to use a build script, so it's a bit mysterious, isn't it?

@egrimley-arm
Copy link
Contributor

Perhaps this really was a bug in the toolchain: rust-lang/rust#90301 ("reproducible builds broken in rustc 1.56.0 due to LLVM 13 update")

(In the simple case described above, the relevant differences appear in sections called .debug_loc and .debug_abbrev.)

@egrimley-arm
Copy link
Contributor

So this issue could perhaps have been fixed by merging #516, which updates the toolchain from 1.56.1 to 1.60.0!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build-process Something related to the Veracruz build process
Projects
None yet
Development

No branches or pull requests

3 participants