Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

meta issue for reproducible builds (was: tests/determinism.sh) #14593

Closed
wants to merge 1 commit into from

Conversation

marc-hb
Copy link
Collaborator

@marc-hb marc-hb commented Mar 15, 2019

These tests are in a "work4me" state. They all proved valuable because they all found issues and all passed a fair number of times in some configuration(s)/environment(s). However for either passing or finding issues they may require:

  • magic test parameters
  • fixes not merged or not submitted yet
  • additional tools like disorderedfs
  • additional test steps like comparing checksums across different environments

I'm sharing this very early draft for two purposes:

  1. Provide more detailed reproduction information and transparency for related fixes. There's only so much test code that can be put in a commit message.

  2. Gather feedback and ideas about what could be an acceptable, high-level test design so CI can hopefully catch the most basic regressions some day. Please don't spend time reviewing any implementation detail because I doubt the final test design (if any) will look similar to this prototype.

Here a list of fixes that some past, present or future version of (some of) these tests have helped with:

diffoscope MR 29 Catch failures to disassemble and rescue all other differences
diffoscope issue 64 Remove elf.StaticLibFile: it's a serious design flaw

Just a prototype to get feedback.

Already identifying many issues and testing their fixes.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@zephyrbot
Copy link
Collaborator

Found the following issues, please fix and resubmit:

Codeowners issues

New files added that are not covered in CODEOWNERS:

tests/determinism.sh

Please add one or more entries in the CODEOWNERS file to cover those files


checkpatch issues

-:244: ERROR:TRAILING_WHITESPACE: trailing whitespace
#244: FILE: tests/determinism.sh:238:
+    ninja -C "$bld"  # obj_list # kobj_types_h_target $

- total: 1 errors, 0 warnings, 264 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
      mechanically convert to the typical style using --fix or --fix-inplace.

NOTE: Whitespace errors detected.
      You may wish to use scripts/cleanpatch or scripts/cleanfile

Your patch has style problems, please review.

NOTE: Ignored message types: AVOID_EXTERNS BRACES CONFIG_EXPERIMENTAL CONST_STRUCT DATE_TIME FILE_PATH_CHANGES MINMAX NETWORKING_BLOCK_COMMENT_STYLE PRINTK_WITHOUT_KERN_LEVEL SPLIT_STRING VOLATILE

NOTE: If any of the errors are false positives, please report
      them to the maintainers.



@nashif
Copy link
Member

nashif commented Mar 16, 2019

determinism might be a bit confusing name, how about reproducible_builds or something that is closer to what this is about.

@marc-hb marc-hb requested a review from ulfalizer March 16, 2019 16:29
Copy link
Member

@carlescufi carlescufi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general comment, without getting into the particulars of this change (which by the way looks pretty interesting as a concept to me). Can we write this in Python? This would be good for 2 reasons:

  1. Most people contributing to Zephyr are more or less comfortable with Python at this point. Less so with bash.
  2. It would work on Windows, which is a supported platform

@marc-hb
Copy link
Collaborator Author

marc-hb commented Mar 18, 2019

Thanks @carlescufi , yes python shouldn't be a problem. Just a bit more verbose that's all :-)

But even before looking at the programming language I would like ideas and directions about where this could/should fit in the grand testing architecture scheme of things. Like: still sitting on top of sanitycheck like this? Or as additional --feature(s) embedded inside sanitycheck itself instead? Other? If not embedded in sanitycheck then how could CI run the most basic of these tests and catch the most basic regressions? Additional github check?

@marc-hb marc-hb requested a review from cinlyooi-intel March 21, 2019 06:21
@marc-hb marc-hb added Feature Request A request for a new feature area: Build System TSC Topics that need TSC discussion Needs review This PR needs attention from Zephyr's maintainers labels Mar 21, 2019
@marc-hb marc-hb requested a review from mbolivar March 22, 2019 04:59
@nashif nashif removed the TSC Topics that need TSC discussion label Mar 27, 2019
@nashif
Copy link
Member

nashif commented Mar 28, 2019

But even before looking at the programming language I would like ideas and directions about where this could/should fit in the grand testing architecture scheme of things. Like: still sitting on top of sanitycheck like this? Or as additional --feature(s) embedded inside sanitycheck itself instead? Other? If not embedded in sanitycheck then how could CI run the most basic of these tests and catch the most basic regressions? Additional github check?

this will run on its own in a nightly job, not per PR and not with check_compliance.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Mar 28, 2019

Thanks @nashif , this answers my main question.

Here's how I see the next steps:

  1. Share here my most recent updates and fixes to this shell script
  2. Rewrite it in tests/build_reproduction.py and submit a new PR
  3. Archive this shell script PR.

@nashif
Copy link
Member

nashif commented Sep 20, 2019

are you planning for this to be merged?

@marc-hb
Copy link
Collaborator Author

marc-hb commented Sep 20, 2019

No, I will push another one. This script has pretty much been entirely rewritten. I will just keep using the description for bookmarks

EDIT 3 years later: I never shared the newer version. It's been bitrotting on a private branch somewhere. The high-level logic was still the same:

  • build once
  • change as many things as possible
  • build again
  • diff

@marc-hb marc-hb closed this Sep 20, 2019
@marc-hb marc-hb changed the title tests/determinism.sh: very early prototype meta issue for reproducible builds (was: tests/determinism.sh) Oct 2, 2019
marc-hb added a commit to marc-hb/zephyr that referenced this pull request Nov 4, 2022
Temporary bugs, corner cases and obsolete toolchains aside, the Zephyr
build is most of the time reproducible: zephyrproject-rtos#50205 and zephyrproject-rtos#14593.

This means two different build machines using the same toolchain will
always produce the same binary output. The one-line addition in this
commit makes it trivial to verify that binary outputs are indeed the
same by adding a single checksum line in the build logs:

```
[16/16] Linking C executable zephyr/zephyr.elf
Memory region         Used Size  Region Size  %age Used
             RAM:       53280 B         3 MB      1.69%
        IDT_LIST:          0 GB         2 KB      0.00%

fdd2ddf2ad7d5da5bbd79b41cef...7b16ef549a8281111d8e205  zephyr.strip
```

This commit makes a non-measurable build time difference.

Build reproducibility matters for (at least) two important reasons:

- Security / supply chain attacks, see https://www.cisa.gov/sbom, zephyrproject-rtos#50205,
  https://reproducible-builds.org/ and many others.
- Making sure build configurations are strictly identical when trying to
  reproduce elusive issues or when issuing releases.

Displaying a reproducible checksum accelerates the investigation of
temporary reproducibility issues like zephyrproject-rtos#48195.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/zephyr that referenced this pull request Nov 4, 2022
Temporary bugs, corner cases and obsolete toolchains aside, the Zephyr
build is reproducible most of the time: zephyrproject-rtos#50205 and zephyrproject-rtos#14593

This means two different build machines using the same toolchain will
always produce the same binary output. The previous, one-line commit made
it trivial to verify that binary outputs are indeed the same by adding this
single line in the buid logs:

```
[16/16] Linking C executable zephyr/zephyr.elf
Memory region         Used Size  Region Size  %age Used
             RAM:       53280 B         3 MB      1.69%
        IDT_LIST:          0 GB         2 KB      0.00%
fdd2ddf2ad7d5da5bbd79b41cef8d7...1a896b989a8281111d8e205  zephyr.strip
```

This commit enables that feature by default because build
reproducibility matters for (at least) two important reasons:

- Security / supply chain attacks, see https://www.cisa.gov/sbom, zephyrproject-rtos#50205,
  https://reproducible-builds.org/ and many others.
- Making sure build configurations are strictly identical when trying to
  reproduce elusive issues or when issuing releases.

It was of course already possible to _manually_ make this Kconfig change
and manually compute this checksum. However this can be impossible when
dealing with an automated build system that does not archive all
_intermediate_ (zephyrproject-rtos#5009) files like `zephyr.elf`. Tweaking the build
configuration can also be difficult and error-prone for people who are
not Zephyr developers.

Most automated CI systems preserve build logs by default.

Displaying the reproducible checksum by default accelerates the
discovery of reproducibility bugs like zephyrproject-rtos#48195.

When measured with `west build -p -b qemu_x86 samples/hello_world/`, the
additional `build/zephyr/zephyr.strip` disk space required is 43
kilobytes compared to a total of 11 Megabytes. Measuring a more
realistic SOF example, `zephyr.strip` weighed 690 kb which was about
0.1% of a total `build/` directory weighing 65M.

To measure the build time cost I ran `west build -p -b qemu_x86
samples/hello_world/` many times in a loop with and without this PR on
my Linux workstation. Stripping and checksumming made literally no time
difference compared to the "noise" observed when building the same
configuration. This is not surprising considering how small
`zephyr.strip`: so the extra cost is most likely dominated by process
creation and the total number of processes created during a Zephyr build
dwarfs the few extra processes required by this feature.

More surprisingly, I measured incremental builds by running `touch
kernel/timer.c; west build ...` in a loop and I could not observe any
visible time difference either.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants