Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a simple reproducibility test command. #13689

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions docs/markdown/Commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,34 @@ DESTDIR=/path/to/staging/area meson install -C builddir
Since *0.60.0* `DESTDIR` and `--destdir` can be a path relative to build
directory. An absolute path will be set into environment when executing scripts.

### reprotest

*(since 1.6.0)*

{{ reprotest_usage.inc }}

Simple reproducible build tester that compiles the project twice and
checks whether the end results are identical.

This command must be run in the source root of the project you want to
test.

{{ reprotest_arguments.inc }}

#### Examples

meson reprotest

Builds the current project with its default settings.

meson reprotest --intermediates -- --buildtype=debugoptimized

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: it should be --intermediaries


Builds the target and also checks that all intermediate files like
object files are also identical. All command line arguments after the
`--` are passed directly to the underlying `meson` invocation. Only
use option arguments, i.e. those that start with a dash, Meson sets
directory arguments automatically.

### rewrite

*(since 0.50.0)*
Expand Down
15 changes: 15 additions & 0 deletions docs/markdown/snippets/reprotester.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
## Simple tool to test build reproducibility

Meson now ships with a command for testing whether your project can be
[built reprodicibly](https://reproducible-builds.org/). It can be used

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: s/reprodicibly/reproducibly

by running a command like the following in the source root of your
project:

meson reprotest --intermediaries -- --buildtype=debugoptimized

All command line options after the `--` are passed to the build
invocations directly.

This tool is not meant to be exhaustive, but instead easy and
convenient to run. It will detect some but definitely not all
reproducibility issues.
4 changes: 3 additions & 1 deletion mesonbuild/mesonmain.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ class CommandLineParser:
def __init__(self) -> None:
# only import these once we do full argparse processing
from . import mconf, mdist, minit, minstall, mintro, msetup, mtest, rewriter, msubprojects, munstable_coredata, mcompile, mdevenv, mformat
from .scripts import env2mfile
from .scripts import env2mfile, reprotest
from .wrap import wraptool
import shutil

Expand Down Expand Up @@ -103,6 +103,8 @@ def __init__(self) -> None:
help_msg='Run commands in developer environment')
self.add_command('env2mfile', env2mfile.add_arguments, env2mfile.run,
help_msg='Convert current environment to a cross or native file')
self.add_command('reprotest', reprotest.add_arguments, reprotest.run,
help_msg='Test if project builds reproducibly')
self.add_command('format', mformat.add_arguments, mformat.run, aliases=['fmt'],
help_msg='Format meson source file')
# Add new commands above this line to list them in help command
Expand Down
128 changes: 128 additions & 0 deletions mesonbuild/scripts/reprotest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# SPDX-License-Identifier: Apache-2.0
# Copyright 2024 The Meson development team

from __future__ import annotations

import sys, os, subprocess, shutil
import pathlib
import typing as T

if T.TYPE_CHECKING:
import argparse

from ..mesonlib import get_meson_command

# Note: when adding arguments, please also add them to the completion
# scripts in $MESONSRC/data/shell-completions/
Comment on lines +15 to +16

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have I missed these?

def add_arguments(parser: 'argparse.ArgumentParser') -> None:
parser.add_argument('--intermediaries',
default=False,
action='store_true',
help='Check intermediate files.')
parser.add_argument('mesonargs', nargs='*',
help='Arguments to pass to "meson setup".')

IGNORE_PATTERNS = ('.ninja_log',
'.ninja_deps',
'meson-private',
'meson-logs',
'meson-info',
)

INTERMEDIATE_EXTENSIONS = ('.gch',
'.pch',
'.o',
'.obj',
'.class',
)

class ReproTester:
def __init__(self, options: T.Any):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Protocol can get rid of the need for Any here, it looks like all it would need is:

if T.TYPE_CHECKING:
    from typing_extensions import Protocol

    class Arguments(Protocol):
         intermediaries: bool
         mesonargs: T.List[str]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually I'd like to get to the point that we don't have Any anywhere in the codebase, my experience is that it basically always hides issues.

self.args = options.mesonargs
self.meson = get_meson_command()[:]
self.builddir = pathlib.Path('buildrepro')
self.storagedir = pathlib.Path('buildrepro.1st')
self.issues: T.List[str] = []
self.check_intermediaries = options.intermediaries

def run(self) -> int:
if not pathlib.Path('meson.build').is_file():
sys.exit('This command needs to be run at your project source root.')
self.check_ccache()
self.cleanup()
self.build()
self.check_output()
self.print_results()
if not self.issues:
self.cleanup()
return len(self.issues)

def check_ccache(self) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have /usr/lib/ccache in my PATH
and that makes me currently not abort with sys.exit(1) even though I should.

> $ ccache -z 
Statistics zeroed

> $ time ~/git/meson/meson.py reprotest --intermediaries 
The Meson build system
Version: 1.5.99
Source dir: /home/fortysixandtwo/git/calls
Build dir: /home/fortysixandtwo/git/calls/buildrepro
Build type: native build

[...]

[422/422] Linking target tests/application
No differences detected.
~/git/meson/meson.py reprotest --intermediaries  94,72s user 21,80s system 491% cpu 23,705 total

> $ ccache -s
Cacheable calls:    550 / 633 (86.89%)
  Hits:             550 / 550 (100.0%)
    Direct:         548 / 550 (99.64%)
    Preprocessed:     2 / 550 ( 0.36%)
  Misses:             0 / 550 ( 0.00%)
Uncacheable calls:   83 / 633 (13.11%)
Local storage:
  Cache size (GiB): 1.3 / 5.0 (25.31%)
  Hits:             550 / 550 (100.0%)
  Misses:             0 / 550 ( 0.00%)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Detecting all ways ccache could be injected is tricky, especially since actual compiler detection code is elsewhere and not easily callable. I guess we could check if PATH contains the string ccache, but that might have false positives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplest approach is to just use $CCACHE_DISABLE in the environment to tell ccache that even when run, it should simply forward directly to the compiler without performing any actions itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would also mean we could simply support the ccache users, as their testing would work by telling ccache to bypass the actual cache, instead of just refusing to perform the reproducibility testing at all.

for evar in ('CC', 'CXX'):
evalue = os.environ.get(evar, '')
if 'ccache' in evalue:
print(f'Environment variable {evar} set to value "{evalue}".')
print('This implies using a compiler cache, which is incompatible with reproducible builds.')
sys.exit(1)
Comment on lines +61 to +66
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't take into account the approach of linking ln -s ccache gcc. If that's a symlink I guess we can resolve that, though I don't know what to do if it's a hardlink.

Maybe a a better solution is to set CCACHE_DISABLE=1 before calling Meson, which causes ccache to skip reading and writing to the cache?


def cleanup(self) -> None:
if self.builddir.exists():
shutil.rmtree(self.builddir)
if self.storagedir.exists():
shutil.rmtree(self.storagedir)

def build(self) -> None:
setup_command: T.Sequence[T.Union[pathlib.Path, str]] = self.meson + ['setup', self.builddir] + self.args
build_command: T.Sequence[T.Union[pathlib.Path, str]] = self.meson + ['compile', '-C', self.builddir]
subprocess.check_call(setup_command)
subprocess.check_call(build_command)
self.builddir.rename(self.storagedir)
subprocess.check_call(setup_command)
subprocess.check_call(build_command)
Comment on lines +77 to +81
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to catch the exceptions that check_call would generate, print a more helpful message, and return a non 1 return code?


def ignore_file(self, fstr: str) -> bool:
for p in IGNORE_PATTERNS:
if p in fstr:
return True
if not self.check_intermediaries:
if fstr.endswith(INTERMEDIATE_EXTENSIONS):
return True
return False

def check_contents(self, fromdir: str, todir: str, check_contents: bool) -> None:
import filecmp
frompath = fromdir + '/'
topath = todir + '/'
for fromfile in pathlib.Path(fromdir).glob('**/*'):
if not fromfile.is_file():
continue
fstr = str(fromfile)
if self.ignore_file(fstr):
continue
assert fstr.startswith(frompath)
tofile = pathlib.Path(fstr.replace(frompath, topath, 1))
if not tofile.exists():
self.issues.append(f'Missing file: {tofile}')
elif check_contents:
if not filecmp.cmp(fromfile, tofile, shallow=False):
self.issues.append(f'File contents differ: {fromfile}')

def print_results(self) -> None:
if self.issues:
print('Build differences detected')
for i in self.issues:
print(i)
else:
print('No differences detected.')

def check_output(self) -> None:
self.check_contents('buildrepro', 'buildrepro.1st', True)
self.check_contents('buildrepro.1st', 'buildrepro', False)

def run(options: T.Any) -> None:
rt = ReproTester(options)
try:
sys.exit(rt.run())
except FileNotFoundError as e:
print(e)
sys.exit(1)
Loading