Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement system package mode #18778

Merged
merged 17 commits into from
Feb 4, 2024
Merged

Implement system package mode #18778

merged 17 commits into from
Feb 4, 2024

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Feb 1, 2024

Synopsis

Makes the zig build system significantly more friendly to system package maintainers by introducing System Integration Options.

Closes #14281
Closes #14597

Demo

Using groovebasin as an example project:

New API for declaring optional system library integration

--- a/build.zig
+++ b/build.zig
@@ -5,18 +5,8 @@ pub fn build(b: *std.Build) void {
     const optimize = b.standardOptimizeOption(.{
         .preferred_optimize_mode = .ReleaseSafe,
     });
-    const libgroove_optimize_mode = b.option(
-        std.builtin.OptimizeMode,
-        "libgroove-optimize",
-        "override optimization mode of libgroove and its dependencies",
-    );
     const use_llvm = b.option(bool, "use-llvm", "LLVM backend");
 
-    const groove_dep = b.dependency("groove", .{
-        .optimize = libgroove_optimize_mode orelse .ReleaseFast,
-        .target = target,
-    });
-
     b.installDirectory(.{
         .source_dir = .{ .path = "public" },
         .install_dir = .lib,
@@ -31,7 +21,22 @@ pub fn build(b: *std.Build) void {
         .use_llvm = use_llvm,
         .use_lld = use_llvm,
     });
-    server.linkLibrary(groove_dep.artifact("groove"));
+
+    if (b.systemIntegrationOption("groove", .{})) {
+        server.linkSystemLibrary("groove");
+    } else {
+        const libgroove_optimize_mode = b.option(
+            std.builtin.OptimizeMode,
+            "libgroove-optimize",
+            "override optimization mode of libgroove and its dependencies",
+        );
+        const groove_dep = b.dependency("groove", .{
+            .optimize = libgroove_optimize_mode orelse .ReleaseFast,
+            .target = target,
+        });
+        server.linkLibrary(groove_dep.artifact("groove"));
+    }
+
     b.installArtifact(server);
 
     const run_cmd = b.addRunArtifact(server);

This diff plus some similar diffs in the project's dependency tree.

New help section

There is a new --help section:

System Integration Options:
  --system [dir]               System Package Mode. Disable fetching; prefer system libs
  -fsys=[name]                 Enable a system integration
  -fno-sys=[name]              Disable a system integration
  --host-target [triple]       Use the provided target as the host
  --host-cpu [cpu]             Use the provided CPU as the host
  --host-dynamic-linker [path] Use the provided dynamic linker as the host

  Available System Integrations:                Enabled:
    groove                                      no
    z                                           no
    mp3lame                                     no
    vorbis                                      no
    ogg                                         no

Using the system integration options

[nix-shell:~/dev/groovebasin]$ zig build --host-target x86_64-linux-gnu --host-dynamic-linker /nix/store/whypqfa83z4bsn43n4byvmw80n4mg3r8-glibc-2.37-45/lib64/ld-linux-x86-64.so.2 -fsys=z

[nix-shell:~/dev/groovebasin]$ ldd zig-out/bin/groovebasin 
    linux-vdso.so.1 (0x00007fff054c7000)
    libz.so.1 => /nix/store/8mw6ssjspf8k1ija88cfldmxlbarl1bb-zlib-1.2.13/lib/libz.so.1 (0x00007fe164675000)
    libm.so.6 => /nix/store/whypqfa83z4bsn43n4byvmw80n4mg3r8-glibc-2.37-45/lib/libm.so.6 (0x00007fe164595000)
    libc.so.6 => /nix/store/whypqfa83z4bsn43n4byvmw80n4mg3r8-glibc-2.37-45/lib/libc.so.6 (0x00007fe1643ae000)
    /nix/store/whypqfa83z4bsn43n4byvmw80n4mg3r8-glibc-2.37-45/lib64/ld-linux-x86-64.so.2 (0x00007fe164696000)

The result is using baseline CPU due to the --host-target flag. Note that --host-target is always a valid option even if the build script does not expose any standard target option.

Now, re-run the command but removing -fsys=z:

[nix-shell:~/dev/groovebasin]$ ~/Downloads/zig/build-release/stage4/bin/zig build --host-target x86_64-linux-gnu --host-dynamic-linker /nix/store/whypqfa83z4bsn43n4byvmw80n4mg3r8-glibc-2.37-45/lib64/ld-linux-x86-64.so.2 

[nix-shell:~/dev/groovebasin]$ ldd zig-out/bin/groovebasin 
    linux-vdso.so.1 (0x00007ffcc23f6000)
    libm.so.6 => /nix/store/whypqfa83z4bsn43n4byvmw80n4mg3r8-glibc-2.37-45/lib/libm.so.6 (0x00007f525feea000)
    libc.so.6 => /nix/store/whypqfa83z4bsn43n4byvmw80n4mg3r8-glibc-2.37-45/lib/libc.so.6 (0x00007f525fd03000)
    /nix/store/whypqfa83z4bsn43n4byvmw80n4mg3r8-glibc-2.37-45/lib64/ld-linux-x86-64.so.2 (0x00007f525ffcc000)

new release option

  --release[=mode]             Request release mode, optionally specifying a
                               preferred optimization mode: fast, safe, small
andy@ark ~/d/a/zlib (main)> zig build --release
the project does not declare a preferred optimization mode. choose: --release=fast, --release=safe, or --release=small
error: the following build command failed with exit code 1:
/home/andy/dev/ayb/zlib/zig-cache/o/6f46a03cb0f5f70d2c891f31086fecc9/build /home/andy/Downloads/zig/build-release/stage3/bin/zig /home/andy/dev/ayb/zlib /home/andy/dev/ayb/zlib/zig-cache /home/andy/.cache/zig --seed 0x3e999c60 --release
andy@ark ~/d/a/zlib (main) [1]> zig build --release=safe
andy@ark ~/d/a/zlib (main)> vim build.zig
andy@ark ~/d/a/zlib (main)> git diff
diff --git a/build.zig b/build.zig
index 76bbb01..1bc13e6 100644
--- a/build.zig
+++ b/build.zig
@@ -5,7 +5,9 @@ pub fn build(b: *std.Build) void {
     const lib = b.addStaticLibrary(.{
         .name = "z",
         .target = b.standardTargetOptions(.{}),
-        .optimize = b.standardOptimizeOption(.{}),
+        .optimize = b.standardOptimizeOption(.{
+            .preferred_optimize_mode = .ReleaseFast,
+        }),
     });
     lib.linkLibC();
     lib.addCSourceFiles(.{
andy@ark ~/d/a/zlib (main)> zig build --release
andy@ark ~/d/a/zlib (main)> zig build --release=small
andy@ark ~/d/a/zlib (main)> 

Lazy Dependencies

--- a/build.zig
+++ b/build.zig
-    const groove_dep = b.dependency("groove", .{
-        .optimize = libgroove_optimize_mode orelse .ReleaseFast,
-        .target = target,
-    });
+    if (b.lazyDependency("groove", .{
+        .optimize = libgroove_optimize_mode orelse .ReleaseFast,
+        .target = target,
+    })) |groove_dep| {
+        server.linkLibrary(groove_dep.artifact("groove"));
+    }
--- a/build.zig.zon
+++ b/build.zig.zon
@@ -5,6 +5,7 @@
         .groove = .{
             .url = "https://github.com/andrewrk/libgroove/archive/66745eae734e986cd478e7220664f2de902d10a1.tar.gz",
             .hash = "1220285f0f6b2be336519a0e612a11617c655f78b0efe1cac12fc73fc1e50c7b3e14",
+            .lazy = true,
         },
     },
     .paths = .{

This makes the dependency only get fetched if it is actually used. The build runner will be rebuilt if any missing lazy dependencies are encountered.

Error for using dependency() instead of lazyDependency()

$ zig build -h
thread 2904684 panic: dependency 'groove' is marked as lazy in build.zig.zon which means it must use the lazyDependency function instead
/home/andy/Downloads/zig/lib/std/debug.zig:434:22: 0x11901a9 in panicExtra__anon_18741 (build)
    std.builtin.panic(msg, trace, ret_addr);
                     ^
/home/andy/Downloads/zig/lib/std/debug.zig:409:15: 0x1167399 in panic__anon_18199 (build)
    panicExtra(null, null, format, args);
              ^
/home/andy/Downloads/zig/lib/std/Build.zig:1861:32: 0x1136dca in dependency__anon_16705 (build)
                std.debug.panic("dependency '{s}{s}' is marked as lazy in build.zig.zon which means it must use the lazyDependency function instead", .{ b.dep_prefix, name });
                               ^
/home/andy/dev/groovebasin/build.zig:33:40: 0x10e8865 in build (build)
        const groove_dep = b.dependency("groove", .{
                                       ^
/home/andy/Downloads/zig/lib/std/Build.zig:1982:33: 0x10ca783 in runBuild__anon_8952 (build)
        .Void => build_zig.build(b),
                                ^
/home/andy/Downloads/zig/lib/build_runner.zig:310:29: 0x10c6708 in main (build)
        try builder.runBuild(root);
                            ^
/home/andy/Downloads/zig/lib/std/start.zig:585:37: 0x10af845 in posixCallMainAndExit (build)
            const result = root.main() catch |err| {
                                    ^
/home/andy/Downloads/zig/lib/std/start.zig:253:5: 0x10af331 in _start (build)
    asm volatile (switch (native_arch) {
    ^
???:?:?: 0x8 in ??? (???)
Unwind information for `???:0x8` was not available, trace may be incomplete

error: the following build command crashed:
/home/andy/dev/groovebasin/zig-cache/o/20af710f8e0e96a0ccc68c47688b2d0d/build /home/andy/Downloads/zig/build-release/stage3/bin/zig /home/andy/dev/groovebasin /home/andy/dev/groovebasin/zig-cache /home/andy/.cache/zig --seed 0x513e8ce9 -Z4472a09906216280 -h

It's allowed to do the reverse - lazyDependency() when the manifest file does not mark it as lazy.

It's probably best practice to always use lazyDependency() in build.zig.

avoid fetching in system mode

[nix-shell:~/dev/2Pew]$ zig build --system ~/tmp/p -fno-sys=SDL2
error: lazy dependency package not found: /home/andy/tmp/p/1220c5360c9c71c215baa41b46ec18d0711059b48416a2b1cf96c7c2d87b2e8e4cf6
info: remote package fetching disabled due to --system mode
info: dependencies might be avoidable depending on build configuration

[nix-shell:~/dev/2Pew]$ zig build --system ~/tmp/p 

[nix-shell:~/dev/2Pew]$ mv ~/.cache/zig/p/1220c5360c9c71c215baa41b46ec18d0711059b48416a2b1cf96c7c2d87b2e8e4cf6 ~/tmp/p

[nix-shell:~/dev/2Pew]$ zig build --system ~/tmp/p -fno-sys=SDL2
steps [5/8] zig build-lib SDL2 ReleaseFast native... Compile C Objects [75/128] e_atan2... ^C

[nix-shell:~/dev/2Pew]$ 

Full List of Breaking Changes

Predicted breakage due to these changes: little to none

  • b.zig_exe => b.graph.zig_exe
  • b.env_map => b.graph.env_map
  • b.search_prefixes: it's now an ArrayListUnmanaged. addSearchPrefix is unchanged
  • b.global_cache_root => b.graph.global_cache_root
  • b.cache => b.graph.cache

Merge Checklist

  • implement --release
  • make the CLI rebuild the build runner with lazy deps added
  • implement --system [dir] command

@andrewrk andrewrk force-pushed the system-package-mode branch from 03934cb to 9aeaa1a Compare February 2, 2024 00:00
@andrewrk andrewrk added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. zig build system std.Build, the build runner, `zig build` subcommand, package management labels Feb 2, 2024
@andrewrk andrewrk force-pushed the system-package-mode branch from 9aeaa1a to 826f5e3 Compare February 2, 2024 02:22
@andrewrk andrewrk force-pushed the system-package-mode branch from a2347c7 to bd07968 Compare February 2, 2024 21:48
* New --help section
* Add b.systemLibraryOption
* Rework the build runner CLI logic a bit
These are advanced options that make it possible to simultaneously
target the "host" computer, while overriding assumptions about the host
computer such as what CPU features are available, or what OS version
range to target.

This is useful only for the case of integrating with system package
manager builds which want to pretend the host is also the target.
This value is very likely incorrect. When glibc_version is provided but
no explicit ABI is provided, use the string "gnu" instead.
This also makes a long-overdue change of extracting common state from
Build into a shared Graph object.

Getting the semantics right for these flags turned out to be quite
tricky. In the end it works like this:
* The override only happens when the target is fully native, with no
  additional query parameters, such as versions or CPU features added.
* The override affects the resolved Target but leaves the original Query
  unmodified.
* The "is native?" detection logic operates on the original, unmodified
  query. This makes it possible to provide invalid host target
  information, causing confusing errors to occur. Don't do that.

There are some minor breaking changes to std.Build API such as the fact
that `b.zig_exe` is now moved to `b.graph.zig_exe`, as well as a handful
of other similar flags.
On second thought, let's keep a bunch of these flags how they already
were.

Partial revert of the previous commit.
Before it was named "library" inconsistently.

Now the CLI args are -fsys=[name] and -fno-sys=[name] and it is a more
general-purpose "system integration" which could be a library name or
perhaps a project name such as "ffmpeg" or a binary such as "nasm".
This allows a `zig build` command to specify intention to create a
release build, regardless of what per-project options exist. It also
allows the command to specify a "preferred optimization mode", which is
chosen if the project itself does not choose one (in other words, the
project gets first choice). If neither the build command nor the project
specify a preferred release mode, an error occurs.
This prevents package fetching and enables system_package_mode in the
build system which flips the defaults for system integrations.
This is how assertions work in zig.
Build manifest files support `lazy: true` for dependency sections.
This causes the auto-generated dependencies.zig to have 2 more
possibilities:
1. It communicates whether a dependency is lazy or not.
2. The dependency might be acknowledged, but missing due to being lazy
   and not fetched.

Lazy dependencies are not fetched by default, but if they are already
fetched then they are provided to the build script.

The build runner reports the set of missing lazy dependenices that are
required to the parent process via stdout and indicates the situation
with exit code 3.

std.Build now has a `lazyDependency` function. I'll let the doc comments
speak for themselves:

When this function is called, it means that the current build does, in
fact, require this dependency. If the dependency is already fetched, it
proceeds in the same manner as `dependency`. However if the dependency
was not fetched, then when the build script is finished running, the
build will not proceed to the make phase. Instead, the parent process
will additionally fetch all the lazy dependencies that were actually
required by running the build script, rebuild the build script, and then
run it again.
In other words, if this function returns `null` it means that the only
purpose of completing the configure phase is to find out all the other
lazy dependencies that are also required.
It is allowed to use this function for non-lazy dependencies, in which
case it will never return `null`. This allows toggling laziness via
build.zig.zon without changing build.zig logic.

The CLI for `zig build` detects this situation, but the logic for then
redoing the build process with these extra dependencies fetched is not
yet implemented.
This makes `zig build` notice when lazy dependencies were missing, fetch
them, and then rebuild the build runner and run it again.
After parsing diagnostics files from clang we don't have any more use
for those tmp files. Delete them to reduce clutter and disk usage.
Pass the required lazy dependencies from the build runner to the parent
process via a tmp file instead of stdout. I'll reproduce this comment to
explain it:

The `zig build` parent process needs a way to obtain results from the
configuration phase of the child process. In the future, the make phase
will be executed in a separate process than the configure phase, and we
can then use stdout from the configuration phase for this purpose.

However, currently, both phases are in the same process, and Run Step
provides API for making the runned subprocesses inherit stdout and stderr
which means these streams are not available for passing metadata back
to the parent.

Until make and configure phases are separated into different processes,
the strategy is to choose a temporary file name ahead of time, and then
read this file in the parent to obtain the results, in the case the child
exits with code 3.

This commit also extracts some common logic from the loop that rebuilds
the build runner so that it does not run again when the build runner is
rebuilt.
This makes it easier to debug and avoids a false positive compile error
in the build script.
Prevents unnecessary depedency on networking when bootstrapping zig.
@thejoshwolfe
Copy link
Contributor

It's probably best practice to always use lazyDependency() in build.zig.

Does this mean the default should be switched? Maybe call the other way "side effect dependency" or something? (is that why you would want a dependency you don't use?)

@andrewrk
Copy link
Member Author

andrewrk commented Feb 3, 2024

Dependencies do not have side effects. The use case is conditional dependencies - the configuration values chosen decides the set of actual dependencies used.

@emidoots
Copy link
Contributor

emidoots commented Feb 3, 2024

Can you comment on this: if a build.zig file imports a dependency via @import("foo").bar() for additional build-time logic, can foo still be an optional dependency?

My assumption would be no, and that if you have a dependency which needs any additional build-time logic to be imported like that, it becomes a viral dependency (if you use it, your dependency also cannot be optional)?

I'm not necessarily opposed to that, but want to understand the limitations here if any.

The reason I am asking is because in Mach currently, we use this to abstract some of the platform-specific nuances of desktop/mobile/web, where the former has a main entrypoint that looks very different from the latter (which are dynamic libraries rather than an executable), so we e.g. @import("mach").createApp(my_mod) basically which let's mach create the main platform-specific entrypoint, and just import/call my_mod as if it were the entrypoint without the platform-specific bits. I am considering moving away from this approach, and if lazy dependencies don't support it that may be another factor in encouraging me to do so.

others (@MasterQ32 ?) may have other reasons to need @import("foo").bar() logic I'm unaware of though

@rohlem
Copy link
Contributor

rohlem commented Feb 3, 2024

@slimsag This should be supportable if the build.zig compilation process catches @import-not-found errors and compares the missing module name to the set of lazy dependencies.
If it finds a match, it can download this dependency (the same as if it had encountered a lazyDependency call in executed build.zig code) and then retry compilation and execution of build.zig.

That way it wouldn't even be an issue if the set of modules imported for build-time logic depended on comptime parameterization like the CPU features and OS version of the host building the project.

@andrewrk
Copy link
Member Author

andrewrk commented Feb 3, 2024

@slimsag as it stands in this branch, direct @import only works on non lazy dependencies.

Regardless of what changes here I would encourage you to explore module based abstractions instead of inversion of control flow based abstractions. Happy to work with you on any limitations you encounter.

But I'll tinker a bit and see if we can have the ability to import lazy dependencies build logic. I don't want to do what @rohlem is suggesting though so it would be a breaking change either way.

@andrewrk
Copy link
Member Author

andrewrk commented Feb 4, 2024

Alright, I spent some more time investigating these follow-up issues, but ultimately I decided to move forward with this branch as-is, and address them later. This is certainly a step forward. Let's try this out for a while and then here are some follow-up issues to tackle:

deprecating dependency() and making lazyDependency() the preferred API

I think this might be a good idea to do in the future, since using lazyDependency() lets you toggle the laziness only by editing build.zig.zon. This would be a prerequisite for making all dependencies always lazy.

However as it stands, this branch is actually not very invasive; the breaking changes are minimal. I think it would be a good idea to solve the other problems mentioned below before inflicting unnecessary breaking changes on build.zig users.

direct @import in build scripts

I looked into making this work for lazy dependencies. It's possible, but it has some downsides:

  • it requires generating a zig source file per dependency instead of one dependencies.zig file for the whole tree. I don't know how much of a problem this is in reality, perhaps it's more distasteful in theory than in practice.
  • It requires some clunky API in the build.zig script - an explicit check for the existence of a dependency followed by the @import, or something like that.

Maybe I could find a more satisfactory solution if I keep poking at it.

That said, this feature was always meant for actual build script dependencies, it was not intended for inversion of control and having the dependency take over the build script logic for the parent package. I think users should try to get away from doing that, and I'm happy to work together on coming up with a satisfactory API that is based on dependencies exposing things rather than doing build logic.

making all dependencies always lazy

This would simplify things by removing an unnecessary choice from the programmer, and result in fewer packages fetched unnecessarily. Why get the programmer involved if the build system can always just do the right thing without asking? The problem with this right now is twofold:

  • Lazy dependencies don't work with direct @import in build scripts. When the programmer puts lazy = true currently, they are losing the ability to use @import for that dependency.
  • It would cause the build script to get recompiled more often - approximately a number of times equal to the dependency tree depth. In the future, such rebuilds will be near instantaneous thanks to compiler performance enhancements and a separation of user logic from the build runner, but as it stands each rebuild is about 4-5 seconds long.

@andrewrk
Copy link
Member Author

andrewrk commented Feb 4, 2024

My assumption would be no, and that if you have a dependency which needs any additional build-time logic to be imported like that, it becomes a viral dependency (if you use it, your dependency also cannot be optional)?

Sorry I missed this question earlier. I have not added test coverage for this yet, but it is not viral. When a lazy dependency is fetched, its own non-lazy dependencies will all be fetched and therefore @import will work. @import will only fail on lazy dependencies, it will succeed on non-lazy dependencies.

@andrewrk andrewrk merged commit d3fc264 into master Feb 4, 2024
10 checks passed
@andrewrk andrewrk deleted the system-package-mode branch February 4, 2024 09:44
@castholm
Copy link
Contributor

castholm commented Feb 4, 2024

@andrewrk Regarding @import in build scripts

That said, this feature was always meant for actual build script dependencies, it was not intended for inversion of control and having the dependency take over the build script logic for the parent package. I think users should try to get away from doing that, and I'm happy to work together on coming up with a satisfactory API that is based on dependencies exposing things rather than doing build logic.

Could you clarify what you mean by "having the dependency take over the build script logic for the parent package"? Because as both a library author and user, being able to expose helper functions that compose and orchestrate build steps seems like one of the main benefits of being able to @import a dependency's build script.

For non-trivial packages I would even say const libfoo = @import("foo").fooArtifact(.{}) is strictly superior to const libfoo = b.dependency("foo", .{}).artifact("foo") in terms of UX because you get away from the duck-typed args struct and the limitations of the types of options that can be defined through b.option() in favor of a fully typed, user-discoverable set of options that tools like ZLS can provide completions for.

@nektro
Copy link
Contributor

nektro commented Feb 4, 2024

That said, this feature was always meant for actual build script dependencies, it was not intended for inversion of control and having the dependency take over the build script logic for the parent package.

I think the proliferation of zig modules in the wild will the blur this line to the point of not really mattering.

For non-trivial packages I would even say const libfoo = @import("foo").fooArtifact(.{}) is strictly superior to const libfoo = b.dependency("foo", .{}).artifact("foo") in terms of UX because you get away from the duck-typed args struct and the limitations of the types of options that can be defined through b.option() in favor of a fully typed, user-discoverable set of options that tools like ZLS can provide completions for.

I don't agree with this line at all and to me what it's calling for is a perfect example of designing around a local maximum.

@andrewrk
Copy link
Member Author

Partially reverted in #19684.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. release notes This PR should be mentioned in the release notes. zig build system std.Build, the build runner, `zig build` subcommand, package management
Projects
None yet
6 participants