Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add inclusion/exclusion directives to build.zig.zon for deciding which files belong the package #14311

Closed
Tracked by #14265
andrewrk opened this issue Jan 14, 2023 · 6 comments
Assignees
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase. zig build system std.Build, the build runner, `zig build` subcommand, package management
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Jan 14, 2023

Extracted from #14265.

The motivation for doing this is mirrors (#14291). Idea being that if you fetch a tarball, perhaps it does not contain any superfluous files, while the same package fetched via git protocol (#14295) does have some extra files that are not intended to be part of the package. Both URLs should work, and their hashes should be identical. This means some files should be excluded from the package.

I could see this going one of two ways:

...with exclusions:

.{
    // ...
    .excluded_files = .{
        ".gitignore",
        "/zig-cache",
        "build-*",
    },
    // ...
}

...or with inclusions:

.{
    // ...
    .included_files = .{
        "src/*.zig",
        "build.zig.zon",
    },
    // ...
}

Two open questions:

  • should it be via inclusions, via exclusions, or either one is allowed?
  • what should the default be, if no rules are present in the manifest file?

Either way, here is how fetching a package works:

  1. Fetch URL from any mirror into a directory of files in a temp directory.
  2. Apply inclusion/exclusion rules by deleting files from the file system.
  3. Hash each file in the full tree independently.
  4. Produce final hash from the hashes of each file, sorted by name (related: zig build package hashes are missing path names and executable bit for each file #14308).
  5. Rename temp directory into the global package cache into a directory named after the hash.

The guarantee should be upheld that if the hash matches, the file system should match exactly. This means that, since empty directories are not included in the hash, empty directories should be removed by step 2, along with the inclusion/exclusion rules.

@andrewrk andrewrk added enhancement Solving this issue will likely involve adding new logic or components to the codebase. zig build system std.Build, the build runner, `zig build` subcommand, package management labels Jan 14, 2023
@andrewrk andrewrk added this to the 0.11.0 milestone Jan 14, 2023
@deflock
Copy link

deflock commented Jan 14, 2023

It's time for .zigignore

@ifreund
Copy link
Member

ifreund commented Jan 14, 2023

It seems to me that the answers to these open questions that are most likely to guide users towards creating packages without unintended files are:

  • Only allow inclusions
  • if there are no rules, produce an empty package or perhaps just print an error.

@silversquirl
Copy link
Contributor

I think @ifreund's suggestion matches Zig's design philosophy of "make the Right Way easy and the Wrong Way hard" (represented in the Zen as "Communicate intent precisely", "Edge cases matter" and "Only one obvious way to do things")

I do wonder whether being able to exclude things might still be useful, eg. for a directory containing things you want, with one file or subdirectory that you don't. However, I can't come up with an example that wouldn't be better solved by reorganizing the project, so that may not actually be a problem.

@andrewrk
Copy link
Member Author

andrewrk commented May 5, 2023

Additional GitHub search keywords for this issue: blacklist, whitelist, include, exclude, ignore

@andrewrk
Copy link
Member Author

andrewrk commented Oct 1, 2023

One reason to make it based on inclusion rules rather than exclusion rules would be that if some method of file delivery happens to add extra files, these will not get picked up and screw up the hash. For example, maybe somebody tries to make a mirror, but they accidentally add .DS_Store, or doing a fetch from version control has something unexpected in there.

If it were based on exclusion rules, then it would make sense to allow exclusion rules at the depender's side of things along with the URL and hash specification. We can avoid this and keep things simpler with inclusion rules.

@andrewrk andrewrk self-assigned this Oct 4, 2023
andrewrk added a commit that referenced this issue Oct 4, 2023
Organize everything around a Fetch task which does a bunch of stuff in a
worker thread without touching any shared state, and then queues up
Fetch tasks for its dependencies.

This isn't the theoretical optimal package fetching performance because
CPU cores don't necessarily map 1:1 with I/O tasks, and each fetch task
contains a mixture of computations and I/O. However, it is expected for
this to significantly outperform master branch, which fetches everything
recursively with only one thread.

The logic is now a lot more linear and easy to follow. Everything that
is embarassingly parallel is done on the thread pool, and then after
everything is fetched, the worker threads are joined and the main thread
does the finishing touches of stitching together the dependencies.zig
import files. There is only one tiny little critical section and it does
not even have any error handling in it.

This also lays the groundwork for #14281 because in system mode, all
this fetching logic will be skipped, but the "finishing touches"
mentioned above still need to be done. With this branch, that logic is
separated out and no longer recursively tangled with fetching stuff.

Additionally, this branch:
 * Implements inclusion directives in `build.zig.zon` for deciding which
   files belong the package (#14311).
 * Adds basic documentation for `build.zig.zon` files.
 * Adds support for fetching dependencies with the `file://` protocol
   scheme (#17364).
 * Adds a workaround for a Linux/btrfs file system bug (#17282).

This commit is a work-in-progress. Still todo:

1. Hook up the CLI to the new system.
2. Restore the module table creation logic after all the fetching is
   done.
3. Fix compilation errors, get the tests passing, and regression test
   against real world projects.
@andrewrk andrewrk mentioned this issue Oct 4, 2023
11 tasks
andrewrk added a commit that referenced this issue Oct 7, 2023
Organize everything around a Fetch task which does a bunch of stuff in a
worker thread without touching any shared state, and then queues up
Fetch tasks for its dependencies.

This isn't the theoretical optimal package fetching performance because
CPU cores don't necessarily map 1:1 with I/O tasks, and each fetch task
contains a mixture of computations and I/O. However, it is expected for
this to significantly outperform master branch, which fetches everything
recursively with only one thread.

The logic is now a lot more linear and easy to follow. Everything that
is embarassingly parallel is done on the thread pool, and then after
everything is fetched, the worker threads are joined and the main thread
does the finishing touches of stitching together the dependencies.zig
import files. There is only one tiny little critical section and it does
not even have any error handling in it.

This also lays the groundwork for #14281 because in system mode, all
this fetching logic will be skipped, but the "finishing touches"
mentioned above still need to be done. With this branch, that logic is
separated out and no longer recursively tangled with fetching stuff.

Additionally, this branch:
 * Implements inclusion directives in `build.zig.zon` for deciding which
   files belong the package (#14311).
 * Adds basic documentation for `build.zig.zon` files.
 * Adds support for fetching dependencies with the `file://` protocol
   scheme (#17364).
 * Adds a workaround for a Linux/btrfs file system bug (#17282).

This commit is a work-in-progress. Still todo:

1. Hook up the CLI to the new system.
2. Restore the module table creation logic after all the fetching is
   done.
3. Fix compilation errors, get the tests passing, and regression test
   against real world projects.
andrewrk added a commit that referenced this issue Oct 8, 2023
Organize everything around a Fetch task which does a bunch of stuff in a
worker thread without touching any shared state, and then queues up
Fetch tasks for its dependencies.

This isn't the theoretical optimal package fetching performance because
CPU cores don't necessarily map 1:1 with I/O tasks, and each fetch task
contains a mixture of computations and I/O. However, it is expected for
this to significantly outperform master branch, which fetches everything
recursively with only one thread.

The logic is now a lot more linear and easy to follow. Everything that
is embarassingly parallel is done on the thread pool, and then after
everything is fetched, the worker threads are joined and the main thread
does the finishing touches of stitching together the dependencies.zig
import files. There is only one tiny little critical section and it does
not even have any error handling in it.

This also lays the groundwork for #14281 because in system mode, all
this fetching logic will be skipped, but the "finishing touches"
mentioned above still need to be done. With this branch, that logic is
separated out and no longer recursively tangled with fetching stuff.

Additionally, this branch:
 * Implements inclusion directives in `build.zig.zon` for deciding which
   files belong the package (#14311).
 * Adds basic documentation for `build.zig.zon` files.
 * Adds support for fetching dependencies with the `file://` protocol
   scheme (#17364).
 * Adds a workaround for a Linux/btrfs file system bug (#17282).

This commit is a work-in-progress. Still todo:

1. Hook up the CLI to the new system.
2. Restore the module table creation logic after all the fetching is
   done.
3. Fix compilation errors, get the tests passing, and regression test
   against real world projects.
@andrewrk
Copy link
Member Author

andrewrk commented Oct 9, 2023

Landed in f7bc55c.

Follow-up issue:

This is currently implemented as a required field for the root package, but the rest of the dependency tree is allowed to omit the paths field; all files are considered to be included.

@andrewrk andrewrk closed this as completed Oct 9, 2023
@mlugg mlugg moved this to Uncategorized in Package Manager Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase. zig build system std.Build, the build runner, `zig build` subcommand, package management
Projects
Archived in project
Development

No branches or pull requests

4 participants