Have `pure-rust-build` job find what compiles C code #1682

EliahKagan · 2024-11-17T07:27:11Z

This causes the pure-rust-build CI job to provide more insight about what and how some of gitoxide's dependencies, even with max-pure, compile C code. This is related to, and hoped to help with, #1681, but it is not a fix for it, because this does not change any of the project's dependencies. It only augments that CI job. The approach is twofold:

Look for crates in the dependency tree that very strongly suggest that this is happening. Unless none are found, show the full tree and highlight the lines that show such dependencies.
Instrument cc1, which gcc calls to do the actual work of compiling C and otherwise is rarely called, interleaving a log of its calls with high-level cargo output (while still omitting all other forms of verbose output), and also displaying the cc1 log afterwards.

The new steps detect when and, to a large extent, how and why building max-pure builds C code. They would be able to fail the job except that, for now, I have used step-level continue-on-error to prevent this.

The commit messages have further details, and some rationale and coverage of subtleties. They also reflect an initially different approach that I replaced with (2). Before moving aside cc1 and putting in place a script that mocks it, reports the call, and delegates to it, I had deleted cc1 to observe the first error from the build (e1555bd).

That approach is currently not nearly as good, because it requires two builds (to maintain full coverage of what was previously tested) and because it only reveals the first failure, whereas instrumenting cc1 reveals all calls to it (f0c7d42).

However, I've left it in the history, in part because it's a technique that might revisited in the future when a fix or greater progress on #1681 are in place. I've also left in some small false starts related to logging, since I think that make it easier to discern why a few things are being done the way they are.

This adds more checks to the `pure-rust-build` CI job to reveal what dependencies of `max-pure` are building C code. The new checks are failing. They relate to GitoxideLabs#1681 and should pass once that bug is fixed. The main goal is to produce information helpful for GitoxideLabs#1681. As currently written, the new steps do not fail the job, because the failing steps have `continue-on-error: true`. This is so that regressions (that would fail the preexisting steps) remain readily detected. This should ideally be temporary; the new steps, if kept, should eventually be strengthened so they can fail the job. Currently this includes both regular and build dependencies. There are two new checks: 1. Search the `max-pure` dependency tree for packages that require a C or C++ compiler, or that are in practice only likely to be used to build C or C++ code. If any are found, display the whole tree, with matching lines highlighted, and fail the step. 2. After all steps that need it to be in working order, break GCC in the container by removing the `cc1` binary, then attempt a clean `max-pure` build, to reveal the first failure, if any, and let it fail the step. The `gcc` command itself is needed, because `rustc` calls the linker through it, even when no non-Rust code is compiled as part of the build. As discussed in GitoxideLabs#1664 comments, installing GCC in a way that is not broken but omits the ability to compile even C code, while it may be possible, is not commonly done and may require long running build steps or a new Docker image. Fortunately, the `gcc` command is just a "compiler driver"; the C compiler in GCC is `cc1`, and this can be removed without breaking most uses of GCC that don't involve actual compilation. This likewise removes `cc1plus` if found, even though it may not be present since we already verified that `g++` is not installed. (The deletion of `cc1` is the important part.) Since the job now cleans and starts a build that fails due to the absence of `cc1`, this removes the `rust-cache` step. (Caching would probably still confer some benefit, since it caches dependencies. Even after running `cargo clean` between the builds, most of these should be reacquired when building with `cc1`. However, to make the whole thing easier to reason about, the step is removed. It can be re-added if the job runs too slowly.) This considers any `-sys` crate to be a dependency that needs to build C. This is in principle not always the case, since they may use existing shared library binaries, though there may still be stub code that has to be built in C. The relevance of the new steps varies depending on precisely how one conceptualizes "pure" in `max-pure`, which is another reason for them to start out as `continue-on-error`. The `-sys` crates this finds in the `max-pure` dependency tree are: - `libsqlite3-sys` via `rusqlite`. This was reported in GitoxideLabs#1681 and strongly seems to be unintended for `max-pure`. - `linux-raw-sys` via `rustix` and `xattr`. It's less clear whether this should be considered impure, since its purpose is to interact with an operating system facility, and it may be comparable to the use of `libc` (on Unix-like systems). However, this does seem like it would not be able to build without a C compiler. The dependency tree also has an occurrence of the `cc` crate without a related `-sys` dependency, as required by `ring`. The `ring` crate contains a C test program that is built when building `ring`. That is currently the first build error shown in the output of the "Do max-pure build without being able to compile C or C++" step.

So the `cc1` calls can also be seen in context. `/dev/tty` is used instead of writing to standard output because standard output is expectd to be redirected and may in some cases even be consulted by code in a build script, whereas we really do want to write to the terminal. This also: - Has the `wrap1` script use `-e` so that it will stop at the first error. No command before the last one should fail. If the `flock` command fails, either due to a lock-related error or due to an error in `tee`, we want to know about it. I should've done this before, but it's more important now that I'm also tee-ing to `/dev/tty`: it's not obvious that this would work on CI. - Tweaks the code style of the `printf` command, to make the command slightly more readable, and make clearer that this is really just being used as a portable alternative to `echo`.

Because it uses the `$$` PID of the per-step shell. (I think the pipe object it goes to may differ across steps too.) This change gets interleaved high-level `cargo` output and lower level `cc1` logging, without any other lower level commands as would be obtained by telling `cargo` to produce more verbose output. Those two kinds of output are displayed together from the `cargo install` step. The full log of all `cc1` commands is again displayed (unless there are no C or C++ builds, then it wouldn't be created) in the subsequent step. At this point, I think this approach is working better than the preceding approach of deleting `cc1` (and, if present, `cc1plus`) so `cargo` fails if anything uses it. This makes it easier to figure out everything that will try to run `gcc` in ways that result in `gcc` running `cc1` (or `cc1plus`), which should usually only happen in order to compile C (or C++) code or to perform a closely related operation. The following two changes are deliberately not made at this time: - Caching with the `rust-cache` action is not added back to this job. Although the build is no longer inefficient and complex (as in the approach of building, cleaning, breaking GCC, and building again) here there is a simpler reason not to cache: we want to find out if *any* build step, including for dependencies, is compiling C (or C++). Caching could potentially hide that, if a cached build artifact that had done so were able to be reused. - No attempt is made to deparallelize the build. It would be useful to do something like `-j1` so that the interleaved log output would always pertain to an immediately adjacent step. However, that is usually already the case, and native code builds are not necessarily successfully deparallelized just by telling `cargo` to only do one task at a time, since they use custom commands (and even sometimes build systems) controlled through per-crate `build.rs` code and build dependencies. Also, since we're not caching, allowing parallelism may be useful to preserve speed. (The `flock` command is only locking logging, not the actual compilations.)

This changes "minimal" to "limited" in the human-readable step names that characterize the omission of some dev tools from the container environment in `pure-rust-build`. - "Minimal" has connotations of being a lower bound (so no matter how much one has, if it is at least so much, it satisfies a minimal requirement), while "limited" emphasizes the restriction. - `minimal` is a profile name for `rustup`, while no profile name coincides with or is strongly similar to "limited".

NobodyXu · 2024-11-17T07:42:47Z

rust-lang/cc-rs#1292

P.S. cc-rs will have the ability to disable compilation soon.

it will be released on next Fri/Sat

Byron

This is wonderful, and I am amazed how quickly you could cook this up - after all there is considerable finesse involved here and I don't claim to understand it in detail just yet.

But nonetheless, I have seen the output and it's perfectly fits its purpose, showing that there is indeed more C being compiled than originally thought, with some of it being impossible to remove unless ring stops relying on it.

Thanks so much!

NobodyXu · 2024-11-17T08:01:26Z

with some of it being impossible to remove unless ring stops relying on it.

ring uses of c is a bit different, it actually uses c code generated from some perl script.

The perl script is verified to generate correct code, so I think they probably won't remove it.

Also ring depends aws-lc-rs by default which also need c compiler

AFAIK ring uses perl-generated C code to provide guarantee on spectre/meltdown/timed attacks, plus some other crypto guarantees.

I suppose it is possible to write it in rust while providing the same guarantee, but that would require quite some work in rustc as well for that guarantee.

Byron · 2024-11-17T08:44:09Z

Thanks for elaborating!

It seems that max-pure was a pipe-dream to begin with, and what's possible is a pure build that drops everything that needs a C compiler. That one wouldn't be able to deal with HTTPS then and thus confined to local operation in practice.

NobodyXu · 2024-11-17T08:49:39Z

Well if you use http only, use the raw git protocol or use the ssh protocol with external binary, then that can work without c compiler.

It'd work for some very specialised workflow, though yeah, for most workload and the general use case it is not possible.

This adds another step to `pure-rust-build` that fails -- and fails the job -- when there are any dependencies named `*-sys` other than `linux-raw-sys`, which is known about. (This is independent of the use of C in `ring` -- discussed in GitoxideLabs#1681, GitoxideLabs#1682, and GitoxideLabs#1684 -- because `ring` is not, and does not use, a `*-sys` dependency.) This should fail prior to 3506afb (GitoxideLabs#1684) and pass afterwards.

This adds comments, and also refactors, to make these steps of the `pure-rust-build` CI job more readable, and to make clearer what the wrapping does and how it works: - "Wrap cc1 (and cc1plus if present) to record calls" - "Build max-pure with limited dev tools and log cc1" The logic clarified here was introduced in GitoxideLabs#1682. The clarification is mainly through these two changes: - Document each of the scripts the steps create and use with an explanatory comment above the "stanza" of code that creates it. - Extract the command to create the `~/display` symlink to such a script, which also has such a comment to explain the effect of writing to that symlink, why it is needed, and why it has to be set in the same GitHub Actions step as the related `cargo` command. Two other less significant clarifcations are made, which arguably are not refactorings in that they could change the behavior (for the better) in some hypothetical situations, but the goal is clarity rather than a behavioral change: - In the scripts that had more than one non-shebang line, take `-e` out of the shebangs and use a `set -e` commmand. This makes no difference in how the scripts are used, since they are always executed directoy. But may make them easier to read, as readers need not check that they are only run in this way to verify their understanding of what they do. - Set `noclobber` in the step that uses `>` to create the script files in `/usr/local`, so that if they somehow clash with files already there, we get an error rather than proceeding and maybe having them called in unantiicpated ways. The likelihood this would happen on a GHA runner is very low, so the real impact of this change is to make immediately clear to readers that the scripts and their names do not have a pre-defined meaning but are instead simply helpers for these GHA steps.

EliahKagan · 2025-03-03T00:20:01Z

and I don't claim to understand it in detail just yet.

I've opened #1872 to make this clearer.

This adds comments, and also refactors, to make these steps of the `pure-rust-build` CI job more readable, and to make clearer what the wrapping does and how it works: - "Wrap cc1 (and cc1plus if present) to record calls" - "Build max-pure with limited dev tools and log cc1" The logic clarified here was introduced in GitoxideLabs#1682. The clarification is mainly through these two changes: - Document each of the scripts the steps create and use with an explanatory comment above the "stanza" of code that creates it. - Extract the command to create the `~/display` symlink to such a script, which also has such a comment to explain the effect of writing to that symlink, why it is needed, and why it has to be set in the same GitHub Actions step as the related `cargo` command. Two other less significant clarifcations are made, which arguably are not refactorings in that they could change the behavior (for the better) in some hypothetical situations, but the goal is clarity rather than a behavioral change: - In the scripts that had more than one non-shebang line, take `-e` out of the shebangs and use a `set -e` commmand. This makes no difference in how the scripts are used, since they are always executed directoy. But may make them easier to read, as readers need not check that they are only run in this way to verify their understanding of what they do. - Set `noclobber` in the step that uses `>` to create the script files in `/usr/local`, so that if they somehow clash with files already there, we get an error rather than proceeding and maybe having them called in unantiicpated ways. The likelihood this would happen on a GHA runner is very low, so the real impact of this change is to make immediately clear to readers that the scripts and their names do not have a pre-defined meaning but are instead simply helpers for these GHA steps.

EliahKagan added 7 commits November 16, 2024 21:02

Log cc1 calls instead of deleting cc1

69db5b9

Synchronize the log writes

90a59f9

GHA doesn't have /dev/tty, let's try this

37a519a

Some troubleshooting

d7d80cf

EliahKagan mentioned this pull request Nov 17, 2024

max-pure feature is not pure (pulls in libsqlite3-sys) #1681

Closed

Byron approved these changes Nov 17, 2024

View reviewed changes

Byron enabled auto-merge November 17, 2024 07:56

Byron merged commit 906acd3 into GitoxideLabs:main Nov 17, 2024
19 checks passed

EliahKagan deleted the run-ci/impure branch November 17, 2024 08:09

EliahKagan mentioned this pull request Nov 18, 2024

Check for unrecognized *-sys dependencies #1688

Merged

EliahKagan mentioned this pull request Mar 3, 2025

Clarify pure-rust-build cc1 wrapping #1872

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Have `pure-rust-build` job find what compiles C code #1682

Have `pure-rust-build` job find what compiles C code #1682

Uh oh!

EliahKagan commented Nov 17, 2024 •

edited

Loading

Uh oh!

NobodyXu commented Nov 17, 2024

Uh oh!

Byron left a comment

Uh oh!

Uh oh!

NobodyXu commented Nov 17, 2024 •

edited

Loading

Uh oh!

Byron commented Nov 17, 2024

Uh oh!

NobodyXu commented Nov 17, 2024

Uh oh!

EliahKagan commented Mar 3, 2025

Uh oh!

Uh oh!

Uh oh!

Have pure-rust-build job find what compiles C code #1682

Have pure-rust-build job find what compiles C code #1682

Uh oh!

Conversation

EliahKagan commented Nov 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NobodyXu commented Nov 17, 2024

Uh oh!

Byron left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NobodyXu commented Nov 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Byron commented Nov 17, 2024

Uh oh!

NobodyXu commented Nov 17, 2024

Uh oh!

EliahKagan commented Mar 3, 2025

Uh oh!

Uh oh!

Have `pure-rust-build` job find what compiles C code #1682

Have `pure-rust-build` job find what compiles C code #1682

EliahKagan commented Nov 17, 2024 •

edited

Loading

NobodyXu commented Nov 17, 2024 •

edited

Loading