Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Have pure-rust-build job find what compiles C code #1682

Merged
merged 8 commits into from
Nov 17, 2024

Conversation

EliahKagan
Copy link
Member

@EliahKagan EliahKagan commented Nov 17, 2024

This causes the pure-rust-build CI job to provide more insight about what and how some of gitoxide's dependencies, even with max-pure, compile C code. This is related to, and hoped to help with, #1681, but it is not a fix for it, because this does not change any of the project's dependencies. It only augments that CI job. The approach is twofold:

  1. Look for crates in the dependency tree that very strongly suggest that this is happening. Unless none are found, show the full tree and highlight the lines that show such dependencies.
  2. Instrument cc1, which gcc calls to do the actual work of compiling C and otherwise is rarely called, interleaving a log of its calls with high-level cargo output (while still omitting all other forms of verbose output), and also displaying the cc1 log afterwards.

The new steps detect when and, to a large extent, how and why building max-pure builds C code. They would be able to fail the job except that, for now, I have used step-level continue-on-error to prevent this.

The commit messages have further details, and some rationale and coverage of subtleties. They also reflect an initially different approach that I replaced with (2). Before moving aside cc1 and putting in place a script that mocks it, reports the call, and delegates to it, I had deleted cc1 to observe the first error from the build (e1555bd).

That approach is currently not nearly as good, because it requires two builds (to maintain full coverage of what was previously tested) and because it only reveals the first failure, whereas instrumenting cc1 reveals all calls to it (f0c7d42).

However, I've left it in the history, in part because it's a technique that might revisited in the future when a fix or greater progress on #1681 are in place. I've also left in some small false starts related to logging, since I think that make it easier to discern why a few things are being done the way they are.

This adds more checks to the `pure-rust-build` CI job to reveal
what dependencies of `max-pure` are building C code. The new checks
are failing. They relate to GitoxideLabs#1681 and should pass once that bug is
fixed. The main goal is to produce information helpful for GitoxideLabs#1681.

As currently written, the new steps do not fail the job, because
the failing steps have `continue-on-error: true`. This is so that
regressions (that would fail the preexisting steps) remain readily
detected. This should ideally be temporary; the new steps, if kept,
should eventually be strengthened so they can fail the job.

Currently this includes both regular and build dependencies.

There are two new checks:

1. Search the `max-pure` dependency tree for packages that require
   a C or C++ compiler, or that are in practice only likely to be
   used to build C or C++ code. If any are found, display the whole
   tree, with matching lines highlighted, and fail the step.

2. After all steps that need it to be in working order, break GCC
   in the container by removing the `cc1` binary, then attempt a
   clean `max-pure` build, to reveal the first failure, if any, and
   let it fail the step.

   The `gcc` command itself is needed, because `rustc` calls the
   linker through it, even when no non-Rust code is compiled as
   part of the build.

   As discussed in GitoxideLabs#1664 comments, installing GCC in a way that is
   not broken but omits the ability to compile even C code, while
   it may be possible, is not commonly done and may require long
   running build steps or a new Docker image.

   Fortunately, the `gcc` command is just a "compiler driver"; the
   C compiler in GCC is `cc1`, and this can be removed without
   breaking most uses of GCC that don't involve actual compilation.

   This likewise removes `cc1plus` if found, even though it may not
   be present since we already verified that `g++` is not
   installed. (The deletion of `cc1` is the important part.)

Since the job now cleans and starts a build that fails due to the
absence of `cc1`, this removes the `rust-cache` step. (Caching
would probably still confer some benefit, since it caches
dependencies. Even after running `cargo clean` between the builds,
most of these should be reacquired when building with `cc1`.
However, to make the whole thing easier to reason about, the step
is removed. It can be re-added if the job runs too slowly.)

This considers any `-sys` crate to be a dependency that needs to
build C. This is in principle not always the case, since they may
use existing shared library binaries, though there may still be
stub code that has to be built in C. The relevance of the new
steps varies depending on precisely how one conceptualizes "pure"
in `max-pure`, which is another reason for them to start out as
`continue-on-error`.

The `-sys` crates this finds in the `max-pure` dependency tree are:

- `libsqlite3-sys` via `rusqlite`. This was reported in GitoxideLabs#1681 and
  strongly seems to be unintended for `max-pure`.

- `linux-raw-sys` via `rustix` and `xattr`. It's less clear whether
  this should be considered impure, since its purpose is to
  interact with an operating system facility, and it may be
  comparable to the use of `libc` (on Unix-like systems). However,
  this does seem like it would not be able to build without a C
  compiler.

The dependency tree also has an occurrence of the `cc` crate
without a related `-sys` dependency, as required by `ring`. The
`ring` crate contains a C test program that is built when building
`ring`. That is currently the first build error shown in the output
of the "Do max-pure build without being able to compile C or C++"
step.
So the `cc1` calls can also be seen in context.

`/dev/tty` is used instead of writing to standard output because
standard output is expectd to be redirected and may in some cases
even be consulted by code in a build script, whereas we really do
want to write to the terminal.

This also:

- Has the `wrap1` script use `-e` so that it will stop at the first
  error. No command before the last one should fail. If the `flock`
  command fails, either due to a lock-related error or due to an
  error in `tee`, we want to know about it.

  I should've done this before, but it's more important now that
  I'm also tee-ing to `/dev/tty`: it's not obvious that this would
  work on CI.

- Tweaks the code style of the `printf` command, to make the
  command slightly more readable, and make clearer that this is
  really just being used as a portable alternative to `echo`.
Because it uses the `$$` PID of the per-step shell.

(I think the pipe object it goes to may differ across steps too.)

This change gets interleaved high-level `cargo` output and lower
level `cc1` logging, without any other lower level commands as
would be obtained by telling `cargo` to produce more verbose
output. Those two kinds of output are displayed together from the
`cargo install` step. The full log of all `cc1` commands is again
displayed (unless there are no C or C++ builds, then it wouldn't be
created) in the subsequent step.

At this point, I think this approach is working better than the
preceding approach of deleting `cc1` (and, if present, `cc1plus`)
so `cargo` fails if anything uses it. This makes it easier to
figure out everything that will try to run `gcc` in ways that
result in `gcc` running `cc1` (or `cc1plus`), which should usually
only happen in order to compile C (or C++) code or to perform a
closely related operation.

The following two changes are deliberately not made at this time:

- Caching with the `rust-cache` action is not added back to this
  job. Although the build is no longer inefficient and complex (as
  in the approach of building, cleaning, breaking GCC, and building
  again) here there is a simpler reason not to cache: we want to
  find out if *any* build step, including for dependencies, is
  compiling C (or C++). Caching could potentially hide that, if a
  cached build artifact that had done so were able to be reused.

- No attempt is made to deparallelize the build. It would be useful
  to do something like `-j1` so that the interleaved log output
  would always pertain to an immediately adjacent step. However,
  that is usually already the case, and native code builds are not
  necessarily successfully deparallelized just by telling `cargo`
  to only do one task at a time, since they use custom commands
  (and even sometimes build systems) controlled through per-crate
  `build.rs` code and build dependencies. Also, since we're not
  caching, allowing parallelism may be useful to preserve speed.
  (The `flock` command is only locking logging, not the actual
  compilations.)
This changes "minimal" to "limited" in the human-readable step
names that characterize the omission of some dev tools from the
container environment in `pure-rust-build`.

- "Minimal" has connotations of being a lower bound (so no matter
  how much one has, if it is at least so much, it satisfies a
  minimal requirement), while "limited" emphasizes the restriction.

- `minimal` is a profile name for `rustup`, while no profile name
  coincides with or is strongly similar to "limited".
@NobodyXu
Copy link
Contributor

rust-lang/cc-rs#1292

P.S. cc-rs will have the ability to disable compilation soon.

it will be released on next Fri/Sat

Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wonderful, and I am amazed how quickly you could cook this up - after all there is considerable finesse involved here and I don't claim to understand it in detail just yet.

But nonetheless, I have seen the output and it's perfectly fits its purpose, showing that there is indeed more C being compiled than originally thought, with some of it being impossible to remove unless ring stops relying on it.

Thanks so much!

@Byron Byron enabled auto-merge November 17, 2024 07:56
@Byron Byron merged commit 906acd3 into GitoxideLabs:main Nov 17, 2024
19 checks passed
@NobodyXu
Copy link
Contributor

NobodyXu commented Nov 17, 2024

with some of it being impossible to remove unless ring stops relying on it.

ring uses of c is a bit different, it actually uses c code generated from some perl script.

The perl script is verified to generate correct code, so I think they probably won't remove it.

Also ring depends aws-lc-rs by default which also need c compiler

AFAIK ring uses perl-generated C code to provide guarantee on spectre/meltdown/timed attacks, plus some other crypto guarantees.

I suppose it is possible to write it in rust while providing the same guarantee, but that would require quite some work in rustc as well for that guarantee.

@EliahKagan EliahKagan deleted the run-ci/impure branch November 17, 2024 08:09
@Byron
Copy link
Member

Byron commented Nov 17, 2024

Thanks for elaborating!

It seems that max-pure was a pipe-dream to begin with, and what's possible is a pure build that drops everything that needs a C compiler. That one wouldn't be able to deal with HTTPS then and thus confined to local operation in practice.

@NobodyXu
Copy link
Contributor

Well if you use http only, use the raw git protocol or use the ssh protocol with external binary, then that can work without c compiler.

It'd work for some very specialised workflow, though yeah, for most workload and the general use case it is not possible.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Nov 18, 2024
This adds another step to `pure-rust-build` that fails -- and fails
the job -- when there are any dependencies named `*-sys` other than
`linux-raw-sys`, which is known about.

(This is independent of the use of C in `ring` -- discussed
in GitoxideLabs#1681, GitoxideLabs#1682, and GitoxideLabs#1684 -- because `ring` is not, and does not
use, a `*-sys` dependency.)

This should fail prior to 3506afb (GitoxideLabs#1684) and pass afterwards.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Nov 18, 2024
This adds another step to `pure-rust-build` that fails -- and fails
the job -- when there are any dependencies named `*-sys` other than
`linux-raw-sys`, which is known about.

(This is independent of the use of C in `ring` -- discussed
in GitoxideLabs#1681, GitoxideLabs#1682, and GitoxideLabs#1684 -- because `ring` is not, and does not
use, a `*-sys` dependency.)

This should fail prior to 3506afb (GitoxideLabs#1684) and pass afterwards.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Nov 18, 2024
This adds another step to `pure-rust-build` that fails -- and fails
the job -- when there are any dependencies named `*-sys` other than
`linux-raw-sys`, which is known about.

(This is independent of the use of C in `ring` -- discussed
in GitoxideLabs#1681, GitoxideLabs#1682, and GitoxideLabs#1684 -- because `ring` is not, and does not
use, a `*-sys` dependency.)

This should fail prior to 3506afb (GitoxideLabs#1684) and pass afterwards.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Mar 2, 2025
This adds comments, and also refactors, to make these steps of the
`pure-rust-build` CI job more readable, and to make clearer what
the wrapping does and how it works:

- "Wrap cc1 (and cc1plus if present) to record calls"
- "Build max-pure with limited dev tools and log cc1"

The logic clarified here was introduced in GitoxideLabs#1682. The clarification
is mainly through these two changes:

- Document each of the scripts the steps create and use with an
  explanatory comment above the "stanza" of code that creates it.

- Extract the command to create the `~/display` symlink to such a
  script, which also has such a comment to explain the effect of
  writing to that symlink, why it is needed, and why it has to be
  set in the same GitHub Actions step as the related `cargo`
  command.

Two other less significant clarifcations are made, which arguably
are not refactorings in that they could change the behavior (for
the better) in some hypothetical situations, but the goal is
clarity rather than a behavioral change:

- In the scripts that had more than one non-shebang line, take `-e`
  out of the shebangs and use a `set -e` commmand. This makes no
  difference in how the scripts are used, since they are always
  executed directoy. But may make them easier to read, as readers
  need not check that they are only run in this way to verify
  their understanding of what they do.

- Set `noclobber` in the step that uses `>` to create the script
  files in `/usr/local`, so that if they somehow clash with files
  already there, we get an error rather than proceeding and maybe
  having them called in unantiicpated ways. The likelihood this
  would happen on a GHA runner is very low, so the real impact of
  this change is to make immediately clear to readers that the
  scripts and their names do not have a pre-defined meaning but are
  instead simply helpers for these GHA steps.
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Mar 2, 2025
This adds comments, and also refactors, to make these steps of the
`pure-rust-build` CI job more readable, and to make clearer what
the wrapping does and how it works:

- "Wrap cc1 (and cc1plus if present) to record calls"
- "Build max-pure with limited dev tools and log cc1"

The logic clarified here was introduced in GitoxideLabs#1682. The clarification
is mainly through these two changes:

- Document each of the scripts the steps create and use with an
  explanatory comment above the "stanza" of code that creates it.

- Extract the command to create the `~/display` symlink to such a
  script, which also has such a comment to explain the effect of
  writing to that symlink, why it is needed, and why it has to be
  set in the same GitHub Actions step as the related `cargo`
  command.

Two other less significant clarifcations are made, which arguably
are not refactorings in that they could change the behavior (for
the better) in some hypothetical situations, but the goal is
clarity rather than a behavioral change:

- In the scripts that had more than one non-shebang line, take `-e`
  out of the shebangs and use a `set -e` commmand. This makes no
  difference in how the scripts are used, since they are always
  executed directoy. But may make them easier to read, as readers
  need not check that they are only run in this way to verify
  their understanding of what they do.

- Set `noclobber` in the step that uses `>` to create the script
  files in `/usr/local`, so that if they somehow clash with files
  already there, we get an error rather than proceeding and maybe
  having them called in unantiicpated ways. The likelihood this
  would happen on a GHA runner is very low, so the real impact of
  this change is to make immediately clear to readers that the
  scripts and their names do not have a pre-defined meaning but are
  instead simply helpers for these GHA steps.
@EliahKagan
Copy link
Member Author

and I don't claim to understand it in detail just yet.

I've opened #1872 to make this clearer.

EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Mar 11, 2025
This adds comments, and also refactors, to make these steps of the
`pure-rust-build` CI job more readable, and to make clearer what
the wrapping does and how it works:

- "Wrap cc1 (and cc1plus if present) to record calls"
- "Build max-pure with limited dev tools and log cc1"

The logic clarified here was introduced in GitoxideLabs#1682. The clarification
is mainly through these two changes:

- Document each of the scripts the steps create and use with an
  explanatory comment above the "stanza" of code that creates it.

- Extract the command to create the `~/display` symlink to such a
  script, which also has such a comment to explain the effect of
  writing to that symlink, why it is needed, and why it has to be
  set in the same GitHub Actions step as the related `cargo`
  command.

Two other less significant clarifcations are made, which arguably
are not refactorings in that they could change the behavior (for
the better) in some hypothetical situations, but the goal is
clarity rather than a behavioral change:

- In the scripts that had more than one non-shebang line, take `-e`
  out of the shebangs and use a `set -e` commmand. This makes no
  difference in how the scripts are used, since they are always
  executed directoy. But may make them easier to read, as readers
  need not check that they are only run in this way to verify
  their understanding of what they do.

- Set `noclobber` in the step that uses `>` to create the script
  files in `/usr/local`, so that if they somehow clash with files
  already there, we get an error rather than proceeding and maybe
  having them called in unantiicpated ways. The likelihood this
  would happen on a GHA runner is very low, so the real impact of
  this change is to make immediately clear to readers that the
  scripts and their names do not have a pre-defined meaning but are
  instead simply helpers for these GHA steps.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants