Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Set CARGO_CHECK environment variable when type checking #3748

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

akiselev
Copy link

@akiselev akiselev commented Dec 21, 2024

Add a new environment variable CARGO_CHECK that is set to true when running cargo check so build scripts can skip expensive compilation steps that are unnecessary for Rust type checking, such as compiling external C++ code in cxx based projects.

Rendered

@akiselev akiselev changed the title RFC: Add CARGO_CHECK environment variable when type checking RFC: Set CARGO_CHECK environment variable when type checking Dec 21, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@akiselev akiselev Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the links! I searched Rust issues and RFCs but forgot to search the cargo repo ☹️

It doesn't look like the discussion has moved much in years, is there a reason against forcing the issue via RFC at this point?

The Rust landscape has changed significantly since the last major activity in the discussion and rust-analyzer seems to be the de-facto LSP implementation at this point. What are the downsides of introducing an LSP specific check command whose only difference (for now) is setting that environment variable? Or the alternative, allowing callers to set a CARGO_NO_BUILD environment variable themselves , one that is officially sanctioned and supported by Rust/Cargo (if not set automatically)?

I understand and appreciate the hesitance to stabilize contracts without ironing out all of the generalities but the discussion about cargo modes and the cargo check && cargo build feels very ivory tower. Rust is increasingly being used to integrate with C++ code beyond *-sys crates as cxx and other tools mature, and I feel like a way to notify build scripts not to run extraneous steps is very much needed regardless of the aforementioned issues. Personally I always configure rust-analyzer to use a subdirectory so that it doesn't block cargo build anyway, so build caching between cargo check and cargo build wouldn't apply (and I'm curious what fraction of the community does too)

That said, I'm biased as I feel acute pain with cxx/cxx-qt, where sccache doesn't seem to help. Worst case scenario I can set the environment variables myself and use a custom fork.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like the discussion has moved much in years, is there a reason against forcing the issue via RFC at this point?

No. While RFC doesn't really force anything until accepted, it is good to open a discussion. Sometimes people hang out in https://internals.rust-lang.org/ first for pre-RFC, before preparing a more formal proposal here.


2. Do Nothing: If we do nothing, build scripts will continue to run all build steps even when it's not necessary, significantly impacting Rust ergonomics when interfacing with exernal languages.

# Prior art
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind providing link to each prior art so we can find reference easier?

@weihanglo weihanglo added the T-cargo Relevant to the Cargo team, which will review and decide on the RFC. label Dec 21, 2024
- Projects using Protocol Buffers generate Rust code from .proto files
- bindgen generates Rust bindings from C/C++ headers

Currently, every time rust-analyzer runs `cargo check`, all build scripts must execute their full build process, including steps like compiling C++ code that are only needed for linking but not for type checking. This significantly impacts IDE responsiveness, especially in projects with complex build scripts.
Copy link
Contributor

@kpreid kpreid Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something seems missing from this description. In my understanding of how to use Cargo, a well-behaved build script should produce an appropriate set of rerun-if-changed directives, meaning that the build script will not be run “every time” unless its dependencies are actually changed.

There are of course specific scenarios where it will be rerun regardless, such as if you are editing one of the build-dependencies, but this paragraph suggests that build scripts predictably run on every cargo check in every project that has one, not just the first cargo check or cargo build. In my mind, if that’s happening, that indicates a bug in the build script. (Or at least an incompleteness — rerunning every time is the default behavior of Cargo given zero rerun-if-* directives, but for any build script that runs long enough to notice, that should be addressed by adding those directives.)

Can you clarify this? Is the build script in this scenario unable to use rerun-if directives for some reason?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, I miswrote that paragraph. I changed it to Currently, every time rust-analyzer runs cargo check, the build script in the changed crate must execute its full build process, ...

Copy link
Contributor

@kpreid kpreid Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that’s still not true in general. If a build script reruns, then one of the following situations applies:

  • The build script does not have any rerun-if-* dependencies declared; that can and should be fixed.
  • You modified one of its [build-dependencies].
  • You modified a file that is a rerun-if-changed dependency of the build script, so it should rerun.
  • You (implicitly or explicitly) changed a environment variable the build script depends on with rerun-if-env-changed, e.g. by running cargo check from two different environments (IDE and separate terminal). Again, this is an actual change in a dependency, though possibly an unintended one.

Changing the Rust code of a crate does not guarantee that its build script will rerun. Build scripts are rerun only if one of their dependencies changes, or if they failed to declare dependencies.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what you mean now.

The cxx crate parses Rust source files to generate bindings from cxx::bridge macros and the Rust source files get added via cargo::rerun-if-changed. Normally this wouldn't be a problem since C/C++ bindings are rarely changed, but in cxx-qt projects the bridge code is the interface between QML/JS GUI code and the Rust backend so it receives a lot more attention while incurring additional QT build tooling overhead.

I'll add a paragraph explaining that detail. I didn't realize the issue I was experiencing is so specific to cxx though

# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Cargo will set the `CARGO_CHECK` environment variable to `true` when running `cargo check`
Copy link
Contributor

@kpreid kpreid Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is unwise to name and describe this as for cargo check in particular, for two reasons:

  • A future-compatibility hazard, especially in context of extending this to other modes as noted in the future possibilities section — are build scripts going to have to check for CARGO_CHECK || CARGO_DOC || CARGO_CLIPPY to be efficient? I think we’d see future arguments over whether commands are “check-like enough to count” rather than introducing more, which would spread confusion and create a tradeoff we can just avoid from the start.
  • Build script authors might decide to intentionally do non-additive things based on the command in use (like generating different bindings code that’s more development-friendly in some way), leading to incorrect caching.

I think it would be better to name and describe as something like:

Cargo will set the CARGO_WILL_NOT_LINK environment variable if Cargo’s build plan is known not to include any linking (and therefore is known not to depend on any foreign object files the build script might produce, or any of its cargo::rustc-link-* directives). Whenever CARGO_WILL_NOT_LINK is present, you may skip producing object files. All generated Rust code or other inputs to the Rust compilation (such as files for include_str! or cargo::rustc-cfg directives) must be unaffected.

The value of CARGO_WILL_NOT_LINK is reserved for making finer distinctions in the future; currently, it is always set to "1", but build scripts should check only for its presence.

This does change the behavior because the definition includes all of cargo check, cargo fix, cargo doc, and cargo clippy, but I think that’s the right thing to do. It’s also very specific about what the build script may do and should not do, to ensure that the results of the build script can still validly be cached.

(I specified the value to be "1" and not "true" because that's what Cargo uses for feature flags, the other existing example of “boolean” environment variables.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great points!

I think your alternative is a better approach, especially not to step on the toes of the parallel cargo modes discussion. I'll wait for more input and then probably update the RFC to the alternative method after the holidays

Copy link
Contributor

@tgross35 tgross35 Dec 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a handful of environment variables that don't have the CARGO_ prefix, e.g. OUT_DIR TARGET NUM_JOBS OPT_LEVEL. My read is that these unprefixed variables are loosely meant to be portable to other build systems that might want to reuse build scripts. Since the proposed feature is certainly something that other build systems could make use of, maybe it should also be unprefixed?

A name like CHECK_ONLY would be reasonably intuitive. WILL_NOT_LINK or SKIP_CODEGEN are a bit more technical but maybe more accurate with e.g. rustdoc. NO_CODEGEN is consistent with the unstable rustc -Zno-codegen. I think Cargo currently gets the check-only functionality by passing --emit=dep-info,metadata.

List of the existing environment variables for reference https://doc.rust-lang.org/cargo/reference/environment-variables.html

generate_rust_bindings();

// Only compile external code when not type checking
if std::env::var("CARGO_CHECK").unwrap() != "true" {
Copy link
Contributor

@kpreid kpreid Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text below says “The environment variable will not be set…” but this code panics if the environment variable is not set.

Suggested change
if std::env::var("CARGO_CHECK").unwrap() != "true" {
if std::env::var("CARGO_CHECK").is_ok() {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thank you

@clarfonthey
Copy link
Contributor

Is there a reason why you propose an environment variable instead of just, a cfg option? Sure, the environment variable can be accessed in the main function in the build.rs script, but it can't be used for conditional compilation even though that would be incredibly handy as well. This would not only remove the runtime penalty of doing all these things, but the compile time penalty of using them in the build script as well.

You would just have to ensure that this cfg option is only available in the build script itself, and not in the actual crate or its dependencies, since you could do some really naughty things otherwise…

@tgross35
Copy link
Contributor

Is there a reason why you propose an environment variable instead of just, a cfg option? Sure, the environment variable can be accessed in the main function in the build.rs script, but it can't be used for conditional compilation even though that would be incredibly handy as well. This would not only remove the runtime penalty of doing all these things, but the compile time penalty of using them in the build script as well.

Wouldn't this be about the same since Cargo needs to build all build-dependencies regardless of whether it is checking or compiling? I would expect that excluding these dependencies would be more significant than time savings from partially configuring out build.rs (since the build script itself usually doesn't contain a lot of slow-to-compile logic), but that would require being able to specify something like codegen-only build dependencies or features.

One advantage of using an environment variable is that you can compile the build script once and use it for both checking and codegen.

- `cargo run`
- `cargo test`

# Drawbacks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #3748 (comment): as proposed, this is only beneficial in cases where whatever the build script does is more significant than compiling the build script + deps itself. IME this is indeed the common case since often a crate like cmake or bindgen is needed to generate the headers, actually compiling only adds cmake --build.


This feature primarily benefits library authors who maintain build scripts, especially those working with external code generation and compilation. Regular Rust developers using these libraries will automatically benefit from improved IDE performance without needing to modify their code.

# Reference-level explanation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should address things like caching and output generation. Quoting @alexcrichton in rust-lang/cargo#4001 (comment):

I'm not 100% certain this still fits in Cargo myself. The major downside of implementing a feature like this is that cargo check && cargo build gets slower than it currently is. Cargo currently caches build script invocations for those two commands, which means that all the work done by cargo check is reused by cargo build and isn't redone.

[...]

As an author of crates like cc and cmake it's also somewhat ambiguous to me about what "check mode" would do for libraries like that. Should they do nothing? Type-check the C code? Make sure it all compiles? Basically I don't think that C/C++ have any real meaningful distinction like Rust does for cargo check and cargo build, so there's not really an obvious choice of what these library crates would do, which would require even more opt-in or configuration on behalf of all users.

High level proposal: for a build script, CHECK_ONLY should mean:

  1. Do any relevant checks that don't include codegen, if possible
  2. Do what is necessary to check the Rust code
  3. This can always be disregarded in favor of running the full build script

So for C code, CHECK_ONLY would mean doing the C/++ configure (but not build) step and possibly invoking bindgen. Usually this is ./configure or cmake without --build to verify dependencies and environment, and prepare final headers and source files. bindgen can then be run on these final headers. !CHECK_ONLY then means that make, cmake --build, or cc should be run.

OUT_DIR should be the same between check and build so configuration output can be reused. The build script will probably always run the congigure step, cmake knows how to skip a lot if files are already up to date 1.

For crates, cc should totally ignore this environment variable and not change anything - the build script author will just be have the option to skip invoking cc if CHECK_ONLY is set. cmake's Config should add new methods like only_configure (always called) and only_build (called if CHECK_ONLY is unset), since Config::build currently does both these things. cmake::build and Config::build could optionally listen to CHECK_ONLY to determine whether to run the only_build step.

Footnotes

  1. Cmake's configure caching can sometimes be pretty slow for bigger projects. It would be possible for build scripts to thumbprint relevant environment and skip invoking cmake if OUT_DIR has an updated thumbprint, but this would be advanced usage.


3. **Maintenance Burden**: The Rust and Cargo teams will need to maintain this feature and ensure it remains consistent across different commands and contexts.

# Rationale and alternatives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rust-lang/cargo#10126 proposed CARGO_MODE that adds doc, test, doctest, and a handful of others. Personally I would rather have a single environment variable that says whether or not codegen should happen (as is proposed here) rather than needing to know which modes do and don't require codegen, which isn't robust against adding new modes. Somehow indicating bench/test mode could be useful but this seems better as an independent feature.

@clarfonthey
Copy link
Contributor

Wouldn't this be about the same since Cargo needs to build all build-dependencies regardless of whether it is checking or compiling? I would expect that excluding these dependencies would be more significant than time savings from partially configuring out build.rs (since the build script itself usually doesn't contain a lot of slow-to-compile logic), but that would require being able to specify something like codegen-only build dependencies or features.

That's fair, although I guess that my thought process was that dependencies don't change often and thus can have their builds cached, whereas build scripts would essentially need to be built and run no matter what.

Since the RFC mentions rust-analyzer specifically, I'd imagine that the primary case is speeding up type checking between changes of the code when dependencies would already be built, but I guess that also changing the build script would be a potential use case.

I guess it is one of those weird cases where compiling two copies only is a big deal if you need to compile both copies often, whereas here you'd likely only be building one or the other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-cargo Relevant to the Cargo team, which will review and decide on the RFC.
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants