Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cranelift ISLE build script: make manifest check optional, to avoid dependencies where needed. #3616

Closed
wants to merge 1 commit into from

Conversation

cfallin
Copy link
Member

@cfallin cfallin commented Dec 17, 2021

Some use-cases are very sensitive to the number of crates pulled in,
both for audit-related reasons and for build-time reasons.

Our manifest-based ISLE-up-to-date approach in #3534 was meant to help
with this while still avoiding the footgun of a completely opt-in source
rebuild: by including generated source in the checked-in tree, and just
checking that source is up to date, we allow for fast builds without the
whole ISLE compiler meta-step, but still catch attempted use of stale
generated source (and allow the developer to opt-in to regenerating the
in-tree source).

Unfortunately this still requires the hash algorithm itself (sha2)
which turns out to pull in a number of other small crates. In cases
where we know the source won't be stale -- for example, depending on the
main branch in git, or a published crate version of
cranelift-codegen -- the checks are actually not needed at all.

This PR thus introduces a feature check-isle that controls whether
build.rs does the checks. It is on by default, so developer safety
remains: if someone checks out the source, modifies some ISLE, and then
does a cargo build, they will get an error that helps them with the
proper steps to regenerate the source. But, dependencies that know what
they are doing can turn it off with default-features = false.

I've verified that we have the expected small dependency tree now:

$ cargo tree --depth 1 -p cranelift-codegen --no-default-features --features "std unwind all-arch"
cranelift-codegen v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen)
├── cranelift-bforest v0.79.0 (/home/cfallin/work/wasmtime/cranelift/bforest)
├── cranelift-codegen-shared v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen/shared)
├── cranelift-entity v0.79.0 (/home/cfallin/work/wasmtime/cranelift/entity)
├── gimli v0.26.1
├── log v0.4.14
├── regalloc v0.0.33
├── smallvec v1.6.1
└── target-lexicon v0.12.0
[build-dependencies]
└── cranelift-codegen-meta v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen/meta)
[dev-dependencies]
└── criterion v0.3.5

Fixes #3609.

The diff looks wonky here but all I did was put the ISLE-related
logic inside a mod isle { ... } with a conditional-on-feature directive
and then alter the toplevel a bit.

@cfallin
Copy link
Member Author

cfallin commented Dec 17, 2021

Urgh, I forgot the one change in Cargo.toml that actually makes sha2 optional; fix coming momentarily!

…ependencies where needed.

Some use-cases are very sensitive to the number of crates pulled in,
both for audit-related reasons and for build-time reasons.

Our manifest-based ISLE-up-to-date approach in bytecodealliance#3534 was meant to help
with this while still avoiding the footgun of a completely opt-in source
rebuild: by including generated source in the checked-in tree, and just
checking that source is up to date, we allow for fast builds without the
whole ISLE compiler meta-step, but still catch attempted use of stale
generated source (and allow the developer to opt-in to regenerating the
in-tree source).

Unfortunately this still requires the hash algorithm itself (`sha2`)
which turns out to pull in a number of other small crates. In cases
where we know the source won't be stale -- for example, depending on the
`main` branch in git, or a published crate version of
`cranelift-codegen` -- the checks are actually not needed at all.

This PR thus introduces a feature `check-isle` that controls whether
`build.rs` does the checks. It is on by default, so developer safety
remains: if someone checks out the source, modifies some ISLE, and then
does a `cargo build`, they will get an error that helps them with the
proper steps to regenerate the source. But, dependencies that know what
they are doing can turn it off with `default-features = false`.

I've verified that we have the expected small dependency tree now:

```
$ cargo tree --depth 1 -p cranelift-codegen --no-default-features --features "std unwind all-arch"
cranelift-codegen v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen)
├── cranelift-bforest v0.79.0 (/home/cfallin/work/wasmtime/cranelift/bforest)
├── cranelift-codegen-shared v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen/shared)
├── cranelift-entity v0.79.0 (/home/cfallin/work/wasmtime/cranelift/entity)
├── gimli v0.26.1
├── log v0.4.14
├── regalloc v0.0.33
├── smallvec v1.6.1
└── target-lexicon v0.12.0
[build-dependencies]
└── cranelift-codegen-meta v0.79.0 (/home/cfallin/work/wasmtime/cranelift/codegen/meta)
[dev-dependencies]
└── criterion v0.3.5
```

Fixes bytecodealliance#3609.
@alexcrichton
Copy link
Member

If sha2 is uncomfortable enough to have a PR to implement this then it seems like we would want to disable this feature in the Wasmtime embedding, but that would then basically subvert the purpose of adding this check in the first place. If we take this even further than the feature here should be on-by-default for development but off-by-default for crates.io publications in the limit.

Overall I feel like we either need to find a slimmer crate so that it doesn't matter or we need to consider these concerns for Wasmtime as well. Personally I'd be fine just using std::hash::SipHasher since it's at least built-in to std and despite being deprecated it won't ever be removed.

@github-actions github-actions bot added the cranelift Issues related to the Cranelift code generator label Dec 17, 2021
cfallin added a commit to cfallin/wasmtime that referenced this pull request Dec 17, 2021
Fixes bytecodealliance#3609. It turns out that `sha2` is a nontrivial dependency for
Cranelift in many contexts, partly because it pulls in a number of other
crates as well.

One option is to remove the hash check under certain circumstances, as
implemented in bytecodealliance#3616. However, this is undesirable for other reasons:
having different dependency options in Wasmtime in particular for
crates.io vs. local builds is not really possible, and so either we
still have the higher build cost in Wasmtime, or we turn off the checks
by default, which goes against the original intent of ensuring developer
safety (no mysterious stale-source bugs).

This PR uses `SipHash` instead, which is built into the standard
library. `SipHash` is deprecated, but it's fixed and deterministic
(across runs and across Rust versions), which is what we need, unlike
the suggested replacement `std::collections::hash_map::DefaultHasher`.
The result is only 64 bits, and is not cryptographically secure, but we
never needed that; we just need a simple check to indicate when we
forget a `rebuild-isle`.
@cfallin
Copy link
Member Author

cfallin commented Dec 17, 2021

That's a good point! I took the SipHash approach instead in #3619; closing this PR.

@cfallin cfallin closed this Dec 17, 2021
cfallin added a commit to cfallin/wasmtime that referenced this pull request Dec 17, 2021
Fixes bytecodealliance#3609. It turns out that `sha2` is a nontrivial dependency for
Cranelift in many contexts, partly because it pulls in a number of other
crates as well.

One option is to remove the hash check under certain circumstances, as
implemented in bytecodealliance#3616. However, this is undesirable for other reasons:
having different dependency options in Wasmtime in particular for
crates.io vs. local builds is not really possible, and so either we
still have the higher build cost in Wasmtime, or we turn off the checks
by default, which goes against the original intent of ensuring developer
safety (no mysterious stale-source bugs).

This PR uses `SipHash` instead, which is built into the standard
library. `SipHash` is deprecated, but it's fixed and deterministic
(across runs and across Rust versions), which is what we need, unlike
the suggested replacement `std::collections::hash_map::DefaultHasher`.
The result is only 64 bits, and is not cryptographically secure, but we
never needed that; we just need a simple check to indicate when we
forget a `rebuild-isle`.
cfallin added a commit to cfallin/wasmtime that referenced this pull request Dec 17, 2021
Fixes bytecodealliance#3609. It turns out that `sha2` is a nontrivial dependency for
Cranelift in many contexts, partly because it pulls in a number of other
crates as well.

One option is to remove the hash check under certain circumstances, as
implemented in bytecodealliance#3616. However, this is undesirable for other reasons:
having different dependency options in Wasmtime in particular for
crates.io vs. local builds is not really possible, and so either we
still have the higher build cost in Wasmtime, or we turn off the checks
by default, which goes against the original intent of ensuring developer
safety (no mysterious stale-source bugs).

This PR uses `SipHash` instead, which is built into the standard
library. `SipHash` is deprecated, but it's fixed and deterministic
(across runs and across Rust versions), which is what we need, unlike
the suggested replacement `std::collections::hash_map::DefaultHasher`.
The result is only 64 bits, and is not cryptographically secure, but we
never needed that; we just need a simple check to indicate when we
forget a `rebuild-isle`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cranelift Issues related to the Cranelift code generator
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cranelift: sha2 dependency introduces many new indirect dependencies
2 participants