Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
20906ec
Introduce `cargo kani assess scan` for analyzing multiple projects at…
tedinski Dec 18, 2022
eb5fe2a
Introduce --package-filter-file
tedinski Dec 22, 2022
aebc602
Merge remote-tracking branch 'origin/main' into assess-scan
tedinski Dec 22, 2022
e7e9b4c
Skip dependencies in the cargo metadata we examine
tedinski Dec 22, 2022
555c0c8
Apparently no_deps changes something rather significant. Do this less…
tedinski Dec 22, 2022
cc7dab2
pr comments, collapse all metadata into one, add serialize to tablebu…
tedinski Dec 22, 2022
36e9f53
Merge remote-tracking branch 'origin/main' into assess-scan
tedinski Dec 22, 2022
5005871
add doc, add top-100 dataset, more
tedinski Dec 27, 2022
3ae75db
Merge remote-tracking branch 'origin/main' into assess-scan
tedinski Dec 27, 2022
2289358
fmt
tedinski Dec 28, 2022
ddf74b4
Merge remote-tracking branch 'origin/main' into assess-scan
tedinski Dec 28, 2022
a89b842
Fix metadata reconstruction
tedinski Dec 28, 2022
208ec52
Remove all-features, it causes too many build failures
tedinski Dec 28, 2022
c9b653b
Add rudimentary test for assess scan
tedinski Dec 28, 2022
5226633
add some doc comments
tedinski Dec 28, 2022
4078382
Answers some more questions in the doc
tedinski Dec 29, 2022
399be4c
Apply suggestions from code review
tedinski Dec 30, 2022
3a8bceb
factor the scan regression into a separate script, add more docs
tedinski Dec 30, 2022
cd213db
Add a script demonstrating how to use scan on the top-100 repos
tedinski Dec 30, 2022
4e28027
Add tip about memory usage, clarify unsupported feature priorities
tedinski Jan 4, 2023
913b6c4
Merge remote-tracking branch 'origin/main' into assess-scan
tedinski Jan 4, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -65,4 +65,5 @@ exclude = [
"tests/perf",
"tests/cargo-ui",
"tests/slow",
"tests/assess-scan-test-scaffold",
]
1 change: 1 addition & 0 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
- [Working with CBMC](./cbmc-hacks.md)
- [Working with `rustc`](./rustc-hacks.md)
- [Command cheat sheets](./cheat-sheets.md)
- [cargo kani assess](./dev-assess.md)
- [Testing](./testing.md)
- [Regression testing](./regression-testing.md)
- [Book runner](./bookrunner.md)
Expand Down
185 changes: 185 additions & 0 deletions docs/src/dev-assess.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@
# `cargo kani assess`

Assess is an experimental new feature to gather data about Rust crates, to aid the start of proof writing.

In the short-term, assess collects and dumps tables of data that may help _Kani developers_ understand what's needed to begin writing proofs for another project.
For instance, assess may help answer questions like:

1. Does Kani successfully build all of the crates involved in this project? If not, why not?
2. Does Kani support all the Rust language features necessary to do verification with this project? If not, which are most important?

In the long-term, assess will become a user-facing feature, and help _Kani users_ get started writing proofs.
We expect that users will have the same questions as above, but in the long term, hopefully the answers to those trend towards an uninteresting "yes."
So the new questions might be:

3. Is this project ready for verification? Projects need to be reasonably well-tested first.
Our operating hypothesis is that code currently covered by unit tests is the code that could become covered by proofs.
4. How much of given project (consisting of multiple packages or workspaces) or which of the user's projects might be verifiable?
If a user wants to start trying Kani, but they have the choice of several different packages where they might try, we can help find the package with the lowest hanging fruit.
5. Given a package, where in that package's code should the user look, in order to write the first (or next) proof?

These long-term goals are only "hinted at" with the present experimental version of assess.
Currently, we only get as far as finding out which tests successfully verify (concretely) with Kani.
This might indicate tests that could be generalized and converted into proofs, but we currently don't do anything to group, rank, or otherwise heuristically prioritize what might be most "interesting."
(For instance, we'd like to eventually compute coverage information and use that to help rank the results.)
As a consequence, the output of the tool is very hard to interpret, and likely not (yet!) helpful to new or potential Kani users.

## Using Assess

To assess a package, run:

```text
cargo kani --enable-unstable assess
```

As a temporary hack (arguments shouldn't work like this), to assess a single cargo workspace, run:

```text
cargo kani --enable-unstable --workspace assess
```

To scan a collection of workspaces or packages that are not part of a shared workspace, run:

```text
cargo kani --enable-unstable assess scan
```

The only difference between 'scan' and 'regular' assess is how the packages built are located.
All versions of assess produce the same output and metrics.
Assess will normally build just like `cargo kani` or `cargo build`, whereas `scan` will find all cargo packages beneath the current directory, even in unrelated workspaces.
Thus, 'scan' may be helpful in the case where the user has a choice of packages and is looking for the easiest to get started with (in addition to the Kani developer use-case, of aggregating statistics across many packages).

(Tip: Assess may need to run for awhile, so try using `screen`, `tmux` or `nohup` to avoid terminating the process if, for example, an ssh connection breaks.
Some tests can also consume huge amounts of ram when run through Kani, so you may wish to use `ulimit -v 6000000` to prevent any processes from using more than 6GB.
You can also limit the number of concurrent tests that will be run by providing e.g. `-j 4`, currently as a prepended argument, like `--enable-unstable` or `--workspace` in the examples above.)

## What assess does

Assess builds all the packages requested in "test mode" (i.e. `--tests`), and runs all the same tests that `cargo test` would, except through Kani.
This gives end-to-end assurance we're able to actually build and run code from these packages, skipping nothing of what the verification process would need, except that the harnesses don't have any nondeterminism (`kani::any()`) and consequently don't "prove" much.
The interesting signal comes from what tests cannot be analyzed by Kani due to unsupported features, performance problems, crash bugs, or other issues that get in the way.

Currently, assess forces termination by using `unwind(1)` on all tests, so many tests will fail with unwinding assertions.

## Current Assess Results

Assess produces a few tables of output (both visually in the terminal, and in a more detailed json format) so far:

### Unsupported features

```text
======================================================
Unsupported feature | Crates | Instances
| impacted | of use
-------------------------------+----------+-----------
caller_location | 71 | 239
simd_bitmask | 39 | 160
...
```

The unsupported features table aggregates information about features that Kani does not yet support.
These correspond to uses of `codegen_unimplemented` in the `kani-compiler`, and appear as warnings during compilation.

Unimplemented features are not necessarily actually hit by (dynamically) reachable code, so an immediate future improvement on this table would be to count the features *actually hit* by failing test cases, instead of just those features reported as existing in code by the compiler.
In other words, the current unsupported features table is **not** what we want to see, in order to _perfectly_ prioritize implementing these features, because we may be counting features that no proof would ever hit.
A perfect signal here isn't possible: there may be code that looks statically reachable, but is never dynamically reachable, and we can't tell.
But we can use test coverage as an approximation: well-tested code will hopefully cover most of the dynamically reachable code.
The operating hypothesis of assess is that code covered by tests is code that could be covered by proof, and so measuring unsupported features by those actually hit by a test should provide a better "signal" about priorities.
Implicitly deprioritizing unsupported features because they aren't covered by tests may not be a bug, but a feature: we may simply not want to prove anything about that code, if it hasn't been tested first, and so adding support for that feature may not be important.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not convinced that's the case. I think test is a great starting point, but I don't think we should deprioritize things that aren't covered by tests.

We've talked about automatic implementation of harnesses before that wouldn't rely on tests. E.g.: Function that all its inputs implement the arbitrary type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not convinced that's the case. I think test is a great starting point, but I don't think we should deprioritize things that aren't covered by tests.

It is how assess works, and I thought I wasn't saying "this is how things ought to be" but "this is the operating hypothesis for assess" (see for instance that I explicitly wrote "may not be a bug, but a feature")

I guess I can clarify that this is the operating hypothesis for assess, not a team consensus. We want to look and see.


A few notes on terminology:

1. "Crates impacted" here means "packages in the current workspace (or scan) where the building of that package (and all of its dependencies) ultimately resulted in this warning."
For example, if only assessing a single package (not a workspace) this could only be `1` in this column, regardless of the number of dependencies.
2. "Instances of use" likewise means "total instances found while compiling this package's tests and all the (reachable) code in its dependencies."
3. These counts are influenced by (static) reachability: if code is not potentially reachable from a test somehow, it will not be built and will not be counted.

### Test failure reasons

```text
================================================
Reason for failure | Number of tests
------------------------------+-----------------
unwind | 61
none (success) | 6
assertion + overflow | 2
...
```

The test failure reasons table indicates why, when assess ran a test through Kani, it failed to verify.
Notably:

1. Because we force termination with `unwind(1)`, we expect `unwind` to rank highly.
2. We do report number of tests succeeding on this table, to aid understanding how well things went overall.
3. The reported reason is the "property class" of the CBMC property that failed. So `assertion` means an ordinary `assert!()` was hit (or something else with this property class).
4. When multiple properties fail, they are aggregated with `+`, such as `assertion + overflow`.
5. Currently this table does not properly account for `should_fail` tests, so `assertion` may actually be "success": the test should hit an assertion and did.

### Promising test cases

```text
=============================================================================
Candidate for proof harness | Location
-------------------------------------------------------+---------------------
float::tests::f64_edge_cases | src/float.rs:226
float::tests::f32_edge_cases | src/float.rs:184
integer::tests::test_integers | src/integer.rs:171
```

This table is the most rudimentary so far, but is the core of what long-term assess will help accomplish.
Currently, this table just presents (with paths displayed in a clickable manner) the tests that successfully "verify" with Kani.
These might be good candidates for turning into proof harnesses.
This list is presently unordered; the next step for improving it would be to find even a rudimentary way of ranking these test cases (e.g. perhaps by code coverage).

## How Assess Works

`kani-compiler` emits `*.kani-metadata.json` for each target it builds.
This format can be found in the `kani_metadata` crate, shared by `kani-compiler` and `kani-driver`.
This is the starting point for assess.

Assess obtains this metadata by essentially running a `cargo kani`:

1. With `--all-features` turned on
2. With `unwind` always set to `1`
3. In test mode, i.e. `--tests`
4. With test-case reachability mode. Normally Kani looks for proof harnesses and builds only those. Here we switch to building only the test harnesses instead.

Assess starts by getting all the information from these metadata files.
This is enough by itself to construct a rudimentary "unsupported features" table.
But assess also uses it to discover all the test cases, and (instead of running proof harnesses) it then runs all these test harnesses under Kani.

Assess produces a second metadata format, called (unsurprisingly) "assess metadata".
(Found in `kani-driver` under [`src/assess/metadata.rs`](https://github.com/model-checking/kani/blob/main/kani-driver/src/assess/metadata.rs).)
This format records the results of what assess does.

This metadata can be written to a json file by providing `--emit-metadata <file>` to `assess`.
Likewise, `scan` can be told to write out this data with the same option.

Assess metadata is an aggregatable format.
It does not apply to just one package, as assess can work on a workspace of packages.
Likewise, `scan` uses and produces the exact same format, across multiple workspaces.

So far all assess metadata comes in the form of "tables" which are built with `TableBuilder<T: TableRow>`.
This is documented further in [`src/assess/table_builder.rs`](https://github.com/model-checking/kani/blob/main/kani-driver/src/assess/table_builder.rs).

## Using Assess on the top-100 crates

There is a script in the Kani repo for this purpose.

This will clone the top-100 crates to `/tmp/top-100-experiment` and run assess scan on them:

```text
./scripts/exps/assess-scan-on-repos.sh
```

If you'd like to preseve the results, you can direct scan to use a different directory with an environment variable:

```text
ASSESS_SCAN="~/top-100-experiment" ./scripts/exps/assess-scan-on-repos.sh
```

To re-run the experiment, it suffices to be in the experiment directory:

```text
cd ~/top-100-experiment && ~/kani/scripts/exps/assess-scan-on-repos.sh
```
20 changes: 18 additions & 2 deletions kani-driver/src/args.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ pub struct CargoKaniArgs {
#[derive(Debug, Parser)]
pub enum CargoKaniSubcommand {
#[command(hide = true)]
Assess,
Assess(crate::assess::AssessArgs),
}

// Common arguments for invoking Kani. This gets put into KaniContext, whereas
Expand Down Expand Up @@ -396,7 +396,7 @@ impl CargoKaniArgs {
pub fn validate(&self) {
self.common_opts.validate::<Self>();
// --assess requires --enable-unstable, but the subcommand needs manual checking
if (matches!(self.command, Some(CargoKaniSubcommand::Assess)) || self.common_opts.assess)
if (matches!(self.command, Some(CargoKaniSubcommand::Assess(_))) || self.common_opts.assess)
&& !self.common_opts.enable_unstable
{
Self::command()
Expand Down Expand Up @@ -519,6 +519,22 @@ mod tests {
StandaloneArgs::parse_from(vec!["kani", "file.rs", "--enable-unstable", "--cbmc-args"]);
// no assertion: the above might fail if it fails to allow 0 args to cbmc-args
}
#[test]
fn check_multiple_packages() {
// accepts repeated:
let a = CargoKaniArgs::parse_from(vec!["cargo-kani", "-p", "a", "-p", "b"]);
assert_eq!(a.common_opts.package, vec!["a".to_owned(), "b".to_owned()]);
let b = CargoKaniArgs::try_parse_from(vec![
"cargo-kani",
"-p",
"a", // no -p
"b",
]);
// BUG: should not accept sequential:
// Related: https://github.com/model-checking/kani/issues/2025
// Currently asserting this backwards from how it should be!
assert!(!b.is_err());
}

fn check(args: &str, require_unstable: bool, pred: fn(StandaloneArgs) -> bool) {
let mut res = parse_unstable_disabled(&args);
Expand Down
42 changes: 42 additions & 0 deletions kani-driver/src/assess/args.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
// Copyright Kani Contributors
// SPDX-License-Identifier: Apache-2.0 OR MIT

//! This module contains arguments specific to the `cargo kani assess` subcommand.

use std::path::PathBuf;

use clap::Parser;

/// `cargo kani assess` subcommand arguments
#[derive(Default, Debug, Parser)]
pub struct AssessArgs {
#[command(subcommand)]
pub command: Option<AssessSubcommand>,

/// Write Assess metadata (unstable file format) to the given file
#[arg(long, hide = true)]
pub emit_metadata: Option<PathBuf>,
}

/// `cargo kani assess` takes optional subcommands to request specialized behavior
#[derive(Debug, Parser)]
pub enum AssessSubcommand {
/// Run assess on a directory containing multiple cargo projects, and aggregate the results
Scan(ScanArgs),
}

/// `cargo kani assess scan` subcommand arguments
#[derive(Debug, Parser)]
pub struct ScanArgs {
/// Don't run assess on found packages, just re-analyze the results from a previous run
#[arg(long, hide = true)]
pub existing_only: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear on this option. Is it just re-computing the tables from the metadata without actually running assess? If so, perhaps consider renaming to "analyze_results_only" or something along those lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bikeshedded on this name a bit already, I don't expect this is one a customer will want to use (hence hide = true). It's mostly useful while debugging the package-finding code, or table-rendering code.

I settled on 'existing' because I wanted to emphasize that something wouldn't be re-run here, while analyze is... what assess does already really. A good short name is hard to find.


/// Only consider the packages named in the given file
#[arg(long, hide = true)]
pub filter_packages_file: Option<PathBuf>,

/// Write Assess-Scan metadata (unstable file format) to the given file
#[arg(long, hide = true)]
pub emit_metadata: Option<PathBuf>,
}
92 changes: 92 additions & 0 deletions kani-driver/src/assess/metadata.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
// Copyright Kani Contributors
// SPDX-License-Identifier: Apache-2.0 OR MIT

//! Assess metadata. This format is shared between 'assess' and 'assess scan'.
//! Assess produces this for one workspace, scan for several.
//! It is not a stable file format: it is meant for assess to directly communicate
//! from assess subprocesses to a parent scan process.
//! We can build other tools that make use of it, but they need to be built for or distributed
//! with the specific version of Kani.

use std::fs::File;
use std::io::BufWriter;
use std::path::Path;

use anyhow::Result;
use serde::{Deserialize, Serialize};

use super::table_builder::TableBuilder;
use super::table_failure_reasons::FailureReasonsTableRow;
use super::table_promising_tests::PromisingTestsTableRow;
use super::table_unsupported_features::UnsupportedFeaturesTableRow;
use super::AssessArgs;

/// The structure of `.kani-assess-metadata.json` files. This is a the structure for both
/// assess (standard) and scan. It it meant to hold results for one or more packages.
///
/// This is not a stable interface.
#[derive(Serialize, Deserialize)]
pub struct AssessMetadata {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add the name of the crate and a list of dependencies

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assess metadata is across multiple crates, so not really applicable here. Something like that might be nice for kani-metadata, or we could think about what we actually want to aggregate and present here somehow...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we at least add the package name? This could be useful when reporting the successful/failed packages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intend, in a follow-up PR, to add another table here that might help with that.

But again, metadata is across multiple packages, so there isn't one name to put here. I'll add a better doc comment on this type.

/// Report on the presence of `codegen_unimplemented` in the analyzed packages
pub unsupported_features: TableBuilder<UnsupportedFeaturesTableRow>,
/// Report of the reasons why tests could not be analyzed by Kani
pub failure_reasons: TableBuilder<FailureReasonsTableRow>,
/// Report on the tests that Kani can successfully analyze
pub promising_tests: TableBuilder<PromisingTestsTableRow>,
}

/// If given the argument to so do, write the assess metadata to the target file.
pub(crate) fn write_metadata(args: &AssessArgs, build: AssessMetadata) -> Result<()> {
if let Some(path) = &args.emit_metadata {
let out_file = File::create(&path)?;
let writer = BufWriter::new(out_file);
// use pretty for now to keep things readable and debuggable, but this should change eventually
serde_json::to_writer_pretty(writer, &build)?;
}
Ok(())
}

/// Write metadata with unsupported features only, supporting the `--only-codegen` option.
pub(crate) fn write_partial_metadata(
args: &AssessArgs,
unsupported_features: TableBuilder<UnsupportedFeaturesTableRow>,
) -> Result<()> {
write_metadata(
args,
AssessMetadata {
unsupported_features,
failure_reasons: TableBuilder::new(),
promising_tests: TableBuilder::new(),
},
)
}

/// Read assess metadata from a file.
pub(crate) fn read_metadata(path: &Path) -> Result<AssessMetadata> {
// this function already exists, but a proxy here helps find it :)
crate::metadata::from_json(path)
}

/// Given assess metadata from several sources, aggregate them into a single strcture.
///
/// This is not a complicated operation, because the assess metadata structure is meant
/// to accomodate multiple packages already, so we're just "putting it together".
pub(crate) fn aggregate_metadata(metas: Vec<AssessMetadata>) -> AssessMetadata {
let mut result = AssessMetadata {
unsupported_features: TableBuilder::new(),
failure_reasons: TableBuilder::new(),
promising_tests: TableBuilder::new(),
};
for meta in metas {
for item in meta.unsupported_features.build() {
result.unsupported_features.add(item.clone());
}
for item in meta.failure_reasons.build() {
result.failure_reasons.add(item.clone());
}
for item in meta.promising_tests.build() {
result.promising_tests.add(item.clone());
}
}
result
}
Loading