Introduce `cargo kani assess scan` for analyzing multiple projects #2029

celinval · 2023-01-04T01:45:59Z

Yeah, I'm not convinced that's the case. I think test is a great starting point, but I don't think we should deprioritize things that aren't covered by tests.

We've talked about automatic implementation of harnesses before that wouldn't rely on tests. E.g.: Function that all its inputs implement the arbitrary type.

Yeah, I'm not convinced that's the case. I think test is a great starting point, but I don't think we should deprioritize things that aren't covered by tests.

It is how assess works, and I thought I wasn't saying "this is how things ought to be" but "this is the operating hypothesis for assess" (see for instance that I explicitly wrote "may not be a bug, but a feature")

I guess I can clarify that this is the operating hypothesis for assess, not a team consensus. We want to look and see.

zhassan-aws · 2022-12-29T23:55:04Z

Not clear on this option. Is it just re-computing the tables from the metadata without actually running assess? If so, perhaps consider renaming to "analyze_results_only" or something along those lines.

I bikeshedded on this name a bit already, I don't expect this is one a customer will want to use (hence hide = true). It's mostly useful while debugging the package-finding code, or table-rendering code.

I settled on 'existing' because I wanted to emphasize that something wouldn't be re-run here, while analyze is... what assess does already really. A good short name is hard to find.

celinval · 2022-12-22T17:49:30Z

maybe add the name of the crate and a list of dependencies

Assess metadata is across multiple crates, so not really applicable here. Something like that might be nice for kani-metadata, or we could think about what we actually want to aggregate and present here somehow...

Should we at least add the package name? This could be useful when reporting the successful/failed packages.

I intend, in a follow-up PR, to add another table here that might help with that.

But again, metadata is across multiple packages, so there isn't one name to put here. I'll add a better doc comment on this type.

-Original file line number
+Diff line change
@@ Expand Up / @@ -65,4 +65,5 @@ exclude = [ @@
       "tests/perf",
       "tests/cargo-ui",
       "tests/slow",
+      "tests/assess-scan-test-scaffold",
     ]

-Original file line number
+Diff line change
@@ Expand Up / @@ -23,6 +23,7 @@ @@
       - [Working with CBMC](./cbmc-hacks.md)
       - [Working with `rustc`](./rustc-hacks.md)
       - [Command cheat sheets](./cheat-sheets.md)
+      - [cargo kani assess](./dev-assess.md)
       - [Testing](./testing.md)
         - [Regression testing](./regression-testing.md)
         - [Book runner](./bookrunner.md)
@@ Expand Down @@

-Original file line number
+Diff line change
@@ -0,0 +1,185 @@
+    # `cargo kani assess`
+    Assess is an experimental new feature to gather data about Rust crates, to aid the start of proof writing.
+    In the short-term, assess collects and dumps tables of data that may help _Kani developers_ understand what's needed to begin writing proofs for another project.
+    For instance, assess may help answer questions like:
+. Does Kani successfully build all of the crates involved in this project? If not, why not?
+. Does Kani support all the Rust language features necessary to do verification with this project? If not, which are most important?
+    In the long-term, assess will become a user-facing feature, and help _Kani users_ get started writing proofs.
+    We expect that users will have the same questions as above, but in the long term, hopefully the answers to those trend towards an uninteresting "yes."
+    So the new questions might be:
+. Is this project ready for verification? Projects need to be reasonably well-tested first.
+    Our operating hypothesis is that code currently covered by unit tests is the code that could become covered by proofs.
+. How much of given project (consisting of multiple packages or workspaces) or which of the user's projects might be verifiable?
+    If a user wants to start trying Kani, but they have the choice of several different packages where they might try, we can help find the package with the lowest hanging fruit.
+. Given a package, where in that package's code should the user look, in order to write the first (or next) proof?
+    These long-term goals are only "hinted at" with the present experimental version of assess.
+    Currently, we only get as far as finding out which tests successfully verify (concretely) with Kani.
+    This might indicate tests that could be generalized and converted into proofs, but we currently don't do anything to group, rank, or otherwise heuristically prioritize what might be most "interesting."
+    (For instance, we'd like to eventually compute coverage information and use that to help rank the results.)
+    As a consequence, the output of the tool is very hard to interpret, and likely not (yet!) helpful to new or potential Kani users.
+    ## Using Assess
+    To assess a package, run:
+    ```text
+    cargo kani --enable-unstable assess
+    ```
+    As a temporary hack (arguments shouldn't work like this), to assess a single cargo workspace, run:
+    ```text
+    cargo kani --enable-unstable --workspace assess
+    ```
+    To scan a collection of workspaces or packages that are not part of a shared workspace, run:
+    ```text
+    cargo kani --enable-unstable assess scan
+    ```
+    The only difference between 'scan' and 'regular' assess is how the packages built are located.
+    All versions of assess produce the same output and metrics.
+    Assess will normally build just like `cargo kani` or `cargo build`, whereas `scan` will find all cargo packages beneath the current directory, even in unrelated workspaces.
+    Thus, 'scan' may be helpful in the case where the user has a choice of packages and is looking for the easiest to get started with (in addition to the Kani developer use-case, of aggregating statistics across many packages).
+    (Tip: Assess may need to run for awhile, so try using `screen`, `tmux` or `nohup` to avoid terminating the process if, for example, an ssh connection breaks.
+    Some tests can also consume huge amounts of ram when run through Kani, so you may wish to use `ulimit -v 6000000` to prevent any processes from using more than 6GB.
+    You can also limit the number of concurrent tests that will be run by providing e.g. `-j 4`, currently as a prepended argument, like `--enable-unstable` or `--workspace` in the examples above.)
+    ## What assess does
+    Assess builds all the packages requested in "test mode" (i.e. `--tests`), and runs all the same tests that `cargo test` would, except through Kani.
+    This gives end-to-end assurance we're able to actually build and run code from these packages, skipping nothing of what the verification process would need, except that the harnesses don't have any nondeterminism (`kani::any()`) and consequently don't "prove" much.
+    The interesting signal comes from what tests cannot be analyzed by Kani due to unsupported features, performance problems, crash bugs, or other issues that get in the way.
+    Currently, assess forces termination by using `unwind(1)` on all tests, so many tests will fail with unwinding assertions.
+    ## Current Assess Results
+    Assess produces a few tables of output (both visually in the terminal, and in a more detailed json format) so far:
+    ### Unsupported features
+    ```text
+    ======================================================
+     Unsupported feature           |   Crates | Instances
+                                   | impacted |    of use
+    -------------------------------+----------+-----------
+     caller_location               |       71 |       239
+     simd_bitmask                  |       39 |       160
+    ...
+    ```
+    The unsupported features table aggregates information about features that Kani does not yet support.
+    These correspond to uses of `codegen_unimplemented` in the `kani-compiler`, and appear as warnings during compilation.
+    Unimplemented features are not necessarily actually hit by (dynamically) reachable code, so an immediate future improvement on this table would be to count the features *actually hit* by failing test cases, instead of just those features reported as existing in code by the compiler.
+    In other words, the current unsupported features table is **not** what we want to see, in order to _perfectly_ prioritize implementing these features, because we may be counting features that no proof would ever hit.
+    A perfect signal here isn't possible: there may be code that looks statically reachable, but is never dynamically reachable, and we can't tell.
+    But we can use test coverage as an approximation: well-tested code will hopefully cover most of the dynamically reachable code.
+    The operating hypothesis of assess is that code covered by tests is code that could be covered by proof, and so measuring unsupported features by those actually hit by a test should provide a better "signal" about priorities.
+    Implicitly deprioritizing unsupported features because they aren't covered by tests may not be a bug, but a feature: we may simply not want to prove anything about that code, if it hasn't been tested first, and so adding support for that feature may not be important.
+    A few notes on terminology:
+. "Crates impacted" here means "packages in the current workspace (or scan) where the building of that package (and all of its dependencies) ultimately resulted in this warning."
+    For example, if only assessing a single package (not a workspace) this could only be `1` in this column, regardless of the number of dependencies.
+. "Instances of use" likewise means "total instances found while compiling this package's tests and all the (reachable) code in its dependencies."
+. These counts are influenced by (static) reachability: if code is not potentially reachable from a test somehow, it will not be built and will not be counted.
+    ### Test failure reasons
+    ```text
+    ================================================
+     Reason for failure           | Number of tests
+    ------------------------------+-----------------
+     unwind                       |              61
+     none (success)               |               6
+     assertion + overflow         |               2
+    ...
+    ```
+    The test failure reasons table indicates why, when assess ran a test through Kani, it failed to verify.
+    Notably:
+. Because we force termination with `unwind(1)`, we expect `unwind` to rank highly.
+. We do report number of tests succeeding on this table, to aid understanding how well things went overall.
+. The reported reason is the "property class" of the CBMC property that failed. So `assertion` means an ordinary `assert!()` was hit (or something else with this property class).
+. When multiple properties fail, they are aggregated with `+`, such as `assertion + overflow`.
+. Currently this table does not properly account for `should_fail` tests, so `assertion` may actually be "success": the test should hit an assertion and did.
+    ### Promising test cases
+    ```text
+    =============================================================================
+     Candidate for proof harness                           | Location
+    -------------------------------------------------------+---------------------
+     float::tests::f64_edge_cases                          | src/float.rs:226
+     float::tests::f32_edge_cases                          | src/float.rs:184
+     integer::tests::test_integers                         | src/integer.rs:171
+    ```
+    This table is the most rudimentary so far, but is the core of what long-term assess will help accomplish.
+    Currently, this table just presents (with paths displayed in a clickable manner) the tests that successfully "verify" with Kani.
+    These might be good candidates for turning into proof harnesses.
+    This list is presently unordered; the next step for improving it would be to find even a rudimentary way of ranking these test cases (e.g. perhaps by code coverage).
+    ## How Assess Works
+    `kani-compiler` emits `*.kani-metadata.json` for each target it builds.
+    This format can be found in the `kani_metadata` crate, shared by `kani-compiler` and `kani-driver`.
+    This is the starting point for assess.
+    Assess obtains this metadata by essentially running a `cargo kani`:
+. With `--all-features` turned on
+. With `unwind` always set to `1`
+. In test mode, i.e. `--tests`
+. With test-case reachability mode. Normally Kani looks for proof harnesses and builds only those. Here we switch to building only the test harnesses instead.
+    Assess starts by getting all the information from these metadata files.
+    This is enough by itself to construct a rudimentary "unsupported features" table.
+    But assess also uses it to discover all the test cases, and (instead of running proof harnesses) it then runs all these test harnesses under Kani.
+    Assess produces a second metadata format, called (unsurprisingly) "assess metadata".
+    (Found in `kani-driver` under [`src/assess/metadata.rs`](https://github.com/model-checking/kani/blob/main/kani-driver/src/assess/metadata.rs).)
+    This format records the results of what assess does.
+    This metadata can be written to a json file by providing `--emit-metadata <file>` to `assess`.
+    Likewise, `scan` can be told to write out this data with the same option.
+    Assess metadata is an aggregatable format.
+    It does not apply to just one package, as assess can work on a workspace of packages.
+    Likewise, `scan` uses and produces the exact same format, across multiple workspaces.
+    So far all assess metadata comes in the form of "tables" which are built with `TableBuilder<T: TableRow>`.
+    This is documented further in [`src/assess/table_builder.rs`](https://github.com/model-checking/kani/blob/main/kani-driver/src/assess/table_builder.rs).
+    ## Using Assess on the top-100 crates
+    There is a script in the Kani repo for this purpose.
+    This will clone the top-100 crates to `/tmp/top-100-experiment` and run assess scan on them:
+    ```text
+    ./scripts/exps/assess-scan-on-repos.sh
+    ```
+    If you'd like to preseve the results, you can direct scan to use a different directory with an environment variable:
+    ```text
+    ASSESS_SCAN="~/top-100-experiment" ./scripts/exps/assess-scan-on-repos.sh
+    ```
+    To re-run the experiment, it suffices to be in the experiment directory:
+    ```text
+    cd ~/top-100-experiment && ~/kani/scripts/exps/assess-scan-on-repos.sh
+    ```

-Original file line number
+Diff line change
@@ Expand Up / @@ -45,7 +45,7 @@ pub struct CargoKaniArgs { @@
     #[derive(Debug, Parser)]
     pub enum CargoKaniSubcommand {
         #[command(hide = true)]
-        Assess,
+        Assess(crate::assess::AssessArgs),
     }
     // Common arguments for invoking Kani. This gets put into KaniContext, whereas
@@ Expand Down Expand Up / @@ -396,7 +396,7 @@ impl CargoKaniArgs { @@
         pub fn validate(&self) {
             self.common_opts.validate::<Self>();
             // --assess requires --enable-unstable, but the subcommand needs manual checking
-            if (matches!(self.command, Some(CargoKaniSubcommand::Assess)) || self.common_opts.assess)
+            if (matches!(self.command, Some(CargoKaniSubcommand::Assess(_))) || self.common_opts.assess)
                 && !self.common_opts.enable_unstable
             {
                 Self::command()
@@ Expand Down Expand Up / @@ -519,6 +519,22 @@ mod tests { @@
                 StandaloneArgs::parse_from(vec!["kani", "file.rs", "--enable-unstable", "--cbmc-args"]);
             // no assertion: the above might fail if it fails to allow 0 args to cbmc-args
         }
+        #[test]
+        fn check_multiple_packages() {
+            // accepts repeated:
+            let a = CargoKaniArgs::parse_from(vec!["cargo-kani", "-p", "a", "-p", "b"]);
+            assert_eq!(a.common_opts.package, vec!["a".to_owned(), "b".to_owned()]);
+            let b = CargoKaniArgs::try_parse_from(vec![
+                "cargo-kani",
+                "-p",
+                "a", // no -p
+                "b",
+            ]);
+            // BUG: should not accept sequential:
+            // Related: https://github.com/model-checking/kani/issues/2025
+            // Currently asserting this backwards from how it should be!
+            assert!(!b.is_err());
+        }
         fn check(args: &str, require_unstable: bool, pred: fn(StandaloneArgs) -> bool) {
             let mut res = parse_unstable_disabled(&args);
@@ Expand Down @@

-Original file line number
+Diff line change
@@ -0,0 +1,42 @@
+    // Copyright Kani Contributors
+    // SPDX-License-Identifier: Apache-2.0 OR MIT
+    //! This module contains arguments specific to the `cargo kani assess` subcommand.
+    use std::path::PathBuf;
+    use clap::Parser;
+    /// `cargo kani assess` subcommand arguments
+    #[derive(Default, Debug, Parser)]
+    pub struct AssessArgs {
+        #[command(subcommand)]
+        pub command: Option<AssessSubcommand>,
+        /// Write Assess metadata (unstable file format) to the given file
+        #[arg(long, hide = true)]
+        pub emit_metadata: Option<PathBuf>,
+    }
+    /// `cargo kani assess` takes optional subcommands to request specialized behavior
+    #[derive(Debug, Parser)]
+    pub enum AssessSubcommand {
+        /// Run assess on a directory containing multiple cargo projects, and aggregate the results
+        Scan(ScanArgs),
+    }
+    /// `cargo kani assess scan` subcommand arguments
+    #[derive(Debug, Parser)]
+    pub struct ScanArgs {
+        /// Don't run assess on found packages, just re-analyze the results from a previous run
+        #[arg(long, hide = true)]
+        pub existing_only: bool,
+        /// Only consider the packages named in the given file
+        #[arg(long, hide = true)]
+        pub filter_packages_file: Option<PathBuf>,
+        /// Write Assess-Scan metadata (unstable file format) to the given file
+        #[arg(long, hide = true)]
+        pub emit_metadata: Option<PathBuf>,
+    }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce `cargo kani assess scan` for analyzing multiple projects #2029

Uh oh!

Diff view

Diff view

There are no files selected for viewing

celinval Jan 4, 2023

Uh oh!

tedinski Jan 4, 2023

Uh oh!

Uh oh!

zhassan-aws Dec 29, 2022

Uh oh!

tedinski Dec 30, 2022

Uh oh!

celinval Dec 22, 2022

Uh oh!

tedinski Dec 28, 2022

Uh oh!

zhassan-aws Dec 30, 2022

Uh oh!

tedinski Dec 30, 2022

Uh oh!

Uh oh!

Uh oh!

-Original file line number
+Diff line change
@@ -0,0 +1,92 @@
+    // Copyright Kani Contributors
+    // SPDX-License-Identifier: Apache-2.0 OR MIT
+    //! Assess metadata. This format is shared between 'assess' and 'assess scan'.
+    //! Assess produces this for one workspace, scan for several.
+    //! It is not a stable file format: it is meant for assess to directly communicate
+    //! from assess subprocesses to a parent scan process.
+    //! We can build other tools that make use of it, but they need to be built for or distributed
+    //! with the specific version of Kani.
+    use std::fs::File;
+    use std::io::BufWriter;
+    use std::path::Path;
+    use anyhow::Result;
+    use serde::{Deserialize, Serialize};
+    use super::table_builder::TableBuilder;
+    use super::table_failure_reasons::FailureReasonsTableRow;
+    use super::table_promising_tests::PromisingTestsTableRow;
+    use super::table_unsupported_features::UnsupportedFeaturesTableRow;
+    use super::AssessArgs;
+    /// The structure of `.kani-assess-metadata.json` files. This is a the structure for both
+    /// assess (standard) and scan. It it meant to hold results for one or more packages.
+    ///
+    /// This is not a stable interface.
+    #[derive(Serialize, Deserialize)]
+    pub struct AssessMetadata {
+        /// Report on the presence of `codegen_unimplemented` in the analyzed packages
+        pub unsupported_features: TableBuilder<UnsupportedFeaturesTableRow>,
+        /// Report of the reasons why tests could not be analyzed by Kani
+        pub failure_reasons: TableBuilder<FailureReasonsTableRow>,
+        /// Report on the tests that Kani can successfully analyze
+        pub promising_tests: TableBuilder<PromisingTestsTableRow>,
+    }
+    /// If given the argument to so do, write the assess metadata to the target file.
+    pub(crate) fn write_metadata(args: &AssessArgs, build: AssessMetadata) -> Result<()> {
+        if let Some(path) = &args.emit_metadata {
+            let out_file = File::create(&path)?;
+            let writer = BufWriter::new(out_file);
+            // use pretty for now to keep things readable and debuggable, but this should change eventually
+            serde_json::to_writer_pretty(writer, &build)?;
+        }
+        Ok(())
+    }
+    /// Write metadata with unsupported features only, supporting the `--only-codegen` option.
+    pub(crate) fn write_partial_metadata(
+        args: &AssessArgs,
+        unsupported_features: TableBuilder<UnsupportedFeaturesTableRow>,
+    ) -> Result<()> {
+        write_metadata(
+            args,
+            AssessMetadata {
+                unsupported_features,
+                failure_reasons: TableBuilder::new(),
+                promising_tests: TableBuilder::new(),
+            },
+        )
+    }
+    /// Read assess metadata from a file.
+    pub(crate) fn read_metadata(path: &Path) -> Result<AssessMetadata> {
+        // this function already exists, but a proxy here helps find it :)
+        crate::metadata::from_json(path)
+    }
+    /// Given assess metadata from several sources, aggregate them into a single strcture.
+    ///
+    /// This is not a complicated operation, because the assess metadata structure is meant
+    /// to accomodate multiple packages already, so we're just "putting it together".
+    pub(crate) fn aggregate_metadata(metas: Vec<AssessMetadata>) -> AssessMetadata {
+        let mut result = AssessMetadata {
+            unsupported_features: TableBuilder::new(),
+            failure_reasons: TableBuilder::new(),
+            promising_tests: TableBuilder::new(),
+        };
+        for meta in metas {
+            for item in meta.unsupported_features.build() {
+                result.unsupported_features.add(item.clone());
+            }
+            for item in meta.failure_reasons.build() {
+                result.failure_reasons.add(item.clone());
+            }
+            for item in meta.promising_tests.build() {
+                result.promising_tests.add(item.clone());
+            }
+        }
+        result
+    }

Introduce cargo kani assess scan for analyzing multiple projects #2029

Uh oh!

Introduce cargo kani assess scan for analyzing multiple projects #2029

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Introduce `cargo kani assess scan` for analyzing multiple projects #2029

Introduce `cargo kani assess scan` for analyzing multiple projects #2029