Consider using the standard benchmark harness instead of criterion #264

plafer · 2024-03-23T19:35:58Z

We currently use criterion for benchmarking. The main limitation of criterion is that only a crate's public API can be tested.

In contrast, the standard benchmark harness (provided by the unstable test crate) allows you to mark any function with #[bench] (as you would a #[test]) - allowing you to benchmark internal functions. I believe this would be so valuable that it would outweigh the cons of using test. While working on #247, we were interested in knowing the specific timing of internal functions (e.g. here). I ended up manually inserting timing statements in the code, running the right test, and inspecting the output. IMO, this shows how limiting using criterion really is.

Note that criterion has a feature to allow benchmarking internal functions similar to #[bench], which depends on the custom_test_framework nightly feature . However, the tracking issue for custom_test_framework has been closed due to inactivity. So I personally would stay away from it if/until that changes.

Pros of using `test` over `criterion`

Ability to benchmark internal functions

Cons of using `test` over `criterion`

Less sophisticated reporting
- e.g. criterion gives you the performance increase over the last run of the benchmark
No generated html reports
Probably less reliable
- e.g. criterion pre-populates the cache before running a benchmark (not sure whether or not test does that too, but at least it's not advertised in the benchmark output)
test is still unstable

Summary

Although there are more cons than pros, I believe the ability to benchmark internal functions far outweighs any of the cons (as explained earlier). We can deal with the dependency on nightly by adding a "benchmarking" feature, which controls the use of nightly.

#![cfg_attr(feature = "benchmarking", feature(test))]

#[cfg(feature = "benchmarking")]
extern crate test;

fn internal_function() { ... }

#[cfg(test)]
#[cfg(feature = "benchmarking")]
mod bench {
    use super::*;
    use test::Bencher;

    #[bench]
    fn bench_internal_function(b: &mut Bencher) {
        b.iter(|| internal_function()));
    }
}

$ cargo +nightly bench --features=benchmarking bench_internal_function

The text was updated successfully, but these errors were encountered:

one-d-wide · 2024-04-06T20:00:16Z

Hi. I don't familiar with winterfell at all, but I was having the same problem and found this issue on google.

I don't know whether my (rather wacky) solution would help there, but I managed to run a criterion benchmark with stable rust on a private attributes concealing it as a default #[test] like that:

fn internal_function() { ... }

#[cfg(test)]
pub mod tests {
    use super::*;
    use criterion::{black_box, Criterion};
    use std::time::Duration;

    fn benchmarks(c: &mut Criterion) {
        c.bench_function("bench_internal_function", |b| {
            b.iter(|| internal_function())
        });
    }

    #[test]
    fn bench() {
        // `cargo test --profile bench -j1 -- --nocapture bench -- <benchmark_filter>
        // This workaround allows benchmarking private interfaces with `criterion` in stable rust.

        // Collect cli arguments
        let args: Vec<String> = std::env::args().collect();

        // Interpret sequence of args `[ "...bench", "--", "[filter]" ]` as a trigger and extract `filter`
        let filter = args
            .windows(3)
            .filter(|p| p.len() >= 2 && p[0].ends_with("bench") && p[1] == "--")
            .map(|s| s.get(2).unwrap_or(&"".to_string()).clone())
            .next();

        // Return successfully if trigger wasn't found 
        let filter = match filter {
            None => return,
            Some(f) => f,
        };

        let profile_time = args
            .windows(2)
            .filter(|p| p.len() == 2 && p[0] == "--profile-time")
            .map(|s| s[1].as_str())
            .next();

        // TODO: Adopt `Criterion::configure_from` when it lands upstream
        let mut c = Criterion::default()
            .with_output_color(true)
            .without_plots()
            .with_filter(&filter)
            .warm_up_time(Duration::from_secs_f32(0.5))
            .measurement_time(Duration::from_secs_f32(0.5))
            .profile_time(profile_time.map(|s| Duration::from_secs_f32(s.parse().unwrap())));

        benchmarks(&mut c);

        Criterion::default().final_summary();
        // TODO: Move this function to a macro
    }
}

one-d-wide mentioned this issue Apr 7, 2024

Add Criterion::configure_from bheisler/criterion.rs#771

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider using the standard benchmark harness instead of criterion #264

Consider using the standard benchmark harness instead of criterion #264

plafer commented Mar 23, 2024 •

edited

Loading

one-d-wide commented Apr 6, 2024 •

edited

Loading

Consider using the standard benchmark harness instead of criterion #264

Consider using the standard benchmark harness instead of criterion #264

Comments

plafer commented Mar 23, 2024 • edited Loading

Pros of using test over criterion

Cons of using test over criterion

Summary

one-d-wide commented Apr 6, 2024 • edited Loading

plafer commented Mar 23, 2024 •

edited

Loading

Pros of using `test` over `criterion`

Cons of using `test` over `criterion`

one-d-wide commented Apr 6, 2024 •

edited

Loading