Alternative measurements other than wall time #130

quark-zju · 2018-02-27T03:10:06Z

It seems criterion.rs has hard-coded measurement to be wall time. While I have been thinking about measuring CPU time (probably more stable), I/O bytes (doable under Linux by reading /proc), or memory usage. If the "measurement" function can be replaced, would that be a welcomed PR?

bheisler · 2018-02-27T03:41:22Z

It would be welcome, yes, but that has the potential to be a major project. That's why I've been putting it off for a while now. I'd be happy to help if you're volunteering to tackle it, though.

I haven't fully figured out yet how I was planning to do this. There's a lot of things Criterion.rs currently does that don't make sense for measuring things like IO and memory usage. For example, the measurements of IO and memory usage per iteration are a lot more likely to be constants rather than distributions, so bootstrapping and reporting various statistics probably doesn't make sense. I think it would make sense to treat timing measurements separately from auxiliary measurements like IO and memory. The HTML reports and command-line output are also designed for timing information.

So, for IO and memory I think it would make sense to (optionally) measure those in addition to the timing, not instead of it. You'd have to change the timing loops in the Bencher struct, as well as add space in that struct to store the results. Then you'd have to expose that information up through the code and provide it to the reports (HTML and CLI) which could then display it appropriately. These auxiliary measurements should be done outside the timing measurement to avoid changing the measured time.

As for changing how the timing measurement itself works, that's tricky. Criterion.rs is designed to minimize the impact that the timing measurement itself has on the measured values, but it would still be preferable to ensure that there's no additional overhead added by this change and that could be difficult. The obvious way would be to make the Bencher struct generic over the timing measurement function, which would be a breaking change and force an 0.3 release. I'd prefer to avoid that, if possible.

What do you think?

quark-zju · 2018-03-25T20:38:41Z

Sorry for the late reply. I have been distracted by other stuffs at work lately.

Thanks for the detailed explanation!

On bootstrapping and distribution - I agree the current logic is designed for measuring wall time. But it should be fine for other measurements - it won't produce wrong numbers but just waste some extra time if unchanged.

I think the main thing that is hard-coded about "time" and could affect the user experience of other measurements is the display of units like "ms", "us","ns". That sounds doable to be replaced by a user-provided value.

I also agree that memory/IO measurements could be tricky because they're platform-dependent. They are ideally implemented as separate addons, instead of core criterion feature.

If the code uses some complex caching, or non deterministic behavior like "try read data as much as possible, but stop after 100ms", it'd be non-constant. So I think it's fine to just use the existing code since it should also work in the constant case. Bootstrapping might be unnecessary, but it does not break the meaningfulness of the alternative measurement.

It does make sense to have other values measured with timing. I think a more general purposed setting is to allow users to choose a list of things to measure. Like [CPU time, and a customized memory measurement function], instead of enforced wall time tracking.

I'm still a bit busy with work stuff right now and might need some paperwork done to be able to actually contribute code here. I'll get back to you if I have some more detailed ideas after understanding the existing codebase.

LutzCle · 2018-10-25T13:45:34Z

Another use-case (that I'm currently facing) is measuring hardware performance counters and GPU event timers. It would be really helpful if Criterion could provide support for user-specified timers/counters.

Would it make sense for Bencher to take one start and one stop closure, with some user-defined state passed from start to stop? That should enable all of the use-cases mentioned in this thread, and provide enough flexibility for other use-cases as well.

Providing default closures for regular CPU timers could deal with the backwards-compatibility. Perhaps the stop closure could return a value for statistics and plotting?

hdevalence · 2018-11-27T18:39:48Z

I'd like to be able to have Criterion report cycle counts using the RDTSC/RDTSCP instructions on x64. Currently it looks like Criterion is using std::time::Instant, which ultimately calls gettime. It would be great to be able to report cycle counts directly, especially for microbenchmarks. There's already a Rust wrapper for RDTSC , so it seems like the hardest part would be connecting everything up?

Would the closure approach suggested by @LutzCle work? I would be happy to work on it if so.

bheisler · 2018-11-28T23:26:34Z

Howdy folks. Thanks for your patience - I've been busy with work and another project for a while now, but I hope I'll be able to put that project to bed soon and make some time for Criterion.rs again.

In the meantime though, I've been thinking about this problem. I think I have a plan to support custom measurements and custom reports in a clean way. I think I could create a Measurement trait with a handful of associated types for the intermediate state, the analysis, and the report, and then the benchmark programmer (or external crates) could define their own implementations. I'm on mobile, so I hope you'll forgive me if I don't provide a code example of what I mean. If anyone is really curious I can write it up in more detail later.

Implementing this would have to be done in a major breaking change release unless I do something awkward to avoid that. Fortunately, I already intend to do exactly that sort of release in 0.3.0 - the move to a custom test framework (ie. #[criterion]) will require users to migrate anyway, so I might as well make all of the breaking changes at once.

hdevalence · 2018-12-04T08:23:14Z

That would be great, I'd be happy to help if it would be useful.

I made a hacky change in my fork (https://github.com/hdevalence/criterion.rs/commit/0083dbd5e2aa06d40b812600961073a8b553fc38) that replaces Instant::now() with RDTSC/P (via the tsc_timer crate). This obviously isn't a good or complete solution but it seems to work well enough for me for the moment, so it might be useful for someone else in the meantime.

bheisler · 2019-06-18T00:40:14Z

Hey, folks. It's been a while, but I've finally gotten around to implementing my preliminary support for custom measurements. You can check it out on the v0.3.0-dev branch; specifically look at src/measurement.rs and the user-guide documentation in book/src/user_guide/custom_measurements.md. Fair warning, there may be force-pushes to that branch from time to time, since I don't expect anyone else to commit to it.

@hdevalence - I'd appreciate it if you could try implementing a custom measurement based on RDTSC/P to validate that my design works for your use-case and that my documentation makes sense. If you'd like to publish that as a reusable crate later, once 0.3.0 is released, that would be great too; I'd be happy to add a link to it in the readme to help other folks who want CPU-counter level timing find it.

hdevalence · 2019-06-19T12:30:33Z

Awesome! Right now I'm kind of busy but I should have time to try implementing a custom measurement at the beginning of July.

shepmaster · 2019-06-24T19:28:59Z

Chiming in that this sounds like a great feature!

I’ve got some tests that feel like they are being really affected by thermal constraints on my laptop (back-to-back runs can have a variation of ~14%, for example), so I was looking for something a bit more stable. I’ll add in a +1 for “instruction count” and/or “cycle count”.

See a related thread on IRLO that describes why perf.rust-lang.org uses what it does.

hdevalence · 2019-06-27T12:12:31Z

@shepmaster this is independent of anything to do with criterion or the content of this PR, but just in case it's useful to you or someone else reading this thread, it's possible on Linux to manage the Turbo Boost state using systemd, so that it's easy to temporarily disable it for testing: https://blog.christophersmart.com/2017/02/08/manage-intel-turbo-boost-with-systemd/

YangKeao · 2019-08-26T16:00:41Z

@bheisler @hdevalence Maybe you are interested. I just implemented a POSIX CPU time measurement today (README and introduction will be added tomorrow). It's surprising that implementing a measurement takes only minutes (though it has not been well tested).

Should third party measurement be added into criterion (with a PR)?

Or should all measurement be extracted out of criterion (as discussed, this will break backwards-compatibility)? Just like serde and serde-json etc...

In my implementation DurationFormatter was directly copied from criterion. Could such useful utils be public or be extracted out as a dependency?

YangKeao · 2019-08-26T16:13:33Z

@bheisler Migrate from 0.2 to 0.3 is painful. One of the reason is adding generic parameter everywhere 😢 (though this is reasonable). I have done this for a quite big project today. It changes thousands of lines of codes.

Another pain point is some lost functions like bench_function_over_inputs. Without it, I have to provide a clonable closure (for multiple usage in loop and bench_with_input), which cannot be written in Box<dyn Fn(..) + Clone> as dyn syntax doesn't allow it.

bheisler · 2019-08-27T00:06:14Z

Migrate from 0.2 to 0.3 is painful. One of the reason is adding generic parameter everywhere 😢 (though this is reasonable). I have done this for a quite big project today. It changes thousands of lines of codes.

The new generic parameter is unfortunately necessary; we need to allow rustc to inline the measurement code into the benchmarks. It looks like tikv built a lot of code on top of Criterion.rs and they want to use CPU time instead of wall-clock time so they'll have to update.

Should third party measurement be added into criterion (with a PR)?

My plan (for now at least) is to allow external crates to provide different measurements and formatters. We can add them into Criterion.rs itself later if it makes sense.

Another pain point is some lost functions like bench_function_over_inputs. Without it, I have to provide a clonable closure (for multiple usage in loop and bench_with_input), which cannot be written in Box<dyn Fn(..) + Clone> as dyn syntax doesn't allow it.

As for bench_function_over_inputs, that's been deprecated for a while. I would recommend updating your code to use the BenchmarkGroup structure. I don't know why you would need to clone a closure, do you have an example?

YangKeao · 2019-08-27T01:36:50Z

@bheisler I noticed that the only way to apply multiple input on a function is loop if I didn't miss some important functions.

I have to change from:

for case in cases {
    c.bench_function_over_inputs(case.name, case.f, inputs.clone());
}

Into:

for case in cases {
    let mut group = c.benchmark_group(case.name);
    for input in inputs.iter() {
        group.bench_with_input(
            criterion::BenchmarkId::from_parameter(input),
            input,
            case.f,
        ); 
    }
    group.finish();
}

In this case, case.f have to be copied as it was used in a loop. The type of case.f used to be Box<dyn Fn(...) + 'static> and it's impossible to ensure it can clone.

To resolve this, I use an ugly way. The implementation is:

For restricting Clone trait on closure, I have to use generic type. So creating a struct InnerBenchCase which has a generic type called F with restriction Fn(&mut criterion::Bencher<M>, &I) + Copy + 'static
As I need multiple InnerBenchCase with different F (every closure's types differ) to store in a vector, a trait IBenchCase was used to provide a connection between these different InnerBenchCase. This trait provides method get_fn, which will copy the inner function and return.
This point is not related with criterion. For some reasons, I have to implement Ord on bench case. However, impl<T> Ord for T where T:IBenchCase is not allowed. So I have to use a struct with dynamic dispatch to implement Ord.

pub struct BenchCase<I, M>
where
    M: criterion::measurement::Measurement + 'static,
{
    inner: Box<dyn IBenchCase<I = I, M = M>>,
}

After these step, I finally can use bench_with_input instead of bench_function_over_inputs.

WildCryptoFox · 2019-09-22T04:24:47Z

@hdevalence CyclesPerByte PR open #336. Only for x86 and x86_64, assuming the cpu feature is present as we'd panic anyway, so it is feature gated under cycles.

bheisler mentioned this issue Nov 7, 2018

Allow counting CPU cycles on supported platforms #234

Closed

bheisler added Enhancement Bigger Project A large project, probably involving major internal changes as well as new API surface labels Dec 5, 2018

bheisler added this to the Version 0.3.0 milestone Dec 9, 2018

hauleth mentioned this issue Dec 30, 2018

Use stable Rust instead of nightly Rust briansmith/crypto-bench#41

Closed

bheisler removed this from the Version 0.3.0 milestone Aug 25, 2019

akshatagarwl mentioned this issue Feb 15, 2021

Set up Tremor benchmarking infrastructure tremor-rs/tremor-runtime#722

Closed

epage mentioned this issue Jun 9, 2023

Input on a json scema for bench reporting #693

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative measurements other than wall time #130

Alternative measurements other than wall time #130

quark-zju commented Feb 27, 2018

bheisler commented Feb 27, 2018

quark-zju commented Mar 25, 2018

LutzCle commented Oct 25, 2018

hdevalence commented Nov 27, 2018

bheisler commented Nov 28, 2018

hdevalence commented Dec 4, 2018

bheisler commented Jun 18, 2019

hdevalence commented Jun 19, 2019

shepmaster commented Jun 24, 2019

hdevalence commented Jun 27, 2019

YangKeao commented Aug 26, 2019 •

edited

Loading

YangKeao commented Aug 26, 2019

bheisler commented Aug 27, 2019

YangKeao commented Aug 27, 2019 •

edited

Loading

WildCryptoFox commented Sep 22, 2019

Alternative measurements other than wall time #130

Alternative measurements other than wall time #130

Comments

quark-zju commented Feb 27, 2018

bheisler commented Feb 27, 2018

quark-zju commented Mar 25, 2018

LutzCle commented Oct 25, 2018

hdevalence commented Nov 27, 2018

bheisler commented Nov 28, 2018

hdevalence commented Dec 4, 2018

bheisler commented Jun 18, 2019

hdevalence commented Jun 19, 2019

shepmaster commented Jun 24, 2019

hdevalence commented Jun 27, 2019

YangKeao commented Aug 26, 2019 • edited Loading

YangKeao commented Aug 26, 2019

bheisler commented Aug 27, 2019

YangKeao commented Aug 27, 2019 • edited Loading

WildCryptoFox commented Sep 22, 2019

YangKeao commented Aug 26, 2019 •

edited

Loading

YangKeao commented Aug 27, 2019 •

edited

Loading