Benchmarking / cargo bench #2287

Manishearth · 2018-01-11T07:32:19Z

Old tracking issue: rust-lang/rust#29553
Previously: https://internals.rust-lang.org/t/pre-rfc-stabilize-bench-bencher-and-black-box/4565
Previously: https://internals.rust-lang.org/t/bench-status/2122/11

Rendered

Centril · 2018-01-11T07:34:57Z

Have you considered https://github.com/japaric/criterion.rs ?

Manishearth · 2018-01-11T07:36:05Z

I think the current API is pretty decent. We should probably have custom test framework support at some point (e.g. this would be great for cargo-fuzz, but given that this has been languishing with no progress for three years I think it's better to stabilize the current API so that people can actually use it, and later try to build a custom test/benchmarking plugin system.

Manishearth · 2018-01-11T07:37:27Z

@Centril Yes. I think it does a bunch more things and for now I want to get the most basic stuff in stable. More functions can be added to the bencher in the future, or, ideally, we'll have a custom test framework system that we can plug Criterion into.

Centril

Looks all good to me - I have a single nit =)

Centril · 2018-01-11T07:41:55Z

text/0000-benchmarking.md

+[reference-level-explanation]: #reference-level-explanation
+
+The bencher reports the median value and deviation (difference between min and max).
+Samples are winsorized, so extreme outliers get clamped.


Might be a good idea to insert a link here: https://en.wikipedia.org/wiki/Winsorizing

done, thanks!

SimonSapin · 2018-01-11T08:23:53Z

This RFC is written as if this is a new feature, which is a good description to have. But a "diff" from the current state of Nightly would also be good: are you proposing any change, or only stabilization of what’s already implemented today?

Manishearth · 2018-01-11T09:08:47Z

@SimonSapin oh, should have clarified. This proposes zero changes, however it stabilizes only a part of the bencher API

alkis · 2018-01-11T10:38:26Z

Thanks @Manishearth for starting this up. I think adding iter_n() to bencher will get the current API to a good enough state for most tasks and everything else can be done with enough gymnastics on top: type/value parametrization, etc. See the rationale for iter_n() here.

Manishearth · 2018-01-11T11:11:08Z

The intent here is to stabilize a minimum useful surface, kept minimum to maximize the probability of stabilization. I have no objection to stabilizing iter_n, but others might.

Do folks have concerns? I'll add it otherwise.

alkis · 2018-01-11T11:18:42Z

I understand the need for minimal surface and I am totally onboard. iter_n cannot be implemented on top of the current API though so in that sense adding it is inline with "minimal API surface" and not adding it is "incomplete API surface" :-)

Manishearth · 2018-01-11T11:20:08Z

I don't see a compelling argument that it is useful for _most_ benchmarking tasks.

alkis · 2018-01-11T13:10:07Z

Let me try for a compelling argument.

My thesis is that more often than not, iter_n() is what you want to use when benchmarking. If it accounts for anything, we have both APIs at Google and most benchmarks are written with iter_n() rather than iter() (in C++).

Here's a few examples from random code:

benchmarking insertion in a deque implementation

We want to benchmark deque of sizes that fit in different caches. When benchmarking insertion we need to insert enough elements to target some cache. Each call to iter() will have to insert N elements where N depends on the cache we are targeting. This means the time per iter() is not comparable across invocations. If we use iter_n() we report time per insertion which makes times comparable across caches (and gives more insights on how our deque works).

benchmarking a parser

We benchmark across different data sets. Naturally each benchmark is of different size. There is a lot more context when reporting time per byte instead of time to parse document X, Y, Z. This makes the times comparable between data sets. In addition it gives more insights: if for example the time per byte to parse document X is much larger than document Y suggests that there might be an optimization opportunity for documents that look like X.

benchmarking boolean formula optimizer

Given a boolean formula the optimizer reduces the number of clauses in it by removing redundant ones. Naturally the number of clauses in the original formula affect how much time it takes to optimize it. So again we want to benchmark using iter_n() where N is the number of clauses in the expression.

Manishearth · 2018-01-11T13:25:48Z

I don't quite get why this affects the parser case, but the other two are pretty compelling. I'm not clear that these will be common enough, but I'll add it and see if others object.

BurntSushi · 2018-01-11T13:27:00Z

Thanks @Manishearth! I would love to see this stabilized. I'd like to lend my voice to say that what's there is already useful, and stabilizing that alone would be a win. I know I have at least been productively using it for years now. I look forward to improving my workflow of course, but I think this is a step in the right direction.

Manishearth · 2018-01-11T13:28:23Z

Actually, wait, iter_n seems to be an API that doesn't currently exist.

I'd rather use this RFC to stabilize the existing API with tweaks; things like iter_n can be added in further RFCs.

The reason being that if this RFC lands as is, we can immediately start the stabilization process; however for iter_n we have to go through a couple nightly cycles as well.

Manishearth · 2018-01-11T13:29:12Z

However if folks think that we can immediately stabilize an iter_n (I'm not sure what the procedure is, @BurntSushi would know) I'm open to adding it to the RFC.

BurntSushi · 2018-01-11T13:38:33Z

@Manishearth I believe you're right. Any new API generally needs to go through a couple cycles as an unstable API first. With that said, I think it might be possible to add iter_n as an unstable API and then stabilize it later via normal FCP if it's simple enough without necessarily needing to get it into this RFC.

(But I am not an expert on matters of policy, and others should check that I have this right.)

udoprog · 2018-01-11T13:47:32Z

text/0000-benchmarking.md

+
+- Should stuff be in `std::test` or a partially-stabilized `libtest`?
+- Should we stabilize any other `Bencher` methods (like `run_once`)?
+- Stable mchine-readable output for this would be nice, but can be done in a separate RFC.


typo: mchine

nagisa · 2018-01-11T13:54:03Z

In hopes of not accidentally prohibiting a custom benchmark/test engines I’ll write down all the components of the current benchmark engine and shortly describe potential pain points when integrating with a such future system.

#[bench] attribute -- would be collected and dispatched to the custom engine as per compiler flags, just like #[test]. No problems here.
The Bencher argument -- a potential pain point. The type comes from test::* and so cannot be swapped out easily, which the custom crates will 100% want to do. Should we make Bencher trait and either use a fn function(b: impl Bencher) or fn function(b: &mut Bencher) (with dynamic dispatch)?
The black_box for values returned out from iter and the black_box function -- seems mostly non-problem, but we could also put this into core::mem or some such, since it is more applicable than just benchmarking.

This crate does not say whether we should stabilise the functionality from within the test crate or move it into core. I feel like moving it to core would be more appropriate, especially since most of the test crate is perma-unstable and has nothing useful other than Bencher or black_box.

udoprog · 2018-01-11T13:56:09Z

I'm glad to see this happening.

Should probably mention the move from test to std::test, or does this not count as a change?

What @nagisa just wrote is also what I'd like to see clarified in the RFC, make sure that we don't lock ourselves into a corner w.r.t. external benchers, or at least have a plan for how to integrate them into the future.

steveklabnik · 2018-01-11T14:21:37Z

I personally would much, much, much rather see ways to do custom test frameworks then to stabilize yet another minimal framework in the language forever.

Manishearth · 2018-01-11T14:31:32Z

I personally would much, much, much rather see ways to do custom test frameworks then to stabilize yet another minimal framework in the language forever.

We've been saying this for years, and there has been no progress on this front. I think this is a pretty important framework to have.

I would rather design it such that it doesn't prohibit us adding custom frameworks in the future.

It's taken us more than a year to stabilize pluggable allocators; and we're stabilizing that pretty much as-is. This is despite it being something people (especially those doing integration with C/++ apps, like Firefox) really wanted. There is no such significant ask for custom test frameworks right now, and I have nearly zero confidence that such a proposal will move forward right now at a reasonable pace. Benchmarking is something folks actually use significantly, let's stabilize that first.

Manishearth · 2018-01-11T14:36:29Z

The Bencher argument -- a potential pain point.

I mostly envision custom test frameworks as being able to choose what attribute to use, and being able to have custom args. It is not necessary that #[bench]-using benchers all have the same, cross-compatible format, as long as what we currently stabilize is possible to rewrite as a custom framework in the future.

For example, for cargo-fuzz we ideally need the ability for the function to take any arg of type T: Arbitrary. Benchers will each have their own bencher arg, which may be a different bencher from this one.

Should we make Bencher trait and either use a fn function(b: impl Bencher) or fn function(b: &mut Bencher) (with dynamic dispatch)?

I think this stabilizes the interface which custom test frameworks can be accessed through, which might be too much. Other benchers may wish to provide more fine grained control, which can't be accessed through a generic bencher.

The black_box for values returned out from iter and the black_box function -- seems mostly non-problem, but we could also put this into core::mem or some such, since it is more applicable than just benchmarking.

Good point. Adding to unresolved questions.

steveklabnik · 2018-01-11T14:43:54Z

We've been saying this for years, and there has been no progress on this front. I think this is a pretty important framework to have.

I agree, but we should do it right rather than stabilize what we have. I know that the intention of this RFC is specifically to push back against this kind of thinking, and I agree with it in many cases, but not here.

I agree this is very important, which is why we shouldn't just stabilize whatever was written by some random contributors years ago and then not really touched since then.

Manishearth · 2018-01-11T14:56:39Z

I agree, but we should do it right rather than stabilize what we have.

Sure. I disagree that doing this right implies going for custom test frameworks. IMO the problem with our "perfect is the enemy of the good" issue isn't that we try to get things right, it's that we try to get things right in the most general way possible, and that ends up stopping things in their tracks because people have neither the bandwidth nor the drive to push for huge projects which address one use case they care about and a million use cases they don't.

Custom test frameworks are a project that's comparable in size to the simple string-based proc macro stabilization, and that actually had a lot of folks who want it. And it still took forever to get that ball rolling.

If we want to get it right, we can also tweak the API, tweak the functionality, whatever. Folks mostly seem to be happy with what exists so I'm trying to keep it the same, but if there are strong reasons for there being a better API or set of functionality, I'm game. So far, there is @alkis's suggestion, which I'm considering, but that doesn't change the existing APIs, just adds one more, which I'd rather table to another RFC. But if we end up changing other things anyway I'd probably want to add it to this RFC.

Basically, I'm not convinced that we must go for custom test frameworks to get this right. Even if we do, I'd want us to be providing some kind of bencher out of the box, so I feel like stabilizing a decent, minimal, bencher; and making sure it can be made to work with the custom test framework support when it happens (as a test plugin), is the best.

Manishearth · 2018-01-19T06:56:37Z

Added a better API for bencher, moved black_box to std::mem, and added iter_n

gnzlbg · 2018-01-19T08:39:03Z

@Manishearth we discussed this a bit on irc.

This is how currently test::black_box is defined:

/// A function that is opaque to the optimizer, to allow benchmarks to
/// pretend to use outputs to assist in avoiding dead-code
/// elimination.
///
/// This function is a no-op, and does not even read from `dummy`.
pub fn black_box<T>(dummy: T) -> T {
    // we need to "use" the argument in some way LLVM can't
    // introspect.
    unsafe { asm!("" : : "r"(&dummy)) }
    dummy
}

I think we should make its definition, independently of the implementation, a bit more precise. What does exactly mean to be "opaque to the optimizer"?

For example, Google's Benchmark framework, define's two functions black_box (they call black_box DoNotOptimize though, I'll use black box here to avoid confusion) and clobber, and this is how they would be defined in Rust (I ported them from here, maybe I screwed up):

/// Read/write barrier; flushes pending writes to global memory.
///
/// Memory managed by block scope objects must be "escaped" using 
/// `black_box` before it can be clobbered.
#[inline(always)]
fn clobber_memory() { 
  unsafe { asm!("" : : : "memory" : "volatile") };
}

/// Prevents a value or the result of an expression from being optimized away by 
/// the compiler adding as little overhead as possible.
///
/// It does not prevent optimizations on the expression generating `x` in any way: 
/// the expression might be removed entirely when the result is already known. It 
/// forces, however, the result of `x` to be stored in either memory or a register.
#[inline(always)]
fn black_box<T>(x: &T) {
    unsafe { 
        asm!("" // template
          : // output operands
          : "r"(x) // input operands
          // r: any general purpose register
          : "memory"    // clobbers all memory
          : "volatile"  // has side-effects
        );
   }
}

This is how you test Vec::push_back with test::black_box vs google::black_box:

pub fn test_black_box() {
    let mut v = Vec::with_capacity(1);
    v.push(42);
    test::black_box(v);
}

pub fn google_black_box() {
    let mut v = Vec::with_capacity(1);
    unsafe { black_box(&*v.as_ptr()) }; // Data to be clobbered
    v.push(42);
    clobber_memory(); // Force 42 to be written to memory.
}

If someone wants to play with these two and our current test::black_box, they can start in this rust.godbolt.org link and check the generated assembly. Looking at the assembly generated in our test::black_box case, it does not look like 42 is written to memory (but I am no expert, could somebody check?). The ones from Google Benchmark look like they generate slightly less instructions 1-2 less. Their docs are here (in the preventing optimization section):

The main differences between test::black_box and google's black_box are that google's:

is volatile (has side-effects),
clobbers all memory,
the function call is intended to disappear (is #[inline(always)]).
the argument is not moved into the function.

I think that clobber is a generally useful addition to mem.

In any case we should define exactly what test::black_box does, and depending on the outcome of that, we might need to add clobber as well, and a "How do we teach this" section explaining how to use them together in benchmarks to create meaningful ones.

Manishearth · 2018-01-19T09:10:08Z

@gnzlbg this sounds good. could you open a PR to my RFC that changes the guide-level section with more examples and adds impl details to the reference-level explanation? Let's see what others think of it.

Note that black_box currently returns the value passed to it as well, which may be useful to allow so that you can place an "optimization barrier" between two operations. For example, it lets you ensure that 400 is not constant-optimized in pow(100, black_box(400)). This could also be achieved by giving it an &mut reference to the thing, but that leads to worse-looking code IMO.

Manishearth · 2018-01-19T09:59:14Z

Opened a supplementary PR with @gnzlbg's suggestions at Manishearth#1

I'll merge it in if folks seem to agree on it.

clarfonthey · 2018-01-24T00:18:26Z

Thinking again, perhaps iter and iter_n should be bench and bench_exact.

Additionally, I agree that startup/teardown should be excluded from the benchmark. Perhaps a trait could be used in place of a closure, like so:

trait BuidBench {
    type Bench: Bench;
    fn build_bench(&self) -> Self::Bench;
}
trait Bench {
    fn run(&mut self);
}

impl<F, T> BuildBench for F where F: Fn() -> T {
    type Bench = &Self;
    fn build_bench(&self) -> Bench {
        self
    }
}
impl<F, T> Bench for &F where F: Fn() -> T {
    fn run(&mut self) {
        std::mem::black_box(self())
    }
}

This would allow using closures like the existing implementation, but allow excluding parts from the benchmark by calling build_bench and drop outside the benchmarked part.

hdevalence · 2018-01-31T22:18:02Z

How does this relate to #1484 ? There's some discussion there about reasons to provide an optimization barrier that don't involve benchmarking.

nrc · 2018-02-01T23:05:31Z

We discussed this at the dev-tools meeting today. The consensus was that we prefer #2318 as an alternative - to be clear, that we would not stabilise bench in any form and remove support from the compiler, providing a 'custom test framework' which does what bench does today. There are probably some improvements that could be made too (and discussed in this thread), but they should probably be iterated on rather than an RFC now. The only piece that may need some RFC work now-ish is on black_box. My inutition is that we'll need to nail down what #2318 looks like and what the bench framework looks like, before we can do that. But it might be worth opening an issue or discuss thread now to collect any thoughts.

@rfcbot fcp close

rfcbot · 2018-02-01T23:05:31Z

gnzlbg · 2018-02-02T09:51:04Z

@nrc

The only piece that may need some RFC work now-ish is on black_box. My inutition is that we'll need to nail down what #2318 looks like and what the bench framework looks like, before we can do that. But it might be worth opening an issue or discuss thread now to collect any thoughts.

We already have an issue open for that here: #1484

I've started writing an RFC for it (based on the pull-request to this one). Will post it in internals "soon" and link it here and on that issue, so that those interesting in evolving it can give feedback and we can iterate on it before submitting it.

fitzgen · 2018-02-02T18:20:06Z

@rfcbot reviewed

japaric · 2018-02-03T10:02:07Z

@rfcbot reviewed

wycats · 2018-02-05T05:19:37Z

@rfcbot reviewed

michaelwoerister · 2018-02-12T13:08:44Z

@rfcbot reviewed

killercup · 2018-03-05T21:45:26Z

@rfcbot reviewed

Manishearth · 2018-03-05T22:02:35Z

Closing, we're going pretty strongly for custom test frameworks.

@gnzlbg care to open that RFC for test::black_box?

gnzlbg · 2018-03-06T10:38:03Z

@Manishearth ok, the discussion in the tracking issue is frozen so an RFC for those are the next step.

frewsxcv · 2018-06-11T05:04:59Z

for anyone who (like myself) discovered this pull request and wants a link to the black_box rfc: #2360

Manishearth mentioned this pull request Jan 11, 2018

Tracking issue for #[bench] and benchmarking support rust-lang/rust#29553

Open

Centril added the T-dev-tools Relevant to the development tools team, which will review and decide on the RFC. label Jan 11, 2018

Manishearth added the T-cargo Relevant to the Cargo team, which will review and decide on the RFC. label Jan 11, 2018

Centril reviewed Jan 11, 2018

View reviewed changes

Manishearth force-pushed the benchmarking branch from 00d1a55 to fe29ce5 Compare January 11, 2018 07:46

udoprog reviewed Jan 11, 2018

View reviewed changes

Manishearth force-pushed the benchmarking branch from fe29ce5 to 0d0e89c Compare January 11, 2018 14:32

Initial text for benchmarking RFC

67a34e4

Manishearth force-pushed the benchmarking branch from 0d0e89c to ea4543f Compare January 19, 2018 07:03

Add iter_n

b37b88f

Manishearth force-pushed the benchmarking branch from ea4543f to b37b88f Compare January 19, 2018 14:36

petrochenkov added T-dev-tools Relevant to the development tools team, which will review and decide on the RFC. and removed T-dev-tools labels Jan 30, 2018

Manishearth mentioned this pull request Feb 1, 2018

black_box improvements Manishearth/rfcs#1

Open

rfcbot added the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Feb 1, 2018

Manishearth mentioned this pull request Feb 2, 2018

Tools news rust-dev-tools/dev-tools-team#35

Closed

Manishearth closed this Mar 5, 2018

rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Mar 5, 2018

thomcc mentioned this pull request Apr 20, 2018

Optimize NamespacedKeyword (and NamespacedSymbol) mozilla/mentat#648

Closed

adaqus mentioned this pull request Nov 2, 2019

Replace bench utkarshkukreti/speculate.rs#31

Open

Manishearth mentioned this pull request Nov 11, 2019

Stabilize #[bench] and Bencher? rust-lang/rust#66287

Open

Benchmarking / cargo bench #2287

Benchmarking / cargo bench #2287

Conversation

Manishearth commented Jan 11, 2018 • edited Loading

Centril commented Jan 11, 2018

Manishearth commented Jan 11, 2018

Manishearth commented Jan 11, 2018

Centril left a comment

Choose a reason for hiding this comment

Centril Jan 11, 2018

Choose a reason for hiding this comment

Manishearth Jan 11, 2018

Choose a reason for hiding this comment

SimonSapin commented Jan 11, 2018

Manishearth commented Jan 11, 2018

alkis commented Jan 11, 2018

Manishearth commented Jan 11, 2018

alkis commented Jan 11, 2018

Manishearth commented Jan 11, 2018 via email • edited Loading

alkis commented Jan 11, 2018 • edited Loading

benchmarking insertion in a deque implementation

benchmarking a parser

benchmarking boolean formula optimizer

Manishearth commented Jan 11, 2018

BurntSushi commented Jan 11, 2018

Manishearth commented Jan 11, 2018

Manishearth commented Jan 11, 2018

BurntSushi commented Jan 11, 2018 • edited Loading

udoprog Jan 11, 2018

Choose a reason for hiding this comment

nagisa commented Jan 11, 2018

udoprog commented Jan 11, 2018

steveklabnik commented Jan 11, 2018

Manishearth commented Jan 11, 2018 • edited Loading

Manishearth commented Jan 11, 2018

steveklabnik commented Jan 11, 2018

Manishearth commented Jan 11, 2018

Manishearth commented Jan 19, 2018

gnzlbg commented Jan 19, 2018 • edited Loading

Manishearth commented Jan 19, 2018

Manishearth commented Jan 19, 2018

clarfonthey commented Jan 24, 2018

hdevalence commented Jan 31, 2018

nrc commented Feb 1, 2018

rfcbot commented Feb 1, 2018 • edited by withoutboats Loading

gnzlbg commented Feb 2, 2018

fitzgen commented Feb 2, 2018

japaric commented Feb 3, 2018

wycats commented Feb 5, 2018

michaelwoerister commented Feb 12, 2018

killercup commented Mar 5, 2018

Manishearth commented Mar 5, 2018

gnzlbg commented Mar 6, 2018

frewsxcv commented Jun 11, 2018

Manishearth commented Jan 11, 2018 •

edited

Loading

Manishearth commented Jan 11, 2018 via email •

edited

Loading

alkis commented Jan 11, 2018 •

edited

Loading

BurntSushi commented Jan 11, 2018 •

edited

Loading

Manishearth commented Jan 11, 2018 •

edited

Loading

gnzlbg commented Jan 19, 2018 •

edited

Loading

rfcbot commented Feb 1, 2018 •

edited by withoutboats

Loading