Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking / cargo bench #2287

Closed
wants to merge 4 commits into from

Conversation

Manishearth
Copy link
Member

@Manishearth Manishearth commented Jan 11, 2018

@Centril Centril added the T-dev-tools Relevant to the development tools team, which will review and decide on the RFC. label Jan 11, 2018
@Manishearth Manishearth added the T-cargo Relevant to the Cargo team, which will review and decide on the RFC. label Jan 11, 2018
@Centril
Copy link
Contributor

Centril commented Jan 11, 2018

Have you considered https://github.com/japaric/criterion.rs ?

@Manishearth
Copy link
Member Author

I think the current API is pretty decent. We should probably have custom test framework support at some point (e.g. this would be great for cargo-fuzz, but given that this has been languishing with no progress for three years I think it's better to stabilize the current API so that people can actually use it, and later try to build a custom test/benchmarking plugin system.

@Manishearth
Copy link
Member Author

@Centril Yes. I think it does a bunch more things and for now I want to get the most basic stuff in stable. More functions can be added to the bencher in the future, or, ideally, we'll have a custom test framework system that we can plug Criterion into.

Copy link
Contributor

@Centril Centril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good to me - I have a single nit =)

[reference-level-explanation]: #reference-level-explanation

The bencher reports the median value and deviation (difference between min and max).
Samples are winsorized, so extreme outliers get clamped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a good idea to insert a link here: https://en.wikipedia.org/wiki/Winsorizing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks!

@SimonSapin
Copy link
Contributor

This RFC is written as if this is a new feature, which is a good description to have. But a "diff" from the current state of Nightly would also be good: are you proposing any change, or only stabilization of what’s already implemented today?

@Manishearth
Copy link
Member Author

@SimonSapin oh, should have clarified. This proposes zero changes, however it stabilizes only a part of the bencher API

@alkis
Copy link

alkis commented Jan 11, 2018

Thanks @Manishearth for starting this up. I think adding iter_n() to bencher will get the current API to a good enough state for most tasks and everything else can be done with enough gymnastics on top: type/value parametrization, etc. See the rationale for iter_n() here.

@Manishearth
Copy link
Member Author

The intent here is to stabilize a minimum useful surface, kept minimum to maximize the probability of stabilization. I have no objection to stabilizing iter_n, but others might.

Do folks have concerns? I'll add it otherwise.

@alkis
Copy link

alkis commented Jan 11, 2018

I understand the need for minimal surface and I am totally onboard. iter_n cannot be implemented on top of the current API though so in that sense adding it is inline with "minimal API surface" and not adding it is "incomplete API surface" :-)

@Manishearth
Copy link
Member Author

Manishearth commented Jan 11, 2018 via email

@alkis
Copy link

alkis commented Jan 11, 2018

Let me try for a compelling argument.

My thesis is that more often than not, iter_n() is what you want to use when benchmarking. If it accounts for anything, we have both APIs at Google and most benchmarks are written with iter_n() rather than iter() (in C++).

Here's a few examples from random code:

benchmarking insertion in a deque implementation

We want to benchmark deque of sizes that fit in different caches. When benchmarking insertion we need to insert enough elements to target some cache. Each call to iter() will have to insert N elements where N depends on the cache we are targeting. This means the time per iter() is not comparable across invocations. If we use iter_n() we report time per insertion which makes times comparable across caches (and gives more insights on how our deque works).

benchmarking a parser

We benchmark across different data sets. Naturally each benchmark is of different size. There is a lot more context when reporting time per byte instead of time to parse document X, Y, Z. This makes the times comparable between data sets. In addition it gives more insights: if for example the time per byte to parse document X is much larger than document Y suggests that there might be an optimization opportunity for documents that look like X.

benchmarking boolean formula optimizer

Given a boolean formula the optimizer reduces the number of clauses in it by removing redundant ones. Naturally the number of clauses in the original formula affect how much time it takes to optimize it. So again we want to benchmark using iter_n() where N is the number of clauses in the expression.

@Manishearth
Copy link
Member Author

I don't quite get why this affects the parser case, but the other two are pretty compelling. I'm not clear that these will be common enough, but I'll add it and see if others object.

@BurntSushi
Copy link
Member

Thanks @Manishearth! I would love to see this stabilized. I'd like to lend my voice to say that what's there is already useful, and stabilizing that alone would be a win. I know I have at least been productively using it for years now. I look forward to improving my workflow of course, but I think this is a step in the right direction.

@Manishearth
Copy link
Member Author

Actually, wait, iter_n seems to be an API that doesn't currently exist.

I'd rather use this RFC to stabilize the existing API with tweaks; things like iter_n can be added in further RFCs.

The reason being that if this RFC lands as is, we can immediately start the stabilization process; however for iter_n we have to go through a couple nightly cycles as well.

@Manishearth
Copy link
Member Author

However if folks think that we can immediately stabilize an iter_n (I'm not sure what the procedure is, @BurntSushi would know) I'm open to adding it to the RFC.

@BurntSushi
Copy link
Member

BurntSushi commented Jan 11, 2018

@Manishearth I believe you're right. Any new API generally needs to go through a couple cycles as an unstable API first. With that said, I think it might be possible to add iter_n as an unstable API and then stabilize it later via normal FCP if it's simple enough without necessarily needing to get it into this RFC.

(But I am not an expert on matters of policy, and others should check that I have this right.)


- Should stuff be in `std::test` or a partially-stabilized `libtest`?
- Should we stabilize any other `Bencher` methods (like `run_once`)?
- Stable mchine-readable output for this would be nice, but can be done in a separate RFC.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: mchine

@nagisa
Copy link
Member

nagisa commented Jan 11, 2018

In hopes of not accidentally prohibiting a custom benchmark/test engines I’ll write down all the components of the current benchmark engine and shortly describe potential pain points when integrating with a such future system.

  • #[bench] attribute -- would be collected and dispatched to the custom engine as per compiler flags, just like #[test]. No problems here.
  • The Bencher argument -- a potential pain point. The type comes from test::* and so cannot be swapped out easily, which the custom crates will 100% want to do. Should we make Bencher trait and either use a fn function(b: impl Bencher) or fn function(b: &mut Bencher) (with dynamic dispatch)?
  • The black_box for values returned out from iter and the black_box function -- seems mostly non-problem, but we could also put this into core::mem or some such, since it is more applicable than just benchmarking.

This crate does not say whether we should stabilise the functionality from within the test crate or move it into core. I feel like moving it to core would be more appropriate, especially since most of the test crate is perma-unstable and has nothing useful other than Bencher or black_box.

@udoprog
Copy link

udoprog commented Jan 11, 2018

I'm glad to see this happening.

Should probably mention the move from test to std::test, or does this not count as a change?

What @nagisa just wrote is also what I'd like to see clarified in the RFC, make sure that we don't lock ourselves into a corner w.r.t. external benchers, or at least have a plan for how to integrate them into the future.

@steveklabnik
Copy link
Member

I personally would much, much, much rather see ways to do custom test frameworks then to stabilize yet another minimal framework in the language forever.

@Manishearth
Copy link
Member Author

Manishearth commented Jan 11, 2018

I personally would much, much, much rather see ways to do custom test frameworks then to stabilize yet another minimal framework in the language forever.

We've been saying this for years, and there has been no progress on this front. I think this is a pretty important framework to have.

I would rather design it such that it doesn't prohibit us adding custom frameworks in the future.

It's taken us more than a year to stabilize pluggable allocators; and we're stabilizing that pretty much as-is. This is despite it being something people (especially those doing integration with C/++ apps, like Firefox) really wanted. There is no such significant ask for custom test frameworks right now, and I have nearly zero confidence that such a proposal will move forward right now at a reasonable pace. Benchmarking is something folks actually use significantly, let's stabilize that first.

@Manishearth
Copy link
Member Author

The Bencher argument -- a potential pain point.

I mostly envision custom test frameworks as being able to choose what attribute to use, and being able to have custom args. It is not necessary that #[bench]-using benchers all have the same, cross-compatible format, as long as what we currently stabilize is possible to rewrite as a custom framework in the future.

For example, for cargo-fuzz we ideally need the ability for the function to take any arg of type T: Arbitrary. Benchers will each have their own bencher arg, which may be a different bencher from this one.

Should we make Bencher trait and either use a fn function(b: impl Bencher) or fn function(b: &mut Bencher) (with dynamic dispatch)?

I think this stabilizes the interface which custom test frameworks can be accessed through, which might be too much. Other benchers may wish to provide more fine grained control, which can't be accessed through a generic bencher.

The black_box for values returned out from iter and the black_box function -- seems mostly non-problem, but we could also put this into core::mem or some such, since it is more applicable than just benchmarking.

Good point. Adding to unresolved questions.

@steveklabnik
Copy link
Member

We've been saying this for years, and there has been no progress on this front. I think this is a pretty important framework to have.

I agree, but we should do it right rather than stabilize what we have. I know that the intention of this RFC is specifically to push back against this kind of thinking, and I agree with it in many cases, but not here.

I agree this is very important, which is why we shouldn't just stabilize whatever was written by some random contributors years ago and then not really touched since then.

@Manishearth
Copy link
Member Author

I agree, but we should do it right rather than stabilize what we have.

Sure. I disagree that doing this right implies going for custom test frameworks. IMO the problem with our "perfect is the enemy of the good" issue isn't that we try to get things right, it's that we try to get things right in the most general way possible, and that ends up stopping things in their tracks because people have neither the bandwidth nor the drive to push for huge projects which address one use case they care about and a million use cases they don't.

Custom test frameworks are a project that's comparable in size to the simple string-based proc macro stabilization, and that actually had a lot of folks who want it. And it still took forever to get that ball rolling.

If we want to get it right, we can also tweak the API, tweak the functionality, whatever. Folks mostly seem to be happy with what exists so I'm trying to keep it the same, but if there are strong reasons for there being a better API or set of functionality, I'm game. So far, there is @alkis's suggestion, which I'm considering, but that doesn't change the existing APIs, just adds one more, which I'd rather table to another RFC. But if we end up changing other things anyway I'd probably want to add it to this RFC.


Basically, I'm not convinced that we must go for custom test frameworks to get this right. Even if we do, I'd want us to be providing some kind of bencher out of the box, so I feel like stabilizing a decent, minimal, bencher; and making sure it can be made to work with the custom test framework support when it happens (as a test plugin), is the best.

@Manishearth
Copy link
Member Author

Added a better API for bencher, moved black_box to std::mem, and added iter_n

@gnzlbg
Copy link
Contributor

gnzlbg commented Jan 19, 2018

@Manishearth we discussed this a bit on irc.

This is how currently test::black_box is defined:

/// A function that is opaque to the optimizer, to allow benchmarks to
/// pretend to use outputs to assist in avoiding dead-code
/// elimination.
///
/// This function is a no-op, and does not even read from `dummy`.
pub fn black_box<T>(dummy: T) -> T {
    // we need to "use" the argument in some way LLVM can't
    // introspect.
    unsafe { asm!("" : : "r"(&dummy)) }
    dummy
}

I think we should make its definition, independently of the implementation, a bit more precise. What does exactly mean to be "opaque to the optimizer"?

For example, Google's Benchmark framework, define's two functions black_box (they call black_box DoNotOptimize though, I'll use black box here to avoid confusion) and clobber, and this is how they would be defined in Rust (I ported them from here, maybe I screwed up):

/// Read/write barrier; flushes pending writes to global memory.
///
/// Memory managed by block scope objects must be "escaped" using 
/// `black_box` before it can be clobbered.
#[inline(always)]
fn clobber_memory() { 
  unsafe { asm!("" : : : "memory" : "volatile") };
}

/// Prevents a value or the result of an expression from being optimized away by 
/// the compiler adding as little overhead as possible.
///
/// It does not prevent optimizations on the expression generating `x` in any way: 
/// the expression might be removed entirely when the result is already known. It 
/// forces, however, the result of `x` to be stored in either memory or a register.
#[inline(always)]
fn black_box<T>(x: &T) {
    unsafe { 
        asm!("" // template
          : // output operands
          : "r"(x) // input operands
          // r: any general purpose register
          : "memory"    // clobbers all memory
          : "volatile"  // has side-effects
        );
   }
}

This is how you test Vec::push_back with test::black_box vs google::black_box:

pub fn test_black_box() {
    let mut v = Vec::with_capacity(1);
    v.push(42);
    test::black_box(v);
}

pub fn google_black_box() {
    let mut v = Vec::with_capacity(1);
    unsafe { black_box(&*v.as_ptr()) }; // Data to be clobbered
    v.push(42);
    clobber_memory(); // Force 42 to be written to memory.
}

If someone wants to play with these two and our current test::black_box, they can start in this rust.godbolt.org link and check the generated assembly. Looking at the assembly generated in our test::black_box case, it does not look like 42 is written to memory (but I am no expert, could somebody check?). The ones from Google Benchmark look like they generate slightly less instructions 1-2 less. Their docs are here (in the preventing optimization section):

The main differences between test::black_box and google's black_box are that google's:

  • is volatile (has side-effects),
  • clobbers all memory,
  • the function call is intended to disappear (is #[inline(always)]).
  • the argument is not moved into the function.

I think that clobber is a generally useful addition to mem.

In any case we should define exactly what test::black_box does, and depending on the outcome of that, we might need to add clobber as well, and a "How do we teach this" section explaining how to use them together in benchmarks to create meaningful ones.

@Manishearth
Copy link
Member Author

@gnzlbg this sounds good. could you open a PR to my RFC that changes the guide-level section with more examples and adds impl details to the reference-level explanation? Let's see what others think of it.

Note that black_box currently returns the value passed to it as well, which may be useful to allow so that you can place an "optimization barrier" between two operations. For example, it lets you ensure that 400 is not constant-optimized in pow(100, black_box(400)). This could also be achieved by giving it an &mut reference to the thing, but that leads to worse-looking code IMO.

@Manishearth
Copy link
Member Author

Opened a supplementary PR with @gnzlbg's suggestions at Manishearth#1

I'll merge it in if folks seem to agree on it.

@clarfonthey
Copy link
Contributor

Thinking again, perhaps iter and iter_n should be bench and bench_exact.

Additionally, I agree that startup/teardown should be excluded from the benchmark. Perhaps a trait could be used in place of a closure, like so:

trait BuidBench {
    type Bench: Bench;
    fn build_bench(&self) -> Self::Bench;
}
trait Bench {
    fn run(&mut self);
}

impl<F, T> BuildBench for F where F: Fn() -> T {
    type Bench = &Self;
    fn build_bench(&self) -> Bench {
        self
    }
}
impl<F, T> Bench for &F where F: Fn() -> T {
    fn run(&mut self) {
        std::mem::black_box(self())
    }
}

This would allow using closures like the existing implementation, but allow excluding parts from the benchmark by calling build_bench and drop outside the benchmarked part.

@petrochenkov petrochenkov added T-dev-tools Relevant to the development tools team, which will review and decide on the RFC. and removed T-dev-tools labels Jan 30, 2018
@hdevalence
Copy link

How does this relate to #1484 ? There's some discussion there about reasons to provide an optimization barrier that don't involve benchmarking.

@nrc
Copy link
Member

nrc commented Feb 1, 2018

We discussed this at the dev-tools meeting today. The consensus was that we prefer #2318 as an alternative - to be clear, that we would not stabilise bench in any form and remove support from the compiler, providing a 'custom test framework' which does what bench does today. There are probably some improvements that could be made too (and discussed in this thread), but they should probably be iterated on rather than an RFC now. The only piece that may need some RFC work now-ish is on black_box. My inutition is that we'll need to nail down what #2318 looks like and what the bench framework looks like, before we can do that. But it might be worth opening an issue or discuss thread now to collect any thoughts.

@rfcbot fcp close

@rfcbot
Copy link
Collaborator

rfcbot commented Feb 1, 2018

Team member @nrc has proposed to close this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Feb 1, 2018
@gnzlbg
Copy link
Contributor

gnzlbg commented Feb 2, 2018

@nrc

The only piece that may need some RFC work now-ish is on black_box. My inutition is that we'll need to nail down what #2318 looks like and what the bench framework looks like, before we can do that. But it might be worth opening an issue or discuss thread now to collect any thoughts.

We already have an issue open for that here: #1484

I've started writing an RFC for it (based on the pull-request to this one). Will post it in internals "soon" and link it here and on that issue, so that those interesting in evolving it can give feedback and we can iterate on it before submitting it.

@fitzgen
Copy link
Member

fitzgen commented Feb 2, 2018

@rfcbot reviewed

4 similar comments
@japaric
Copy link
Member

japaric commented Feb 3, 2018

@rfcbot reviewed

@wycats
Copy link
Contributor

wycats commented Feb 5, 2018

@rfcbot reviewed

@michaelwoerister
Copy link
Member

@rfcbot reviewed

@killercup
Copy link
Member

@rfcbot reviewed

@Manishearth
Copy link
Member Author

Closing, we're going pretty strongly for custom test frameworks.

@gnzlbg care to open that RFC for test::black_box?

@Manishearth Manishearth closed this Mar 5, 2018
@rfcbot rfcbot removed the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Mar 5, 2018
@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 6, 2018

@Manishearth ok, the discussion in the tracking issue is frozen so an RFC for those are the next step.

@frewsxcv
Copy link
Member

for anyone who (like myself) discovered this pull request and wants a link to the black_box rfc: #2360

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-cargo Relevant to the Cargo team, which will review and decide on the RFC. T-dev-tools Relevant to the development tools team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.