-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarking / cargo bench #2287
Conversation
Have you considered https://github.com/japaric/criterion.rs ? |
I think the current API is pretty decent. We should probably have custom test framework support at some point (e.g. this would be great for cargo-fuzz, but given that this has been languishing with no progress for three years I think it's better to stabilize the current API so that people can actually use it, and later try to build a custom test/benchmarking plugin system. |
@Centril Yes. I think it does a bunch more things and for now I want to get the most basic stuff in stable. More functions can be added to the bencher in the future, or, ideally, we'll have a custom test framework system that we can plug Criterion into. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all good to me - I have a single nit =)
text/0000-benchmarking.md
Outdated
[reference-level-explanation]: #reference-level-explanation | ||
|
||
The bencher reports the median value and deviation (difference between min and max). | ||
Samples are winsorized, so extreme outliers get clamped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a good idea to insert a link here: https://en.wikipedia.org/wiki/Winsorizing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thanks!
00d1a55
to
fe29ce5
Compare
This RFC is written as if this is a new feature, which is a good description to have. But a "diff" from the current state of Nightly would also be good: are you proposing any change, or only stabilization of what’s already implemented today? |
@SimonSapin oh, should have clarified. This proposes zero changes, however it stabilizes only a part of the bencher API |
Thanks @Manishearth for starting this up. I think adding |
The intent here is to stabilize a minimum useful surface, kept minimum to maximize the probability of stabilization. I have no objection to stabilizing iter_n, but others might. Do folks have concerns? I'll add it otherwise. |
I understand the need for minimal surface and I am totally onboard. |
I don't see a compelling argument that it is useful for _most_ benchmarking
tasks.
|
Let me try for a compelling argument. My thesis is that more often than not, Here's a few examples from random code: benchmarking insertion in a deque implementationWe want to benchmark deque of sizes that fit in different caches. When benchmarking insertion we need to insert enough elements to target some cache. Each call to benchmarking a parserWe benchmark across different data sets. Naturally each benchmark is of different size. There is a lot more context when reporting time per byte instead of time to parse document X, Y, Z. This makes the times comparable between data sets. In addition it gives more insights: if for example the time per byte to parse document X is much larger than document Y suggests that there might be an optimization opportunity for documents that look like X. benchmarking boolean formula optimizerGiven a boolean formula the optimizer reduces the number of clauses in it by removing redundant ones. Naturally the number of clauses in the original formula affect how much time it takes to optimize it. So again we want to benchmark using |
I don't quite get why this affects the parser case, but the other two are pretty compelling. I'm not clear that these will be common enough, but I'll add it and see if others object. |
Thanks @Manishearth! I would love to see this stabilized. I'd like to lend my voice to say that what's there is already useful, and stabilizing that alone would be a win. I know I have at least been productively using it for years now. I look forward to improving my workflow of course, but I think this is a step in the right direction. |
Actually, wait, I'd rather use this RFC to stabilize the existing API with tweaks; things like The reason being that if this RFC lands as is, we can immediately start the stabilization process; however for |
However if folks think that we can immediately stabilize an |
@Manishearth I believe you're right. Any new API generally needs to go through a couple cycles as an unstable API first. With that said, I think it might be possible to add (But I am not an expert on matters of policy, and others should check that I have this right.) |
text/0000-benchmarking.md
Outdated
|
||
- Should stuff be in `std::test` or a partially-stabilized `libtest`? | ||
- Should we stabilize any other `Bencher` methods (like `run_once`)? | ||
- Stable mchine-readable output for this would be nice, but can be done in a separate RFC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: mchine
In hopes of not accidentally prohibiting a custom benchmark/test engines I’ll write down all the components of the current benchmark engine and shortly describe potential pain points when integrating with a such future system.
This crate does not say whether we should stabilise the functionality from within the |
I'm glad to see this happening. Should probably mention the move from What @nagisa just wrote is also what I'd like to see clarified in the RFC, make sure that we don't lock ourselves into a corner w.r.t. external benchers, or at least have a plan for how to integrate them into the future. |
I personally would much, much, much rather see ways to do custom test frameworks then to stabilize yet another minimal framework in the language forever. |
We've been saying this for years, and there has been no progress on this front. I think this is a pretty important framework to have. I would rather design it such that it doesn't prohibit us adding custom frameworks in the future. It's taken us more than a year to stabilize pluggable allocators; and we're stabilizing that pretty much as-is. This is despite it being something people (especially those doing integration with C/++ apps, like Firefox) really wanted. There is no such significant ask for custom test frameworks right now, and I have nearly zero confidence that such a proposal will move forward right now at a reasonable pace. Benchmarking is something folks actually use significantly, let's stabilize that first. |
fe29ce5
to
0d0e89c
Compare
I mostly envision custom test frameworks as being able to choose what attribute to use, and being able to have custom args. It is not necessary that For example, for
I think this stabilizes the interface which custom test frameworks can be accessed through, which might be too much. Other benchers may wish to provide more fine grained control, which can't be accessed through a generic bencher.
Good point. Adding to unresolved questions. |
I agree, but we should do it right rather than stabilize what we have. I know that the intention of this RFC is specifically to push back against this kind of thinking, and I agree with it in many cases, but not here. I agree this is very important, which is why we shouldn't just stabilize whatever was written by some random contributors years ago and then not really touched since then. |
Sure. I disagree that doing this right implies going for custom test frameworks. IMO the problem with our "perfect is the enemy of the good" issue isn't that we try to get things right, it's that we try to get things right in the most general way possible, and that ends up stopping things in their tracks because people have neither the bandwidth nor the drive to push for huge projects which address one use case they care about and a million use cases they don't. Custom test frameworks are a project that's comparable in size to the simple string-based proc macro stabilization, and that actually had a lot of folks who want it. And it still took forever to get that ball rolling. If we want to get it right, we can also tweak the API, tweak the functionality, whatever. Folks mostly seem to be happy with what exists so I'm trying to keep it the same, but if there are strong reasons for there being a better API or set of functionality, I'm game. So far, there is @alkis's suggestion, which I'm considering, but that doesn't change the existing APIs, just adds one more, which I'd rather table to another RFC. But if we end up changing other things anyway I'd probably want to add it to this RFC. Basically, I'm not convinced that we must go for custom test frameworks to get this right. Even if we do, I'd want us to be providing some kind of bencher out of the box, so I feel like stabilizing a decent, minimal, bencher; and making sure it can be made to work with the custom test framework support when it happens (as a test plugin), is the best. |
Added a better API for bencher, moved black_box to |
0d0e89c
to
ea4543f
Compare
@Manishearth we discussed this a bit on irc. This is how currently /// A function that is opaque to the optimizer, to allow benchmarks to
/// pretend to use outputs to assist in avoiding dead-code
/// elimination.
///
/// This function is a no-op, and does not even read from `dummy`.
pub fn black_box<T>(dummy: T) -> T {
// we need to "use" the argument in some way LLVM can't
// introspect.
unsafe { asm!("" : : "r"(&dummy)) }
dummy
} I think we should make its definition, independently of the implementation, a bit more precise. What does exactly mean to be "opaque to the optimizer"? For example, Google's Benchmark framework, define's two functions /// Read/write barrier; flushes pending writes to global memory.
///
/// Memory managed by block scope objects must be "escaped" using
/// `black_box` before it can be clobbered.
#[inline(always)]
fn clobber_memory() {
unsafe { asm!("" : : : "memory" : "volatile") };
}
/// Prevents a value or the result of an expression from being optimized away by
/// the compiler adding as little overhead as possible.
///
/// It does not prevent optimizations on the expression generating `x` in any way:
/// the expression might be removed entirely when the result is already known. It
/// forces, however, the result of `x` to be stored in either memory or a register.
#[inline(always)]
fn black_box<T>(x: &T) {
unsafe {
asm!("" // template
: // output operands
: "r"(x) // input operands
// r: any general purpose register
: "memory" // clobbers all memory
: "volatile" // has side-effects
);
}
} This is how you test pub fn test_black_box() {
let mut v = Vec::with_capacity(1);
v.push(42);
test::black_box(v);
}
pub fn google_black_box() {
let mut v = Vec::with_capacity(1);
unsafe { black_box(&*v.as_ptr()) }; // Data to be clobbered
v.push(42);
clobber_memory(); // Force 42 to be written to memory.
} If someone wants to play with these two and our current The main differences between
I think that In any case we should define exactly what |
@gnzlbg this sounds good. could you open a PR to my RFC that changes the guide-level section with more examples and adds impl details to the reference-level explanation? Let's see what others think of it. Note that |
Opened a supplementary PR with @gnzlbg's suggestions at Manishearth#1 I'll merge it in if folks seem to agree on it. |
ea4543f
to
b37b88f
Compare
Thinking again, perhaps Additionally, I agree that startup/teardown should be excluded from the benchmark. Perhaps a trait could be used in place of a closure, like so:
This would allow using closures like the existing implementation, but allow excluding parts from the benchmark by calling |
How does this relate to #1484 ? There's some discussion there about reasons to provide an optimization barrier that don't involve benchmarking. |
We discussed this at the dev-tools meeting today. The consensus was that we prefer #2318 as an alternative - to be clear, that we would not stabilise @rfcbot fcp close |
Team member @nrc has proposed to close this. The next step is review by the rest of the tagged teams:
No concerns currently listed. Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up! See this document for info about what commands tagged team members can give me. |
We already have an issue open for that here: #1484 I've started writing an RFC for it (based on the pull-request to this one). Will post it in internals "soon" and link it here and on that issue, so that those interesting in evolving it can give feedback and we can iterate on it before submitting it. |
@rfcbot reviewed |
Closing, we're going pretty strongly for custom test frameworks. @gnzlbg care to open that RFC for test::black_box? |
@Manishearth ok, the discussion in the tracking issue is frozen so an RFC for those are the next step. |
for anyone who (like myself) discovered this pull request and wants a link to the |
Old tracking issue: rust-lang/rust#29553
Previously: https://internals.rust-lang.org/t/pre-rfc-stabilize-bench-bencher-and-black-box/4565
Previously: https://internals.rust-lang.org/t/bench-status/2122/11
Rendered