-
-
Notifications
You must be signed in to change notification settings - Fork 746
Add dontOptimizeAway to std.datetime.stopwatch #5416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
std/datetime/stopwatch.d
Outdated
| long _ticksElapsed; // Total time that the StopWatch ran before it was stopped last. | ||
| } | ||
|
|
||
| private void doNotOptimizeAway(T)(auto ref T t) @trusted @nogc nothrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to inline this in benchmark because otherwise we'll include getpid in the measurements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so do you want to stop & start the clock all the time?
std/datetime/stopwatch.d
Outdated
| sw.reset(); | ||
| foreach (_; 0 .. n) | ||
| fun[i](); | ||
| doNotOptimizeAway(fun[i]()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a test where fun returns void, that trip me up some time ago
dffd36c to
99f1494
Compare
std/datetime/stopwatch.d
Outdated
| { | ||
| import core.thread : getpid; | ||
| import core.stdc.stdio; | ||
| if(getpid() == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this calls getpid on every loop iteration? This seems a bit silly. What about e.g. benchmarking small kernels, where the cost of a getpid call is very much non-negligible (even with user-space caching)? There might also be some cache effects due to the extra function invocation (although I suppose that if this greatly affects your measurements, they are not very meaningful anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after if :)
This will be nonportable platform-dependent code, of which getpid is the most conservative.
@wilzbach for dmd I think you may simply insert asm {} and dmd will get very conservative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the following idea for the unoptimizable test:
public __gshared void* _13f43be984760828c93138284d104611;
private void doNotOptimizeAway(T)(auto ref T t) @trusted @nogc nothrow
{
if (&t == _13f43be984760828c93138284d104611) { ... }
}Although undocumented and obfuscated, that global is technically public so the compiler must conservatively assume it may point anywhere within the address space, including the address of the parameter.
Works? cc @WalterBright @ibuclaw @klickverbot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correx: the global should have type const void*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andralex: I don't think that would work. Let's say the parameter is allocated on the stack. After inlining doNotOptimizeAway, the compiler might be able to prove that the address of the stack slot doesn't escape, so the global can't be equal to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klickverbot Yah, but the stack is a free-for-all address space and I wonder whether the compiler acknowledges that. Meaning, do compilers assume a global may point to stack-allocated data (from a previous function) even though the current stack data does not escape?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andralex: I think the compiler would be free to make the assumption that this doesn't occur in the C++ object model. As for what actually happens, I just checked with LDC and LLVM 3.9 doesn't seem to assume that the pointers can't alias on its own (without extra metadata from the frontend encoding the fact that it can't).
However, there is another, much bigger problem: Just calling your doNotOptimizeAway on the result of the computation won't guarantee that the actual computation isn't hoisted out of the benchmark loop. In fact, this is something LDC actually does on a few examples I tried.
This is why Chandler suggests clobber in addition to escape – doNotOptimizeAway on its own doesn't prevent code motion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I'm not sure how to go about implementing the latter in pure D, although I suppose if we had something like C++'s . Scratch the part about atomic_signal_fence – whether this is guaranteed to work would again depend on the precise aliasing semantics.)std::atomic_signal_fence, we might be able to reuse that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klickverbot ah, clobber. We should add that too! Thanks!
|
Some inspiration for a minimal overhead solution: https://github.com/facebook/folly/blob/master/folly/Benchmark.h https://github.com/google/benchmark/blob/master/include/benchmark/benchmark_api.h Unfortunately, a solution like this would be compiler-dependent in D as well. |
|
To add to @klickverbot's suggestions, I highly recommend watching this talk on benchmarking. The key point is that in addition to preventing the compiler from deleting the function call altogether, we need to make sure we're not interfering the the measurements (by adding unrelated code) and we're not tempering with the compiler's optimization abilities. One of the techniques shown in presentation is the following: void escape(void* p)
{
asm volatile("" : : "g"(p) : "memory");
}
void clobber()
{
asm volatile("" : : : "memory");
}And as @klickverbot says, we need an analogous solution in D that works with DMD, GDC and LDC. |
|
Why don't we just ask the compiler devs to add a new pragma to the language to solve this? |
|
|
tbh. you don't really need it for dmd :) |
|
On a more serious note: We'd need to control somewhat which optimizations we disable, |
I'd say that's the wrong question to ask. As demonstrated by the C++ benchmark library snippets (I can really recommend Chandler Carruth's talk for non-compiler writers interested in what's going on, by the way), it is perfectly possible to do what we need on both GCC and LLVM. The challenge is rather to appropriately nail down the semantics for such barrier/fake use instructions such that their implications are understandable, while being minimal and portable. |
|
We should add a module called At any rate, portable interfaces for non-portable things are what the stdlib is about, so I'm in favor of adding this. |
|
The barrier itself could be an intrinsic of druntime. Then |
Just aiming for the simplest thing - i.e. make it part of druntime. |
|
It seems like the only things blocking this is the I say we move forward with this. |
|
Maybe this is a stupid question, but is there a reason why it wouldn't work to just mark a function that you don't want optimized away with |
|
Can |
I have no idea. I never use lambdas for benchmarking, because I always end up benchmarking stuff that's too complicated for that to make sense. Function attributes can be used on lambdas though, so it would make sense if a pragma like that can be used on them, and if it can't, then I'd say that it makes good sense to create an enhancement request for it. But if I'm not mistaken, and |
|
Well, even if it did, the argument is that the standard library should do this for the user automatically, i.e. Phobos should make it hard for the user to get the wrong result in their benchmark. |
99f1494 to
b9ae58c
Compare
2410a32 to
94337d0
Compare
… away its functions
94337d0 to
25c3ef4
Compare
|
BTW there's also a very good explanation of https://godbolt.org/g/WRFWNv (as you can see
Yeah, in theory future implementation could optimize
|
@wilzbach - Luckily we have attributes for asm statements. In gdc, all asm blocks prevent optimizations unless you mark it as |
|
Can we make this forward to a new set of druntime intrinsics or something along the lines instead? The current PR is (obviously) x86-only for LDC, and GDC just doesn't support it. We'd like to avoid having to patch Phobos downstream as much as possible. |
|
Btw, on a quick look it seems all these solutions are meant to prevent dead code elimination. However, is there any solution to prevent 'result caching' for foreach (_; 0 .. n)
{
escape(fun[i]());
clobber();
}Couldn't this still be rewritten to this: auto tmp = fun[i]();
foreach (_; 0 .. n)
{
escape(tmp);
clobber();
}
? |
|
That's not true for gdc. If a function is strong pure (i.e. pure in the GCC backend sense), the GCC backend heavily optimizes calls to such functions including caching, merging calls, etc. (Of course the compiler must be able to 'see' two identical calls, but using inlining and maybe even lto compilers are quite good at that nowadays) |
|
Thinking about this some more, clobbering (all!) input arguments before calling the function should work for pure functions and even inlining problems. As the compiler no longer knows the exact values of arguments or whether these changed since the last iteration, it can not assume anything about the return value. So it has to always execute the complete function. (Or when inlining, the part of the function actually depending on the input value) @ibuclaw does the memory clobber ( However, |
|
Why was this merged? |
|
Because it was approved two weeks ago. |
|
...and because I didn't read carefully enough. @ibuclaw @klickverbot My apologies.
With this soft opening, I mistook it as idle musing rather than a verdict on the viability of the PR. Can I get away with blaming this on cross-Atlantic differences in communication style? |
I only see one approval. Two weeks and one year ago. :-) |
As mentioned on #5367, it would be great if the benchmarking function wouldn't allow compilers to optimize its to-be-tested functions away.
There are a couple of tricks to avoid this behavior, at least:
__gshareddoNotOptimizeAwaytrick (see below or Parameterized unittests and benchmarks #2995)@optStrategy("none)in LDCI guess this might be a controversial change, so I am very interested to hear your opinions. What strategy to avoid the optimization do you prefer?