Add dontOptimizeAway to std.datetime.stopwatch #5416

wilzbach · 2017-05-22T17:38:28Z

As mentioned on #5367, it would be great if the benchmarking function wouldn't allow compilers to optimize its to-be-tested functions away.

There are a couple of tricks to avoid this behavior, at least:

assigning return value to __gshared
using the doNotOptimizeAway trick (see below or Parameterized unittests and benchmarks #2995)
@optStrategy("none) in LDC
...

I guess this might be a controversial change, so I am very interested to hear your opinions. What strategy to avoid the optimization do you prefer?

PetarKirov · 2017-05-22T17:56:15Z

std/datetime/stopwatch.d

    long _ticksElapsed;    // Total time that the StopWatch ran before it was stopped last.
 }

+private void doNotOptimizeAway(T)(auto ref T t) @trusted @nogc nothrow


I think we need to inline this in benchmark because otherwise we'll include getpid in the measurements.

Hmm, so do you want to stop & start the clock all the time?

burner · 2017-05-23T10:24:18Z

std/datetime/stopwatch.d

        sw.reset();
        foreach (_; 0 .. n)
-            fun[i]();
+            doNotOptimizeAway(fun[i]());


please add a test where fun returns void, that trip me up some time ago

dnadlinger · 2017-05-26T11:14:13Z

std/datetime/stopwatch.d

+{
+    import core.thread : getpid;
+    import core.stdc.stdio;
+    if(getpid() == 1)


So this calls getpid on every loop iteration? This seems a bit silly. What about e.g. benchmarking small kernels, where the cost of a getpid call is very much non-negligible (even with user-space caching)? There might also be some cache effects due to the extra function invocation (although I suppose that if this greatly affects your measurements, they are not very meaningful anyway).

space after if :)

This will be nonportable platform-dependent code, of which getpid is the most conservative.

@wilzbach for dmd I think you may simply insert asm {} and dmd will get very conservative.

I have the following idea for the unoptimizable test:

public __gshared void* _13f43be984760828c93138284d104611; private void doNotOptimizeAway(T)(auto ref T t) @trusted @nogc nothrow { if (&t == _13f43be984760828c93138284d104611) { ... } }

Although undocumented and obfuscated, that global is technically public so the compiler must conservatively assume it may point anywhere within the address space, including the address of the parameter.

Works? cc @WalterBright @ibuclaw @klickverbot

Correx: the global should have type const void*

@andralex: I don't think that would work. Let's say the parameter is allocated on the stack. After inlining doNotOptimizeAway, the compiler might be able to prove that the address of the stack slot doesn't escape, so the global can't be equal to it.

@klickverbot Yah, but the stack is a free-for-all address space and I wonder whether the compiler acknowledges that. Meaning, do compilers assume a global may point to stack-allocated data (from a previous function) even though the current stack data does not escape?

@andralex: I think the compiler would be free to make the assumption that this doesn't occur in the C++ object model. As for what actually happens, I just checked with LDC and LLVM 3.9 doesn't seem to assume that the pointers can't alias on its own (without extra metadata from the frontend encoding the fact that it can't).

However, there is another, much bigger problem: Just calling your doNotOptimizeAway on the result of the computation won't guarantee that the actual computation isn't hoisted out of the benchmark loop. In fact, this is something LDC actually does on a few examples I tried.

This is why Chandler suggests clobber in addition to escape – doNotOptimizeAway on its own doesn't prevent code motion.

(I'm not sure how to go about implementing the latter in pure D, ~~although I suppose if we had something like C++'s std::atomic_signal_fence, we might be able to reuse that~~. Scratch the part about atomic_signal_fence – whether this is guaranteed to work would again depend on the precise aliasing semantics.)

@klickverbot ah, clobber. We should add that too! Thanks!

dnadlinger · 2017-05-26T11:16:08Z

Some inspiration for a minimal overhead solution: https://github.com/facebook/folly/blob/master/folly/Benchmark.h https://github.com/google/benchmark/blob/master/include/benchmark/benchmark_api.h

Unfortunately, a solution like this would be compiler-dependent in D as well.

PetarKirov · 2017-05-27T12:24:29Z

To add to @klickverbot's suggestions, I highly recommend watching this talk on benchmarking. The key point is that in addition to preventing the compiler from deleting the function call altogether, we need to make sure we're not interfering the the measurements (by adding unrelated code) and we're not tempering with the compiler's optimization abilities.
Otherwise, we'll be measuring something different from the real world.

One of the techniques shown in presentation is the following:

void escape(void* p)
{
    asm volatile("" : : "g"(p) : "memory");
}

void clobber()
{
    asm volatile("" : : : "memory");
}

And as @klickverbot says, we need an analogous solution in D that works with DMD, GDC and LDC.

JackStouffer · 2017-05-27T15:42:28Z

Why don't we just ask the compiler devs to add a new pragma to the language to solve this?

wilzbach · 2017-05-27T15:51:23Z

Why don't we just ask the compiler devs to add a new pragma to the language to solve this?

CC @WalterBright @yebblies @UplinkCoder @ibuclaw

UplinkCoder · 2017-05-27T16:00:34Z

tbh. you don't really need it for dmd :)
doing an empty switch in a function will disable the inliner.

UplinkCoder · 2017-05-27T16:59:30Z

On a more serious note: We'd need to control somewhat which optimizations we disable,
Which is possible for DMD, but I don't know about LDC / GDC.
How much control do you guys have about the optimizer ?

dnadlinger · 2017-05-27T18:20:07Z

How much control do you guys have about the optimizer ?

I'd say that's the wrong question to ask.

As demonstrated by the C++ benchmark library snippets (I can really recommend Chandler Carruth's talk for non-compiler writers interested in what's going on, by the way), it is perfectly possible to do what we need on both GCC and LLVM.

The challenge is rather to appropriately nail down the semantics for such barrier/fake use instructions such that their implications are understandable, while being minimal and portable.

andralex · 2017-06-04T16:22:11Z

We should add a module called benchmark of which doNotOptimizeAway would be a primitive. One question is whether this should go in druntime instead of phobos. It is highly platform-dependent so that would indicate druntime as the best place.

At any rate, portable interfaces for non-portable things are what the stdlib is about, so I'm in favor of adding this.

ibuclaw · 2017-06-06T14:54:17Z

The barrier itself could be an intrinsic of druntime. Then doNotOptimizeAway can still remain in phobos, itself just a user of the intrinsic.

andralex · 2017-06-06T16:06:42Z

The barrier itself could be an intrinsic of druntime. Then doNotOptimizeAway can still remain in phobos, itself just a user of the intrinsic.

Just aiming for the simplest thing - i.e. make it part of druntime.

JackStouffer · 2018-03-07T14:17:07Z

It seems like the only things blocking this is the getpid problem and an issue of future implementation. D doesn't support any platforms atm where the getpid call would be expensive. Secondly, since this is a private function, there's no reason this can't be swapped out when a better option presents itself.

I say we move forward with this.

jmdavis · 2018-03-07T14:34:44Z

Maybe this is a stupid question, but is there a reason why it wouldn't work to just mark a function that you don't want optimized away with pragma(inline, false)? From what I've seen, that prevents the function from being optimized out.

JackStouffer · 2018-03-07T14:36:42Z

Can pragma(inline, false) be applied to lambda's? That's a very common use-case for benchmark.

jmdavis · 2018-03-07T14:45:34Z

Can pragma(inline, false) be applied to lambda's? That a very common use-case for benchmark.

I have no idea. I never use lambdas for benchmarking, because I always end up benchmarking stuff that's too complicated for that to make sense. Function attributes can be used on lambdas though, so it would make sense if a pragma like that can be used on them, and if it can't, then I'd say that it makes good sense to create an enhancement request for it. But if I'm not mistaken, and pragma(inline, false) actually does the job properly, it seems to me like a nice, simple solution to this problem.

JackStouffer · 2018-03-07T15:02:02Z

Well, even if it did, the argument is that the standard library should do this for the user automatically, i.e. Phobos should make it hard for the user to get the wrong result in their benchmark.

dlang-bot · 2018-03-11T21:36:02Z

Thanks for your pull request, @wilzbach!

Bugzilla references

Auto-close	Bugzilla	Severity	Description
✓	18593	enhancement	std.datetime.stopwatch.benchmark shouldn't optimize away its functions

… away its functions

wilzbach · 2018-03-11T21:53:03Z

BTW there's also a very good explanation of clobber, escape and doNotOptimizeAway on StackOverflow.
However, it seems like asm allows an even easier solution:

asm
{
    call [fun + i];
    ret;
}

https://godbolt.org/g/WRFWNv (as you can see pragma(inline) doesn't work)

an issue of future implementation.

Yeah, in theory future implementation could optimize asm away, but

there's no overhead with this change anymore
`benchmark at the moment doesn't guarantee that it avoids optimization
dmd will never get so smart
if ldc or gdc ever get so smart, they could special-case this function or provide a custom API/method to avoid optimization, but as far as I see it, it's a problem of the future.

ibuclaw · 2018-03-11T21:59:03Z

if ldc or gdc ever get so smart, they could special-case this function or provide a custom API/method to avoid optimization, but as far as I see it, it's a problem of the future.

@wilzbach - Luckily we have attributes for asm statements. In gdc, all asm blocks prevent optimizations unless you mark it as pure.

dnadlinger · 2018-03-11T22:04:08Z

Can we make this forward to a new set of druntime intrinsics or something along the lines instead? The current PR is (obviously) x86-only for LDC, and GDC just doesn't support it. We'd like to avoid having to patch Phobos downstream as much as possible.

jpf91 · 2018-03-11T22:12:10Z

Btw, on a quick look it seems all these solutions are meant to prevent dead code elimination. However, is there any solution to prevent 'result caching' for pure functions or inlined functions?

foreach (_; 0 .. n)
{
     escape(fun[i]());
     clobber();
}

Couldn't this still be rewritten to this:

auto tmp = fun[i]();
foreach (_; 0 .. n)
{
     escape(tmp);
     clobber();
}
?

jmdavis · 2018-03-11T22:18:15Z

However, is there any solution to prevent 'result caching' for pure functions or inlined functions?

pure is a non-issue so long as the function isn't called multiple times within the same statement. As I understand it, for better or worse, the results of a pure function are never really cached. The current statement may be adjusted such that the pure function is only called once, but once multiple statements are involved, the pure function will be called multiple times.

jpf91 · 2018-03-11T22:36:28Z

pure is a non-issue so long as the function isn't called multiple times within the same statement. As I understand it, for better or worse, the results of a pure function are never really cached.

That's not true for gdc. If a function is strong pure (i.e. pure in the GCC backend sense), the GCC backend heavily optimizes calls to such functions including caching, merging calls, etc. (Of course the compiler must be able to 'see' two identical calls, but using inlining and maybe even lto compilers are quite good at that nowadays)

jpf91 · 2018-03-12T07:39:45Z

Thinking about this some more, clobbering (all!) input arguments before calling the function should work for pure functions and even inlining problems. As the compiler no longer knows the exact values of arguments or whether these changed since the last iteration, it can not assume anything about the return value. So it has to always execute the complete function. (Or when inlining, the part of the function actually depending on the input value)

@ibuclaw does the memory clobber (asm volatile("" : : : "memory");) already take care of this, or do we explicitly have to 'write' the arguments in another asm block?

However, benchmark only uses functions which do not take any parameters. If these are pure (could also be inferred), then by definition any call is expected to return exactly the same value. So we'd need a different approach to defeat compiler optimizations there.

ibuclaw · 2018-06-13T19:54:20Z

Why was this merged?

n8sh · 2018-06-13T19:59:31Z

Because it was approved two weeks ago.

n8sh · 2018-06-13T20:17:01Z

...and because I didn't read carefully enough. @ibuclaw @klickverbot My apologies.

Can we make this forward to a new set of druntime intrinsics or something along the lines instead?

With this soft opening, I mistook it as idle musing rather than a verdict on the viability of the PR. Can I get away with blaming this on cross-Atlantic differences in communication style?

ibuclaw · 2018-06-13T20:58:14Z

Because it was approved two weeks ago.

I only see one approval. Two weeks and one year ago. :-)

wilzbach requested review from burner and jmdavis May 22, 2017 17:38

PetarKirov reviewed May 22, 2017

View reviewed changes

burner requested changes May 23, 2017

View reviewed changes

wilzbach force-pushed the dontoptimizeaway branch from dffd36c to 99f1494 Compare May 23, 2017 11:55

JackStouffer added the Severity:Enhancement label May 25, 2017

burner approved these changes May 26, 2017

View reviewed changes

dnadlinger reviewed May 26, 2017

View reviewed changes

wilzbach mentioned this pull request Jun 4, 2017

std.algorithm.searching: minmaxElement #4248

Closed

dlang-bot added the Merge:stalled label Sep 5, 2017

dlang-bot added Merge:Needs Rebase and removed Merge:Needs Rebase labels Jan 1, 2018

wilzbach mentioned this pull request Jan 29, 2018

Modified the behavior of std.stdio.LockingTextWriter.put #6092

Merged

wilzbach mentioned this pull request Feb 8, 2018

Issue 18398 - std.datetime.stopwatch documented examples could be better #6139

Merged

wilzbach force-pushed the dontoptimizeaway branch from 99f1494 to b9ae58c Compare March 11, 2018 21:36

dlang-bot removed the Merge:stalled label Mar 11, 2018

wilzbach force-pushed the dontoptimizeaway branch 2 times, most recently from 2410a32 to 94337d0 Compare March 11, 2018 21:44

Fix Issue 18593 - std.datetime.stopwatch.benchmark shouldn't optimize…

25c3ef4

… away its functions

wilzbach force-pushed the dontoptimizeaway branch from 94337d0 to 25c3ef4 Compare March 11, 2018 21:47

wilzbach mentioned this pull request Mar 11, 2018

Fix assertion failure in datetime.stopwatch when compiling with -O2 D-Programming-GDC/gdc#639

Merged

wilzbach mentioned this pull request Apr 2, 2018

Fix Issue 18704 - std/datetime/stopwatch.d(452) assertion failure on auto-tester #6404

Merged

n8sh added the Merge:auto-merge label Jun 13, 2018

dlang-bot merged commit e257ce1 into dlang:master Jun 13, 2018

n8sh mentioned this pull request Jun 13, 2018

Revert "Add dontOptimizeAway to std.datetime.stopwatch" #6582

Merged

wilzbach deleted the dontoptimizeaway branch August 22, 2018 14:57

Uh oh!

Add dontOptimizeAway to std.datetime.stopwatch #5416

Add dontOptimizeAway to std.datetime.stopwatch #5416

Uh oh!

Conversation

wilzbach commented May 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnadlinger Jun 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnadlinger Jun 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dnadlinger commented May 26, 2017

Uh oh!

PetarKirov commented May 27, 2017

Uh oh!

JackStouffer commented May 27, 2017

Uh oh!

wilzbach commented May 27, 2017

Uh oh!

UplinkCoder commented May 27, 2017

Uh oh!

UplinkCoder commented May 27, 2017

Uh oh!

dnadlinger commented May 27, 2017

Uh oh!

andralex commented Jun 4, 2017

Uh oh!

ibuclaw commented Jun 6, 2017

Uh oh!

andralex commented Jun 6, 2017

Uh oh!

JackStouffer commented Mar 7, 2018

Uh oh!

jmdavis commented Mar 7, 2018

Uh oh!

JackStouffer commented Mar 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jmdavis commented Mar 7, 2018

Uh oh!

JackStouffer commented Mar 7, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dlang-bot commented Mar 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bugzilla references

Uh oh!

wilzbach commented Mar 11, 2018

Uh oh!

ibuclaw commented Mar 11, 2018

Uh oh!

dnadlinger commented Mar 11, 2018

Uh oh!

jpf91 commented Mar 11, 2018

Uh oh!

wilzbach commented May 22, 2017 •

edited

Loading

dnadlinger Jun 8, 2017 •

edited

Loading

dnadlinger Jun 8, 2017 •

edited

Loading

JackStouffer commented Mar 7, 2018 •

edited

Loading

JackStouffer commented Mar 7, 2018 •

edited

Loading

dlang-bot commented Mar 11, 2018 •

edited

Loading