std.algorithm.searching: minmaxElement #4248

wilzbach · 2016-04-27T02:17:08Z

follow-up to #4221
An efficient combination of {min,max}Element.

Can be found in C++ as well:

http://en.cppreference.com/w/cpp/algorithm/minmax_element

9il · 2016-04-28T07:51:26Z

std/algorithm/searching.d

+    return minmaxElement!(map, selector)(r, seed);
+}
+
+private auto minmaxElement(alias map = "a", alias selector = "a < b", Range,


Why not to use common map instead?

Prototype with 2 seeds, one for max and one for min looks more useful

Why we need this function? is it faster then reduce!(min, max)?

EDIT: 3. map!(...).cache.reduce!(min, max)

wilzbach · 2016-04-28T09:00:49Z

@9il - the idea comes from #4221 and @nordlow and I do prefer .reduce(min, max) too.
As mentioned in #4221 the problem is that having a map function will yield only the mapped minima.

If I think about it should be possible to templatize extremum, but on the other hand it will yield more complications and there is only min, max.

9il · 2016-04-28T09:06:22Z

@9il - the idea comes from #4221 and @nordlow and I do prefer .reduce(min, max) too.
As mentioned in #4221 the problem is that having a map function will yield only the mapped minima.

I really don't understand. Could you please give an example where minmaxElement would be faster/better?

DmitryOlshansky · 2016-04-28T09:10:40Z

Could you please give an example where minmaxElement would be faster/better?

It all comes down to having the original element not the one mapped by a predicate.
So that e.g. finding minimum a record by sub-field yields the record which has minimum sub-field not the minimum subfield value.

wilzbach · 2016-04-28T09:13:06Z

I really don't understand. Could you please give an example where minmaxElement would be faster/better?

It will be faster than calling extremum twice ;-)
Btw for future readers as explained at #4221, it's not possible to get the element before map back, e.g.

assert([3, 4, 5, 1, 2].enumerate.minmaxElement!"a.value" ==
                                      tuple(tuple(3, 1), tuple(2, 5)));

vs.

[3, 4, 5, 1, 2].map!"a.value".reduce!(min, max) == tuple(1, 5);

9il · 2016-04-28T09:23:01Z

Why not this:

alias minValue = (a, b) => a.value < b.value ? a : b;
alias maxValue = (a, b) => a.value > b.value ? a : b;
assert([3, 4, 5, 1, 2].enumerate.reduce!(minValue, maxValue) ==  tuple(tuple(3, 1), tuple(2, 5)));

?

wilzbach · 2016-04-28T09:33:03Z

Why not this:

Same reason as for #4221 applies (map function is evaluated twice), but let's focus the discussion on #4221.

9il · 2016-04-28T13:21:27Z

could you please add prototype with 2 seeds?

wilzbach · 2016-04-28T13:34:29Z

could you please add prototype with 2 seeds?

done - I removed the one seed prototype as it didn't make sense to me anymore.

DmitryOlshansky · 2016-04-29T08:23:20Z

Needs a changelog entry.

wilzbach · 2016-04-29T08:25:10Z

Needs a changelog entry.

I know, already noted - I hope that we can make the transition to the changelog folder soon. This needs the @andralex approval anyways ;-)

wilzbach · 2016-04-29T08:28:35Z

Please add @andralex label
Should I add the optimization for the noop map function case (=no mapping) to this function too?

JackStouffer · 2016-04-29T15:07:51Z

std/algorithm/searching.d

+}
+
+///
+//@safe pure unittest


Fix this and the stdio import

JackStouffer · 2016-04-29T15:12:05Z

I think this function is of limited value. But, because there is an STL implementation adding this will help people transitioning from C++.

LGTM sans comments

JackStouffer · 2016-04-29T15:15:14Z

std/algorithm/searching.d

+If the extreme element occurs multiple time, the first occurrence will be
+returned.
+This function is more efficient than calling both $(LREF minElement) and
+$(LREF maxElement).


Could go into a little more detail here. I think you should add something like

This function is more efficient than calling both $(LREF minElement) and $(LREF maxElement) for one range because this function only requires one scan of the range, whereas the former takes two. Also, calling both $(LREF minElement) and $(LREF maxElement) on the same range would require it to be a forward range.

This would help people understand this function's benefits more clearly.

More precisely, we should provide guarantees similar to the C++ version - per http://en.cppreference.com/w/cpp/algorithm/minmax_element: "At most max(floor(3/2(N−1)), 0) applications of the predicate, where N = std::distance(first, last)."

andralex · 2017-03-05T21:17:29Z

Ah, cool, I'd forgotten about the algorithmic trick. But I don't seem to see you applying it. The pattern goes:

if (r[i] < r[i + 1])
{
    if (r[i] < r[min]) min = i;
    if (r[i + 1] > r[max]) max = i + 1;
}
else
{
    if (r[i + 1] < r[min]) min = i + 1;
    if (r[i] > r[max]) max = i;
}

It's actually one of the FB interview questions :). So yes I approve the addition, but for the life of me I don't see where you implement the correct algorithm above.

wilzbach · 2017-05-05T15:06:16Z

Ah, cool, I'd forgotten about the algorithmic trick. But I don't seem to see you applying it. The pattern goes:

Yeah I tried it (see benchmarks below) and the "stupid" way seems to be a lot faster.
FWIW in C++ GCC and CLang your pattern is used as well:

https://github.com/gcc-mirror/gcc/blob/gcc-7_1_0-release/libstdc%2B%2B-v3/include/bits/stl_algo.h#L3332
https://github.com/llvm-mirror/libcxx/blob/release_40/include/algorithm#L2662

I currently don't have time to dive more into it, but here are the benchmarks to compare the naive version and iteration in pairs:

RandomAccess

ldc -release -O5 -mcpu=native test.d && ./test
reduce!(min,max) = 7 secs, 722 ms, 265 μs, and 4 hnsecs
fold.minMax     = 5 secs, 770 ms, and 427 μs
minmaxElement   = 5 secs, 410 ms, and 31 μs
minmaxElementInPairs = 15 secs, 362 ms, 724 μs, and 3 hnsecs

InputRange

>  ldc -release -O5 -mcpu=native test.d && ./test
reduce!(min,max) = 7 secs, 215 ms, 601 μs, and 1 hnsec
fold.minMax     = 6 secs, 507 ms, 278 μs, and 5 hnsecs
minmaxElement   = 6 secs, 361 ms, 248 μs, and 8 hnsecs
minmaxElementInPairs = 12 secs, 344 ms, 548 μs, and 3 hnsecs

(updated code from above).

andralex · 2017-06-03T23:46:16Z

Heh, thanks @wilzbach. I've reproed your measurements. This is an interesting result. I'm not sure exactly what's going on yet. Take a look at https://godbolt.org/g/mXj3uW. There we have:

minmaxElementNoMap the brute force version. From what I can see ldc does some really awesome loop unrolling there,
minmaxElementNoMap2 cleverer version that saves on comparisons under the assumption that if something is smaller than the smallest, it can't be greater than the greatest. So it does statistically between n and 2 * n comparisons. I measured no performance difference on any input. I ascribe this to the fact that the comparisons are done in parallel. LDC again generates interesting code, though very different from the first!
minmaxElementNoMapInPairs the classic algorithm that does 3n/2 comparisons. It is indeed slower! LDC seems to generate much more conservative code.
minmaxElementNoMapInPairs2 an improved version that has a tighter loop. It does improve the situation but not by much.

I'd say let's add minMaxElement with the guaranteed 3n/2 comparisons (which may be arbitrarily expensive); otherwise there is no merit to it over reduce!(min, max). We may specialize it for certain data types and the default comparison to take the brute force approach.

Compiler experts @ibuclaw @JohanEngelen @klickverbot please take a look!

FWIW dmd generates equally good/bad code for minmaxElementNoMap2 and minmaxElementNoMapInPairs2. Didn't try gdc yet.

dnadlinger · 2017-06-04T01:06:17Z

@andralex: Do you have a benchmark script for your experiments?

wilzbach · 2017-06-04T01:11:05Z

@andralex: Do you have a benchmark script for your experiments?

@klickverbot: sorry that my link was so hidden at the end of the post. It's here:

https://gist.github.com/wilzbach/3407d80bfa757d46a3ac59a873d5f085

dnadlinger · 2017-06-04T01:36:02Z

@wilzbach: Thanks, but I was referring to Andrei's experiments in particular because I'm lazy. I guess I need to copy-paste over his code myself after all… ;)

andralex · 2017-06-04T03:25:08Z

@klickverbot pasted the mess here: https://dpaste.dzfl.pl/09fbcf17f932

JohanEngelen · 2017-06-04T08:00:46Z

@andralex For the InPairs implementations, shouldn't the loop advance in pairs (+2)? (and then some extra work for odd length)

andralex · 2017-06-04T13:34:59Z

@JohanEngelen I ignored the odd elements case, it has no bearing on measuring efficiency. Per lines 124 and 202, I advance in 2 increments and look at i - 1 and i in one pass. If you find any bug I'd be relieved! I'm getting results that are difficult to interpret.

andralex · 2017-06-04T13:49:57Z

std/algorithm/searching.d

+{
+    alias mapFun = unaryFun!map;
+    alias selectorFun = binaryFun!selector;
+


assert(!selector(maxSeed, minSeed));

andralex · 2017-06-04T13:52:14Z

std/algorithm/searching.d

+            {
+                maxElement = r[i];
+                maxElementMapped = mapElement;
+            }


So here we should use the 3n/2 algorithm, even if technically slower for < and int. One great thing to do would be to specialize for this (and a few other) cases.

andralex · 2017-06-04T13:54:11Z

std/algorithm/searching.d

+            MapType mapElement1 = mapFun(rawElement1);
+            r.popFront();
+            // check if the range had an uneven amount of elements and thus has ended
+            if (r.empty)


BUG: must return at the end of this if.

andralex · 2017-06-04T14:05:06Z

@wilzbach one more thing - where did you see the getpid trick in doNotOptimizeAway? I've seen it elsewhere, too, but forgot where. Thx!

wilzbach · 2017-06-04T15:44:17Z

@wilzbach one more thing - where did you see the getpid trick in doNotOptimizeAway? I've seen it elsewhere, too, but forgot where.

Ideally we get something like this trick into Phobos, s.t. users don't have to worry about it:

#5416

JohanEngelen · 2017-06-05T10:23:35Z

@andralex The code you linked to (#4248 (comment), https://godbolt.org/g/mXj3uW ) does not do the +2.

JohanEngelen · 2017-06-05T10:31:06Z

@andralex Did you trying "caching" r.length for code like this:

for (size_t i = 0; i < r.length; i += 2)
{
    uint j = selectorFun(r[i], r[i + 1]);

in case the compiler cannot/doesnot deduce that selectorFun is not touching r.length?

andralex · 2017-06-05T17:19:32Z

@JohanEngelen tried that, makes no difference.

BUT! I found what seems to be an interesting performance bug. I stripped minmaxElementInPairsNoMap all the way down to this core loop:

    for (size_t i = 0; i < r.length; i += 2)
    {
    }

Even if it literally does nothing, it still takes more than 2 times longer than minmaxElementNoMap. If I change it to:

    for (size_t i = 0; i < r.length; i += 1)
    {
    }

So ldc does not generate good code for loops that advance in a non-unit increment. I think you'd improve the life of many if you looked into that!

JohanEngelen · 2017-06-05T18:59:37Z

@andralex That's because of integer overflow. If r.length == size_t.max (an odd number!), it's an infinite loop.
See https://godbolt.org/g/5RFNdt . With the early r.length == size_t.max return path, GDC is able to optimize all out. LDC isn't. (I filed ldc-developers/ldc#2154 )

Edit: This kind of stuff is a lot of fun to work on @andralex ! Especially with such supertiny test cases. Hope you remember to file such things in our bugtracker ;-) ;-)

andralex · 2017-06-06T18:36:10Z

@JohanEngelen fwiw I changed the implementation to use ++i for iteration up to r.length / 2 and used 2 * i and 2 * i + 1 as adjacent elements. Still no improvement.

@wilzbach so let's stay with the 3n/2 algo for this PR. Works?

wilzbach · 2018-06-06T16:43:11Z

I never needed this and lost interest in pursuing this PR. Sorry

wilzbach mentioned this pull request Apr 27, 2016

std.algorithm: {min,max}Element for a single range #4221

Merged

9il reviewed Apr 28, 2016
View reviewed changes

9il added the Review:Needs Decision label Apr 28, 2016

DmitryOlshansky removed the Review:Needs Decision label Apr 28, 2016

9il added the Review:Needs Decision label Apr 28, 2016

9il added ndslice and removed Review:Needs Decision ndslice labels Apr 28, 2016

wilzbach force-pushed the minmaxElement branch 2 times, most recently from b612065 to 3e208f4 Compare April 28, 2016 13:33

wilzbach force-pushed the minmaxElement branch 3 times, most recently from 3c2bad9 to cf06f65 Compare April 28, 2016 17:24

DmitryOlshansky added the @andralex label Apr 29, 2016

JackStouffer reviewed Apr 29, 2016
View reviewed changes

std/algorithm/searching.d Outdated

}

///

//@safe pure unittest

Copy link

Contributor

JackStouffer Apr 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this and the stdio import

JackStouffer reviewed Apr 29, 2016
View reviewed changes

andralex removed the Merge:Blocked label Mar 5, 2017

wilzbach added the Review:Needs Work label Mar 5, 2017

std.algorithm.searching: minmaxElement

c902f96

wilzbach force-pushed the minmaxElement branch from cf06f65 to c902f96 Compare May 5, 2017 14:52

andralex requested changes Jun 4, 2017

View reviewed changes

JohanEngelen mentioned this pull request Jun 5, 2017

Missed optimization opportunity by deduction problem on +=2 loop. ldc-developers/ldc#2154

Open

dlang-bot added the Merge:stalled label Sep 2, 2017

dlang-bot added Merge:Needs Rebase and removed Merge:Needs Rebase labels Jan 1, 2018

wilzbach added the Review:Orphaned The author of the PR is no longer available and this PR can be adopted by anyone. label Jun 6, 2018

wilzbach closed this Jun 6, 2018

Uh oh!

std.algorithm.searching: minmaxElement #4248

std.algorithm.searching: minmaxElement #4248

Uh oh!

Conversation

wilzbach commented Apr 27, 2016

Uh oh!

9il Apr 28, 2016

Choose a reason for hiding this comment

Uh oh!

9il Apr 28, 2016

Choose a reason for hiding this comment

Uh oh!

wilzbach commented Apr 28, 2016

Uh oh!

9il commented Apr 28, 2016

Uh oh!

DmitryOlshansky commented Apr 28, 2016

Uh oh!

wilzbach commented Apr 28, 2016

Uh oh!

9il commented Apr 28, 2016

Uh oh!

wilzbach commented Apr 28, 2016

Uh oh!

9il commented Apr 28, 2016

Uh oh!

wilzbach commented Apr 28, 2016

Uh oh!

DmitryOlshansky commented Apr 29, 2016

Uh oh!

wilzbach commented Apr 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wilzbach commented Apr 29, 2016

Uh oh!

JackStouffer Apr 29, 2016

Choose a reason for hiding this comment

Uh oh!

JackStouffer commented Apr 29, 2016

Uh oh!

JackStouffer Apr 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andralex Jun 4, 2017

Choose a reason for hiding this comment

Uh oh!

andralex commented Mar 5, 2017

Uh oh!

wilzbach commented May 5, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RandomAccess

InputRange

Uh oh!

andralex commented Jun 3, 2017

Uh oh!

dnadlinger commented Jun 4, 2017

Uh oh!

wilzbach commented Jun 4, 2017

Uh oh!

dnadlinger commented Jun 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andralex commented Jun 4, 2017

Uh oh!

JohanEngelen commented Jun 4, 2017

Uh oh!

andralex commented Jun 4, 2017

Uh oh!

andralex Jun 4, 2017

Choose a reason for hiding this comment

Uh oh!

andralex Jun 4, 2017

Choose a reason for hiding this comment

Uh oh!

andralex Jun 4, 2017

Choose a reason for hiding this comment

Uh oh!

andralex commented Jun 4, 2017

wilzbach commented Apr 29, 2016 •

edited

Loading

JackStouffer Apr 29, 2016 •

edited

Loading

wilzbach commented May 5, 2017 •

edited

Loading

dnadlinger commented Jun 4, 2017 •

edited

Loading

JohanEngelen commented Jun 5, 2017 •

edited

Loading

JohanEngelen commented Jun 5, 2017 •

edited

Loading

wilzbach commented Jun 6, 2018 •

edited

Loading