Skip to content

Conversation

@wilzbach
Copy link
Contributor

follow-up to #4221
An efficient combination of {min,max}Element.

Can be found in C++ as well:

http://en.cppreference.com/w/cpp/algorithm/minmax_element

Ping @nordlow.

return minmaxElement!(map, selector)(r, seed);
}

private auto minmaxElement(alias map = "a", alias selector = "a < b", Range,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why not to use common map instead?
  2. Prototype with 2 seeds, one for max and one for min looks more useful
  3. Why we need this function? is it faster then reduce!(min, max)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: 3. map!(...).cache.reduce!(min, max)

@wilzbach
Copy link
Contributor Author

@9il - the idea comes from #4221 and @nordlow and I do prefer .reduce(min, max) too.
As mentioned in #4221 the problem is that having a map function will yield only the mapped minima.

If I think about it should be possible to templatize extremum, but on the other hand it will yield more complications and there is only min, max.

@9il
Copy link
Member

9il commented Apr 28, 2016

@9il - the idea comes from #4221 and @nordlow and I do prefer .reduce(min, max) too.
As mentioned in #4221 the problem is that having a map function will yield only the mapped minima.

I really don't understand. Could you please give an example where minmaxElement would be faster/better?

@DmitryOlshansky
Copy link
Member

Could you please give an example where minmaxElement would be faster/better?

It all comes down to having the original element not the one mapped by a predicate.
So that e.g. finding minimum a record by sub-field yields the record which has minimum sub-field not the minimum subfield value.

@wilzbach
Copy link
Contributor Author

I really don't understand. Could you please give an example where minmaxElement would be faster/better?

It will be faster than calling extremum twice ;-)
Btw for future readers as explained at #4221, it's not possible to get the element before map back, e.g.

assert([3, 4, 5, 1, 2].enumerate.minmaxElement!"a.value" ==
                                      tuple(tuple(3, 1), tuple(2, 5)));

vs.

[3, 4, 5, 1, 2].map!"a.value".reduce!(min, max) == tuple(1, 5);

@9il
Copy link
Member

9il commented Apr 28, 2016

Why not this:

alias minValue = (a, b) => a.value < b.value ? a : b;
alias maxValue = (a, b) => a.value > b.value ? a : b;
assert([3, 4, 5, 1, 2].enumerate.reduce!(minValue, maxValue) ==  tuple(tuple(3, 1), tuple(2, 5)));

?

@wilzbach
Copy link
Contributor Author

Why not this:

Same reason as for #4221 applies (map function is evaluated twice), but let's focus the discussion on #4221.

@9il
Copy link
Member

9il commented Apr 28, 2016

could you please add prototype with 2 seeds?

@wilzbach wilzbach force-pushed the minmaxElement branch 2 times, most recently from b612065 to 3e208f4 Compare April 28, 2016 13:33
@wilzbach
Copy link
Contributor Author

could you please add prototype with 2 seeds?

done - I removed the one seed prototype as it didn't make sense to me anymore.

@wilzbach wilzbach force-pushed the minmaxElement branch 3 times, most recently from 3c2bad9 to cf06f65 Compare April 28, 2016 17:24
@DmitryOlshansky
Copy link
Member

Needs a changelog entry.

@wilzbach
Copy link
Contributor Author

wilzbach commented Apr 29, 2016

Needs a changelog entry.

I know, already noted - I hope that we can make the transition to the changelog folder soon. This needs the @andralex approval anyways ;-)

@wilzbach
Copy link
Contributor Author

  1. Please add @andralex label
  2. Should I add the optimization for the noop map function case (=no mapping) to this function too?

}

///
//@safe pure unittest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix this and the stdio import

@JackStouffer
Copy link
Contributor

I think this function is of limited value. But, because there is an STL implementation adding this will help people transitioning from C++.

LGTM sans comments

If the extreme element occurs multiple time, the first occurrence will be
returned.
This function is more efficient than calling both $(LREF minElement) and
$(LREF maxElement).
Copy link
Contributor

@JackStouffer JackStouffer Apr 29, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could go into a little more detail here. I think you should add something like

This function is more efficient than calling both $(LREF minElement) and $(LREF maxElement) for one range because this function only requires one scan of the range, whereas the former takes two. Also, calling both $(LREF minElement) and $(LREF maxElement) on the same range would require it to be a forward range.

This would help people understand this function's benefits more clearly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More precisely, we should provide guarantees similar to the C++ version - per http://en.cppreference.com/w/cpp/algorithm/minmax_element: "At most max(floor(3/2(N−1)), 0) applications of the predicate, where N = std::distance(first, last)."

@andralex
Copy link
Member

andralex commented Mar 5, 2017

Ah, cool, I'd forgotten about the algorithmic trick. But I don't seem to see you applying it. The pattern goes:

if (r[i] < r[i + 1])
{
    if (r[i] < r[min]) min = i;
    if (r[i + 1] > r[max]) max = i + 1;
}
else
{
    if (r[i + 1] < r[min]) min = i + 1;
    if (r[i] > r[max]) max = i;
}

It's actually one of the FB interview questions :). So yes I approve the addition, but for the life of me I don't see where you implement the correct algorithm above.

@wilzbach
Copy link
Contributor Author

wilzbach commented May 5, 2017

Ah, cool, I'd forgotten about the algorithmic trick. But I don't seem to see you applying it. The pattern goes:

Yeah I tried it (see benchmarks below) and the "stupid" way seems to be a lot faster.
FWIW in C++ GCC and CLang your pattern is used as well:

https://github.com/gcc-mirror/gcc/blob/gcc-7_1_0-release/libstdc%2B%2B-v3/include/bits/stl_algo.h#L3332
https://github.com/llvm-mirror/libcxx/blob/release_40/include/algorithm#L2662

I currently don't have time to dive more into it, but here are the benchmarks to compare the naive version and iteration in pairs:

RandomAccess

ldc -release -O5 -mcpu=native test.d && ./test
reduce!(min,max) = 7 secs, 722 ms, 265 μs, and 4 hnsecs
fold.minMax     = 5 secs, 770 ms, and 427 μs
minmaxElement   = 5 secs, 410 ms, and 31 μs
minmaxElementInPairs = 15 secs, 362 ms, 724 μs, and 3 hnsecs

InputRange

>  ldc -release -O5 -mcpu=native test.d && ./test
reduce!(min,max) = 7 secs, 215 ms, 601 μs, and 1 hnsec
fold.minMax     = 6 secs, 507 ms, 278 μs, and 5 hnsecs
minmaxElement   = 6 secs, 361 ms, 248 μs, and 8 hnsecs
minmaxElementInPairs = 12 secs, 344 ms, 548 μs, and 3 hnsecs

(updated code from above).

@andralex
Copy link
Member

andralex commented Jun 3, 2017

Heh, thanks @wilzbach. I've reproed your measurements. This is an interesting result. I'm not sure exactly what's going on yet. Take a look at https://godbolt.org/g/mXj3uW. There we have:

  • minmaxElementNoMap the brute force version. From what I can see ldc does some really awesome loop unrolling there,
  • minmaxElementNoMap2 cleverer version that saves on comparisons under the assumption that if something is smaller than the smallest, it can't be greater than the greatest. So it does statistically between n and 2 * n comparisons. I measured no performance difference on any input. I ascribe this to the fact that the comparisons are done in parallel. LDC again generates interesting code, though very different from the first!
  • minmaxElementNoMapInPairs the classic algorithm that does 3n/2 comparisons. It is indeed slower! LDC seems to generate much more conservative code.
  • minmaxElementNoMapInPairs2 an improved version that has a tighter loop. It does improve the situation but not by much.

I'd say let's add minMaxElement with the guaranteed 3n/2 comparisons (which may be arbitrarily expensive); otherwise there is no merit to it over reduce!(min, max). We may specialize it for certain data types and the default comparison to take the brute force approach.

Compiler experts @ibuclaw @JohanEngelen @klickverbot please take a look!

FWIW dmd generates equally good/bad code for minmaxElementNoMap2 and minmaxElementNoMapInPairs2. Didn't try gdc yet.

@dnadlinger
Copy link
Contributor

@andralex: Do you have a benchmark script for your experiments?

@wilzbach
Copy link
Contributor Author

wilzbach commented Jun 4, 2017

@andralex: Do you have a benchmark script for your experiments?

@klickverbot: sorry that my link was so hidden at the end of the post. It's here:

https://gist.github.com/wilzbach/3407d80bfa757d46a3ac59a873d5f085

@dnadlinger
Copy link
Contributor

dnadlinger commented Jun 4, 2017

@wilzbach: Thanks, but I was referring to Andrei's experiments in particular because I'm lazy. I guess I need to copy-paste over his code myself after all… ;)

@andralex
Copy link
Member

andralex commented Jun 4, 2017

@klickverbot pasted the mess here: https://dpaste.dzfl.pl/09fbcf17f932

@JohanEngelen
Copy link
Contributor

@andralex For the InPairs implementations, shouldn't the loop advance in pairs (+2)? (and then some extra work for odd length)

@andralex
Copy link
Member

andralex commented Jun 4, 2017

@JohanEngelen I ignored the odd elements case, it has no bearing on measuring efficiency. Per lines 124 and 202, I advance in 2 increments and look at i - 1 and i in one pass. If you find any bug I'd be relieved! I'm getting results that are difficult to interpret.

{
alias mapFun = unaryFun!map;
alias selectorFun = binaryFun!selector;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert(!selector(maxSeed, minSeed));

{
maxElement = r[i];
maxElementMapped = mapElement;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here we should use the 3n/2 algorithm, even if technically slower for < and int. One great thing to do would be to specialize for this (and a few other) cases.

MapType mapElement1 = mapFun(rawElement1);
r.popFront();
// check if the range had an uneven amount of elements and thus has ended
if (r.empty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUG: must return at the end of this if.

@andralex
Copy link
Member

andralex commented Jun 4, 2017

@wilzbach one more thing - where did you see the getpid trick in doNotOptimizeAway? I've seen it elsewhere, too, but forgot where. Thx!

@wilzbach
Copy link
Contributor Author

wilzbach commented Jun 4, 2017

@wilzbach one more thing - where did you see the getpid trick in doNotOptimizeAway? I've seen it elsewhere, too, but forgot where.

Ideally we get something like this trick into Phobos, s.t. users don't have to worry about it:

#5416

@JohanEngelen
Copy link
Contributor

@andralex The code you linked to (#4248 (comment), https://godbolt.org/g/mXj3uW ) does not do the +2.

@JohanEngelen
Copy link
Contributor

JohanEngelen commented Jun 5, 2017

@andralex Did you trying "caching" r.length for code like this:

for (size_t i = 0; i < r.length; i += 2)
{
    uint j = selectorFun(r[i], r[i + 1]);

in case the compiler cannot/doesnot deduce that selectorFun is not touching r.length?

@andralex
Copy link
Member

andralex commented Jun 5, 2017

@JohanEngelen tried that, makes no difference.

BUT! I found what seems to be an interesting performance bug. I stripped minmaxElementInPairsNoMap all the way down to this core loop:

    for (size_t i = 0; i < r.length; i += 2)
    {
    }

Even if it literally does nothing, it still takes more than 2 times longer than minmaxElementNoMap. If I change it to:

    for (size_t i = 0; i < r.length; i += 1)
    {
    }

So ldc does not generate good code for loops that advance in a non-unit increment. I think you'd improve the life of many if you looked into that!

@JohanEngelen
Copy link
Contributor

JohanEngelen commented Jun 5, 2017

@andralex That's because of integer overflow. If r.length == size_t.max (an odd number!), it's an infinite loop.
See https://godbolt.org/g/5RFNdt . With the early r.length == size_t.max return path, GDC is able to optimize all out. LDC isn't. (I filed ldc-developers/ldc#2154 )

Edit: This kind of stuff is a lot of fun to work on @andralex ! Especially with such supertiny test cases. Hope you remember to file such things in our bugtracker ;-) ;-)

@andralex
Copy link
Member

andralex commented Jun 6, 2017

@JohanEngelen fwiw I changed the implementation to use ++i for iteration up to r.length / 2 and used 2 * i and 2 * i + 1 as adjacent elements. Still no improvement.

@wilzbach so let's stay with the 3n/2 algo for this PR. Works?

@wilzbach wilzbach added the Review:Orphaned The author of the PR is no longer available and this PR can be adopted by anyone. label Jun 6, 2018
@wilzbach
Copy link
Contributor Author

wilzbach commented Jun 6, 2018

I never needed this and lost interest in pursuing this PR. Sorry

@wilzbach wilzbach closed this Jun 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merge:stalled Review:Needs Work Review:Orphaned The author of the PR is no longer available and this PR can be adopted by anyone. Severity:Enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants