Fix Issue 9682 - group/filter should return a SortedRange when they are passed one #5712

RazvanN7 · 2017-08-30T10:35:53Z

No description provided.

…ge) ==> SortedRange

dlang-bot · 2017-08-30T10:35:54Z

Thanks for your pull request, @RazvanN7! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.

Some tips to help speed things up:

smaller, focused PRs are easier to review than big ones
try not to mix up refactoring or style changes with bug fixes or feature enhancements
provide helpful commit messages explaining the rationale behind each change

Bear in mind that large or tricky changes may require multiple rounds of review and revision.

Please see CONTRIBUTING.md for more information.

Bugzilla references

Auto-close	Bugzilla	Description
✓	9682	Propagate range sortedness property throughout Phobos algorithms

JackStouffer · 2017-08-30T16:49:32Z

The most apparent problem with this is why stop at these two functions? Why not add this special case to every function which returns it's own range type?

I'd rather leave these sorts of special cases up to the user. Especially since this is one of the reasons assumeSorted was added is this exact use case.

JackStouffer · 2017-08-30T16:56:38Z

The other problem with this type of code is that code like this

auto r = range_thats_sorted.assumeSorted;
auto b = r.func().func().func().assumeSorted;

vs the following with your changes

auto r = range_thats_sorted.assumeSorted;
auto b = r.func().func().func();

is that b in the first version will have a type of

SortedRange!(RangeType!(RangeType!(RangeType!(SortedRange!(R)))))

and in your code it will be

SortedRange!(RangeType!(SortedRange!(RangeType!(SortedRange!(RangeType!(SortedRange!(R)))))))

RazvanN7 · 2017-08-31T08:56:04Z

@JackStouffer

I'd rather leave these sorts of special cases up to the user. Especially since this is one of the reasons assumeSorted was added is this exact use case.

I think that most functions should preserve the type range that is passed as parameter, although I agree that adding tests for each type of range is not the way to go.

The other problem with this type of code is that code like this

Yes, that bothers me too and I tried to search for a way to get the underlying range of SortedRange (without using release()) so that only Group/FilterResult will be a SortedRange; I even asked about this on Slack), but couldn't find any solution, so I thought I'd just make a PR and see what other folks think about this.

Should be close this and bug report?

JackStouffer · 2017-08-31T12:54:42Z

I think that most functions should preserve the type range that is passed as parameter, although I agree that adding tests for each type of range is not the way to go.

In practice I think this is impossible because you need to "overload" front et. all for custom behavior.

Should be close this and bug report?

Yes, I would close that as won't fix. I believe this is one of those things that the user has to be in charge of.

Thanks for all of your work.

PetarKirov · 2017-09-01T09:29:01Z

@JackStouffer Propagating range sortedness is one of the biggest algorithmic optimization that we can do in Phobos, so I strongly disagree that this is not an area worth pursuing. This in an area that D can do much better than other languages and we should try to pursue it too the full extend possible.
Also, we (phobos devs) are in a much better position to know which algorithm or range adaptor preserves this and that property. Given a long UFCS chain like r.func1().func2().func3().func4().func5() do you really expect the average Joe to really know where to put .assumeSorted? This approach just doesn't scale for large projects. Propagating sortedness is just like propagating forward / bidirectional and random access. Many standard libraries aren't interested in doing this, but we are.

PetarKirov · 2017-09-01T09:52:46Z

On the other hand, I'm also not 100% sold on the idea of adding .assumeSorted inside Phobos functions, though admittedly it requires little effort effort and in principle I don't think long type names are a problem.

Probably an approach like we do for infinite ranges would worth pursuing.

auto someRangeAlgo(R)(R range)
{
    struct Result
    {
        R _range;

        static if (isStaticallyKnownToBeSorted!R)
        {
            enum isSorted = true;
            alias sortPredicate = getSortPredicate!R;

            // If applicable
            mixin SortedRangeAlgos!_range;
        }

        // front, popFront, empty, ...
    }

    return Result(range);
}

(Of course the whole static if block can be factored in a template mixin ;) )

JackStouffer · 2017-09-01T15:29:36Z

@ZombineDev Been thinking about this some more. I think you're right in that there is value in propagating sortedness when there's a lot of room in Phobos for more sortedness optimizations.

I'll reopen this for further discussions. I'm still not really sold on the idea of doing things automatically, even if less experienced users won't get all of the benefits.

One other problem that any solution would need to address is non-pure predicates/lambdas in things like map. There's no guarantee that these functions aren't modifying the range down the line and making it non sorted.

I don't think long type names are a problem.

It's not just the names, you also balloon the size of the end result struct.

andralex · 2017-09-01T18:38:29Z

std/algorithm/iteration.d

-        return FilterResult!(unaryFun!predicate, Range)(range);
+        import std.range : SortedRange, assumeSorted;
+        static if (is(Range : SortedRange!TT, TT))
+            return assumeSorted(FilterResult!(unaryFun!predicate, Range)(range));


The sorting predicate should be propagated here, e.g. return FilterResult!(unaryFun!predicate, Range)(range).assumeSorted!TT;

andralex · 2017-09-01T18:39:59Z

std/algorithm/iteration.d

+        static if (is(Range : SortedRange!TT, TT))
+            return assumeSorted(FilterResult!(unaryFun!predicate, Range)(range));
+        else
+            return FilterResult!(unaryFun!predicate, Range)(range);


Code is repetitive here, should be something like:

auto result = FilterResult!(unaryFun!predicate, Range)(range); static if (is(Range : SortedRange!order, order)) return result.assumeSorted!order; else return result;

andralex · 2017-09-01T18:40:56Z

std/algorithm/iteration.d

-    return typeof(return)(r);
+    import std.range : SortedRange, assumeSorted;
+    static if (is(Range : SortedRange!TT, TT))
+        return assumeSorted(Group!(pred, Range)(r));


same as above

andralex · 2017-09-02T15:11:14Z

@ZombineDev @JackStouffer there are several current and future optional features of ranges. We currently fully support length as an "official" optional property, and dedicate code to supporting it. There is good reason for that - if lost, the property is virtually impossible to implement. In contrast, sortedness is easy to recover if lost by appending .assumeSorted to the resulting range. Assessing that the range remains sorted doesn't seem like a difficult task for the human programmer.

Nevertheless, there is value in investigating the cost/value of propagating sortedness. @JackStouffer could you please make two lists - one with algorithms/ranges that preserve sortedness, and one with those that don't? Then we should have a good image of what it takes to implement this policy across Phobos. Thanks!

JackStouffer · 2017-09-02T22:34:34Z

I'm away from my laptop ATM, I'll take a look on Monday

JackStouffer · 2017-09-03T02:16:50Z

Reflecting on this a bit more, I did not understand the full implications of @ZombineDev's comment.

In the chain

auto r = knownSorted.assumeSorted();

r.filter!(func).find!("a == thing");

It's not obvious that find knows about sorted ranges and takes advantage of them. I was only thinking about long function chain in which you would've appended assumeSorted to the end because you know that the functions didn't change the ordering .

The user not knowing/caring about which functions do or do not take advantage of sorted information does present a problem of knowing when and when not to use assume sorted.

However, the above problems still stand.

RazvanN7 · 2017-09-04T07:06:26Z

We should take into account the fact that sometimes the range may be sorted with a predicate and the function is called with a different predicate. For example: [5, 4, 2].assumeSorted!"a > b".find!"a < b"(4). In most case you probably can safely propagate sortedness only when the predicates match (which is tricky because you need to do function comparison).

MetaLang · 2017-09-04T13:41:06Z

Currently it's not possible to compare predicates in the general case. For example,
is(typeof([0, 1, 2].sort!((a, b) => a > b)) == typeof([0, 1, 2].sort!((a, b) => a > b))) will return false because the compiler generates a unique lambda function for each. There's been talk of how to do it in the past but I don't think any of it was ever implemented.

One way it might be able to be done is to add sort as a member function to SortedRange that returns the type UnsortedRange, thus disallowing any SortedRange-based operations (the member version of find has priority over the UFCS version). Then your example code would not incorrectly assume the range is sorted according to second predicate.

andralex · 2017-09-04T13:54:16Z

There are two predicates involved in [5, 4, 2].assumeSorted!"a > b".find!"a < b"(4). The first one is the sortedness, and does not change. The second one is the find predicate, which is distinct and does not intervene in the sorting order. So we're fine there. There are, however, cases when predicate comparison is necessary - we don't have a solution to that yet.

RazvanN7 · 2017-09-04T14:05:44Z

@andralex

The second one is the find predicate, which is distinct and does not intervene in the sorting order. So we're fine there

I was trying to highlight the fact that in order to benefit from sortedness or propagate it we need to do predicate comparison. There is one case in which find uses the sortedness to its advantage and that is only when the sort predicate is the default one [1]; this in my opinion is a weak implementation (it is politically correct to say that, since it is mine) - a powerful one should take into account the different combinations of find-sort predicates and this point is valid for all the functions in phobos which may benefit from. or propagate sortedness.

[1] #4907

edi33416 · 2017-09-17T12:06:47Z

@andralex

There are, however, cases when predicate comparison is necessary - we don't have a solution to that yet.

What if, instead of generating a new lambda every time, we make the compiler generate a named function for the given predicate and argument types and memoize it? Now, when it encounters a predicate, we first check the lookup table for the current predicate and argument types and we either point to the existing (previously generated function) or create a new one.

Just a thought, though I don't know if this covers all the cases, or how hard it would be to implement.

RazvanN7 · 2017-09-17T17:25:41Z

@edi33416 The problem with functions is that it is difficult to compare them for equality. How do you know for sure that f(x) is equal to g(x)? For simple predicates like "a < b" it is simple, but for more complex functions there is no trivial solution.

MetaLang · 2017-09-17T23:40:55Z

@edi33416 unfortunately it's not so simple. Any possible solution will probably be very complicated and full of special cases if we allow impure functions to be compared. Not only do you have to know about the function, but the context that it might be carrying with it. Maybe normalizing variable names to de Brujin indices can help in most cases but I think it'll be very hard to have a 100% solution. It's impossible in the general case because ultimately, I think checking two lambdas for equality reduces to the halting problem (f == g iff forall n f(n) == g(n), which would require running the functions, but we don't know if they'll terminate).

JackStouffer · 2017-09-18T16:52:33Z

@andralex Doing a quick scan of the docs for std.algorithm and std.range, here's what I've come up with.

Functions which preserve sortedness w/ pure lambdas

filter
all
find (findSplit + before/after)
cache
uniq
drop (+ variants)
enumerate (maybe, depends on how you define sortedness of tuples)
groupby (same problem)

edi33416 · 2017-09-19T14:30:52Z

@RazvanN7 @MetaLang Thank you for the explanation

MetaLang · 2017-09-20T03:18:16Z

There is another method of checking whether two functions are equal that I didn't consider, and it doesn't reduce to the halting problem (I think). If two functions generate the same code (variable names aside), we should consider them equal. This is a different notion of function equality (it cares about how they're implemented rather than what they produce) and I don't know how practical it is, but it may be worth thinking about as well.

andralex · 2017-09-20T05:30:55Z

@MetaLang this has been discussed before (I think in the forum). Two functions can be comparable by means of the unification algorithm https://en.wikipedia.org/wiki/Unification_(computer_science). The implementation should allow alpha renaming http://wiki.c2.com/?AlphaEquivalence.

Could you please add a bugzilla? Thanks!

andralex · 2017-11-06T15:35:51Z

Will not pursue the strategy of propagating sortedness in phobos.

Fix Issue 9682 - group(SortedRange) ==> SortedRange, filter(SortedRan…

6dde17b

…ge) ==> SortedRange

RazvanN7 requested review from JackStouffer, PetarKirov, andralex and wilzbach as code owners August 30, 2017 10:35

dlang-bot added the Severity:Enhancement label Aug 30, 2017

JackStouffer closed this Aug 31, 2017

JackStouffer reopened this Sep 1, 2017

JackStouffer added the Review:Needs Review label Sep 1, 2017

andralex reviewed Sep 1, 2017

View reviewed changes

JackStouffer self-assigned this Sep 2, 2017

andralex closed this Nov 6, 2017

Uh oh!

Fix Issue 9682 - group/filter should return a SortedRange when they are passed one #5712

Fix Issue 9682 - group/filter should return a SortedRange when they are passed one #5712

Uh oh!

Conversation

RazvanN7 commented Aug 30, 2017

Uh oh!

dlang-bot commented Aug 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bugzilla references

Uh oh!

JackStouffer commented Aug 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackStouffer commented Aug 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RazvanN7 commented Aug 31, 2017

Uh oh!

JackStouffer commented Aug 31, 2017

Uh oh!

PetarKirov commented Sep 1, 2017

Uh oh!

PetarKirov commented Sep 1, 2017

Uh oh!

JackStouffer commented Sep 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andralex Sep 1, 2017

Choose a reason for hiding this comment

Uh oh!

andralex Sep 1, 2017

Choose a reason for hiding this comment

Uh oh!

andralex Sep 1, 2017

Choose a reason for hiding this comment

Uh oh!

andralex commented Sep 2, 2017

Uh oh!

JackStouffer commented Sep 2, 2017

Uh oh!

JackStouffer commented Sep 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RazvanN7 commented Sep 4, 2017

Uh oh!

MetaLang commented Sep 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andralex commented Sep 4, 2017

Uh oh!

RazvanN7 commented Sep 4, 2017

Uh oh!

edi33416 commented Sep 17, 2017

Uh oh!

RazvanN7 commented Sep 17, 2017

Uh oh!

MetaLang commented Sep 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JackStouffer commented Sep 18, 2017

Uh oh!

edi33416 commented Sep 19, 2017

Uh oh!

MetaLang commented Sep 20, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andralex commented Sep 20, 2017

Uh oh!

andralex commented Nov 6, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dlang-bot commented Aug 30, 2017 •

edited

Loading

JackStouffer commented Aug 30, 2017 •

edited

Loading

JackStouffer commented Aug 30, 2017 •

edited

Loading

JackStouffer commented Sep 1, 2017 •

edited

Loading

JackStouffer commented Sep 3, 2017 •

edited

Loading

MetaLang commented Sep 4, 2017 •

edited

Loading

MetaLang commented Sep 17, 2017 •

edited

Loading

MetaLang commented Sep 20, 2017 •

edited

Loading