Skip to content

Conversation

@RazvanN7
Copy link
Collaborator

No description provided.

@dlang-bot
Copy link
Contributor

dlang-bot commented Aug 30, 2017

Thanks for your pull request, @RazvanN7! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.

Some tips to help speed things up:

  • smaller, focused PRs are easier to review than big ones

  • try not to mix up refactoring or style changes with bug fixes or feature enhancements

  • provide helpful commit messages explaining the rationale behind each change

Bear in mind that large or tricky changes may require multiple rounds of review and revision.

Please see CONTRIBUTING.md for more information.

Bugzilla references

Auto-close Bugzilla Description
9682 Propagate range sortedness property throughout Phobos algorithms

@JackStouffer
Copy link
Contributor

JackStouffer commented Aug 30, 2017

The most apparent problem with this is why stop at these two functions? Why not add this special case to every function which returns it's own range type?

I'd rather leave these sorts of special cases up to the user. Especially since this is one of the reasons assumeSorted was added is this exact use case.

@JackStouffer
Copy link
Contributor

JackStouffer commented Aug 30, 2017

The other problem with this type of code is that code like this

auto r = range_thats_sorted.assumeSorted;
auto b = r.func().func().func().assumeSorted;

vs the following with your changes

auto r = range_thats_sorted.assumeSorted;
auto b = r.func().func().func();

is that b in the first version will have a type of

SortedRange!(RangeType!(RangeType!(RangeType!(SortedRange!(R)))))

and in your code it will be

SortedRange!(RangeType!(SortedRange!(RangeType!(SortedRange!(RangeType!(SortedRange!(R)))))))

@RazvanN7
Copy link
Collaborator Author

@JackStouffer

I'd rather leave these sorts of special cases up to the user. Especially since this is one of the reasons assumeSorted was added is this exact use case.

I think that most functions should preserve the type range that is passed as parameter, although I agree that adding tests for each type of range is not the way to go.

The other problem with this type of code is that code like this

Yes, that bothers me too and I tried to search for a way to get the underlying range of SortedRange (without using release()) so that only Group/FilterResult will be a SortedRange; I even asked about this on Slack), but couldn't find any solution, so I thought I'd just make a PR and see what other folks think about this.

Should be close this and bug report?

@JackStouffer
Copy link
Contributor

I think that most functions should preserve the type range that is passed as parameter, although I agree that adding tests for each type of range is not the way to go.

In practice I think this is impossible because you need to "overload" front et. all for custom behavior.

Should be close this and bug report?

Yes, I would close that as won't fix. I believe this is one of those things that the user has to be in charge of.

Thanks for all of your work.

@PetarKirov
Copy link
Member

@JackStouffer Propagating range sortedness is one of the biggest algorithmic optimization that we can do in Phobos, so I strongly disagree that this is not an area worth pursuing. This in an area that D can do much better than other languages and we should try to pursue it too the full extend possible.
Also, we (phobos devs) are in a much better position to know which algorithm or range adaptor preserves this and that property. Given a long UFCS chain like r.func1().func2().func3().func4().func5() do you really expect the average Joe to really know where to put .assumeSorted? This approach just doesn't scale for large projects. Propagating sortedness is just like propagating forward / bidirectional and random access. Many standard libraries aren't interested in doing this, but we are.

@PetarKirov
Copy link
Member

On the other hand, I'm also not 100% sold on the idea of adding .assumeSorted inside Phobos functions, though admittedly it requires little effort effort and in principle I don't think long type names are a problem.

Probably an approach like we do for infinite ranges would worth pursuing.

auto someRangeAlgo(R)(R range)
{
    struct Result
    {
        R _range;

        static if (isStaticallyKnownToBeSorted!R)
        {
            enum isSorted = true;
            alias sortPredicate = getSortPredicate!R;

            // If applicable
            mixin SortedRangeAlgos!_range;
        }

        // front, popFront, empty, ...
    }

    return Result(range);
}

(Of course the whole static if block can be factored in a template mixin ;) )

@JackStouffer
Copy link
Contributor

JackStouffer commented Sep 1, 2017

@ZombineDev Been thinking about this some more. I think you're right in that there is value in propagating sortedness when there's a lot of room in Phobos for more sortedness optimizations.

I'll reopen this for further discussions. I'm still not really sold on the idea of doing things automatically, even if less experienced users won't get all of the benefits.

One other problem that any solution would need to address is non-pure predicates/lambdas in things like map. There's no guarantee that these functions aren't modifying the range down the line and making it non sorted.

I don't think long type names are a problem.

It's not just the names, you also balloon the size of the end result struct.

return FilterResult!(unaryFun!predicate, Range)(range);
import std.range : SortedRange, assumeSorted;
static if (is(Range : SortedRange!TT, TT))
return assumeSorted(FilterResult!(unaryFun!predicate, Range)(range));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sorting predicate should be propagated here, e.g. return FilterResult!(unaryFun!predicate, Range)(range).assumeSorted!TT;

static if (is(Range : SortedRange!TT, TT))
return assumeSorted(FilterResult!(unaryFun!predicate, Range)(range));
else
return FilterResult!(unaryFun!predicate, Range)(range);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is repetitive here, should be something like:

auto result = FilterResult!(unaryFun!predicate, Range)(range);
static if (is(Range : SortedRange!order, order))
    return result.assumeSorted!order;
else
    return result;

return typeof(return)(r);
import std.range : SortedRange, assumeSorted;
static if (is(Range : SortedRange!TT, TT))
return assumeSorted(Group!(pred, Range)(r));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@andralex
Copy link
Member

andralex commented Sep 2, 2017

@ZombineDev @JackStouffer there are several current and future optional features of ranges. We currently fully support length as an "official" optional property, and dedicate code to supporting it. There is good reason for that - if lost, the property is virtually impossible to implement. In contrast, sortedness is easy to recover if lost by appending .assumeSorted to the resulting range. Assessing that the range remains sorted doesn't seem like a difficult task for the human programmer.

Nevertheless, there is value in investigating the cost/value of propagating sortedness. @JackStouffer could you please make two lists - one with algorithms/ranges that preserve sortedness, and one with those that don't? Then we should have a good image of what it takes to implement this policy across Phobos. Thanks!

@JackStouffer
Copy link
Contributor

I'm away from my laptop ATM, I'll take a look on Monday

@JackStouffer JackStouffer self-assigned this Sep 2, 2017
@JackStouffer
Copy link
Contributor

JackStouffer commented Sep 3, 2017

Reflecting on this a bit more, I did not understand the full implications of @ZombineDev's comment.

In the chain

auto r = knownSorted.assumeSorted();

r.filter!(func).find!("a == thing");

It's not obvious that find knows about sorted ranges and takes advantage of them. I was only thinking about long function chain in which you would've appended assumeSorted to the end because you know that the functions didn't change the ordering .

The user not knowing/caring about which functions do or do not take advantage of sorted information does present a problem of knowing when and when not to use assume sorted.

However, the above problems still stand.

@RazvanN7
Copy link
Collaborator Author

RazvanN7 commented Sep 4, 2017

We should take into account the fact that sometimes the range may be sorted with a predicate and the function is called with a different predicate. For example: [5, 4, 2].assumeSorted!"a > b".find!"a < b"(4). In most case you probably can safely propagate sortedness only when the predicates match (which is tricky because you need to do function comparison).

@MetaLang
Copy link
Member

MetaLang commented Sep 4, 2017

Currently it's not possible to compare predicates in the general case. For example,
is(typeof([0, 1, 2].sort!((a, b) => a > b)) == typeof([0, 1, 2].sort!((a, b) => a > b))) will return false because the compiler generates a unique lambda function for each. There's been talk of how to do it in the past but I don't think any of it was ever implemented.

One way it might be able to be done is to add sort as a member function to SortedRange that returns the type UnsortedRange, thus disallowing any SortedRange-based operations (the member version of find has priority over the UFCS version). Then your example code would not incorrectly assume the range is sorted according to second predicate.

@andralex
Copy link
Member

andralex commented Sep 4, 2017

There are two predicates involved in [5, 4, 2].assumeSorted!"a > b".find!"a < b"(4). The first one is the sortedness, and does not change. The second one is the find predicate, which is distinct and does not intervene in the sorting order. So we're fine there. There are, however, cases when predicate comparison is necessary - we don't have a solution to that yet.

@RazvanN7
Copy link
Collaborator Author

RazvanN7 commented Sep 4, 2017

@andralex

The second one is the find predicate, which is distinct and does not intervene in the sorting order. So we're fine there

I was trying to highlight the fact that in order to benefit from sortedness or propagate it we need to do predicate comparison. There is one case in which find uses the sortedness to its advantage and that is only when the sort predicate is the default one [1]; this in my opinion is a weak implementation (it is politically correct to say that, since it is mine) - a powerful one should take into account the different combinations of find-sort predicates and this point is valid for all the functions in phobos which may benefit from. or propagate sortedness.

[1] #4907

@edi33416
Copy link
Contributor

@andralex

There are, however, cases when predicate comparison is necessary - we don't have a solution to that yet.

What if, instead of generating a new lambda every time, we make the compiler generate a named function for the given predicate and argument types and memoize it? Now, when it encounters a predicate, we first check the lookup table for the current predicate and argument types and we either point to the existing (previously generated function) or create a new one.

Just a thought, though I don't know if this covers all the cases, or how hard it would be to implement.

@RazvanN7
Copy link
Collaborator Author

@edi33416 The problem with functions is that it is difficult to compare them for equality. How do you know for sure that f(x) is equal to g(x)? For simple predicates like "a < b" it is simple, but for more complex functions there is no trivial solution.

@MetaLang
Copy link
Member

MetaLang commented Sep 17, 2017

@edi33416 unfortunately it's not so simple. Any possible solution will probably be very complicated and full of special cases if we allow impure functions to be compared. Not only do you have to know about the function, but the context that it might be carrying with it. Maybe normalizing variable names to de Brujin indices can help in most cases but I think it'll be very hard to have a 100% solution. It's impossible in the general case because ultimately, I think checking two lambdas for equality reduces to the halting problem (f == g iff forall n f(n) == g(n), which would require running the functions, but we don't know if they'll terminate).

@JackStouffer
Copy link
Contributor

@andralex Doing a quick scan of the docs for std.algorithm and std.range, here's what I've come up with.

Functions which preserve sortedness w/ pure lambdas

  1. filter
  2. all
  3. find (findSplit + before/after)
  4. cache
  5. uniq
  6. drop (+ variants)
  7. enumerate (maybe, depends on how you define sortedness of tuples)
  8. groupby (same problem)

@edi33416
Copy link
Contributor

@RazvanN7 @MetaLang Thank you for the explanation

@MetaLang
Copy link
Member

MetaLang commented Sep 20, 2017

There is another method of checking whether two functions are equal that I didn't consider, and it doesn't reduce to the halting problem (I think). If two functions generate the same code (variable names aside), we should consider them equal. This is a different notion of function equality (it cares about how they're implemented rather than what they produce) and I don't know how practical it is, but it may be worth thinking about as well.

@andralex
Copy link
Member

@MetaLang this has been discussed before (I think in the forum). Two functions can be comparable by means of the unification algorithm https://en.wikipedia.org/wiki/Unification_(computer_science). The implementation should allow alpha renaming http://wiki.c2.com/?AlphaEquivalence.

Could you please add a bugzilla? Thanks!

@andralex
Copy link
Member

andralex commented Nov 6, 2017

Will not pursue the strategy of propagating sortedness in phobos.

@andralex andralex closed this Nov 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants