Skip to content

Conversation

@RazvanN7
Copy link
Collaborator

Added functionality so that find can take advantage of SortedRange's functions when:

  1. A single element is searched in a SortedRange.
  2. A a range is searched in another range and both ranges are SortedRange with the same signature

@dlang-bot
Copy link
Contributor

Fix Bugzilla Description
5236 [patch] std.format.formattedRead/unformatValue does not support the raw reading of integer types
8829 std.algorithm.find fails to take advantage of SortedRange

@RazvanN7 RazvanN7 changed the title Issue 8828 Issue 8829 Nov 17, 2016
Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good going, thanks for this work.

// Works only for the default find predicate and any SortedRange predicate.
// 8829 enhancement
import std.range: SortedRange;
static if(is(typeof(haystack) : SortedRange!TT, TT) && isDefaultPred)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after if

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already have the type of haystack: static if (is(InputRange : SortedRange!TT, TT) && isDefaultPred)

static if(is(typeof(haystack) : SortedRange!TT, TT) && isDefaultPred)
{
auto partitions = haystack.trisect(needle);
if(partitions[1].length == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after if (throughout please)

import std.range: SortedRange;
static if(is(typeof(haystack) : SortedRange!TT, TT) && isDefaultPred)
{
auto partitions = haystack.trisect(needle);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trisect does a bit more work than lower/upperBound, could you use one of those instead? Probably lowerBound because you want to find the first occurrence.

// When it is found O(walklength(needle)) steps are performed.
// 8829 enhancement
import std.range;
static if(is(typeof(haystack) == typeof(needle))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is(R1 == R2)

// 8829 enhancement
import std.range;
static if(is(typeof(haystack) == typeof(needle))
&& is(typeof(haystack) : SortedRange!TT, TT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is(R1 : SortedRange!TT, TT)

auto needleFirstElem = needle[0];
auto partitions = haystack.trisect(needleFirstElem);
auto firstElemLen = partitions[1].length;
int count = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t

auto firstElemLen = partitions[1].length;
int count = 0;

if(firstElemLen == 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace most of the code below with a call to mismatch: https://dlang.org/library/std/algorithm/comparison/mismatch.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made all the changes. Hope it's ok now

@RazvanN7 RazvanN7 changed the title Issue 8829 Issue 8829 - std.algorithm.find fails to take advantage of SortedRange Dec 5, 2016
Copy link
Member

@andralex andralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think these are the right algorithms to use. Please make a pass for the potential bugs. Thx!

static if (is(InputRange : SortedRange!TT, TT) && isDefaultPred)
{
auto lb = haystack.lowerBound(needle);
if (lb.length == 0 || lb.length == haystack.length || haystack[lb.length] != needle)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have lb.length == 0 there? It means there is no element less than the needle in the haystack, so the haystack may actually start with the needle. No?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that case has eluded me. The condition was for the case when the needle has a value which is smaller then the smallest value in the range. I added an additional condition which rules out the case you mentioned (lb.length == 0 && haystack[0] != needle).


while (needle.front() == needleFirstElem)
{
auto elem = needle.front();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need elem?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't. I used it in a previous version and forgot about it.


auto m = mismatch(partitions[2], needle);

if (m[1] == haystack[$ .. $])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (m[1].empty)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


if (m[1] == haystack[$ .. $])
return haystack[partitions[0].length + partitions[1].length - count .. $];
if (m[1] != haystack[$ .. $])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (m[1].empty)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

++count;

if (count > firstElemLen)
return haystack[$ .. $];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not covered, please add to the unittest

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

auto lb = haystack.lowerBound(needle);
if ((lb.length == 0 && haystack[0] != needle) || lb.length == haystack.length
|| haystack[lb.length] != needle)
return haystack[$ .. $];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Odd alignment - first, there's one character difference which is definitely a problem (all indentation levels should be a multiple of 4). Second, code is (approximately) at the same level as an unrelated expression continuation, which can't be right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expression is convoluted:

(lb.length == 0 && haystack[0] != needle) || lb.length == haystack.length
           || haystack[lb.length] != needle

Simplified:

(x == 0 && haystack[x] != needle) || x == y || haystack[x] != needle

Reordered:

x == y || (x == 0 && haystack[x] != needle) || haystack[x] != needle

And this exposes the redundancy:

x == y || haystack[x] != needle

In fact the more complicated condition had a bug - you're accessing haystack[0] without protection so you'll have a bounds check failure if the haystack is empty.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wilzbach can you add to the style checker that we detect all indents that are not a multiple of 4?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks. Indeed, the code is much cleaner now.


if (m[1].empty)
return haystack[partitions[0].length + partitions[1].length - count .. $];
if (!m[1].empty)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is the complement of the previous one, which if passes will terminate the flow. Remove.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it because I had some problems in the past with static ifs (the code which followd my block was declared unreachable), but I solved now. Thanks

@andralex andralex merged commit 98a7c44 into dlang:master Dec 6, 2016
@nordlow
Copy link
Contributor

nordlow commented Dec 7, 2016

Here are some more algorithms that should be specialized: #3534

The algorithms isSorted and {min,max}Element should be the lowest hanging fruits.

@schuetzm
Copy link
Contributor

schuetzm commented Dec 9, 2016

This should have been squashed before pulling.

@andralex
Copy link
Member

andralex commented Dec 9, 2016

@schuetzm can we do it postmortem? cc @CyberShadow

@wilzbach
Copy link
Contributor

wilzbach commented Dec 9, 2016

@schuetzm can we do it postmortem? cc @CyberShadow

Nope rebasing the git history is no option, but maybe we can at least learn from these events? :)

@CyberShadow
Copy link
Member

No, we can't.

Don't worry about it. Most of the problems of not squishing commits are when some changes are done in one commit and then undone in another, as that creates noise for blame. Here it seems like most of the commits were fixups on the code added in the first commit, so although a neater history would've been nice, it's not a big deal.

@andralex
Copy link
Member

andralex commented Dec 9, 2016

OK thank you all!

@wilzbach
Copy link
Contributor

FYI this uses string comparisons to check whether a range isSorted:

    static if (is(typeof(pred == "a == b")))
        enum isDefaultPred = pred == "a == b";
    else
        enum isDefaultPred = false;

...

    static if (is(InputRange : SortedRange!TT, TT) && isDefaultPred)

So:

  • it won't work if the user uses a lambda and I thought we wanted to move away from the string lambda
  • it won't work if it's applied from another range function that doesn't modify the sortedness attribute (e.g. take)

I opened a bug for using string comparisons and another for propagating sortedness, s.t. it's not forgotten :/

@andralex
Copy link
Member

@wilzbach cool - the better solution is to use overloads that detect "no lambda has been passed"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants