Implement find_first/last, position_first/last #189

schuster · 2016-12-20T16:22:44Z

Fixes #151

This is my first PR to Rayon, so any and all feedback is welcome.

cuviper · 2016-12-20T21:10:10Z

It looks like you're missing some imports for the stable compilers, possibly related to ordering like #173.
(IOW, try adding use super::internal::*; before use super::*;)

cuviper

This looks really nice! I have only a few comments...

cuviper · 2016-12-21T07:40:47Z

src/par_iter/find_first_last.rs

+        // divide the range in half
+        let old_lower_bound = self.lower_bound.get();
+        let median = old_lower_bound + ((self.upper_bound - old_lower_bound) / 2);
+        self.lower_bound.set(median);


So, it's important to your algorithm that the result from split_off is always used for the left/lower side by whatever is calling it. Near as I can tell, that's always true in practice, but I don't think we've ever specified it must be. It's just generally convenient to call in order (consumer.split_off(), consumer) when splitting in two.

I don't think there's a problem here, but we need to be careful, and this deserves a comment at least.

Perhaps we can rename split_off() to split_left() or split_off_left() or split_left_off(), something like that.

Good point; I'll add a comment. +1 to renaming split_off().

cuviper · 2016-12-21T07:44:39Z

src/par_iter/find_first_last.rs

+            // before, depending on timing. This means more consumers will
+            // continue to run than necessary, but the reducer will still ensure
+            // the correct value is returned.
+            self.best_found.swap(self.boundary, Ordering::Relaxed);


I think we can do better than that with a compare_and_swap loop, repeating until you either get a successful swap or find that someone else swapped something "better" than you anyway.

Yes, seems like it'd be worth using a compare-exchange here. I'd probably use compare_exchange_weak, but that's a minor thing.

I was indeed wondering whether a loop like you describe would be better. I'll write that up and push it.

cuviper · 2016-12-21T07:47:08Z

src/par_iter/find_first_last.rs

+    fn full(&self) -> bool {
+        let best_found = self.best_found.load(Ordering::Relaxed);
+        match self.match_position {
+            MatchPosition::Leftmost => best_found < self.boundary,


For Leftmost only, isn't self.item.is_some() also sufficient?

For that matter, consume probably needs to make sure that Leftmost doesn't clobber any existing self.item. We treat full as a hint, but consume should ensure its own correctness when that's important. Maybe try some tests with many successive candidates that will hit the same folder, perhaps even find_first(|| true).

Wait, am I confused? Why is self.item() sufficient? Don't we want to also stop if some other folder (to our left) has found an item?

@nikomatsakis I think he's saying it's also sufficient, which is correct (and a case I didn't think about). If we've already found the item, we need to make sure we stop before we find some other item slightly to the right.

So @cuviper, I think you're right that that logic should be added to both full and consume.

Right, it's an additional case for stopping.

yes, ok. I agree (this seems to relate to having some tests that cover this case)

nikomatsakis · 2016-12-21T09:49:35Z

src/par_iter/find_first_last.rs

+use super::*;
+use super::len::*;
+
+// The consumer for find_first/find_last has fake indexes representing the lower


👍 to leaving some comments on how things work =)

nikomatsakis · 2016-12-21T09:50:03Z

src/par_iter/find_first_last.rs

+// consumer's position relative to other consumers. The purpose is to allow a
+// consumer to know it should stop consuming items when another consumer finds a
+// better match.
+


Total nit: I'd like a // here to indicate that these are two paragraphs in the same comment.

nikomatsakis · 2016-12-21T09:50:41Z

src/par_iter/find_first_last.rs

+// don't implement that for now. The only downside of the current approach is
+// that in some cases, iterators very close to each other will have the same
+// range and therefore not be able to stop processing if one of them finds a
+// better match than the others.


Hmm, can you elaborate here?

OK, I read the code now, so I understand, but I think an example would be great. Basically just saying: we start out with an iterator over 0..usize::max, then split this range in half, and so forth.

Also, should we use u64 just to get that much more resolution?

Also, I think we need a test for the case where we have an iterator that exceeds the resolution of the false indices. Or at least some careful comments explaining why it is correct. =)

Thanks, I'll work on that explanation.

I think AtomicU64 is not stable yet (tracked here). Would it be worth postponing merging this PR until it's stable? Of course, we should be able to make that change later, because the exact size of the integer used is private to this module.

I agree that a test would be useful to validate my hand-waving in the comments. Any thoughts on how to write that test? I don't yet know enough about the splitting logic to know how to force that case. One possibility is to start with a FindConsumer whose upper and lower bounds are the same, but that requires either making the FindConsumer struct public so it can be used in the test module, or writing a test directly in this module (which none of the rest of this project does).

I think AtomicU64 is not stable yet (tracked here). Would it be worth postponing merging this PR until it's stable? Of course, we should be able to make that change later, because the exact size of the integer used is private to this module.

Nah, just use a usize. It'll make it easier to write a test. :)

As for the test... well, you could... perhaps write something that drives the splitting by hand? i.e., take a u64 range, create a producer from it, and manually call split etc? The test can be internal to this module if it needs access to private state...

...at minimum some thorough comments for how that case is handled would be a good start.

nikomatsakis · 2016-12-21T09:54:41Z

src/par_iter/find_first_last.rs

+        // divide the range in half
+        let old_lower_bound = self.lower_bound.get();
+        let median = old_lower_bound + ((self.upper_bound - old_lower_bound) / 2);
+        self.lower_bound.set(median);


Perhaps we can rename split_off() to split_left() or split_off_left() or split_left_off(), something like that.

nikomatsakis · 2016-12-21T09:58:39Z

src/par_iter/find_first_last.rs

+            // before, depending on timing. This means more consumers will
+            // continue to run than necessary, but the reducer will still ensure
+            // the correct value is returned.
+            self.best_found.swap(self.boundary, Ordering::Relaxed);


Yes, seems like it'd be worth using a compare-exchange here. I'd probably use compare_exchange_weak, but that's a minor thing.

nikomatsakis · 2016-12-21T10:01:03Z

src/par_iter/find_first_last.rs

+    fn full(&self) -> bool {
+        let best_found = self.best_found.load(Ordering::Relaxed);
+        match self.match_position {
+            MatchPosition::Leftmost => best_found < self.boundary,


Wait, am I confused? Why is self.item() sufficient? Don't we want to also stop if some other folder (to our left) has found an item?

nikomatsakis · 2016-12-21T10:03:14Z

src/par_iter/mod.rs

+        where FIND_OP: Fn(&Self::Item) -> bool + Sync {
+        find_first_last::find_last(self, predicate)
+    }
+
    #[doc(hidden)]
    #[deprecated(note = "parallel `find` does not search in order -- use `find_any`")]


let's update this deprecated note to say: "use find_any, find_first, or find_last" -- we might also consider calling find_first(), not sure.

nikomatsakis · 2016-12-21T10:05:31Z

This is nice =)

schuster · 2016-12-23T16:31:54Z

Just pushed with updates to address all feedback.

schuster · 2016-12-23T17:12:54Z

Actually, I just found a bug in the range calculation. Stand by for a fix.

schuster · 2016-12-23T18:16:48Z

Fixed. Turned out to be a problem with the test itself, not the code.

nikomatsakis

These changes look great! I left a few nits.

nikomatsakis · 2016-12-29T12:33:06Z

src/par_iter/find_first_last.rs

+// (i.e. its length is 1), in which case the split returns two consumers with
+// the same range. In that case both consumers will continue to consume all
+// their data regardless of whether a better match is found, but the reducer
+// will still return the correct answer.


Great comment =) Very clear.

nikomatsakis · 2016-12-29T12:33:59Z

src/par_iter/find_first_last.rs

+                                           &first_found);
+
+    // split until we have an indivisible range
+    let bits_in_usize = usize::min_value().count_zeros();


This was actually the most direct way I could find to get the number of bits in a usize, believe it or not. Is there some other way that I missed?

nikomatsakis · 2016-12-29T12:36:57Z

src/par_iter/find_first_last.rs

+    let right_first_folder = right_first_folder.consume(2).consume(3);
+    assert_eq!(first_reducer.reduce(left_first_folder.complete(),
+                                    right_first_folder.complete()),
+               Some(0));


one nit -- while we are testing all this, maybe we can keep around also the consumer where we expect full() to be true? For example:

for i in 0..bits_in_usize - 1 { first_consumer.split_off(); } let second_folder = first_consumer.split_off().into_folder(); ... assert!(!right_first_folder.full()); assert!(second_folder.full());

The name second_folder is kind of weak though...but whatever =)

Do you mean that you want to see a consumer from outside of the exhausted range, so that we can check that it's full when the consumer from inside the range finds a match? That wasn't quite what your example code does (you'd have to move the creation of second_folder above the loop, or something like that), but I think that's what you're going for.

Yes, that's what I meant.

nikomatsakis · 2016-12-29T12:37:08Z

src/par_iter/find_first_last.rs

+               Some(0));
+
+    // same test, but for find_last
+    let last_found = AtomicUsize::new(0);


can you break this into a distinct #[test]?

also, as a style nit, at least in rayon I would tend to convert this into a directory module (i.e., find_first_last/mod.rs and put the tests in find_first_last/test.rs). But .... I don't really care about this. Maybe just move them to the end of the file or something.

(I would prefer not to have an inline mod test { ... } module, as many people prefer. I find managing the imports more annoying that way. I'm idiosyncratic like that I guess.)

nikomatsakis · 2016-12-29T12:43:48Z

src/par_iter/find_first_last.rs

+                                                                        self.boundary,
+                                                                        Ordering::Relaxed,
+                                                                        Ordering::Relaxed);
+                if exchange_result.is_err() {


Actually, compare_exchange_weak gives you back the value that you failed to swap in its Err variant. So you could rewrite this as:

let mut current = self.best_found.load(Ordering::Relaxed); loop { if better_position(current, self.boundary, self.match_position) { break; } match self.best_found.compare_exchange_weak(current, self.boundary, Relaxed, Relaxed) { Ok(_) => { self.item = Some(item); break; } Err(v) => current = v, } }

This seems mildly clearer (and more efficient) to me (I also prefer the loop+break formulation over the while loop; it's a nit, but I find it easier for me to parse what is happening).

Agreed, I also find your formulation easier to understand.

nikomatsakis · 2016-12-29T12:49:58Z

So one question is whether we should rename split_off() in this PR to reflect its new role. We can probably do it in a later PR.

schuster · 2016-12-31T16:40:26Z

A later PR makes sense to me, too - it's a logically separate change. I could submit that PR as soon as this one goes through.

nikomatsakis · 2016-12-31T20:26:05Z

Looks great thanks @schuster !

Rename split_off to split_off_left (as discussed in PR #189)

Implement find_first/last, position_first/last

2d017b2

Add extra import (related to rayon-rs#173)

febc7ed

cuviper requested changes Dec 21, 2016

View reviewed changes

nikomatsakis requested changes Dec 21, 2016

View reviewed changes

schuster added 5 commits December 22, 2016 10:48

Refactor full calculations for find_first/last

48157c4

Loop to set best_found for find_first/last

d586aaa

Update docs and comments for find_first/last

fd191d8

Avoid clobbering earlier results in find_first

ec9499e

Test for index resolution exhaustion in find_first/last

aca9f9f

Improve the same-range test for find_first/last

17d2652

nikomatsakis requested changes Dec 29, 2016

View reviewed changes

schuster added 4 commits December 30, 2016 13:03

Split same_range test into two tests

dec880d

Test fullness of find_first/last consumers outside exhausted range

5c359ab

Rewrite the loop to set best_found in find_first/last

efeee21

Move find_first/last tests to a separate file

e2977b1

nikomatsakis approved these changes Dec 31, 2016

View reviewed changes

nikomatsakis merged commit 4614458 into rayon-rs:master Dec 31, 2016

schuster added a commit to schuster/rayon that referenced this pull request Jan 2, 2017

Rename split_off to split_off_left (as discussed in PR rayon-rs#189)

d20472c

schuster added a commit to schuster/rayon that referenced this pull request Jan 3, 2017

Rename split_off to split_off_left (as discussed in PR rayon-rs#189)

54166df

nikomatsakis added a commit that referenced this pull request Jan 3, 2017

Merge pull request #198 from schuster/rename_split_off

32f9cce

Rename split_off to split_off_left (as discussed in PR #189)

cuviper mentioned this pull request Jan 16, 2019

Add a parallel equivalent to Iterator::find_map #607

Closed

Implement find_first/last, position_first/last #189

Implement find_first/last, position_first/last #189

Conversation

schuster commented Dec 20, 2016

cuviper commented Dec 20, 2016

cuviper left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikomatsakis commented Dec 21, 2016

schuster commented Dec 23, 2016

schuster commented Dec 23, 2016

schuster commented Dec 23, 2016

nikomatsakis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikomatsakis commented Dec 29, 2016

schuster commented Dec 31, 2016

nikomatsakis commented Dec 31, 2016