-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FR: Parallel to sequential iterator #210
Comments
Hmm. It seems to me that you'd still need a buffer big enough for the whole iteration, as if you did a normal I wonder if futures (#193) can help with this? I'm thinking roughly that each time we split, it could be represented as some kind of |
In the worst case, it would devolve into just collecting into a vec. But usually, the first item will be finished early and so on. But you do need some sort of buffer. I'm not sure whether it makes sense to have one big buffer, or a linked list of buffers (one per split consumer). It depends on how much you care about being able to drop parts of the buffer early. In the worst case, you still need storage equal to the entire list of results, so it probably won't make much difference either way. |
Hmm, I think the right answer is perhaps a futures stream. That is, you could convert a par iter into a stream. Streams are the next thing that I want to figure out after #212 lands. |
But it'd be worth drilling a bit more into what you want. In particular, do you want this parallel computation to run asychronously -- that is, disconnected from any particular stack frame? Or is it part of some function? Reading your original request, it sounds like you want something like |
Yes, foreach is also ok. |
This would be very useful for me, too. My use-case: programs that read lines from a file (sequentially), perform an expensive computation independently on each line (parallelizable), and then must print out the results in exactly the same order. |
Why not sort the output later then? Assign an atomic number to each line being read, perform the computation and write that number to the output. Later, when data is processed all you need to do is to sort lines by that number. Much like map-reduce. Of course, that may not be convenient solution for you, so just in case. |
I'm not sure about jswrenn's case, but in my case, I want to access the first result as soon as it is ready, rather than having to wait until the entire vector is processed. |
In my case, I'm piping stdout to another process which reads input line-by-line. |
I would be interested in this too. I'd like to be able to get the results out, but order isn't important to me. I am using rayon to |
Would it work to implement this as a |
@rocallahan The current design requires a parallel |
FWIW I tried implementing parallel ordered output using a Rayon |
I think my use case would be very similar to that of @jswrenn, so just to put it in terms of code -- I have this: stdin_guard
.lines()
.filter_map(|result| match result {
Ok(line) => Some(expensive_cpu_bound_operation(&line)),
Err(_) => panic!("error reading line"),
})
.for_each(|line| {
writeln!(buffered_stdout, "{}", line).expect("error writing line");
}); And I would like to be able to do something like this: stdin_guard
.lines()
.par_iter() // switch to parallel iteration
.filter_map(|result| match result {
Ok(line) => Some(expensive_cpu_bound_operation(&line)),
Err(_) => panic!("error reading line"),
})
.seq_iter() // switch back
.for_each(|line| {
writeln!(buffered_stdout, "{}", line).expect("error writing line");
}); Which would make the operation as easy and straightforward as the first example in Rayon's README, which is very compelling :) Collecting + sorting is not really an option, as the input stream is potentially infinite (well, very long, in practice). |
This seems like it could be implemented by "seq_iter" simply creating a growable, circular queue that is then processed as the results of the "seq_iter" by the next function ("for_each" in this case). Ideally, "seq_iter" could take an initial size and/or a maximum size. If it grows to the maximum size, it would apply back-pressure on the parallel processing until it was drained sufficiently (high-water mark/low-water mark sort of thing). Of course, this would require that "seq_iter" has some mechanism provided by Rayon for applying back-pressure. This back-pressure would cause the number of worker threads to suspend until given the OK to proceed. Would something like that work for you? |
Yes, that sounds promising :) The only thing is, there's nothing in there guaranteeing the ordering of the items after Though now that I think of it, it may not be necessary in some applications, so carting around the ordering information and reconstructing it should probably be opt-in... |
The only useful way to still have ordering after the parallel processing is to:
Either of those cases will require much more space and be much less likely to gain much benefit overall from the parallel processing. The single circular, growable queue has the benefit of maximizing the concurrency of the parallel part while allowing the processing rate of the sequential part to slow the parallel processing down (if needed) to prevent too much memory usage. Above, option number 1 is equivalent to just collecting to a Sorted Map and then processing those results, so, if that is your requirement, just do that. Option 2 could, in some circumstances be useful, but, the likelihood of it using significant amounts of memory while requiring significant processing overhead makes it likely to be unuseful. |
Collecting all the results is unfortunately not an option for me :( I thought maybe it would be possible to wrap all of the items in Futures and pass these through in the same order while waiting for rayon's thread pool to do the work. In the But my mental model of how futures work is still hazy at best, maybe they simply don't allow you to do this, or the overhead would dwarf the gains of parallelization, or maybe this whole idea stinks of deadlock... |
Rayon works like this:
Given that, do you see how keeping things sorted just wouldn't work without significant overhead that would likely ruin the gains from parallelism? |
@gbutler69 Thanks for taking the time to spell this out! |
Is there a way to do this (non-parallel folding in my case) if I do not require the order of items to stay the same? |
Unordered, you could use a channel. Spawn a parallel |
Noting that if you try to do that naively, you'll run into the fact that |
Hi, everyone! I'm new to Rayon and also Rust. I made a piece of code which may solve this feature. ImplementationHere is the implementation. Everything is in lib.rs below. UsageAPI looks like the following. // Create a par iter
let par_iter = (10..20).into_par_iter().map(|x| x * 2);
// Convert to iter
let iter = par_iter.into_seq_iter(num_cpus::get());
// Iterate by normal for
for x in iter {
println!("item: {}", x);
} OutputHere is the output.
CI statusIf it's useful, I'd like Rayon to have this feature officially. |
OK, so as you commented, that work is based on this post: That idea is to feed a channel of enumerated values into a binary min-heap, and wait for the appropriate next index to come through. The biggest thing I fear with making this "official" is that it won't compose well within the pool. That is, if the sequentializing receiver is on a thread in the pool, it's a bad idea to let that block on the channel. Maybe we can make that use Not to say this is insurmountable, but we need to be careful. |
I think there are two concerns here depending on if you want order to be preserved or not. Both are valid use cases. For the non-preserving case I want to get items as they finish. I think it is a common use case to do something serially with the results of your computation. Basically I want a better, and blessed version of: fn into_iter<T>(iter: impl rayon::ParallelIterator<Item=T>) -> Iterator<Item=T> {
let (send, recv) = std::sync::mpsc::sync_channel();
thread::spawn(move || iter.for_each(|el| { let _ = send.send(el); });
recv.into_iter()
} I think the above is pretty good but it can be improved by avoiding the extra thread and canceling the rest of the parent iterator rather then computing the remaining elements then throwing them away. |
In case there are others who would like to see an example of printing the parallel iterator to a buffered stdout (unordered output) via a let json_lines = unimplemented!(); // a parallel iterator of strings
let (send, recv) = std::sync::mpsc::sync_channel(rayon::current_num_threads());
// Spawn a thread that is dedicated to printing to a buffered stdout
let thrd = std::thread::spawn(|| {
let stdout = io::stdout();
let lock = stdout.lock();
let mut writer = BufWriter::new(lock);
for json in recv.into_iter() {
let _ = writeln!(writer, "{}", json);
}
});
let send = json_lines.try_for_each_with(send, |s, x| s.send(x));
if let Err(e) = send {
eprintln!("Unable to send internal data: {:?}", e);
}
if let Err(e) = thrd.join() {
eprintln!("Unable to join internal thread: {:?}", e);
} |
Recently encountered this issue myself and discussed some workarounds that may be helpful to others encountering this limitation of Rayon. You can use I agree this feature would be very useful (e.g. for computing derivatives or convolving with kernels), but agree with above comments that we would want to implement this carefully in a way that is performant and composes well within the threadpool. |
In my case I had a iterator that is processed with a heavy operation in parallel, and then needs to be read in chunks in a sequential loop. So I basically wanted a let par_iter = walkdir::WalkDir::new("...pathname")
.into_iter()
.par_bridge()
.map(heavy_operation); // heavy operation in parallel
for e in CollChunks::new(par_iter, 1000, 1000) {
// collected into sequential chunks
println!("got chunk with {} items", e.len());
} and here's the implementation: struct CollChunks<T: Send> {
chunk_size: usize,
cur_chunk: Vec<T>,
receiver: std::sync::mpsc::IntoIter<T>
}
impl<T: Send + 'static> CollChunks<T> {
fn new<I>(par_iter: I, chunk_size: usize, queue_len: usize) -> Self
where
I: IntoParallelIterator<Item = T>,
<I as rayon::iter::IntoParallelIterator>::Iter: 'static,
{
let par_iter = par_iter.into_par_iter();
let (sender, receiver) = std::sync::mpsc::sync_channel(queue_len);
// send all the items in parallel
std::thread::spawn(move || {
par_iter.for_each(|ele| sender.send(ele).expect("receiver gone?"))
});
let coll = CollChunks {
receiver: receiver.into_iter(),
chunk_size,
cur_chunk: Vec::with_capacity(chunk_size),
};
coll
}
}
impl<T: Send> Iterator for CollChunks<T> {
type Item = Vec<T>;
fn next(&mut self) -> Option<Self::Item> {
// receive items until we have a full buffer or sender is done
while self.cur_chunk.len() < self.chunk_size {
match self.receiver.next() {
Some(e) => self.cur_chunk.push(e),
None => {
if self.cur_chunk.len() > 0 {
break; // last chunk
} else {
return None; // done
}
}
}
}
let mut new_vec = Vec::with_capacity(self.chunk_size);
std::mem::swap(&mut self.cur_chunk, &mut new_vec);
return Some(new_vec);
}
} No guarantees of course, but seems to work |
In case someone is looking here in need of parallel processing over iterator items preserving order (and without storing the data in the collection first), I've been working on a library that does it. Unlike rayon it's not work-stealing based, which comes with different tradeoffs: https://github.com/dpc/pariter |
I've also liked to converge parallel iterator results into sequential one. The implementation is similar to @phiresky. I've managed to avoid Additionally production from a parallel iterator starts only when the sequential iterator is pulled. Still, I've issues with this code. For example, I don't know how to check how it will behave when an issue while sending or receiving will happen. use core::option::Option;
use std::iter;
use crossbeam::channel;
use crossbeam::channel::{Receiver, Sender};
use rayon::iter::ParallelIterator;
pub trait IntoSequentialIteratorEx<'a, T: Sized>: Sized {
fn into_seq_iter(self) -> Box<dyn 'a + Iterator<Item=T>>;
}
impl<'a, T, PI> IntoSequentialIteratorEx<'a, T> for PI
where
T: 'a + Send,
PI: 'a + ParallelIterator<Item=T>,
{
fn into_seq_iter(self) -> Box<dyn 'a + Iterator<Item=T>> {
let (sender, receiver) = channel::unbounded();
Box::new(deferred_first_element(self, sender, receiver.clone())
.chain(deferred_remaining_elements(receiver)))
}
}
fn deferred_first_element<'a, T: 'a + Send, PI: 'a + ParallelIterator<Item=T>>(
par_iter: PI,
sender: Sender<T>,
receiver: Receiver<T>) -> Box<dyn 'a + Iterator<Item=T>>
{
let deferred = iter::once(Box::new(move || {
crossbeam::scope(|s| {
s.spawn(|_| {
par_iter.for_each(|element| {
sender.send(element).unwrap();
});
drop(sender);
});
}).unwrap();
receiver.recv().ok()
}) as Box<dyn FnOnce() -> Option<T>>);
Box::new(deferred
.map(|f| {
f()
})
.filter(Option::is_some)
.map(Option::unwrap))
}
fn deferred_remaining_elements<'a, T: 'a + Send>(receiver: Receiver<T>) -> Box<dyn 'a + Iterator<Item=T>> {
Box::new(
iter::repeat_with(move || {
receiver.recv().ok()
})
.take_while(Option::is_some)
.map(Option::unwrap))
}
#[cfg(test)]
mod tests {
use itertools::Itertools;
use rayon::iter::{IntoParallelIterator, ParallelBridge};
use super::*;
#[test]
fn does_noting_for_an_empty_iterator() {
// when
let result = Vec::<i32>::new().into_par_iter().into_seq_iter().collect_vec();
// then
assert_eq!(Vec::<i32>::new(), result);
}
#[test]
fn iterates_over_whole_iterator_range() {
// given
let elements = 120;
// when
let result = iter::repeat(12)
.take(elements)
.par_bridge()
.into_seq_iter()
.count();
// then
assert_eq!(elements, result);
}
} |
If you're only doing I/O and not doing any additional struct OutputToken;
let m = std::sync::Mutex::new(OutputToken);
get_some_par_iter()
.for_each(|thing| {
let _token = m.lock().expect("Output mutex was not poisoned");
print_thing(&thing);
}); |
I described various solutions to this problem here: #1070 |
Here is my solution: #1071 |
Now I think solution is https://crates.io/crates/pariter ( https://dpc.pw/adding-parallelism-to-your-rust-iterators ) |
Hello everyone 👋 I just release a(nother) small crate, named |
It would be nice if there was a way to convert a parallel to a sequential iterator. Fold/reduce can only be used in cases where the reduction is associative, which rules out e.g. performing I/O on every element in order, as the I/O is a global state. One possibility is to collect() into a vec, but then you have to wait until every element is computed before you can process the first element of the results.
The text was updated successfully, but these errors were encountered: