diff --git a/FAQ.md b/FAQ.md index abd25d404..e66185cca 100644 --- a/FAQ.md +++ b/FAQ.md @@ -1,51 +1,47 @@ # Rayon FAQ -This file is for general questions that don't fit into the README or -crate docs. +This file is for general questions that don't fit into the README or crate docs. ## How many threads will Rayon spawn? -By default, Rayon uses the same number of threads as the number of -CPUs available. Note that on systems with hyperthreading enabled this -equals the number of logical cores and not the physical ones. +By default, Rayon uses the same number of threads as the number of CPUs +available. Note that on systems with hyperthreading enabled this equals the +number of logical cores and not the physical ones. If you want to alter the number of threads spawned, you can set the -environmental variable `RAYON_NUM_THREADS` to the desired number of -threads or use the +environmental variable `RAYON_NUM_THREADS` to the desired number of threads or +use the [`ThreadPoolBuilder::build_global` function](https://docs.rs/rayon/*/rayon/struct.ThreadPoolBuilder.html#method.build_global) method. ## How does Rayon balance work between threads? -Behind the scenes, Rayon uses a technique called **work stealing** to -try and dynamically ascertain how much parallelism is available and -exploit it. The idea is very simple: we always have a pool of worker -threads available, waiting for some work to do. When you call `join` -the first time, we shift over into that pool of threads. But if you -call `join(a, b)` from a worker thread W, then W will place `b` into -its work queue, advertising that this is work that other worker -threads might help out with. W will then start executing `a`. - -While W is busy with `a`, other threads might come along and take `b` -from its queue. That is called *stealing* `b`. Once `a` is done, W -checks whether `b` was stolen by another thread and, if not, executes -`b` itself. If W runs out of jobs in its own queue, it will look -through the other threads' queues and try to steal work from them. - -This technique is not new. It was first introduced by the -[Cilk project][cilk], done at MIT in the late nineties. The name Rayon -is an homage to that work. +Behind the scenes, Rayon uses a technique called **work stealing** to try and +dynamically ascertain how much parallelism is available and exploit it. The idea +is very simple: we always have a pool of worker threads available, waiting for +some work to do. When you call `join` the first time, we shift over into that +pool of threads. But if you call `join(a, b)` from a worker thread W, then W +will place `b` into its work queue, advertising that this is work that other +worker threads might help out with. W will then start executing `a`. + +While W is busy with `a`, other threads might come along and take `b` from its +queue. That is called *stealing* `b`. Once `a` is done, W checks whether `b` was +stolen by another thread and, if not, executes `b` itself. If W runs out of jobs +in its own queue, it will look through the other threads' queues and try to +steal work from them. + +This technique is not new. It was first introduced by the [Cilk project][cilk], +done at MIT in the late nineties. The name Rayon is an homage to that work. [cilk]: http://supertech.csail.mit.edu/cilk/ ## What should I do if I use `Rc`, `Cell`, `RefCell` or other non-Send-and-Sync types? -There are a number of non-threadsafe types in the Rust standard library, -and if your code is using them, you will not be able to combine it -with Rayon. Similarly, even if you don't have such types, but you try -to have multiple closures mutating the same state, you will get -compilation errors; for example, this function won't work, because -both closures access `slice`: +There are a number of non-threadsafe types in the Rust standard library, and if +your code is using them, you will not be able to combine it with Rayon. +Similarly, even if you don't have such types, but you try to have multiple +closures mutating the same state, you will get compilation errors; for example, +this function won't work, because both closures access `slice`: ```rust /// Increment all values in slice. @@ -54,9 +50,9 @@ fn increment_all(slice: &mut [i32]) { } ``` -The correct way to resolve such errors will depend on the case. Some -cases are easy: for example, uses of [`Rc`] can typically be replaced -with [`Arc`], which is basically equivalent, but thread-safe. +The correct way to resolve such errors will depend on the case. Some cases are +easy: for example, uses of [`Rc`] can typically be replaced with [`Arc`], which +is basically equivalent, but thread-safe. Code that uses `Cell` or `RefCell`, however, can be somewhat more complicated. If you can refactor your code to avoid those types, that is often the best way @@ -66,34 +62,33 @@ equivalents: - `Cell` -- replacement: `AtomicUsize`, `AtomicBool`, etc - `RefCell` -- replacement: `RwLock`, or perhaps `Mutex` -However, you have to be wary! The parallel versions of these types -have different atomicity guarantees. For example, with a `Cell`, you -can increment a counter like so: +However, you have to be wary! The parallel versions of these types have +different atomicity guarantees. For example, with a `Cell`, you can increment a +counter like so: ```rust let value = counter.get(); counter.set(value + 1); ``` -But when you use the equivalent `AtomicUsize` methods, you are -actually introducing a potential race condition (not a data race, -technically, but it can be an awfully fine distinction): +But when you use the equivalent `AtomicUsize` methods, you are actually +introducing a potential race condition (not a data race, technically, but it can +be an awfully fine distinction): ```rust let value = tscounter.load(Ordering::SeqCst); tscounter.store(value + 1, Ordering::SeqCst); ``` -You can already see that the `AtomicUsize` API is a bit more complex, -as it requires you to specify an +You can already see that the `AtomicUsize` API is a bit more complex, as it +requires you to specify an [ordering](https://doc.rust-lang.org/std/sync/atomic/enum.Ordering.html). (I -won't go into the details on ordering here, but suffice to say that if -you don't know what an ordering is, and probably even if you do, you -should use `Ordering::SeqCst`.) The danger in this parallel version of -the counter is that other threads might be running at the same time -and they could cause our counter to get out of sync. For example, if -we have two threads, then they might both execute the "load" before -either has a chance to execute the "store": +won't go into the details on ordering here, but suffice to say that if you don't +know what an ordering is, and probably even if you do, you should use +`Ordering::SeqCst`.) The danger in this parallel version of the counter is that +other threads might be running at the same time and they could cause our counter +to get out of sync. For example, if we have two threads, then they might both +execute the "load" before either has a chance to execute the "store": ``` Thread 1 Thread 2 @@ -104,26 +99,23 @@ tscounter.store(value+1); tscounter.store(value+1); // tscounter = X+1 // tscounter = X+1 ``` -Now even though we've had two increments, we'll only increase the -counter by one! Even though we've got no data race, this is still -probably not the result we wanted. The problem here is that the `Cell` -API doesn't make clear the scope of a "transaction" -- that is, the -set of reads/writes that should occur atomically. In this case, we -probably wanted the get/set to occur together. - -In fact, when using the `Atomic` types, you very rarely want a plain -`load` or plain `store`. You probably want the more complex -operations. A counter, for example, would use `fetch_add` to -atomically load and increment the value in one step. Compare-and-swap -is another popular building block. - -A similar problem can arise when converting `RefCell` to `RwLock`, but -it is somewhat less likely, because the `RefCell` API does in fact -have a notion of a transaction: the scope of the handle returned by -`borrow` or `borrow_mut`. So if you convert each call to `borrow` to -`read` (and `borrow_mut` to `write`), things will mostly work fine in -a parallel setting, but there can still be changes in behavior. -Consider using a `handle: RefCell>` like: +Now even though we've had two increments, we'll only increase the counter by +one! Even though we've got no data race, this is still probably not the result +we wanted. The problem here is that the `Cell` API doesn't make clear the scope +of a "transaction" -- that is, the set of reads/writes that should occur +atomically. In this case, we probably wanted the get/set to occur together. + +In fact, when using the `Atomic` types, you very rarely want a plain `load` or +plain `store`. You probably want the more complex operations. A counter, for +example, would use `fetch_add` to atomically load and increment the value in one +step. Compare-and-swap is another popular building block. + +A similar problem can arise when converting `RefCell` to `RwLock`, but it is +somewhat less likely, because the `RefCell` API does in fact have a notion of a +transaction: the scope of the handle returned by `borrow` or `borrow_mut`. So if +you convert each call to `borrow` to `read` (and `borrow_mut` to `write`), +things will mostly work fine in a parallel setting, but there can still be +changes in behavior. Consider using a `handle: RefCell>` like: ```rust let len = handle.borrow().len(); @@ -133,13 +125,12 @@ for i in 0 .. len { } ``` -In sequential code, we know that this loop is safe. But if we convert -this to parallel code with an `RwLock`, we do not: this is because -another thread could come along and do -`handle.write().unwrap().pop()`, and thus change the length of the -vector. In fact, even in *sequential* code, using very small borrow -sections like this is an anti-pattern: you ought to be enclosing the -entire transaction together, like so: +In sequential code, we know that this loop is safe. But if we convert this to +parallel code with an `RwLock`, we do not: this is because another thread could +come along and do `handle.write().unwrap().pop()`, and thus change the length of +the vector. In fact, even in *sequential* code, using very small borrow sections +like this is an anti-pattern: you ought to be enclosing the entire transaction +together, like so: ```rust let vec = handle.borrow(); @@ -159,11 +150,10 @@ for data in vec { } ``` -There are several reasons to prefer one borrow over many. The most -obvious is that it is more efficient, since each borrow has to perform -some safety checks. But it's also more reliable: suppose we modified -the loop above to not just print things out, but also call into a -helper function: +There are several reasons to prefer one borrow over many. The most obvious is +that it is more efficient, since each borrow has to perform some safety checks. +But it's also more reliable: suppose we modified the loop above to not just +print things out, but also call into a helper function: ```rust let vec = handle.borrow(); @@ -172,8 +162,8 @@ for data in vec { } ``` -And now suppose, independently, this helper fn evolved and had to pop -something off of the vector: +And now suppose, independently, this helper fn evolved and had to pop something +off of the vector: ```rust fn helper(...) { @@ -181,36 +171,33 @@ fn helper(...) { } ``` -Under the old model, where we did lots of small borrows, this would -yield precisely the same error that we saw in parallel land using an -`RwLock`: the length would be out of sync and our indexing would fail -(note that in neither case would there be an actual *data race* and -hence there would never be undefined behavior). But now that we use a -single borrow, we'll see a borrow error instead, which is much easier -to diagnose, since it occurs at the point of the `borrow_mut`, rather -than downstream. Similarly, if we move to an `RwLock`, we'll find that -the code either deadlocks (if the write is on the same thread as the -read) or, if the write is on another thread, works just fine. Both of -these are preferable to random failures in my experience. +Under the old model, where we did lots of small borrows, this would yield +precisely the same error that we saw in parallel land using an `RwLock`: the +length would be out of sync and our indexing would fail (note that in neither +case would there be an actual *data race* and hence there would never be +undefined behavior). But now that we use a single borrow, we'll see a borrow +error instead, which is much easier to diagnose, since it occurs at the point of +the `borrow_mut`, rather than downstream. Similarly, if we move to an `RwLock`, +we'll find that the code either deadlocks (if the write is on the same thread as +the read) or, if the write is on another thread, works just fine. Both of these +are preferable to random failures in my experience. ## But wait, isn't Rust supposed to free me from this kind of thinking? -You might think that Rust is supposed to mean that you don't have to -think about atomicity at all. In fact, if you avoid interior -mutability (`Cell` and `RefCell` in a sequential setting, or -`AtomicUsize`, `RwLock`, `Mutex`, et al. in parallel code), then this -is true: the type system will basically guarantee that you don't have -to think about atomicity at all. But often there are times when you -WANT threads to interleave in the ways I showed above. - -Consider for example when you are conducting a search in parallel, say -to find the shortest route. To avoid fruitless search, you might want -to keep a cell with the shortest route you've found thus far. This -way, when you are searching down some path that's already longer than -this shortest route, you can just stop and avoid wasted effort. In -sequential land, you might model this "best result" as a shared value -like `Rc>` (here the `usize` represents the length of best -path found so far); in parallel land, you'd use a `Arc`. +You might think that Rust is supposed to mean that you don't have to think about +atomicity at all. In fact, if you avoid interior mutability (`Cell` and +`RefCell` in a sequential setting, or `AtomicUsize`, `RwLock`, `Mutex`, et al. +in parallel code), then this is true: the type system will basically guarantee +that you don't have to think about atomicity at all. But often there are times +when you WANT threads to interleave in the ways I showed above. + +Consider for example when you are conducting a search in parallel, say to find +the shortest route. To avoid fruitless search, you might want to keep a cell +with the shortest route you've found thus far. This way, when you are searching +down some path that's already longer than this shortest route, you can just stop +and avoid wasted effort. In sequential land, you might model this "best result" +as a shared value like `Rc>` (here the `usize` represents the length +of best path found so far); in parallel land, you'd use a `Arc`. ```rust fn search(path: &Path, cost_so_far: usize, best_cost: &AtomicUsize) { @@ -222,5 +209,5 @@ fn search(path: &Path, cost_so_far: usize, best_cost: &AtomicUsize) { } ``` -Now in this case, we really WANT to see results from other threads -interjected into our execution! +Now in this case, we really WANT to see results from other threads interjected +into our execution!