diff --git a/README.md b/README.md index c1fa8763a..00874c523 100644 --- a/README.md +++ b/README.md @@ -35,6 +35,8 @@ Eio replaces existing concurrency libraries such as Lwt * [Running processes](#running-processes) * [Time](#time) * [Multicore Support](#multicore-support) + * [Domain Manager](#domain-manager) + * [Executor Pool](#executor-pool) * [Synchronisation Tools](#synchronisation-tools) * [Promises](#promises) * [Example: Concurrent Cache](#example-concurrent-cache) @@ -936,7 +938,12 @@ The mock backend provides a mock clock that advances automatically where there i OCaml allows a program to create multiple *domains* in which to run code, allowing multiple CPUs to be used at once. Fibers are scheduled cooperatively within a single domain, but fibers in different domains run in parallel. -This is useful to perform CPU-intensive operations quickly. +This is useful to perform CPU-intensive operations quickly +(though extra care needs to be taken when using multiple cores; see the [Multicore Guide](./doc/multicore.md) for details). + +### Domain Manager + +[Eio.Domain_manager][] provides a basic API for spawning domains. For example, let's say we have a CPU intensive task: ```ocaml @@ -950,7 +957,7 @@ let sum_to n = !total ``` -We can use [Eio.Domain_manager][] to run this in a separate domain: +We can use the domain manager to run this in a separate domain: ```ocaml let main ~domain_mgr = @@ -977,6 +984,10 @@ let main ~domain_mgr = - : unit = () ``` +
+ +
+ Notes: - `traceln` can be used safely from multiple domains. @@ -988,8 +999,78 @@ Notes: - `Domain_manager.run` waits for the domain to finish, but it allows other fibers to run while waiting. This is why we use `Fiber.both` to create multiple fibers. -For more information, see the [Multicore Guide](./doc/multicore.md). +### Executor Pool + +An [Eio.Executor_pool][] distributes jobs among a pool of domain workers. +Domains are reused and can execute multiple jobs concurrently. + +Each domain worker starts new jobs until the total `~weight` of its running jobs reaches `1.0`. +The `~weight` represents the expected proportion of a CPU core that the job will take up. +Jobs are queued up if they cannot be started immediately due to all domain workers being busy (`>= 1.0`). + +This is the recommended way of leveraging OCaml 5's multicore capabilities. + +Usually you will only want one pool for an entire application, so the pool is typically created when the application starts: + + +```ocaml +let () = + Eio_main.run @@ fun env -> + Switch.run @@ fun sw -> + let pool = + Eio.Executor_pool.create + ~sw (Eio.Stdenv.domain_mgr env) + ~domain_count:4 + in + main ~pool +``` + +The pool starts its domain workers immediately upon creation. + +The pool will not block our switch `sw` from completing; +when the switch finishes, all domain workers and running jobs are cancelled. +`~domain_count` is the number of domain workers to create. +The total number of domains should not exceed `Domain.recommended_domain_count` or the number of cores on your system. + +We can run the previous example using an Executor Pool like this: + +```ocaml +let main ~domain_mgr = + Switch.run @@ fun sw -> + let pool = + Eio.Executor_pool.create ~sw domain_mgr ~domain_count:4 + in + let test n = + traceln "sum 1..%d = %d" n + (Eio.Executor_pool.submit_exn pool ~weight:1.0 + (fun () -> sum_to n)) + in + Fiber.both + (fun () -> test 100000) + (fun () -> test 50000) +``` + + +```ocaml +# Eio_main.run @@ fun env -> + main ~domain_mgr:(Eio.Stdenv.domain_mgr env);; ++Starting CPU-intensive task... ++Starting CPU-intensive task... ++Finished ++sum 1..50000 = 1250025000 ++Finished ++sum 1..100000 = 5000050000 +- : unit = () +``` +`~weight` is the anticipated proportion of a CPU core used by the job. +In other words, the fraction of time actively spent executing OCaml code, not just waiting for I/O or system calls. +In the above code snippet we use `~weight:1.0` because the job is entirely CPU-bound: it never waits for I/O or other syscalls. +`~weight` must be `>= 0.0` and `<= 1.0`. +Example: given an IO-bound job that averages 2% of one CPU core, pass `~weight:0.02`. + +Each domain worker starts new jobs until the total `~weight` of its running jobs reaches `1.0`. + ## Synchronisation Tools Eio provides several sub-modules for communicating between fibers, @@ -1241,6 +1322,8 @@ The `Fiber.check ()` checks whether the worker itself has been cancelled, and ex It's not actually necessary in this case, because if we continue instead then the following `Stream.take` will perform the check anyway. +Note: in a real system, you would probably use [Eio.Executor_pool][] for this rather than making your own pool. + ### Mutexes and Semaphores Eio also provides `Mutex` and `Semaphore` sub-modules. @@ -1805,6 +1888,7 @@ Some background about the effects system can be found in: [Eio.Path]: https://ocaml-multicore.github.io/eio/eio/Eio/Path/index.html [Eio.Time]: https://ocaml-multicore.github.io/eio/eio/Eio/Time/index.html [Eio.Domain_manager]: https://ocaml-multicore.github.io/eio/eio/Eio/Domain_manager/index.html +[Eio.Executor_pool]: https://ocaml-multicore.github.io/eio/eio/Eio/Executor_pool/index.html [Eio.Promise]: https://ocaml-multicore.github.io/eio/eio/Eio/Promise/index.html [Eio.Stream]: https://ocaml-multicore.github.io/eio/eio/Eio/Stream/index.html [Eio_posix]: https://ocaml-multicore.github.io/eio/eio_posix/Eio_posix/index.html diff --git a/doc/multicore.md b/doc/multicore.md index 78ff50396..2536979e9 100644 --- a/doc/multicore.md +++ b/doc/multicore.md @@ -4,21 +4,22 @@ * [Introduction](#introduction) * [Problems with Multicore Programming](#problems-with-multicore-programming) - * [Optimisation 1: Caching](#optimisation-1-caching) - * [Optimisation 2: Out-of-Order Execution](#optimisation-2-out-of-order-execution) - * [Optimisation 3: Compiler Optimisations](#optimisation-3-compiler-optimisations) - * [Optimisation 4: Multiple Cores](#optimisation-4-multiple-cores) + * [Optimisation 1: Caching](#optimisation-1-caching) + * [Optimisation 2: Out-of-Order Execution](#optimisation-2-out-of-order-execution) + * [Optimisation 3: Compiler Optimisations](#optimisation-3-compiler-optimisations) + * [Optimisation 4: Multiple Cores](#optimisation-4-multiple-cores) * [The OCaml Memory Model](#the-ocaml-memory-model) - * [Atomic Locations](#atomic-locations) - * [Initialisation](#initialisation) -* [Guidelines](#guidelines) + * [Atomic Locations](#atomic-locations) + * [Initialisation](#initialisation) +* [Safety Guidelines](#safety-guidelines) +* [Performance Guidelines](#performance-guidelines) * [Further Reading](#further-reading) ## Introduction -OCaml 5.00 adds support for using multiple CPU cores in a single OCaml process. +OCaml 5.0 adds support for using multiple CPU cores in a single OCaml process. An OCaml process is made up of one or more *domains*, and the operating system can run each domain on a different core, so that they run in parallel. This can make programs run much faster, but also introduces new ways for programs to go wrong. @@ -446,7 +447,7 @@ So it will always see a correct list: - : unit = () ``` -## Guidelines +## Safety Guidelines It's important to understand the above to avoid writing incorrect code, but there are several general principles that avoid most problems: @@ -502,6 +503,28 @@ Finally, note that OCaml remains type-safe even with multiple domains. For example, accessing a `Queue` in parallel from multiple domains may result in a corrupted queue, but it won't cause a segfault. +## Performance Guidelines + +The following recommendations will help you extract as much performance as possible from your hardware: + +- There's a certain overhead associated with placing execution onto another domain, + but that overhead will be paid off quickly if your job takes at least a few milliseconds to complete. + Jobs that complete under 2-5ms may not be worth running on a separate domain. +- Similarly, jobs that are 100% I/O-bound may not be worth running on a separate domain. + The small initial overhead is simply never recouped. +- If your program never hits 100% CPU usage, it's unlikely that parallelizing it will improve performance. +- Try to avoid reading or writing to memory that's modified by other domains after the start of your job. + Ideally, your jobs shouldn't need to interact with other domains' "working data". + Aim to make your jobs as independent as possible. + If unavoidable, the [Saturn](https://github.com/ocaml-multicore/saturn) library offers a collection of efficient threadsafe data structures. +- It's often easier to design code to be multithreading friendly from the start + (by making longer, independent jobs) than by refactoring existing code. +- There's a cost associated with creating a domain, so try to use the same domains for longer periods of time. + `Eio.Executor_pool` takes care of this automatically. +- Obviously, reuse the same executor pool whenever possible! Don't recreate it over and over. +- Having a large number of domains active at the same time imposes additional overhead on + both the OS scheduler and the OCaml runtime, even if those domains are idle. + ## Further Reading - [OCaml Memory Model][] describes the full details of the memory model. diff --git a/doc/traces/Makefile b/doc/traces/Makefile index 540f1393f..ddc8a175e 100644 --- a/doc/traces/Makefile +++ b/doc/traces/Makefile @@ -1,4 +1,4 @@ -all: both-posix.svg cancel-posix.svg switch-mock.svg net-posix.svg +all: both-posix.svg cancel-posix.svg switch-mock.svg net-posix.svg multicore-posix.svg %.svg: %.fxt eio-trace render "$<" diff --git a/doc/traces/multicore-posix.fxt b/doc/traces/multicore-posix.fxt new file mode 100644 index 000000000..db04dd8c9 Binary files /dev/null and b/doc/traces/multicore-posix.fxt differ diff --git a/doc/traces/multicore-posix.svg b/doc/traces/multicore-posix.svg new file mode 100644 index 000000000..ec12675ad --- /dev/null +++ b/doc/traces/multicore-posix.svg @@ -0,0 +1,1490 @@ + +