Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lwt_domain: an interface to multicore parallelism #860

Merged
merged 22 commits into from
Nov 25, 2021

Conversation

Sudha247
Copy link
Contributor

Synopsis: The PR aims to add a module Lwt_domain that enables offloading computations to many CPU cores via Multicore's Domains.

Lwt_domain interface

This patch aims to add a new module Lwt_domain. The API is similar to Lwt_preemptive, the major difference being Lwt_preemptive runs compuatations in a new systhread and Lwt_domain computations in a new domain.

In effect, the Lwt_domain runs OCaml computations in parallel on multiple CPU cores, which is not the case with Lwt_preemptive. This is because system threads are run on a single CPU core despite being run concurrently. On the other hand, multicore domains run on the available cores.

Implementation

A pool of domains is created by the simple_init function, with the default number of domains being 4. The created domains will be alive and waiting for tasks till the time main loop is alive. The number of live domains can be adjusted via the set_bounds function.

Unlike Lwt_preemptive which creates new systhreads frequently, all live domains are reused here because repeatedly creating and shutting down new domains is an expensive process, which could be avoided.

When detach function is called it obtains a new worker from the pool of domains and the function is executed in that worker domain. After the execution is done, the worker domain will again be ready to take tasks.

A separate queue of jobs is maintained that are supposed to be executed on the main domain, being called from run_in_main.

Benchmark

A simple server that returns Fibonacci number of a given input was ported to run in parallel. The server and client code can be found here. The benchmark was run with 10 clients and 10 requests per client, it's time is shown below. Individual timings can be found in the blog post mentioned below.

Time graph of parallel Fibonacci server

image

Side note

Unix.fork is not compatible with multiple domains. Hence, the test-suite is modified to run sequentially and all the tests that run Unix.fork are run before the Lwt_domain test.


This module is intended to work with OCaml 5.0 with parallelism support. It can be used now with Multicore OCaml obtained from multicore-opam. Any suggestions/feedback welcome, benchmarks from potential applications of this module will be appreciated too!

Further reading

@Sudha247 Sudha247 marked this pull request as draft May 24, 2021 11:15
@raphael-proust
Copy link
Collaborator

Hello,

Thanks a lot for this PR! I'm looking forward to multicore-lwt!

I've just had a quick look. (I'll do a longer review later.) I have a few questions/comments:

  • You added Lwt_domain to the unix/ domain. Is there a reason for this? Is there a dependency from the multicore part of the runtime on the OS? Or is it by similarity with Lwt_preemptive that it ended up there? In general we want to put code in core/ if it doesn't rely on unix.

    Currently there's a call to lwt_unix, for a notification thing. Is it for convenience (the function with the right signature was there) or intrisically dependent on a unix feature?

  • We'll need to grow the test suite a bit just to buff up the coverage. But that's not urgent at all.

@avsm
Copy link
Collaborator

avsm commented May 25, 2021

Lwt_offload should be in core/ indeed -- it's not part of the Unix-specific part of multicore OCaml.

@Sudha247 it would be good to get more of your benchmarks into the test suite here. For example the fib run would be ideal to check that you get the same results and also provide an example.

I'm just discussing with caml-devel the right way to be able to release this with Lwt_offload as an optional module -- I'll update when I find consensus.

@Sudha247
Copy link
Contributor Author

Thanks for the comments @raphael-proust and @avsm!

Agree about moving this module to core. The notification call is just for convenience, I'll work on removing this part after which there won't be any dependency on lwt_unix, so then the module will be ready to be moved to core.

Having more tests for the module would certainly be useful, I'll add the fib example to the testsuite.

@Sudha247
Copy link
Contributor Author

Sudha247 commented Jun 4, 2021

Some updates on the latest commit are:

  • Ported the Lwt_domain module to domainslib Task pool rather than managing creating domains on the module itself. This reduces duplication with domainslib.
  • Added a couple more tests - a test for run_in_main and a shorter running version of the Fibonacci benchmark.

With the new update, there's a slight digression from the Lwt_preemptive module; since the process of creating and destroying domains is expensive, Lwt_domain creates a fixed number of worker domains onto which tasks will be offloaded to, so there's no notion of minimum and maximum number of domains.

Benchmarks

Some benchmark results with the updates are below:
Benchmark: Computes fib 42 100 times.

Time

Sequential version: 205.397s
Preemptive threads version: 210.665s
Parallel version (with Lwt_domain)

The results were obtained on an Intel Xeon Gold 5120 processor with 24 isolated cores.

Cores Time(s) Speedup
1 212.72 0.9655744641
2 106.34 1.931512131
4 53.15 3.864477893
8 27.64 7.431150507
12 19.13 10.73690538
16 14.89 13.79429147
20 10.64 19.30422932
24 11.02 18.63856624

image

Note: There's still a dependency on Lwt_unix due to calling notifications. I'm trying to replicate that nicely with Domainslib channels. Will update the PR and move the module to core when it's done.

src/unix/lwt_domain.mli Outdated Show resolved Hide resolved
src/unix/lwt_domain.ml Outdated Show resolved Hide resolved
src/unix/lwt_domain.mli Outdated Show resolved Hide resolved
@raphael-proust
Copy link
Collaborator

This module is very promising (pun intended).

I'll work on packaging: how to ship this module without introducing a hard dependence on OCaml.5.0. I think it'll just need to be a separate opam package lwt-domain or something of the sort.

@Sudha247
Copy link
Contributor Author

Sudha247 commented Sep 9, 2021

Thanks a lot for the comments @raphael-proust! I'm doing another pass to update documentation and remove obsolete parts in the PR. I'll think through some of these open-ended questions and respond to them.

@avsm
Copy link
Collaborator

avsm commented Sep 9, 2021

It looks like there is a real error here in the CI:

File "src/unix/lwt_domain.ml", line 24, characters 42-43:
24 | let pool = ref (T.setup_pool ~num_domains:0)
                                               ^
Error: The function applied to this argument has type
         num_additional_domains:int -> Domainslib__Task.pool

Is this expected @Sudha247?

@Sudha247
Copy link
Contributor Author

Needed to bump domainslib to the latest version and change the argument to ~num_additional_domains, it's fixed now.

src/unix/lwt_domain.ml Outdated Show resolved Hide resolved
src/unix/lwt_domain.mli Outdated Show resolved Hide resolved
@avsm
Copy link
Collaborator

avsm commented Sep 24, 2021

We need to define whether the Lwt.async exception hook is supported or not when on a different domain.

src/unix/lwt_domain.mli Outdated Show resolved Hide resolved
@raphael-proust
Copy link
Collaborator

I toyed around with replacing the Lwt_unix.condition signaling mechanism with a lower-level, no-Unix version, but I wasn't satisfied with the result. I think that we can move the Lwt_domain module outside of unix/ and into core/ but we might need

  • to add some new global data-structure in lwt.ml (similar to paused but for detached computation instead),
  • add some interaction in the loop of Lwt_main.run to check on those (again, similar to pause),
  • add some other mechanism in the js-of-ocaml lwt runtime.

In my opinion, this can wait: we should focus on getting something that works and we can later get it to work better.

@raphael-proust
Copy link
Collaborator

We need to define whether the Lwt.async exception hook is supported or not when on a different domain.

The documentation about the exception is explicit. But to make sure the situation re: the async exception hook is clear, we could add a paragraph that explains what happens if we do Lwt.dont_wait (fun () -> Lwt_domain.detach …) (fun exc -> …) and if we do Lwt.async (Lwt_domain.detach …).

We can add this later because it can be inferred from the current documentation.

Copy link
Contributor

@ygrek ygrek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't depend on any lwt internals and could be just a useful separate small library until ocaml 5 is released?

src/unix/lwt_domain.ml Outdated Show resolved Hide resolved
src/unix/lwt_domain.ml Outdated Show resolved Hide resolved
@smorimoto
Copy link
Member

After merging #894, you can rebase this PR to the latest master to actually test the multicore one with CI.

@Sudha247
Copy link
Contributor Author

After merging #894, you can rebase this PR to the latest master to actually test the multicore one with CI.

That's great! I've reabsed the PR, it now includes CI check for 4.12.0+domains.

Copy link
Contributor

@yawaramin yawaramin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions.

src/unix/lwt_domain.ml Outdated Show resolved Hide resolved
src/unix/lwt_domain.ml Outdated Show resolved Hide resolved
src/unix/lwt_domain.ml Outdated Show resolved Hide resolved
src/unix/lwt_domain.ml Outdated Show resolved Hide resolved
@yawaramin
Copy link
Contributor

Thinking about this design a bit more; I think we also need a convenience function:

val Lwt.domain : (* or, detach/blocking/… *)
  ?pool:Lwt_domain.pool ->
  ('a -> 'b) ->
  'a ->
  'b Lwt.t

In a typical application, we might want to run a blocking computation at various points in the code. It would become a hassle to set up a domain pool at the application’s top layer and thread that through to all lower layers which needed it. Lwt should set up a domain pool by default, and allow users to override it if necessary.

We can add some docs to the effect that ‘promises are for non-blocking I/O but we have a convenient way to represent blocking computations as promises’.

@Sudha247
Copy link
Contributor Author

Sudha247 commented Nov 2, 2021

It would become a hassle to set up a domain pool at the application’s top layer and thread that through to all lower layers which needed it.

It sure is a hassle, especially while modifying existing modules. This is why named pools were introduced in domainslib.0.3.1. Pools can be obtained at a later point with the help of an identifier string it is associated to, thereby eliminating the need to thread through the pool across layers.

Lwt should set up a domain pool by default, and allow users to override it if necessary.

We were doing this earlier. But it could turn out problematic inter-operating with multiple Multicore enabled libraries, in the sense that it gets harder tracking down all the live domain pools. Ideally in Multicore OCaml (and OCaml 5.0) programs; number of domains = number of cores, exceeding that will introduce overheads and could slow down the program. If at all we do end up with a default domain pool, I think the right place for that to reside would be at domainslib.

We can add some docs to the effect that ‘promises are for non-blocking I/O but we have a convenient way to represent blocking computations as promises’.

This sounds like a good idea. I think I'm not clear about how this convenience function would be useful, it looks like it's exposing Lwt_domain.detach to Lwt but I may be misreading here.

@Sudha247
Copy link
Contributor Author

Sudha247 commented Nov 2, 2021

Thanks to @raphael-proust lwt-domain is moved to its own package now.

@yawaramin
Copy link
Contributor

Thanks for the explanation. I am now understanding a couple of new things. Btw let me preface this comment by saying that, if this lwt_domain package were to be merged right now, in my humble opinion it is fine as it is. My suggestion is just to improve developer experience.

Pools can be obtained at a later point with the help of an identifier string it is associated to,

OK, that is convenient, and then my suggestion becomes:

val Lwt.domain : (* or, detach/blocking/… *)
  string ->
  ('a -> 'b) ->
  'a ->
  'b Lwt.t

But it could turn out problematic inter-operating with multiple Multicore enabled libraries

That's a bummer.

If at all we do end up with a default domain pool, I think the right place for that to reside would be at domainslib.

Fair enough.

I think I'm not clear about how this convenience function would be useful,

I am suggesting that in most cases users should not have to think about Lwt_domain, and just call Lwt.domain to 'magically' parallelize their code. E.g.:

(* main.ml *)

let main () =
  let _ = Lwt_domain.setup_pool ~name:"main" 3 in
  ...

let () = main ()

(* person.ml *)

type t = {
  id : int;
  name : string;
} [@@deriving of_yojson]

let of_yojson = Lwt.domain "main" of_yojson
(* val of_yojson : Yojson.Safe.t -> (t, string) Lwt_result.t *)

Now just by wrapping an existing CPU-bound function with Lwt.domain "main", we turn it into a parallelized function in the Lwt context. This convenience will be very beneficial.

@Sudha247 Sudha247 marked this pull request as ready for review November 8, 2021 08:38
Comment on lines 123 to +130
- run: opam install lwt_luv lwt_react lwt --deps-only --with-test
if: ${{ matrix.ppx == false }}

- run: opam exec -- make build
- run: opam install lwt_ppx --deps-only --with-test
if: ${{ matrix.ppx == true }}

- run: opam install lwt_domain --deps-only --with-test
if: ${{ matrix.domain == true }}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern (default action for all workflows, ppx action if ppx=true, domain action if domain=true) is repeated below for building and then for testing. It is somewhat verbose and not very scalable (at some point we might want to add a unix flag).

It might be better to use the opam-local-packages variable. Somewhat annoyingly, opam and dune expect the package list in different formats so it's not completely trivial. Still we could possibly simplify this file.

Opinions? Ideas?

ping @smorimoto who wrote this file.

lwt_domain.opam Outdated Show resolved Hide resolved
test/unix/main.ml Outdated Show resolved Hide resolved
@raphael-proust
Copy link
Collaborator

rebased on master to resolve merge conflict

@raphael-proust raphael-proust merged commit d6b223e into ocsigen:master Nov 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants