Skip to content

Commit fee6012

Browse files
authored
Merge pull request #76 from lyrm/refactoring
Refactoring to have a lockfree package
2 parents 1e7c41f + 67164e5 commit fee6012

39 files changed

+330
-169
lines changed

CHANGES.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ All notable changes to this project will be documented in this file.
44

55
## Not released
66

7-
- Rename data structures and package, add docs (@lyrm)
7+
- Add docs and rename/refactor to add a lockfree package (@lyrm)
88
- Add STM tests for current data structures (@lyrm, @jmid)
99

1010
## 0.3.1

README.md

Lines changed: 20 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,24 +2,31 @@
22

33
---
44

5-
A collection of parallelism-safe data structures for OCaml 5. It contains:
5+
This repository is a collection of parallelism-safe data structures for OCaml 5.
6+
They are contained in two packages:
67

7-
| Name | What is it ? | Sources |
8-
| -------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
9-
| [Treiber Stack](src/treiber_stack.mli) | A classic multi-producer multi-consumer stack, robust and flexible. Recommended starting point when needing LIFO structure | |
10-
| [Michael-Scott Queue](src/michael_scott_queue.mli) | A classic multi-producer multi-consumer queue, robust and flexible. Recommended starting point when needing FIFO structure. | [Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms](https://www.cs.rochester.edu/~scott/papers/1996_PODC_queues.pdf) |
11-
| [Chase-Lev Work-Stealing Deque](src/ws_deque.mli) | Single-producer, multi-consumer dynamic-size double-ended queue (deque). Ideal for throughput-focused scheduling using per-core work distribution. Note, `pop` and `steal` follow different ordering (respectively LIFO and FIFO) and have different linearization contraints. | [Dynamic circular work-stealing deque](https://dl.acm.org/doi/10.1145/1073970.1073974) and [Correct and efficient work-stealing for weak memory models](https://dl.acm.org/doi/abs/10.1145/2442516.2442524)) |
12-
| [SPSC Queue](src/spsc_queue.mli) | Simple single-producer single-consumer fixed-size queue. Thread-safe as long as at most one thread acts as producer and at most one as consumer at any single point in time. | |
13-
| [MPMC Relaxed Queue](src/mpmc_relaxed_queue.mli) | Multi-producer, multi-consumer, fixed-size relaxed queue. Optimised for high number of threads. Not strictly FIFO. Note, it exposes two interfaces: a lockfree and a non-lockfree (albeit more practical) one. See the mli for details. | |
14-
| [MPSC Queue](src/mpsc_queue.mli) | A multi-producer, single-consumer, thread-safe queue without support for cancellation. This makes a good data structure for a scheduler's run queue. It is used in [Eio](https://github.com/ocaml-multicore/eio). | It is a single consumer version of the queue described in [Implementing lock-free queues](https://people.cs.pitt.edu/~jacklange/teaching/cs2510-f12/papers/implementing_lock_free.pdf). |
8+
- `Saturn` that includes every data structures and should be used by default if
9+
you just want parallelism-safe data structures..
10+
- `Saturn_lockfree` that includes only lock-free data structures.
11+
12+
The available data structures are :
13+
14+
| Names | Names in `Saturn` <br> (in `Saturn_lockfree`) | What is it ? | Sources |
15+
| ------------------------------------------------------------ | --------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
16+
| [Treiber stack](src_lockfree/treiber_stack.mli) | `Stack` (same) | A classic multi-producer multi-consumer stack, robust and flexible. Recommended starting point when needing a LIFO structure | |
17+
| [Michael-Scott queue](src_lockfree/michael_scott_queue.mli) | `Queue` (same) | A classic multi-producer multi-consumer queue, robust and flexible. Recommended starting point when needing a FIFO structure. | [Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms](https://www.cs.rochester.edu/~scott/papers/1996_PODC_queues.pdf) |
18+
| [Chase-Lev Work-Stealing Dequeue](src_lockfree/ws_deque.mli) | `Work_stealing_deque` (same) | Single-producer, multi-consumer dynamic-size double-ended queue (deque). Ideal for throughput-focused scheduling using per-core work distribution. Note, `pop` and `steal` follow different ordering (respectively LIFO and FIFO) and have different linearization contraints. | [Dynamic circular work-stealing deque](https://dl.acm.org/doi/10.1145/1073970.1073974) and [Correct and efficient work-stealing for weak memory models](https://dl.acm.org/doi/abs/10.1145/2442516.2442524)) |
19+
| [SPSC Queue](src_lockfree/spsc_queue.mli) | `Single_prod_single_`<br>`cons_queue` (same) | Simple single-producer single-consumer fixed-size queue. Thread-safe as long as at most one thread acts as producer and at most one as consumer at any single point in time. | |
20+
| [MPMC bounded relaxed queue](src/mpmc_relaxed_queue.mli) | `Relaxed queue` (same) | Multi-producer, multi-consumer, fixed-size relaxed queue. Optimised for high number of threads. Not strictly FIFO. Note, it exposes two interfaces: a lockfree and a non-lockfree (albeit more practical) one. See the mli for details. | |
21+
| [MPSC Queue](src_lockfree/mpsc_queue.mli) | `Single_consumer_queue` (same) | A multi-producer, single-consumer, thread-safe queue without support for cancellation. This makes a gooddata structure for a scheduler's run queue. It is used in [Eio](https://github.com/ocaml-multicore/eio). | It is a single consumer version of the queue described in [Implementing lock-free queues](https://people.cs.pitt.edu/~jacklange/teaching/cs2510-f12/papers/implementing_lock_free.pdf). |
1522

1623
## Usage
1724

1825
`Saturn` can be installed from `opam`: `opam install saturn`. Sample usage of
19-
`Ws_deque` is illustrated below.
26+
`Work_stealing_deque` is illustrated below.
2027

2128
```ocaml
22-
module Ws_deque = Ws_deque.M
29+
module Ws_deque = Work_stealing_deque.M
2330
2431
let q = Ws_deque.create ()
2532
@@ -49,5 +56,5 @@ There is a number of benchmarks in `bench` directory. You can run them with
4956

5057
## Contributing
5158

52-
Contributions of more parallelism-safe data structures appreciated! Please create
53-
issues/PRs to this repo.
59+
Contributions are appreciated! If you intend to add a new data structure, please
60+
read [this](CONTRIBUTING.md) before.

dune-project

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,8 @@
11
(lang dune 3.0)
22
(name saturn)
3+
(package
4+
(name saturn)
5+
(synopsis "Collection of parallelism-safe data structures for Multicore OCaml"))
6+
(package
7+
(name saturn_lockfree)
8+
(synopsis "Collection of lock-free data structures for Multicore OCaml"))

saturn.opam

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ depends: [
1616
"qcheck-stm" {with-test & >= "0.2"}
1717
"qcheck-alcotest" {with-test & >= "0.18.1"}
1818
"alcotest" {with-test & >= "1.6.0"}
19-
"yojson" {>= "2.0.2"}
19+
"yojson" {with-test &>= "2.0.2"}
2020
"dscheck" {with-test & >= "0.1.0"}
2121
]
2222
available: arch != "x86_32" & arch != "arm32" & arch != "ppc64"

saturn_lockfree.opam

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
opam-version: "2.0"
2+
maintainer:"KC Sivaramakrishnan <sk826@cl.cam.ac.uk>"
3+
authors: ["KC Sivaramakrishnan <sk826@cl.cam.ac.uk>"]
4+
homepage: "https://github.com/ocaml-multicore/saturn"
5+
doc: "https://ocaml-multicore.github.io/saturn"
6+
synopsis: "Lock-free data structures for multicore OCaml"
7+
license: "ISC"
8+
dev-repo: "git+https://github.com/ocaml-multicore/saturn.git"
9+
bug-reports: "https://github.com/ocaml-multicore/saturn/issues"
10+
tags: []
11+
depends: [
12+
"ocaml" {>= "4.12"}
13+
"dune" {>= "3.0"}
14+
"domain_shims" {>= "0.1.0"}
15+
"qcheck" {with-test & >= "0.18.1"}
16+
"qcheck-stm" {with-test & >= "0.2"}
17+
"qcheck-alcotest" {with-test & >= "0.18.1"}
18+
"alcotest" {with-test & >= "1.6.0"}
19+
"yojson" {with-test &>= "2.0.2"}
20+
"dscheck" {with-test & >= "0.1.0"}
21+
]
22+
available: arch != "x86_32" & arch != "arm32" & arch != "ppc64"
23+
depopts: []
24+
build: ["dune" "build" "-p" name "-j" jobs]

src/dune

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
(library
22
(name saturn)
33
(public_name saturn)
4-
(libraries domain_shims))
4+
(libraries saturn_lockfree domain_shims))

src/mpmc_relaxed_queue.ml

Lines changed: 5 additions & 131 deletions
Original file line numberDiff line numberDiff line change
@@ -1,140 +1,14 @@
1-
(*
2-
# General idea
3-
4-
It is the easiest to explain the general idea on an array of infinite size.
5-
Let's start with that. Each element in such an array constitutes a single-use
6-
exchange slot. Enqueuer increments [tail] and treats prior value as index of
7-
its slot. Same for dequeuer and [head]. This effectively creates pairs
8-
(enqueuer, dequeuer) assigned to the same slot. Enqueuer leaves the value in
9-
the slot, dequer copies it out.
10-
11-
Enqueuer never fails. It always gets a brand-new slot and places item in it.
12-
Dequeuer, on the other hand, may witness an empty slot. That's because [head]
13-
may jump behind [tail]. Remember, indices are implemented blindy. For now,
14-
assume dequeuer simply spins on the empty slot until an item appears.
1+
include Lockfree.Relaxed_queue
152

16-
That's it. There's a few things flowing from this construction:
17-
* Slots are atomic. This is where paired enqueuer and dequeuer communicate.
18-
* [head] overshooting [tail] is a normal condition and that's good - we want
19-
to keep operations on [head] and [tail] independent.
20-
21-
# Finite array
22-
23-
Now, to make it work in real-world, simply treat finite array as circular,
24-
i.e. wrap around when reached the end. Slots are now re-used, so we need to be
25-
more careful.
26-
27-
Firstly, if there's too many items, enqueuer may witness a full slot. Let's assume
28-
enqueuer simply spins on full slot until some dequeuer appears and takes the old
29-
value.
30-
31-
Secondly, in the case of overlap, there can be more than 2 threads (1x enqueuer,
32-
1x dequeuer) assigned to a single slot (imagine 10 enqueuers spinning on an 8-slot
33-
array). In fact, it could be any number. Thus, all operations on slot have to use
34-
CAS to ensure that no item is overwrriten on store and no item is dequeued by two
35-
threads at once.
36-
37-
Above works okay in practise, and there is some relevant literature, e.g.
38-
(DOI: 10.1145/3437801.3441583) analyzed this particular design. There's also
39-
plenty older papers looking at similar approaches
40-
(e.g. DOI: 10.1145/2851141.2851168).
41-
42-
Note, this design may violate FIFO (on overlap). The risk can be minimized by
43-
ensuring size of array >> number of threads but it's never zero.
44-
(github.com/rigtorp/MPMCQueue has a nice way of fixing this, we could add it).
45-
46-
# Blocking (non-lockfree paths on full, empty)
47-
48-
Up until now [push] and [pop] were allowed to block indefinitely on empty and full
49-
queue. Overall, what can be done in those states?
50-
51-
1. Busy wait until able to finish.
52-
2. Rollback own index with CAS (unassign itself from slot).
53-
3. Move forward other index with CAS (assign itself to the same slot as opposite
54-
action).
55-
4. Mark slot as burned - dequeue only.
56-
57-
Which one then?
58-
59-
Let's optimize for stability, i.e. some reasonable latency that won't get much worse
60-
under heavy load. Busy wait is great because it does not cause any contention in the
61-
hotspots ([head], [tail]). Thus, start with busy wait (1). If queue is busy and
62-
moving fast, there is a fair chance that within, say, 30 spins, we'll manage to
63-
complete action without having to add contention elsewhere.
64-
65-
Once N busy-loops happen and nothing changes, we probably want to return even if its
66-
costs. (2), (3) both allow that. (2) doesn't add contention to the other index like
67-
(3) does. Say, there's a lot more dequeuers than enqueuers, if all dequeurs did (3),
68-
they would add a fair amount of contention to the [tail] index and slow the
69-
already-outnumbered enqueuers further. So, (2) > (3) for that reason.
70-
71-
However, with just (2), some dequeuers will struggle to return. If many dequeuers
72-
constatly try to pop an element and fail, they will form a chain.
73-
74-
tl hd
75-
| |
76-
[.]-[A]-[B]-[C]-..-[X]
77-
78-
For A to rollback, B has to rollback first. For B to rollback C has to rollback first.
79-
80-
[A] is likely to experience a large latency spike. In such a case, it is easier for [A]
81-
to do (3) rather than hope all other active dequeuers will unblock it at some point.
82-
Thus, it's worthwile also trying to do (3) periodically.
83-
84-
Thus, the current policy does (1) for a bit, then (1), (2) with periodic (3).
85-
86-
What about burned slots (4)?
87-
88-
It's present in the literature. Weakly I'm not a fan. If dequeuers are faster to remove
89-
items than enqueuers supply them, slots burned by dequeuers are going to make enqueuers
90-
do even more work.
91-
92-
# Resizing
93-
94-
The queue does not support resizing, but it can be simulated by wrapping it in a
95-
lockfree list.
96-
*)
97-
98-
type 'a t = {
99-
array : 'a Option.t Atomic.t Array.t;
100-
head : int Atomic.t;
101-
tail : int Atomic.t;
102-
mask : int;
103-
}
104-
105-
let create ~size_exponent () : 'a t =
106-
let size = 1 lsl size_exponent in
107-
let array = Array.init size (fun _ -> Atomic.make None) in
108-
let mask = size - 1 in
109-
let head = Atomic.make 0 in
110-
let tail = Atomic.make 0 in
111-
{ array; head; tail; mask }
3+
module Spin = struct
4+
let push = push
5+
let pop = pop
6+
end
1127

1138
(* [ccas] A slightly nicer CAS. Tries without taking microarch lock first. Use on indices. *)
1149
let ccas cell seen v =
11510
if Atomic.get cell != seen then false else Atomic.compare_and_set cell seen v
11611

117-
module Spin = struct
118-
let push { array; tail; mask; _ } item =
119-
let tail_val = Atomic.fetch_and_add tail 1 in
120-
let index = tail_val land mask in
121-
let cell = Array.get array index in
122-
while not (ccas cell None (Some item)) do
123-
Domain.cpu_relax ()
124-
done
125-
126-
let pop { array; head; mask; _ } =
127-
let head_val = Atomic.fetch_and_add head 1 in
128-
let index = head_val land mask in
129-
let cell = Array.get array index in
130-
let item = ref (Atomic.get cell) in
131-
while Option.is_none !item || not (ccas cell !item None) do
132-
Domain.cpu_relax ();
133-
item := Atomic.get cell
134-
done;
135-
Option.get !item
136-
end
137-
13812
module Not_lockfree = struct
13913
(* [spin_threshold] Number of times on spin on a slot before trying an exit strategy. *)
14014
let spin_threshold = 30

src/mpmc_relaxed_queue.mli

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
(**
2-
A multi-producer, multi-consumer, thread-safe, relaxed-FIFO queue.
2+
A multi-producer, multi-consumer, thread-safe, bounded relaxed-FIFO queue.
33
44
It exposes two interfaces: [Spin] and [Not_lockfree]. [Spin] is lock-free
55
formally, but the property is achieved in a fairly counterintuitive way -

src/saturn.ml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,10 @@ Copyright (c) 2017, Nicolas ASSOUAD <nicolas.assouad@ens.fr>
2626
########
2727
*)
2828

29-
module Queue = Michael_scott_queue
30-
module Stack = Treiber_stack
31-
module Work_stealing_deque = Ws_deque
32-
module Single_prod_single_cons_queue = Spsc_queue
33-
module Single_consumer_queue = Mpsc_queue
29+
module Queue = Lockfree.Queue
30+
module Stack = Lockfree.Stack
31+
module Work_stealing_deque = Lockfree.Work_stealing_deque
32+
module Single_prod_single_cons_queue = Lockfree.Single_prod_single_cons_queue
33+
module Single_consumer_queue = Lockfree.Single_consumer_queue
3434
module Relaxed_queue = Mpmc_relaxed_queue
35-
module Backoff = Backoff
35+
module Backoff = Lockfree.Backoff

src/saturn.mli

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -30,13 +30,12 @@ Copyright (c) 2017, Nicolas ASSOUAD <nicolas.assouad@ens.fr>
3030

3131
(** {1 Data structures} *)
3232

33-
module Queue = Michael_scott_queue
34-
module Stack = Treiber_stack
35-
module Work_stealing_deque = Ws_deque
36-
module Single_prod_single_cons_queue = Spsc_queue
37-
module Single_consumer_queue = Mpsc_queue
33+
module Queue = Lockfree.Queue
34+
module Stack = Lockfree.Stack
35+
module Work_stealing_deque = Lockfree.Work_stealing_deque
36+
module Single_prod_single_cons_queue = Lockfree.Single_prod_single_cons_queue
37+
module Single_consumer_queue = Lockfree.Single_consumer_queue
3838
module Relaxed_queue = Mpmc_relaxed_queue
3939

40+
module Backoff = Lockfree.Backoff
4041
(** {2 Other} *)
41-
42-
module Backoff = Backoff

0 commit comments

Comments
 (0)