Blocking mechanism with cached threads #2687

vasilmkd · 2021-12-24T15:23:43Z

Up to now, every thread that was spawned in order to respond to scala.concurrent.blocking simply replaced the blocked worker thread. The worked thread that got blocked died after running the blocking code.

This PR adds a mechanism for caching threads, similar to Executors.newCachedThreadPool, where the threads are kept around for one minute after they become inactive, in case more blocking work arrives. This saves the expensive process of spawning a new thread.

For now, the retention period is fixed and non-configurable, but I am open to making it user configurable in 3.4.0.

Furthemore, this PR makes a change in the IOFiber blocking section, where fibers can determine if they are running on the WSTP and instead of making an expensive shift to the blocking executor and another expensive shift back to the compute pool, the blocking actions are wrapped in scala.concurrent.blocking and executed in place, taking advantage of cached worker threads or spawning new ones if necessary.

I also added benchmarks and the improvements are pretty sweet.

series/3.3.x:

Benchmark                        (size)   Mode  Cnt     Score     Error  Units
BlockingBenchmark.blockThenCede   10000  thrpt   20     3.353 ±   0.081  ops/s
BlockingBenchmark.coarse          10000  thrpt   20  3580.270 ± 164.925  ops/s
BlockingBenchmark.fine            10000  thrpt   20     3.321 ±   0.184  ops/s
BlockingBenchmark.nested          10000  thrpt   20     3.345 ±   0.064  ops/s

This PR:

Benchmark                        (size)   Mode  Cnt     Score     Error  Units
BlockingBenchmark.blockThenCede   10000  thrpt   20     6.209 ±   0.055  ops/s
BlockingBenchmark.coarse          10000  thrpt   20  3468.893 ± 138.703  ops/s
BlockingBenchmark.fine            10000  thrpt   20   739.282 ±   5.803  ops/s
BlockingBenchmark.nested          10000  thrpt   20   716.012 ±   4.738  ops/s

vasilmkd · 2021-12-24T16:30:47Z

Some more benchmarks ran locally on my machine:

series/3.3.x:

BlockingBenchmark.blockThenCede   10000  thrpt   20    12.032 ±  0.250  ops/s
BlockingBenchmark.coarse          10000  thrpt   20  9337.834 ± 67.816  ops/s
BlockingBenchmark.fine            10000  thrpt   20    12.635 ±  0.055  ops/s
BlockingBenchmark.nested          10000  thrpt   20    12.254 ±  0.356  ops/s

This PR:

Benchmark                        (size)   Mode  Cnt      Score     Error  Units
BlockingBenchmark.blockThenCede   10000  thrpt   20     23.029 ±   0.249  ops/s
BlockingBenchmark.coarse          10000  thrpt   20  10085.972 ± 159.529  ops/s
BlockingBenchmark.fine            10000  thrpt   20   1734.625 ±  30.182  ops/s
BlockingBenchmark.nested          10000  thrpt   20   1624.172 ±  90.719  ops/s

vasilmkd · 2021-12-24T16:40:06Z

This PR does not affect interruptible. Those effects still go to the blocking pool. That code path might need a bit of massaging if we want to run it on the compute pool. I would attempt that in a separate PR.

vasilmkd · 2021-12-24T18:57:16Z

If anyone wants to take it for a spin, "io.vasilev" %% "cats-effect" % "3.3-27-baccedd".

This snapshot works with fs2, the tests pass without issues.

vasilmkd · 2021-12-25T12:26:27Z

After the latest round of changes:
Cloud instance:

Benchmark                        (size)   Mode  Cnt     Score     Error  Units
BlockingBenchmark.blockThenCede   10000  thrpt   20     6.229 ±   0.299  ops/s
BlockingBenchmark.coarse          10000  thrpt   20  3565.953 ± 105.111  ops/s
BlockingBenchmark.fine            10000  thrpt   20   758.839 ±  19.670  ops/s
BlockingBenchmark.nested          10000  thrpt   20   729.449 ±  12.753  ops/s

Local:

Benchmark                        (size)   Mode  Cnt      Score     Error  Units
BlockingBenchmark.blockThenCede   10000  thrpt   20     21.879 ±   0.202  ops/s
BlockingBenchmark.coarse          10000  thrpt   20  10181.248 ±  18.676  ops/s
BlockingBenchmark.fine            10000  thrpt   20   1795.412 ±  16.884  ops/s
BlockingBenchmark.nested          10000  thrpt   20   1673.573 ±   6.557  ops/s

vasilmkd · 2021-12-25T15:14:37Z

Latest snapshot:

"io.vasilev" %% "cats-effect" % "3.3-33-d14798d"

Edit: works with fs2.

djspiewak · 2021-12-29T05:43:30Z

This is supremely amazing. I'm going to take a closer review over the next few days but I really want to get this out soon!

djspiewak

This is honestly extremely clever. I'm a little concerned about the index bit, but aside from that it's amazing

djspiewak · 2021-12-30T00:10:30Z

core/jvm/src/main/scala/cats/effect/unsafe/WorkerThread.scala

@@ -66,6 +64,9 @@ private final class WorkerThread(
  import TracingConstants._
  import WorkStealingThreadPoolConstants._

+  // Index assigned by the `WorkStealingThreadPool` for identification purposes.
+  private[this] var _index: Int = idx


Doesn't this mean that threads can get "lost" in the cachedThreads set? Since the index could change after the thread is inserted, meaning that the comparator would fail to find the old value.

private[unsafe] val cachedThreads: ConcurrentSkipListSet[WorkerThread] = new ConcurrentSkipListSet(Comparator.comparingInt[WorkerThread](_.nameIndex))

val nameIndex: Int = pool.blockedWorkerThreadNamingIndex.incrementAndGet()

The threads in the cachedThreads skip list are ordered by this immutable val. The value of the nameIndex does not really matter, it can also be System.identityHashCode. But, we need a stable and unique value for debugging (think inspecting threads in JDK Mission Control or in a debugger) and so I just reused it for this purpose too. The ordering doesn't really matter either. I chose the skip list because it offers a log n remove operation instead of a linear one (the remove is necessary when a worker thread is woken up).

The only thing left to do is maybe rename nameIndex to something else.

index is mutable and only has a meaning inside our WSTP. It is an index for the threads in the core pool (only the N threads where N = Runtime#availableProcessors()) and is used to keep track of their data structures (local queues and synchronization primitives), while nameIndex is just used for keeping track of threads in debuggers (to avoid multiple physical threads sharing the same name and be confusing to users).

vasilmkd · 2021-12-30T00:56:33Z

I only force pushed to rebase the PR, I did not make any changes.

vasilmkd · 2021-12-30T00:58:46Z

Wait a second, I messed up signing my commits. I will force push again.

vasilmkd · 2021-12-30T01:03:56Z

Done, sorry for the inconvenience.

vasilmkd marked this pull request as draft December 25, 2021 09:58

vasilmkd marked this pull request as ready for review December 25, 2021 16:25

This was referenced Dec 29, 2021

Removed interruptible dependency on an explicit ExecutionContext #2699

Merged

Make IORuntime#blocking public #2685

Closed

djspiewak reviewed Dec 30, 2021

View reviewed changes

djspiewak previously approved these changes Dec 30, 2021

View reviewed changes

vasilmkd marked this pull request as draft December 30, 2021 00:58

vasilmkd added 13 commits December 30, 2021 02:02

Use the done signal when parking worker threads

53cac4c

Add blocking benchmarks

b2a2931

Make the fields of WorkerThread mutable

857f0e2

Add a queue for cached threads

ff599dc

Initial implementation of the cached blocking mechanism

40f0dec

Fix the resumption mechanism on shutdown

0c501aa

Blocking fast-path

21132bc

Use the thread prefix from the pool

ff1d6a9

Add a global naming index for better debugging of threads

50fe3ec

Address race conditions on wake up

170d096

Use a blocking queue for transferring data

40b6da9

Add an init method for more DRY

585cff3

Explain the caching mechanism in comments

8d00be3

vasilmkd dismissed djspiewak’s stale review via 8d00be3 December 30, 2021 01:03

vasilmkd marked this pull request as ready for review December 30, 2021 01:03

vasilmkd requested a review from djspiewak December 30, 2021 01:04

djspiewak approved these changes Dec 30, 2021

View reviewed changes

djspiewak merged commit af65602 into typelevel:series/3.3.x Dec 30, 2021

vasilmkd deleted the cached-blocking branch December 30, 2021 17:07

armanbilge mentioned this pull request May 2, 2023

Fix worker thread name index #3591

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blocking mechanism with cached threads #2687

Blocking mechanism with cached threads #2687

vasilmkd commented Dec 24, 2021

vasilmkd commented Dec 24, 2021

vasilmkd commented Dec 24, 2021

vasilmkd commented Dec 24, 2021 •

edited

Loading

vasilmkd commented Dec 25, 2021 •

edited

Loading

vasilmkd commented Dec 25, 2021 •

edited

Loading

djspiewak commented Dec 29, 2021

djspiewak left a comment

djspiewak Dec 30, 2021

vasilmkd Dec 30, 2021

vasilmkd commented Dec 30, 2021

vasilmkd commented Dec 30, 2021

vasilmkd commented Dec 30, 2021

Blocking mechanism with cached threads #2687

Blocking mechanism with cached threads #2687

Conversation

vasilmkd commented Dec 24, 2021

vasilmkd commented Dec 24, 2021

vasilmkd commented Dec 24, 2021

vasilmkd commented Dec 24, 2021 • edited Loading

vasilmkd commented Dec 25, 2021 • edited Loading

vasilmkd commented Dec 25, 2021 • edited Loading

djspiewak commented Dec 29, 2021

djspiewak left a comment

Choose a reason for hiding this comment

djspiewak Dec 30, 2021

Choose a reason for hiding this comment

vasilmkd Dec 30, 2021

Choose a reason for hiding this comment

vasilmkd commented Dec 30, 2021

vasilmkd commented Dec 30, 2021

vasilmkd commented Dec 30, 2021

vasilmkd commented Dec 24, 2021 •

edited

Loading

vasilmkd commented Dec 25, 2021 •

edited

Loading

vasilmkd commented Dec 25, 2021 •

edited

Loading