`IOLocal` propagation for unsafe access #3636

armanbilge · 2023-05-16T19:34:54Z

Still needs quite a bit of work, but wanted to sketch the basic idea.

Goal: to expose a fiber's IOLocals as ThreadLocals within a side-effecting block, so that they can be accessed and modified in unsafe land.

Motivation: basically telemetry Java interop.

Constraints: to do this as safely as possible 😅

armanbilge · 2023-05-16T19:36:50Z

core/jvm/src/main/java/cats/effect/IOFiberConstants.java

@@ -43,4 +43,6 @@ final class IOFiberConstants {
  static final byte CedeR = 6;
  static final byte AutoCedeR = 7;
  static final byte DoneR = 8;
+
+  static final boolean dumpLocals = Boolean.getBoolean("cats.effect.tracing.dumpLocals");


Bikesheddable configuration for opting-in. So the rest of us don't have to pay the penalty 😇

Is this specifically "tracing", even if that's the most obvious use case?

Oh woops, this was a very lazy copy-pasta. I copied it from the system properties we use to configure fiber tracing. We should rename it anyway, dumpLocals is not quite right I think 😅

What about cats.effect.localContextPropagation similar to Monix's monix.environment.localContextPropagation?

Thanks, I liked that! I went with cats.effect.ioLocalPropagation.

armanbilge · 2023-05-16T19:37:25Z

core/shared/src/main/scala/cats/effect/IOFiber.scala

+          var locals: IOLocals = null
+          if (dumpLocals) {
+            locals = new IOLocals(localState)
+            IOLocals.threadLocal.set(locals)
+          }


I did this just for delay, but I guess blocking and interruptible would want it too.

armanbilge · 2023-05-16T19:39:47Z

core/shared/src/main/scala/cats/effect/unsafe/IOLocals.scala

+// TODO handle defaults and lenses. all do-able, just needs refactoring ...
+final class IOLocals private[effect] (private[this] var state: IOLocalState) {


Ok, so I was extremely lazy with implementing this thing. But the main idea of this wrapper API is that it should only give the user access to IOLocals that they know about i.e. they should not able to clear out other locals that happen to be present.

core/shared/src/main/scala/cats/effect/IOFiber.scala

core/shared/src/main/scala/cats/effect/unsafe/IOLocals.scala

armanbilge · 2023-05-17T01:04:51Z

core/shared/src/main/scala/cats/effect/IO.scala

    val fiber = new IOFiber[A](
-      Map.empty,
+      if (IOFiberConstants.dumpLocals) unsafe.IOLocals.getState else Map.empty,


We can even go in the opposite direction for IO#unsafeRun* 😁

It's less clear if/how to do this for fibers started in a Dispatcher, since they should be inheriting locals from the fiber backing the Dispatcher.

rossabaker

Interesting. I will try to give a snapshot of this a shot over on the otel4s side.

rossabaker · 2023-05-18T02:16:21Z

core/jvm/src/main/java/cats/effect/IOFiberConstants.java

@@ -43,4 +43,6 @@ final class IOFiberConstants {
  static final byte CedeR = 6;
  static final byte AutoCedeR = 7;
  static final byte DoneR = 8;
+
+  static final boolean dumpLocals = Boolean.getBoolean("cats.effect.tracing.dumpLocals");


Is this specifically "tracing", even if that's the most obvious use case?

core/shared/src/main/scala/cats/effect/unsafe/IOLocals.scala

armanbilge · 2023-05-21T03:00:26Z

core/jvm/src/main/java/cats/effect/IOFiberConstants.java

 // defined in Java since Scala doesn't let us define static fields
-final class IOFiberConstants {
+public final class IOFiberConstants {


This is not good. To avoid this we'll either have to replicate it at both the cats.effect and cats.effect.unsafe levels, or move the thread-local IOLocals accessors into cats.effect.

kevin-lee · 2023-06-23T14:31:18Z

I apologise for jumping in and asking this question but, can it be used as an alternative to Local from Monix?
I need it specifically for logging, much like what's explained in the post, "Better logging with Monix 3, part 1: MDC" but with cats-effect 3. This has been a blocker for my company's transition from cats-effect 2 with Monix to cats-effect 3. The issue has become increasingly critical as more and more libraries cease to support cats-effect 2, choosing to support only cats-effect 3 instead. So it would be great to see if there's any upcoming solution.

armanbilge · 2023-06-23T15:03:08Z

@kevin-lee no problem! At a glance, yes, this does look like an alternative/replacement to that Monix feature. Perhaps @alexandru can confirm :)

kevin-lee · 2023-06-28T14:15:10Z

@kevin-lee no problem! At a glance, yes, this does look like an alternative/replacement to that Monix feature. Perhaps @alexandru can confirm :)

@armanbilge Thank you. That's great!
It looks quite similar and looks like it can be used for the same purpose, but I can see the methods in IOLocals taking IOLocal[A] whereas the methods in Local from Monix don't. So I'm wondering if it can be used for the same purpose. I need to use it in a logging library like Logback, which doesn't have IO or any effect.
Yeah, it would be nice if @alexandru can confirm this.

armanbilge · 2023-06-28T14:31:49Z

but I can see the methods in IOLocals taking IOLocal[A] whereas the methods in Local from Monix don't. So I'm wondering if it can be used for the same purpose. I need to use it in a logging library like Logback, which doesn't have IO or any effect.

@kevin-lee the IOLocal[A] is just the "key". Once you have that key (and you can build one unsafely outside of the effect) then you don't need any more effects to read and write it.

For an example integration, see this PR which implements a Java SPI using this IOLocals API.

Add IOLocalContextStorage otel4s#214

kevin-lee · 2023-06-28T14:39:02Z

@kevin-lee the IOLocal[A] is just the "key". Once you have that key (and you can build one unsafely outside of the effect) then you don't need any more effects to read and write it.

For an example integration, see this PR which implements a Java SPI using this IOLocals API.

Experimental IOLocalContextStorage otel4s#214

@armanbilge Oh... got it. It looks very promising then. Thank you!

core/js-native/src/main/scala/cats/effect/IOFiberConstants.scala

core/jvm/src/main/scala/cats/effect/IOLocalPlatform.scala

djspiewak · 2024-07-04T03:39:56Z

Finished benchmarking. @armanbilge let's see your armchair performance reasoning explain this result. I'm at a loss.

Before

[info] Benchmark                                             (cpuTokens)   (size)   Mode  Cnt      Score     Error    Units
[info] DeepBindBenchmark.async                                       N/A    10000  thrpt   10   2786.655 ±   3.536    ops/s
[info] DeepBindBenchmark.delay                                       N/A    10000  thrpt   10   9613.496 ± 234.106    ops/s
[info] DeepBindBenchmark.pure                                        N/A    10000  thrpt   10  11168.222 ± 465.787    ops/s
[info] MapCallsBenchmark.batch120                                    N/A      N/A  thrpt   10    338.430 ±   3.116    ops/s
[info] MapCallsBenchmark.batch30                                     N/A      N/A  thrpt   10     86.709 ±   0.563    ops/s
[info] MapCallsBenchmark.one                                         N/A      N/A  thrpt   10      2.944 ±   0.017    ops/s
[info] MapStreamBenchmark.batch120                                   N/A      N/A  thrpt   10   5561.435 ±  14.217    ops/s
[info] MapStreamBenchmark.batch30                                    N/A      N/A  thrpt   10   2524.134 ±   2.286    ops/s
[info] MapStreamBenchmark.one                                        N/A      N/A  thrpt   10   3233.048 ±   4.027    ops/s
[info] ParallelBenchmark.parTraverse                               10000     1000  thrpt   10    886.610 ±   0.865    ops/s
[info] ParallelBenchmark.traverse                                  10000     1000  thrpt   10     70.460 ±   0.067    ops/s
[info] ShallowBindBenchmark.async                                    N/A    10000  thrpt   10   2031.729 ±   3.593    ops/s
[info] ShallowBindBenchmark.delay                                    N/A    10000  thrpt   10   9778.303 ±  62.426    ops/s
[info] ShallowBindBenchmark.pure                                     N/A    10000  thrpt   10  11973.770 ±  32.027    ops/s
[info] WorkStealingBenchmark.alloc                                   N/A  1000000  thrpt   10     14.152 ±   0.090  ops/min
[info] WorkStealingBenchmark.manyThreadsSchedulingBenchmark          N/A  1000000  thrpt   10     31.868 ±   5.072  ops/min
[info] WorkStealingBenchmark.runnableScheduling                      N/A  1000000  thrpt   10    870.238 ±   3.475  ops/min
[info] WorkStealingBenchmark.runnableSchedulingScalaGlobal           N/A  1000000  thrpt   10   2245.542 ±   7.645  ops/min
[info] WorkStealingBenchmark.scheduling                              N/A  1000000  thrpt   10     29.631 ±   2.092  ops/min

After

[info] Benchmark                                             (cpuTokens)   (size)   Mode  Cnt      Score     Error    Units
[info] DeepBindBenchmark.async                                       N/A    10000  thrpt   10   2814.793 ±   6.982    ops/s
[info] DeepBindBenchmark.delay                                       N/A    10000  thrpt   10   9589.880 ±  27.294    ops/s
[info] DeepBindBenchmark.pure                                        N/A    10000  thrpt   10  11377.434 ±  37.080    ops/s
[info] MapCallsBenchmark.batch120                                    N/A      N/A  thrpt   10    336.483 ±   2.160    ops/s
[info] MapCallsBenchmark.batch30                                     N/A      N/A  thrpt   10     85.833 ±   0.404    ops/s
[info] MapCallsBenchmark.one                                         N/A      N/A  thrpt   10      2.925 ±   0.007    ops/s
[info] MapStreamBenchmark.batch120                                   N/A      N/A  thrpt   10   5684.510 ±   8.954    ops/s
[info] MapStreamBenchmark.batch30                                    N/A      N/A  thrpt   10   2522.568 ±   7.291    ops/s
[info] MapStreamBenchmark.one                                        N/A      N/A  thrpt   10   3284.312 ±   2.895    ops/s
[info] ParallelBenchmark.parTraverse                               10000     1000  thrpt   10    891.486 ±   0.846    ops/s
[info] ParallelBenchmark.traverse                                  10000     1000  thrpt   10     70.303 ±   0.060    ops/s
[info] ShallowBindBenchmark.async                                    N/A    10000  thrpt   10   1985.424 ±   2.771    ops/s
[info] ShallowBindBenchmark.delay                                    N/A    10000  thrpt   10   9655.691 ± 182.608    ops/s
[info] ShallowBindBenchmark.pure                                     N/A    10000  thrpt   10  10075.531 ±  22.884    ops/s
[info] WorkStealingBenchmark.alloc                                   N/A  1000000  thrpt   10     14.044 ±   0.078  ops/min
[info] WorkStealingBenchmark.manyThreadsSchedulingBenchmark          N/A  1000000  thrpt   10     48.611 ±   1.642  ops/min
[info] WorkStealingBenchmark.runnableScheduling                      N/A  1000000  thrpt   10   2987.595 ±   5.299  ops/min
[info] WorkStealingBenchmark.runnableSchedulingScalaGlobal           N/A  1000000  thrpt   10   2249.480 ±  48.568  ops/min
[info] WorkStealingBenchmark.scheduling                              N/A  1000000  thrpt   10     52.240 ±   3.133  ops/min

djspiewak · 2024-07-28T21:36:48Z

Finally scraped together the time needed to shave a few dev environment yaks and test this PR with propagation enabled. The performance hit is about 25% on microbenchmarks involving delay or map, and (as expected) ~0% (within margin of error) on everything else. The bizarre acceleration of WSTP scheduling on non-parTraverse benchmarks remains consistent (notably, parTraverse itself was around 0%).

So I think the summary of this PR, from a performance standpoint, is the following:

When disabled, no meaningful impact.
- Does seem to spookily effect the WSTP microbenchmarks in a way which resets the baseline
When enabled, 25% worse on delay/map microbenchmarks
- No effect within margin of error on scheduling or traverse-y benchmarks
- WSTP microbenchmark baseline effect remains and is of the same magnitude

So enabling this is analogous to enabling cached tracing, but of course would be cumulative with that flag as well, so technically turning both of them on would bop your microbenchmark scores by something like 55%. It's very unclear whether this type of performance impact matters in practice. We would probably need to run some end-to-end scale tests with propagation enabled in order to see.

Given that this is default-off and the impact when disabled is zero (outside of the still bizarre WSTP microbenchmark mirages), I'm comfortable merging this as-is and we can do some further testing. We should also probably document (or at least, write a ticket to document) the performance effects so it doesn't surprise anyone.

armanbilge · 2024-08-05T18:35:15Z

I added some documentation.

The performance hit is about 25% on microbenchmarks involving delay or map, and (as expected) ~0% (within margin of error) on everything else.

Discussed on Discord, but to clarify there is no reason to expect that delay or map should be impacted differently, because the propagation logic runs when the fiber is scheduled/descheduled, not for specific ops. (Although note that my initial implementation ran the propagation logic ran before/after delay blocks, in which case the performance discrepancy would be expected.)

henricook · 2024-09-23T14:41:12Z

@djspiewak Please forgive my impetuousness - do you have a guesstimate of when this might be going in please?

henricook · 2024-11-21T19:59:35Z

Oh-em-gee. Exciting.

djspiewak · 2024-11-21T20:00:42Z

@armanbilge Conflict resolution time!

NthPortal · 2024-11-27T11:29:29Z

can someone drop a message here when this PR ends up in a release?

kubukoz · 2024-11-27T15:56:27Z

@NthPortal I'd recommend subscribing to releases additionally/instead:

djspiewak · 2024-11-27T17:13:43Z

We're working to get an RC out asap!

armanbilge marked this pull request as draft May 16, 2023 19:35

POC thread-local iolocals

0b88c01

armanbilge force-pushed the topic/thread-local-iolocal branch from dc9b74c to 0b88c01 Compare May 16, 2023 19:35

armanbilge commented May 16, 2023

View reviewed changes

armanbilge mentioned this pull request May 16, 2023

Unable to start span as current typelevel/otel4s#202

Open

Simplify and optimize

db743e2

armanbilge commented May 16, 2023

View reviewed changes

core/shared/src/main/scala/cats/effect/IOFiber.scala Outdated Show resolved Hide resolved

core/shared/src/main/scala/cats/effect/unsafe/IOLocals.scala Outdated Show resolved Hide resolved

Special-case for WorkerThread

716ef32

armanbilge commented May 17, 2023

View reviewed changes

core/shared/src/main/scala/cats/effect/unsafe/IOLocals.scala Outdated Show resolved Hide resolved

Load locals in unsafeRunFiber

0a69caf

armanbilge commented May 17, 2023

View reviewed changes

rossabaker reviewed May 18, 2023

View reviewed changes

rossabaker mentioned this pull request May 18, 2023

Add IOLocalContextStorage typelevel/otel4s#214

Merged

rossabaker added a commit to typelevel/otel4s that referenced this pull request May 18, 2023

Try typelevel/cats-effect#3636

682abb6

armanbilge added 4 commits May 18, 2023 06:12

Dump locals in more places

2775064

Refactor IOLocal

270764f

Use new IOLocal APIs in IOLocals

d55489d

Mark IOLocal methods as final

2cf72a5

armanbilge commented May 21, 2023

View reviewed changes

Add IOLocalsSpec

cb3859d

armanbilge changed the base branch from series/3.5.x to series/3.x June 27, 2023 00:28

armanbilge marked this pull request as ready for review June 27, 2023 00:29

armanbilge changed the title ~~Proof-of-concept thread-local IOLocals~~ thread-local IOLocals Jun 27, 2023

armanbilge added 2 commits June 28, 2023 18:15

Rename property to ioLocalPropagation and fixes

7dce01c

Bump base version

5e171ac

armanbilge added 2 commits June 4, 2024 20:30

Expose IOLocal propagation as a ThreadLocal

d63a6ff

unsafeToThreadLocal() throws if propagation disabled

d4549fb

armanbilge marked this pull request as draft June 4, 2024 20:56

armanbilge added 3 commits June 5, 2024 00:12

Add scaladoc

2502045

Factor out to JVM-only

535fc8a

Bikeshed API and docs

d854799

armanbilge marked this pull request as ready for review June 5, 2024 00:34

armanbilge added 2 commits June 5, 2024 00:38

Formatting

f070552

Delete dead code

2cf1d8a

armanbilge commented Jun 5, 2024

View reviewed changes

core/js-native/src/main/scala/cats/effect/IOFiberConstants.scala Show resolved Hide resolved

core/jvm/src/main/scala/cats/effect/IOLocalPlatform.scala Show resolved Hide resolved

Document ThreadLocal propagation

0eec9dd

armanbilge requested a review from djspiewak August 5, 2024 18:35

This was referenced Nov 12, 2024

http4s 0.22: Add support for Jetty 12 to address CVE-2024-6763 http4s/http4s#7578

Open

Close #7578: http4s 0.22: Add support for Jetty 12 to address CVE-2024-6763 http4s/http4s#7579

Open

Merge branch 'series/3.x' into topic/thread-local-iolocal

af84973

djspiewak previously approved these changes Nov 21, 2024

View reviewed changes

Merge branch 'series/3.x' into topic/thread-local-iolocal

1adf368

armanbilge dismissed djspiewak’s stale review via 1adf368 November 21, 2024 21:14

djspiewak approved these changes Nov 21, 2024

View reviewed changes

armanbilge merged commit 8091026 into typelevel:series/3.x Nov 22, 2024
29 of 33 checks passed

armanbilge added the 🍄 enhancement label Dec 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`IOLocal` propagation for unsafe access #3636

`IOLocal` propagation for unsafe access #3636

armanbilge commented May 16, 2023

armanbilge May 16, 2023

rossabaker May 18, 2023

armanbilge May 18, 2023

kevin-lee Jun 28, 2023

armanbilge Jun 28, 2023

armanbilge May 16, 2023

armanbilge May 16, 2023

armanbilge May 17, 2023

rossabaker left a comment

rossabaker May 18, 2023

armanbilge May 21, 2023 •

edited

Loading

kevin-lee commented Jun 23, 2023

armanbilge commented Jun 23, 2023

kevin-lee commented Jun 28, 2023

armanbilge commented Jun 28, 2023 •

edited

Loading

kevin-lee commented Jun 28, 2023

djspiewak commented Jul 4, 2024

djspiewak commented Jul 28, 2024

armanbilge commented Aug 5, 2024

henricook commented Sep 23, 2024

henricook commented Nov 21, 2024

djspiewak commented Nov 21, 2024

NthPortal commented Nov 27, 2024 •

edited

Loading

kubukoz commented Nov 27, 2024

djspiewak commented Nov 27, 2024

		// TODO handle defaults and lenses. all do-able, just needs refactoring ...
		final class IOLocals private[effect] (private[this] var state: IOLocalState) {

IOLocal propagation for unsafe access #3636

IOLocal propagation for unsafe access #3636

Conversation

armanbilge commented May 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossabaker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

armanbilge May 21, 2023 • edited Loading

Choose a reason for hiding this comment

kevin-lee commented Jun 23, 2023

armanbilge commented Jun 23, 2023

kevin-lee commented Jun 28, 2023

armanbilge commented Jun 28, 2023 • edited Loading

kevin-lee commented Jun 28, 2023

djspiewak commented Jul 4, 2024

Before

After

djspiewak commented Jul 28, 2024

armanbilge commented Aug 5, 2024

henricook commented Sep 23, 2024

henricook commented Nov 21, 2024

djspiewak commented Nov 21, 2024

NthPortal commented Nov 27, 2024 • edited Loading

kubukoz commented Nov 27, 2024

djspiewak commented Nov 27, 2024

`IOLocal` propagation for unsafe access #3636

`IOLocal` propagation for unsafe access #3636

armanbilge May 21, 2023 •

edited

Loading

armanbilge commented Jun 28, 2023 •

edited

Loading

NthPortal commented Nov 27, 2024 •

edited

Loading