Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OS Signal handling and fiber interruption #9321

Closed
mlhales opened this issue Nov 20, 2024 · 18 comments · Fixed by #9366
Closed

OS Signal handling and fiber interruption #9321

mlhales opened this issue Nov 20, 2024 · 18 comments · Fixed by #9366

Comments

@mlhales
Copy link

mlhales commented Nov 20, 2024

I've had several problems with this in the past, and it has returned again with the latest 2.1.12 version.

My understanding is when the JVM receives a signal from the OS, like a sigint or sigterm, that the ZIO runtime has subscribed to those, and attempts to interrupt the running fibers for an orderly shutdown. I have had problems in the past where fibers on attemptBlocking or fibers that were forkDaemon's will keep running, and prevent the application from exiting, requiring the JVM process to be forced to exit with a sigkill.

I have structured my application very carefully, to make sure that I don't do those things, and in ZIO 2.1.11, it shuts down cleanly (albeit with some ugly fiber messages to the console) with a 130 exit code upon sigint from IntelliJ. This is nice because when you use the stop button in IntelliJ, it shuts down cleanly. And even more importantly, when the application is deployed as a service on Windows using nssm, or as a systemd service on Linux, the default way of stopping the service, sending it a sigterm, will mean the service stops correctly and quickly.

I upgraded to ZIO 2.1.12 and this stopped working again. I've had it work correctly in the past, and also not, it seems to have been variable from version to version.

I made a minimal example that maybe somebody could turn in to a test? For me, this exits correctly with an exit code 130 upon a sigint from IntelliJ under ZIO 2.1.11, but does not exit and keeps running under ZIO 2.1.12 and requires a sigkill to exit with an exit code -1.


object Signals extends ZIOAppDefault:
  def run: ZIO[Any, Throwable, Unit] =
    for
      _     <- Console.printLine("Hello, Signals!")
      fiber <- Console.printLine("Hello, fiber!").schedule(Schedule.spaced(1.second)).fork
      _     <- fiber.join
    yield ()
@clarzte
Copy link
Collaborator

clarzte commented Nov 25, 2024

hey @mlhales on what kind of OS and JVM are you experiencing this on?
I've tried to reproduce it on OS X Ventura and Java 21, but I always get the 130 exit code on intellij and metals.

@mlhales
Copy link
Author

mlhales commented Nov 25, 2024

I'm on Windows 10, running Java 21 also, specifically the Amazon Coretto open jdk build. 21.0.3

@mlhales
Copy link
Author

mlhales commented Nov 25, 2024

I just tried a couple of different JDK versions as well, with the same results. I went back to 11.0.25.9.1, and forward to 21.0.5.11.1 the most recent, and got the same behavior again, correct exit with ZIO 2.1.11, and keeps running with ZIO 2.1.12

@mlhales
Copy link
Author

mlhales commented Nov 25, 2024

Also to be clear, in both cases the fiber dump is written to the console, it's just that it keeps running after the dump, still printing the "Hello, fiber!" message.

@clarzte
Copy link
Collaborator

clarzte commented Nov 28, 2024

Ok I can reproduce it on Windows!
I am hitting stack overflow error here cbf3cc8#diff-26a4df63126981982325c476d26ee568d4fda1ab32963969395b0d266a3513c9R114

Exception in thread "SIGINT handler" java.lang.StackOverflowError
	at zio.internal.Signal$SignalHandler$SunMiscSignalHandler.$anonfun$initInvocationHandler$1(Signal.scala:114)
	at jdk.proxy2/jdk.proxy2.$Proxy2.toString(Unknown Source)
	at java.base/java.lang.StringConcatHelper.stringOf(StringConcatHelper.java:453)
	at zio.internal.Signal$SignalHandler$SunMiscSignalHandler.$anonfun$initInvocationHandler$1(Signal.scala:114)
	at jdk.proxy2/jdk.proxy2.$Proxy2.toString(Unknown Source)
	at java.base/java.lang.StringConcatHelper.stringOf(StringConcatHelper.java:453)
	at zio.internal.Signal$SignalHandler$SunMiscSignalHandler.$anonfun$initInvocationHandler$1(Signal.scala:114)
	at jdk.proxy2/jdk.proxy2.$Proxy2.toString(Unknown Source)
	...

will try to dig more tomorrow

@jdegoes
Copy link
Member

jdegoes commented Nov 29, 2024

/bounty $250

Copy link

algora-pbc bot commented Nov 29, 2024

## 💎 $250 bounty • ZIO

### Steps to solve:
1. Start working: Comment /attempt #9321 with your implementation plan
2. Submit work: Create a pull request including /claim #9321 in the PR body to claim the bounty
3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to zio/zio!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @Saturn225 Nov 29, 2024, 3:03:40 AM WIP
🟢 @clarzte Dec 2, 2024, 1:41:26 PM #9366

@Saturn225
Copy link
Contributor

Saturn225 commented Nov 29, 2024

/attempt #9321

Algora profile Completed bounties Tech Active attempts Options
@Saturn225    1 ZIO bounty
+ 2 bounties from 2 projects
Rust, Scala,
Cancel attempt

@Saturn225
Copy link
Contributor

@clarzte @mlhales Do you mind to test my PR solution? I have no access to Windows and it will be great to check out the PR ruling towards direction of solving this reproducer made by @clarzte
#9321 (comment)

@clarzte
Copy link
Collaborator

clarzte commented Dec 1, 2024

@Saturn225 it didn't fix it

@Saturn225
Copy link
Contributor

@clarzte Can you post repoducer to check it out to reproduce it on Windows/Linux?

I will checkout with my friends desktop and examine the logs and fix the things

Or else any detailed logs from the reproducer :)

@clarzte
Copy link
Collaborator

clarzte commented Dec 2, 2024

@Saturn225 I used the reproducer the OP provided.
The error I provided happened after I inserted the println in the method - I forgot to mention that.
The logs I get from the reproducer:

Hello, Signals!
Hello, fiber!
Hello, fiber!
Hello, fiber!

"zio-fiber-1209249671" (4s 117ms) waiting on #1939357491
	Status: Suspended((CooperativeYielding, FiberRoots), <no trace>)


"zio-fiber-1939357491" (4s 110ms) 
	Status: Suspended((Interruption, CooperativeYielding, FiberRoots), <no trace>)


"zio-fiber-1644448784" (4s 2ms) waiting on #575043367
	Status: Suspended((Interruption, CooperativeYielding, FiberRoots), zio.Reproducer.run(Reproducer.scala:11))
	at zio.Reproducer.run(Reproducer.scala:11)

"zio-fiber-575043367" (3s 899ms) 
	Status: Suspended((Interruption, CooperativeYielding, FiberRoots), zio.Reproducer.run(Reproducer.scala:9))
	at zio.Reproducer.run(Reproducer.scala:9)
Hello, fiber!
Hello, fiber!

Process finished with exit code -1

@clarzte
Copy link
Collaborator

clarzte commented Dec 2, 2024

/attempt #9321

Algora profile Completed bounties Tech Active attempts Options
@clarzte 1 ZIO bounty
Python, TypeScript,
Scala & more
Cancel attempt

Copy link

algora-pbc bot commented Dec 2, 2024

Note

The user @Saturn225 is already attempting to complete issue #9321 and claim the bounty. We recommend checking in on @Saturn225's progress, and potentially collaborating, before starting a new solution.

Copy link

algora-pbc bot commented Dec 2, 2024

💡 @clarzte submitted a pull request that claims the bounty. You can visit your bounty board to reward.

@mlhales
Copy link
Author

mlhales commented Dec 2, 2024

I think that this fixed the problem by introducing a bug =)

Before merging you should see if it does the same thing by not registering the SIGINT handler at all.

It looks to me like registering a signal handler for SIGINT, makes it so the JVM then does not run the shutdown hooks, if the SIGINT handler does not shut down the JVM. This lets the application decide if it wants to keep running or not.

Look at these examples in plain Scala. The first correctly runs the shutdown hook, and exits with a 130.

object Hook extends App:

  val runningThread = new Thread:
    override def run(): Unit =
      try
        while true do
          println("Running")
          sleep(1000)
      catch
        case _: InterruptedException =>
          println("Sleep Interrupted")
          Thread.currentThread().interrupt()

  val shutdownHookThread = new Thread:
    override def run(): Unit =
      println("Shutting down")
      runningThread.interrupt()
      runningThread.join()

  java.lang.Runtime.getRuntime.addShutdownHook(shutdownHookThread)

  runningThread.start()

This one, registers a SIGINT handler, and when run, the handler is called, but the shutdown hook is not, and the JVM keeps running.

object Hook extends App:

  val handler = new SignalHandler():
    override def handle(signal: Signal): Unit =
      println("Signal " + signal)

  val runningThread = new Thread:
    override def run(): Unit =
      try
        while true do
          println("Running")
          sleep(1000)
      catch
        case _: InterruptedException =>
          println("Sleep Interrupted")
          Thread.currentThread().interrupt()

  val shutdownHookThread = new Thread:
    override def run(): Unit =
      println("Shutting down")
      runningThread.interrupt()
      runningThread.join()

  Signal.handle(new Signal("INT"), handler);

  java.lang.Runtime.getRuntime.addShutdownHook(shutdownHookThread)

  runningThread.start()

So I think in the pull request, trying to register using the string "SIGINT" instead of "INT" makes it so the signal handler registration fails, but this fixes the issue, because the shutdown hook still runs.

Before, when the correct signal handler, registered with the "INT" string runs, then the signal handler is fired, which just dumps the fibers to the console, but the shutdown hook doesn't run.

So either we should just not register for SIGINT at all, or if we do, we need to do more than just dump the fibers, we need to actually request the runtime to shutdown.

@kyri-petrou
Copy link
Contributor

@mlhales I think you were meant to add this comment in #9366 instead. Do you mind adding it there so that it's visible to other users reviewing the PR?

Copy link

algora-pbc bot commented Dec 5, 2024

🎉🎈 @clarzte has been awarded $250! 🎈🎊

@kyri-petrou kyri-petrou linked a pull request Dec 6, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants