Skip to content

Conversation

@Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Aug 27, 2025

What changes were proposed in this pull request?

This PR fixes UninterruptibleLock.isInterruptible to avoid duplicated interruption when the thread is already interrupted.

Why are the changes needed?

The "uninterruptible" semantic of UninterruptibleThreadis broken (i.e., UninterruptibleThread is interruptible even if it's under runUninterruptibly) after the fix #50594. The probelm is that the state of
shouldInterruptThread becomes unsafe when there are multiple interrupts concurrently.

For example, thread A could interrupt UninterruptibleThread ut first before UninterruptibleThread enters runUninterruptibly. Right after that, another thread B starts to invoke ut.interrupt() and pass through uninterruptibleLock.isInterruptible (becasue at this point, shouldInterruptThread = uninterruptible = false). Before thread B invokes super.interrupt(), UninterruptibleThread ut enters runUninterruptibly and pass through uninterruptibleLock.getAndSetUninterruptible and set uninterruptible = true. Then, thread ut continues the check uninterruptibleLock.isInterruptPending. However, uninterruptibleLock.isInterruptPending return false at this point (due to shouldInterruptThread = Thread.interrupted = true) even though thread B is actully interrupting. As a result, the state of shouldInterruptThread becomes inconsistent between thread B and thread ut. Then, as uninterruptibleLock.isInterruptPending returns false, ut to continute to execute f. At the same time, thread B invokes super.interrupt(), and f could be interrupted

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually tested. The issue can be easily reproduced if we run UninterruptibleThreadSuite.stress test for 100 times in a row:

[info]   true did not equal false (UninterruptibleThreadSuite.scala:208)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at org.apache.spark.util.UninterruptibleThreadSuite.$anonfun$new$7(UninterruptibleThreadSuite.scala:208)
...

And the issue is gone after the fix.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the CORE label Aug 27, 2025
@Ngone51 Ngone51 requested review from cloud-fan and mridulm August 27, 2025 04:51
@Ngone51
Copy link
Member Author

Ngone51 commented Aug 27, 2025

cc @mridulm @cloud-fan @vrozov

@cloud-fan
Copy link
Contributor

Can the existing tests verify this fix?

@Ngone51
Copy link
Member Author

Ngone51 commented Aug 27, 2025

Can the existing tests verify this fix?

@cloud-fan Yes. UninterruptibleThreadSuite.stress test could esaily fail if we run it for 100 times in a row. But it won't after the fix.

@vrozov
Copy link
Member

vrozov commented Aug 27, 2025

@Ngone51 What is the assumption? If thread A or thread B interrupts ut before it calls uninterruptibleLock.getAndSetUninterruptible(true), ut can be interrupted.

@Ngone51
Copy link
Member Author

Ngone51 commented Aug 28, 2025

@Ngone51 What is the assumption? If thread A or thread B interrupts ut before it calls uninterruptibleLock.getAndSetUninterruptible(true), ut can be interrupted.

@vrozov Thread ut is interrupted in this case. But it continous the execution because there is no InterruptedException triggered.

Copy link
Member

@vrozov vrozov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cloud-fan
Copy link
Contributor

@Ngone51 since this needs manual test, have you verified the new fix locally?

@Ngone51
Copy link
Member Author

Ngone51 commented Aug 28, 2025

@cloud-fan Yes, it still works.

@cloud-fan
Copy link
Contributor

thanks, merging to master/4.0!

@cloud-fan cloud-fan closed this in 78871d7 Aug 29, 2025
cloud-fan pushed a commit that referenced this pull request Aug 29, 2025
…duplicated interrupt

### What changes were proposed in this pull request?

This PR fixes `UninterruptibleLock.isInterruptible` to avoid duplicated interruption when the thread is already interrupted.

### Why are the changes needed?

The "uninterruptible" semantic of `UninterruptibleThread`is broken (i.e., `UninterruptibleThread` is interruptible even if it's under `runUninterruptibly`) after the fix #50594. The probelm is that the state of
`shouldInterruptThread` becomes unsafe when there are multiple interrupts concurrently.

For example, thread A could interrupt UninterruptibleThread ut first before UninterruptibleThread enters `runUninterruptibly`. Right after that, another thread B starts to invoke ut.interrupt() and pass through `uninterruptibleLock.isInterruptible` (becasue at this point, `shouldInterruptThread = uninterruptible = false`). Before thread B invokes `super.interrupt()`, UninterruptibleThread ut enters `runUninterruptibly` and pass through `uninterruptibleLock.getAndSetUninterruptible` and set `uninterruptible = true`. Then, thread ut continues the check `uninterruptibleLock.isInterruptPending`. However, `uninterruptibleLock.isInterruptPending` return false at this point (due to `shouldInterruptThread = Thread.interrupted = true`) even though thread B is actully interrupting. *As a result, the state of `shouldInterruptThread` becomes inconsistent between thread B and thread ut.* Then, as `uninterruptibleLock.isInterruptPending` returns false, ut to continute to execute `f`. At the same time, thread B invokes `super.interrupt()`, and `f` could be interrupted

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually tested. The issue can be easily reproduced if we run `UninterruptibleThreadSuite.stress test` for 100 times in a row:
```
[info]   true did not equal false (UninterruptibleThreadSuite.scala:208)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at org.apache.spark.util.UninterruptibleThreadSuite.$anonfun$new$7(UninterruptibleThreadSuite.scala:208)
...
```
And the issue is gone after the fix.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #52139 from Ngone51/fix-uninterruptiable.

Authored-by: Yi Wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 78871d7)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@Ngone51
Copy link
Member Author

Ngone51 commented Aug 29, 2025

Thanks! @cloud-fan @vrozov

zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
…duplicated interrupt

### What changes were proposed in this pull request?

This PR fixes `UninterruptibleLock.isInterruptible` to avoid duplicated interruption when the thread is already interrupted.

### Why are the changes needed?

The "uninterruptible" semantic of `UninterruptibleThread`is broken (i.e., `UninterruptibleThread` is interruptible even if it's under `runUninterruptibly`) after the fix apache#50594. The probelm is that the state of
`shouldInterruptThread` becomes unsafe when there are multiple interrupts concurrently.

For example, thread A could interrupt UninterruptibleThread ut first before UninterruptibleThread enters `runUninterruptibly`. Right after that, another thread B starts to invoke ut.interrupt() and pass through `uninterruptibleLock.isInterruptible` (becasue at this point, `shouldInterruptThread = uninterruptible = false`). Before thread B invokes `super.interrupt()`, UninterruptibleThread ut enters `runUninterruptibly` and pass through `uninterruptibleLock.getAndSetUninterruptible` and set `uninterruptible = true`. Then, thread ut continues the check `uninterruptibleLock.isInterruptPending`. However, `uninterruptibleLock.isInterruptPending` return false at this point (due to `shouldInterruptThread = Thread.interrupted = true`) even though thread B is actully interrupting. *As a result, the state of `shouldInterruptThread` becomes inconsistent between thread B and thread ut.* Then, as `uninterruptibleLock.isInterruptPending` returns false, ut to continute to execute `f`. At the same time, thread B invokes `super.interrupt()`, and `f` could be interrupted

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually tested. The issue can be easily reproduced if we run `UninterruptibleThreadSuite.stress test` for 100 times in a row:
```
[info]   true did not equal false (UninterruptibleThreadSuite.scala:208)
[info]   org.scalatest.exceptions.TestFailedException:
[info]   at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472)
[info]   at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471)
[info]   at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295)
[info]   at org.apache.spark.util.UninterruptibleThreadSuite.$anonfun$new$7(UninterruptibleThreadSuite.scala:208)
...
```
And the issue is gone after the fix.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#52139 from Ngone51/fix-uninterruptiable.

Authored-by: Yi Wu <yi.wu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 271392a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants