Skip to content

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Apr 18, 2025

Rationale for this change

The reason changing the original implementation (using Channels.newChannel(out)) to directly writing via OutputStream resolves the deadlock issue is as follows:

In the original implementation, using Channels.newChannel(out) introduces an internal lock (WritableByteChannelImpl) that interacts with the underlying OutputStream. When Spark's task interruption mechanism (Task reaper thread) attempts to interrupt or close the channel, it acquires locks in a different order compared to the executor thread writing data. Specifically:

  • The executor thread holds the DFSOutputStream lock and waits for the internal lock of WritableByteChannelImpl.

  • The Task reaper thread holds the internal lock of WritableByteChannelImpl and waits for the DFSOutputStream lock (during hflush()).
    This conflicting lock acquisition order results in a deadlock.

By directly writing to the OutputStream without using Channels.newChannel, the intermediate locking introduced by WritableByteChannelImpl is eliminated. This removes the conflicting lock order scenario, thus resolving the deadlock.

What changes are included in this PR?

  • Removed the usage of Channels.newChannel(out) and WritableByteChannel.
  • Updated the writeAllTo method to write data directly to the OutputStream using ByteBuffer operations.

Are these changes tested?

Manual test.
There is no zombie task(deadlock) after applying this patch:
image

Are there any user-facing changes?

No.

@wangyum wangyum marked this pull request as draft April 18, 2025 02:56
@wangyum wangyum changed the title [WIP] [GH-3193] Avoid deadlock in writeAllTo by removing WritableByteChannel usage Apr 18, 2025
@wangyum wangyum marked this pull request as ready for review April 18, 2025 16:16
@wangyum
Copy link
Member Author

wangyum commented Apr 18, 2025

@gszadovszky @wgtmac Would you please take a look at this patch?

@vrozov
Copy link
Member

vrozov commented Apr 21, 2025

The new implementation is uninterruptible compared to the previous implementation that uses WritableByteChannel.

@gszadovszky
Copy link
Contributor

@wangyum, not sure I get the purpose of this fix. We have already closed the related Parquet issue saying that the bug is in Spark. See here. Am I misinterpreting something?

@wangyum
Copy link
Member Author

wangyum commented Apr 23, 2025

Thanks for the reply @gszadovszky.

Although it is a bug in Spark, Spark has never triggered this bug until the Parquet version was upgraded to 1.15.1.
This is because Parquet 1.15.1 fixed a critical CVE(CVE-2025-30065) issue, prompting many users to consider upgrading, especially those using the parquet-avro module.
I hope the Parquet team can also address this issue so that users with Spark versions below 4.0 can continue to use the new version of Parquet without problems.

@vrozov
Copy link
Member

vrozov commented Apr 23, 2025

@wangyum There is a fix on the Spark side and if it is critical CVE, IMO it will be better to ask Spark community to backport the fix to 3.5.x so existing Spark users can benefit from Parquet upgrade. Can you please try to cherry pick and test it.

@wgtmac
Copy link
Member

wgtmac commented Apr 23, 2025

I agree with @vrozov and @gszadovszky that Spark not Parquet should fix this. Since the root cause is not from Parquet, other user dependency may also trigger this bug and workaround on the Parquet side does not help either.

@wangyum
Copy link
Member Author

wangyum commented Apr 23, 2025

@vrozov What should users do if they are still using Spark 3.0 - Spark 3.4? The Spark community no longer maintains these branches.

@pan3793
Copy link
Member

pan3793 commented Apr 23, 2025

@wgtmac possible to trigger patch releases for older Parquet branches? e.g. 1.13 (used by Spark 3.5) and 1.14

@vrozov
Copy link
Member

vrozov commented Apr 23, 2025

What should users do if they are still using Spark 3.0 - Spark 3.4? The Spark community no longer maintains these branches.

@wangyum Those that use Spark 3.0 - 3.4 should upgrade to Spark 3.5.x so they can continue to receive CVE and other fixes from the Spark community. Additionally from the discussion on the Apache Spark, the Parquet CVE-2025-30065 does not affect Spark, so Spark community even considered not upgrading parquet-java to 1.5.1.

I also think that avoiding using WritableByteChannel is not the correct generic fix. While it may work for Spark, it may break other consumers of the parquet-java as it converts interruptible write to uninterruptible.

As the last resort, both Spark and Parquet are open source and it is possible to recompile both using your own fork.

@wangyum
Copy link
Member Author

wangyum commented Apr 23, 2025

it converts interruptible write to uninterruptible

Before Parquet 1.14.0. It also uninterruptible write:

@Override
public void writeAllTo(OutputStream out) throws IOException {
for (byte[] slab : slabs) {
out.write(slab);
}
}

@vrozov
Copy link
Member

vrozov commented Apr 24, 2025

It is interruptible starting with 1.14 and going back to uninterruptible is not a good option, IMO. That will even impact Spark as it will now have to write all data instead of I/O (write) being interrupted and aborted. If Spark needs this task to run without being interrupted it should use runUninterruptibly.

@wangyum wangyum closed this Jun 23, 2025
@wangyum wangyum deleted the PARQUET-3193 branch June 23, 2025 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants