[SPARK-35868][CORE] Add fs.s3a.downgrade.syncable.exceptions if not set #33044

dongjoon-hyun · 2021-06-24T00:49:01Z

What changes were proposed in this pull request?

This PR aims to add fs.s3a.downgrade.syncable.exceptions=true if it's not provided by the users.

Why are the changes needed?

Currently, event log feature is broken with Hadoop 3.2 profile due to UnsupportedOperationException because HADOOP-17597 changes the default behavior to throw exceptions by default since Apache Hadoop 3.3.1. We know that it's because EventLogFileWriters is using hadoopDataStream.foreach(_.hflush()), but this PR aims to provide the same UX across Spark distributions with Hadoop2/Hadoop 3 at Apache Spark 3.2.0.

$ bin/spark-shell -c spark.eventLog.enabled=true -c spark.eventLog.dir=s3a://dongjoon/spark-events/
...
21/06/23 17:34:35 ERROR SparkContext: Error initializing SparkContext.
java.lang.UnsupportedOperationException: S3A streams are not Syncable. See HADOOP-17597.

Does this PR introduce any user-facing change?

Yes, this will recover the existing behavior.

How was this patch tested?

Manual.

$ build/sbt package -Phadoop-3.2 -Phadoop-cloud
$ bin/spark-shell -c spark.eventLog.enabled=true -c spark.eventLog.dir=s3a://dongjoon/spark-events/
...(working)...

If the users provide the configuration explicitly, it will return to the original behavior throwing exceptions.

$ bin/spark-shell -c spark.eventLog.enabled=true -c spark.eventLog.dir=s3a://dongjoon/spark-events/ -c spark.hadoop.fs.s3a.downgrade.syncable.exceptions=false
...
21/06/23 17:44:41 ERROR Main: Failed to initialize Spark session.
java.lang.UnsupportedOperationException: S3A streams are not Syncable. See HADOOP-17597.

dongjoon-hyun · 2021-06-24T00:56:41Z

cc @sunchao , @steveloughran

dongjoon-hyun · 2021-06-24T01:03:00Z

cc @gengliangwang for Apache Spark 3.2.0.

sunchao

LGTM (non-binding) thanks @dongjoon-hyun !

dongjoon-hyun · 2021-06-24T01:30:08Z

Thank you, @sunchao !

SparkQA · 2021-06-24T01:33:29Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44756/

SparkQA · 2021-06-24T01:41:30Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44756/

gengliangwang

LGTM

SparkQA · 2021-06-24T03:34:47Z

Test build #140229 has finished for PR 33044 at commit 4cbf380.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2021-06-24T05:46:21Z

Thank you so much, @gengliangwang ! The python UT failures are irrelevant.
Merged to master for Apache Spark 3.2.0.

HyukjinKwon · 2021-06-24T05:48:27Z

lgtm2

steveloughran · 2021-06-24T08:34:23Z

thx. FWIW, given its causing trouble, do you want this to be the default in hadoop default-xml?

its there to stop people attempting to use s3 as a WAL for HBase or similar, but if applications have been treating it as a low-cost operation in general file IO, then we can just downgrade it broadly and rely on the hope that people don't do this.

[SPARK-35868][CORE] Add fs.s3a.downgrade.syncable.exceptions if not set

4cbf380

github-actions bot added the CORE label Jun 24, 2021

sunchao approved these changes Jun 24, 2021

View reviewed changes

gengliangwang approved these changes Jun 24, 2021

View reviewed changes

dongjoon-hyun closed this in af9b47f Jun 24, 2021

dongjoon-hyun deleted the SPARK-35868 branch June 24, 2021 05:47

dongjoon-hyun mentioned this pull request Jun 16, 2025

[SPARK-52481] Add Spark History Server example apache/spark-kubernetes-operator#249

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-35868][CORE] Add fs.s3a.downgrade.syncable.exceptions if not set #33044

[SPARK-35868][CORE] Add fs.s3a.downgrade.syncable.exceptions if not set #33044

Uh oh!

dongjoon-hyun commented Jun 24, 2021 •

edited

Loading

Uh oh!

dongjoon-hyun commented Jun 24, 2021

Uh oh!

dongjoon-hyun commented Jun 24, 2021 •

edited

Loading

Uh oh!

sunchao left a comment

Uh oh!

dongjoon-hyun commented Jun 24, 2021

Uh oh!

SparkQA commented Jun 24, 2021

Uh oh!

SparkQA commented Jun 24, 2021

Uh oh!

gengliangwang left a comment

Uh oh!

SparkQA commented Jun 24, 2021

Uh oh!

dongjoon-hyun commented Jun 24, 2021

Uh oh!

HyukjinKwon commented Jun 24, 2021

Uh oh!

steveloughran commented Jun 24, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[SPARK-35868][CORE] Add fs.s3a.downgrade.syncable.exceptions if not set #33044

[SPARK-35868][CORE] Add fs.s3a.downgrade.syncable.exceptions if not set #33044

Uh oh!

Conversation

dongjoon-hyun commented Jun 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

dongjoon-hyun commented Jun 24, 2021

Uh oh!

dongjoon-hyun commented Jun 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sunchao left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jun 24, 2021

Uh oh!

SparkQA commented Jun 24, 2021

Uh oh!

SparkQA commented Jun 24, 2021

Uh oh!

gengliangwang left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 24, 2021

Uh oh!

dongjoon-hyun commented Jun 24, 2021

Uh oh!

HyukjinKwon commented Jun 24, 2021

Uh oh!

steveloughran commented Jun 24, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dongjoon-hyun commented Jun 24, 2021 •

edited

Loading

dongjoon-hyun commented Jun 24, 2021 •

edited

Loading