[SPARK-22258][SQL] Writing empty dataset fails with ORC format #19477

dongjoon-hyun · 2017-10-12T02:43:43Z

What changes were proposed in this pull request?

Since SPARK-8501, Spark doesn't create an ORC file for empty data sets. However, SPARK-21669 is trying to get the length of the written file at the end of writing tasks and fails with FileNotFoundException. This is a regression at 2.3.0 only. We had better fix this and have a test case to prevent future regression.

scala> Seq("str").toDS.limit(0).write.format("orc").save("/tmp/a")
17/10/11 19:28:59 ERROR Utils: Aborting task
java.io.FileNotFoundException: File file:/tmp/a/_temporary/0/_temporary/attempt_20171011192859_0000_m_000000_0/part-00000-aa56c3cf-ec35-48f1-bb73-23ad1480e917-c000.snappy.orc does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
	at org.apache.spark.sql.execution.datasources.BasicWriteTaskStatsTracker.getFileSize(BasicWriteStatsTracker.scala:60)

How was this patch tested?

Pass the newly added test cases.

dongjoon-hyun · 2017-10-12T02:53:37Z

Hi, @gatorsmile and @cloud-fan .
This is a regression of SPARK-21669 (Internal API for collecting metrics/stats during FileFormatWriter jobs) at Spark 2.3.0. Could you review this PR?

viirya · 2017-10-12T03:22:51Z

@dongjoon-hyun This is kind of duplicate to #18979, although the viewpoint of the issue is different.

viirya · 2017-10-12T03:24:56Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala

    }
  }
+
+  Seq("orc", "parquet", "csv", "json", "text").foreach { format =>


Seems this test case is worth merging into. cc @steveloughran Shall we include this test into #18979?

+1. Please, @steveloughran . :)

dongjoon-hyun · 2017-10-12T03:41:31Z

Wow. There is a PR for that. Thank you for informing that, @viirya ! Then, it's good.

SparkQA · 2017-10-12T05:35:08Z

Test build #82654 has finished for PR 19477 at commit b545f28.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

This is going to create merge conflict with this branch until I rebase it, which I'm about to Change-Id: Ie2309066ad7892cb20155d9de8248c1682bba526

[SPARK-22258][SQL] Writing empty dataset fails with ORC format

b545f28

viirya reviewed Oct 12, 2017

View reviewed changes

dongjoon-hyun closed this Oct 12, 2017

dongjoon-hyun deleted the SPARK-22258 branch October 12, 2017 03:41

dongjoon-hyun mentioned this pull request Oct 25, 2017

[SPARK-15474][SQL] Write and read back non-emtpy schema with empty dataframe #19571

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-22258][SQL] Writing empty dataset fails with ORC format #19477

[SPARK-22258][SQL] Writing empty dataset fails with ORC format #19477

Uh oh!

dongjoon-hyun commented Oct 12, 2017

Uh oh!

dongjoon-hyun commented Oct 12, 2017

Uh oh!

viirya commented Oct 12, 2017

Uh oh!

viirya Oct 12, 2017 •

edited

Loading

Uh oh!

dongjoon-hyun Oct 12, 2017 •

edited

Loading

Uh oh!

dongjoon-hyun commented Oct 12, 2017

Uh oh!

SparkQA commented Oct 12, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-22258][SQL] Writing empty dataset fails with ORC format #19477

[SPARK-22258][SQL] Writing empty dataset fails with ORC format #19477

Uh oh!

Conversation

dongjoon-hyun commented Oct 12, 2017

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

dongjoon-hyun commented Oct 12, 2017

Uh oh!

viirya commented Oct 12, 2017

Uh oh!

viirya Oct 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Oct 12, 2017

Uh oh!

SparkQA commented Oct 12, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

viirya Oct 12, 2017 •

edited

Loading

dongjoon-hyun Oct 12, 2017 •

edited

Loading