Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jun 17, 2016

What changes were proposed in this pull request?

This PR allows emptyDataFrame.write since the user didn't specify any partition columns.

Before

scala> spark.emptyDataFrame.write.parquet("/tmp/t1")
org.apache.spark.sql.AnalysisException: Cannot use all columns for partition columns;
scala> spark.emptyDataFrame.write.csv("/tmp/t1")
org.apache.spark.sql.AnalysisException: Cannot use all columns for partition columns;

After this PR, there occurs no exceptions and the created directory has only one file, _SUCCESS, as expected.

How was this patch tested?

Pass the Jenkins tests including updated test cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since validatePartitionColumn is used for only writing-related classes, I added this description for clarification.
We can update the function description if the usage pattern is changed in the future.

@dongjoon-hyun
Copy link
Member Author

Hi, @tdas .
This is the PR to care the reported corner case.

@SparkQA
Copy link

SparkQA commented Jun 17, 2016

Test build #60686 has finished for PR 13730 at commit da1017e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its weird that a check like this is in the PartitioningUtils. This check seems nothing to do with partitioning, as its basically certain file formats that do not support writing a DF with no columns. Is there somewhere earlier where you can check this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for review, @tdas.
Yes. Indeed. This is beyond of the scope of PartitioningUtils.
Actually, this logic is used 3 classes, PreWriteCheck., DataSource, FileStreamSinkWriter
I'll try to move this.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jun 17, 2016

Hi, @tdas .
At first look, I thought this corner case needs to throw exceptions.
But, after considering more carefully, I want to allow emptyDataFrame.write without exceptions.
That is more natural way to handle this corner case because the user didn't give any partition columns.
How do you think about this?

(Anyway, I will update this PR for further discussion)

@SparkQA
Copy link

SparkQA commented Jun 17, 2016

Test build #60704 has finished for PR 13730 at commit 5f1db7f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @tdas .
Could you review this PR again when you have some time?

@dongjoon-hyun
Copy link
Member Author

Hi, @rxin .
Could you review this PR ?

@dongjoon-hyun
Copy link
Member Author

Oh, sorry. The master was changed.

@dongjoon-hyun
Copy link
Member Author

I will recheck this PR again.

@dongjoon-hyun
Copy link
Member Author

Yep. The case still exists for parquet/csv and I updated the cases.
The previous text case changes like the following and looks legitimate.

scala> spark.emptyDataFrame.write.text("/tmp/t1")
org.apache.spark.sql.AnalysisException: Text data source supports only a single column, and you have 0 columns.;

@dongjoon-hyun
Copy link
Member Author

Hi, @tdas .
Could you review this PR when you have some time?
Current master still has this issue.

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61014 has finished for PR 13730 at commit fd0b1cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Hi, @tdas .
Could you give me some advice for the direction about how to change this PR?

@dongjoon-hyun
Copy link
Member Author

Rebased.

@SparkQA
Copy link

SparkQA commented Jun 24, 2016

Test build #61183 has finished for PR 13730 at commit 7bb64d7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 26, 2016

Test build #61254 has finished for PR 13730 at commit ba9a529.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Ping @tdas

@SparkQA
Copy link

SparkQA commented Jun 28, 2016

Test build #61391 has started for PR 13730 at commit 7d38003.

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@SparkQA
Copy link

SparkQA commented Jun 28, 2016

Test build #61397 has finished for PR 13730 at commit 7d38003.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jun 28, 2016

Hi, @tdas .
Could you give me some advice on this issue?

@rxin
Copy link
Contributor

rxin commented Jun 29, 2016

Thanks - merging in master/2.0.

asfgit pushed a commit that referenced this pull request Jun 29, 2016
…throws non-intuitive exception

## What changes were proposed in this pull request?

This PR allows `emptyDataFrame.write` since the user didn't specify any partition columns.

**Before**
```scala
scala> spark.emptyDataFrame.write.parquet("/tmp/t1")
org.apache.spark.sql.AnalysisException: Cannot use all columns for partition columns;
scala> spark.emptyDataFrame.write.csv("/tmp/t1")
org.apache.spark.sql.AnalysisException: Cannot use all columns for partition columns;
```

After this PR, there occurs no exceptions and the created directory has only one file, `_SUCCESS`, as expected.

## How was this patch tested?

Pass the Jenkins tests including updated test cases.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #13730 from dongjoon-hyun/SPARK-16006.

(cherry picked from commit 9b1b3ae)
Signed-off-by: Reynold Xin <rxin@databricks.com>
@asfgit asfgit closed this in 9b1b3ae Jun 29, 2016
@dongjoon-hyun
Copy link
Member Author

Thank you for merging, @rxin .

@dongjoon-hyun dongjoon-hyun deleted the SPARK-16006 branch July 20, 2016 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants