[SPARK-14387][SPARK-19459][SQL] Enable Hive-1.x ORC compatibility with spark.sql.hive.convertMetastoreOrc #19235

dongjoon-hyun · 2017-09-14T16:22:38Z

What changes were proposed in this pull request?

This PR includes #14471 to enable spark.sql.hive.convertMetastoreOrc for SPARK-19459 (ORC tables cannot be read when they contain char/varchar columns). All credits should go to @rajeshbalamohan .

For SPARK-19459, the padding is handled by Hive-side HiveCharWritable via HiveBaseChar.java on read. So, ORCFileFormat can handle this unlikely from ParquetFileFormat. For Parquet, please see SPARK-21997.

How was this patch tested?

Pass the newly added test cases.

gatorsmile · 2017-09-14T16:54:01Z

I think this is not a fix. What is the root cause?

SparkQA · 2017-09-14T17:45:01Z

Test build #81789 has finished for PR 19235 at commit 57332c3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

vanzin · 2017-09-14T17:48:01Z

Won't this cause Hive's parquet code to be used by default instead of Spark's? I'm not sure that's what we want.

dongjoon-hyun · 2017-09-14T17:55:32Z

Thank you for review, @gatorsmile and @vanzin . Yes, this is not a way where Apache Spark is heading. It's slow.

Here, I wanted to raise this issue of convertMetastoreXXX things since convertMetastoreXXX seems not to be tested completly for both ORC/Parquet. Especially, as you see here, 14 failures means the test cases depend implicitly on that option. I think we need to test for both true and false for all those cases. Otherwise, we need to add withSQLConf(true) explicitly.

dongjoon-hyun · 2017-09-14T21:10:44Z

I'm not sure where I hit this before from some another JIRA issue. But, I updated 'WHERE' clause examples. Maybe, there was discussion on this difference.

vanzin · 2017-09-14T23:16:52Z

I wanted to raise this issue of convertMetastoreXXX things since convertMetastoreXXX seems not to be tested completly

That's fine, but that's not what your change is proposing. If you look at the test failures, at least the one I looked at is because the wrong exception is being thrown, not because the feature is broken. That doesn't excuse the fact that it's not tested, but maybe the situation is not that dire.

But as Xiao has mentioned, you're not fixing the problem you describe in the change summary; you haven't even root caused the problem. You just found out that using Hive classes to read the parquet data works around the problem, but that's not an acceptable fix for the problem - it's just a workaround that those affected by the issue can use.

dongjoon-hyun · 2017-09-15T00:05:12Z

I agree with you. Okay. I'll refocus this.

…ive.convertMetastoreOrc

dongjoon-hyun · 2017-09-15T02:39:08Z

@gatorsmile and @vanzin .

I'm comparing with ORC now. Previously, ORC fails with another reason. ~~I'll make another PR for that.~~ I found that #14471 is enough for ORC.

In case of ORC, ORC itself handles truncations on write. The padding is handled by Hive side HiveCharWritable via HiveBaseChar.java on read. In case of Parquet, I guess Parquet is the same, but there is no such a padding logic like HiveCharWritable in Spark.

dongjoon-hyun · 2017-09-15T02:44:03Z

Since this PR is invalid, I'll reuse this PR instead of creating new one.

…with char/varchar

dongjoon-hyun · 2017-09-15T02:52:16Z

sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcSourceSuite.scala

-      checkAnswer(spark.table("hive_orc"), result)
-      checkAnswer(spark.table("spark_orc"), result)
+      Seq("false", "true").foreach { value =>
+        withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> value) {


This is a test case for ORC file format. Based on #14471, I'm enabling this.
For Parquet, I think we can proceed separately if ORC is finished.

SparkQA · 2017-09-15T04:51:44Z

Test build #81806 has finished for PR 19235 at commit c6d2c35.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun changed the title ~~[SPARK-21997][SQL] Turn off spark.sql.hive.convertMetastoreParquet by default~~ [SPARK-21997][SQL][WIP] Turn off spark.sql.hive.convertMetastoreParquet by default Sep 14, 2017

[SPARK-14387][SQL] Enable Hive-1.x ORC compatibility with spark.sql.h…

170f44b

…ive.convertMetastoreOrc

[SPARK-19459][TEST] Add convertMetastoreOrc test case for ORC tables …

c6d2c35

…with char/varchar

dongjoon-hyun changed the title ~~[SPARK-21997][SQL][WIP] Turn off spark.sql.hive.convertMetastoreParquet by default~~ [SPARK-14387][SPARK-19459][SQL] Enable Hive-1.x ORC compatibility with spark.sql.hive.convertMetastoreOrc Sep 15, 2017

dongjoon-hyun commented Sep 15, 2017

View reviewed changes

dongjoon-hyun closed this Oct 13, 2017

dongjoon-hyun deleted the SPARK-21997 branch March 23, 2018 04:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-14387][SPARK-19459][SQL] Enable Hive-1.x ORC compatibility with spark.sql.hive.convertMetastoreOrc #19235

[SPARK-14387][SPARK-19459][SQL] Enable Hive-1.x ORC compatibility with spark.sql.hive.convertMetastoreOrc #19235

Uh oh!

dongjoon-hyun commented Sep 14, 2017 •

edited

Loading

Uh oh!

gatorsmile commented Sep 14, 2017

Uh oh!

SparkQA commented Sep 14, 2017

Uh oh!

vanzin commented Sep 14, 2017

Uh oh!

dongjoon-hyun commented Sep 14, 2017 •

edited

Loading

Uh oh!

dongjoon-hyun commented Sep 14, 2017 •

edited

Loading

Uh oh!

vanzin commented Sep 14, 2017

Uh oh!

dongjoon-hyun commented Sep 15, 2017

Uh oh!

dongjoon-hyun commented Sep 15, 2017 •

edited

Loading

Uh oh!

dongjoon-hyun commented Sep 15, 2017

Uh oh!

dongjoon-hyun Sep 15, 2017

Uh oh!

SparkQA commented Sep 15, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[SPARK-14387][SPARK-19459][SQL] Enable Hive-1.x ORC compatibility with spark.sql.hive.convertMetastoreOrc #19235

[SPARK-14387][SPARK-19459][SQL] Enable Hive-1.x ORC compatibility with spark.sql.hive.convertMetastoreOrc #19235

Uh oh!

Conversation

dongjoon-hyun commented Sep 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

gatorsmile commented Sep 14, 2017

Uh oh!

SparkQA commented Sep 14, 2017

Uh oh!

vanzin commented Sep 14, 2017

Uh oh!

dongjoon-hyun commented Sep 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Sep 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vanzin commented Sep 14, 2017

Uh oh!

dongjoon-hyun commented Sep 15, 2017

Uh oh!

dongjoon-hyun commented Sep 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented Sep 15, 2017

Uh oh!

dongjoon-hyun Sep 15, 2017

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Sep 15, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dongjoon-hyun commented Sep 14, 2017 •

edited

Loading

dongjoon-hyun commented Sep 14, 2017 •

edited

Loading

dongjoon-hyun commented Sep 14, 2017 •

edited

Loading

dongjoon-hyun commented Sep 15, 2017 •

edited

Loading