-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14387][SPARK-19459][SQL] Enable Hive-1.x ORC compatibility with spark.sql.hive.convertMetastoreOrc #19235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I think this is not a fix. What is the root cause? |
|
Test build #81789 has finished for PR 19235 at commit
|
|
Won't this cause Hive's parquet code to be used by default instead of Spark's? I'm not sure that's what we want. |
|
Thank you for review, @gatorsmile and @vanzin . Yes, this is not a way where Apache Spark is heading. It's slow. Here, I wanted to raise this issue of |
|
I'm not sure where I hit this before from some another JIRA issue. But, I updated 'WHERE' clause examples. Maybe, there was discussion on this difference. |
That's fine, but that's not what your change is proposing. If you look at the test failures, at least the one I looked at is because the wrong exception is being thrown, not because the feature is broken. That doesn't excuse the fact that it's not tested, but maybe the situation is not that dire. But as Xiao has mentioned, you're not fixing the problem you describe in the change summary; you haven't even root caused the problem. You just found out that using Hive classes to read the parquet data works around the problem, but that's not an acceptable fix for the problem - it's just a workaround that those affected by the issue can use. |
|
I agree with you. Okay. I'll refocus this. |
…ive.convertMetastoreOrc
|
@gatorsmile and @vanzin . I'm comparing with ORC now. Previously, ORC fails with another reason. In case of ORC, ORC itself handles truncations on write. The padding is handled by Hive side |
|
Since this PR is invalid, I'll reuse this PR instead of creating new one. |
…with char/varchar
| checkAnswer(spark.table("hive_orc"), result) | ||
| checkAnswer(spark.table("spark_orc"), result) | ||
| Seq("false", "true").foreach { value => | ||
| withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a test case for ORC file format. Based on #14471, I'm enabling this.
For Parquet, I think we can proceed separately if ORC is finished.
|
Test build #81806 has finished for PR 19235 at commit
|
What changes were proposed in this pull request?
This PR includes #14471 to enable
spark.sql.hive.convertMetastoreOrcfor SPARK-19459 (ORC tables cannot be read when they contain char/varchar columns). All credits should go to @rajeshbalamohan .For SPARK-19459, the padding is handled by Hive-side
HiveCharWritableviaHiveBaseChar.javaon read. So, ORCFileFormat can handle this unlikely from ParquetFileFormat. For Parquet, please see SPARK-21997.How was this patch tested?
Pass the newly added test cases.