Skip to content

Conversation

@liancheng
Copy link
Contributor

This is a proper version of PR #7693 authored by @viirya

The reason why "CTAS with serde" fails is that the MetastoreRelation gets converted to a Parquet data source relation by default.

@liancheng liancheng changed the title [SPARK-9378] [SQL] [HOTFIX] Fixes test case "CTAS with serde" [SPARK-9378] [SQL] Fixes test case "CTAS with serde" Jul 27, 2015
@liancheng
Copy link
Contributor Author

Removed the "HOTFIX" tag, since this is actually not a newly introduced issue. Spark 1.4 behaves exactly the same. With a table created in Hive via:

CREATE TABLE x STORED AS PARQUET AS SELECT 1 AS key;

We have the following PySpark result in Spark 1.4:

In [5]: sqlContext.setConf('spark.sql.hive.convertMetastoreParquet', 'true')


In [6]: sqlContext.sql('desc extended x').show()
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
+--------+---------+-------+
|col_name|data_type|comment|
+--------+---------+-------+
|     key|      int|       |
+--------+---------+-------+



In [7]: sqlContext.setConf('spark.sql.hive.convertMetastoreParquet', 'false')


In [8]: sqlContext.sql('desc extended x').show()
+--------------------+--------------------+-------+
|            col_name|           data_type|comment|
+--------------------+--------------------+-------+
|                 key|                 int|   null|
|Detailed Table In...|Table(tableName:x...|       |
+--------------------+--------------------+-------+

So I'm pretty puzzled why this test case only fails occasionally. A possible explanation is that, some test cases may set spark.sql.hive.convertMetastoreParquet to false without properly restoring the original value. When such a test case is executed before "CTAS with serde", no failure occurs.

cc @viirya

@liancheng
Copy link
Contributor Author

BTW, to the committer who's going to merge this PR, please attribute this PR to @viirya.

@viirya
Copy link
Member

viirya commented Jul 27, 2015

@liancheng Thanks for investigating this.

I think your explanation is reasonable. Actually, in the problematic test, we don't explicitly set or check the value of HiveContext.CONVERT_METASTORE_PARQUET. It might be affected by other tests as you said. This can explain why it fails occasionally. But I think we can't easily verify it. However, it is still interesting why it doesn't cause problems before.

@liancheng
Copy link
Contributor Author

@viirya Actually I did find a test case that simply restores HiveContext.CONVERT_METASTORE_PARQUET to false. Before removing the old Parquet code, this test case is executed twice, namely with Parquet data source enabled and disabled respectively. This might be the reason why we didn't notice this failure. But I still doubt...

@SparkQA
Copy link

SparkQA commented Jul 27, 2015

Test build #38566 has finished for PR 7700 at commit 4413af0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jul 27, 2015

I've merged this.

@asfgit asfgit closed this in 8e7d2be Jul 27, 2015
@liancheng liancheng deleted the spark-9378-fix-ctas-test branch July 27, 2015 23:37
@liancheng
Copy link
Contributor Author

@viirya Sorry, seems that @rxin didn't notice my comment above, and still attributed this one to me :( I reassigned the JIRA ticket to you though.

@viirya
Copy link
Member

viirya commented Jul 28, 2015

@liancheng No problem. You already mentioned me in the PR description. Thanks. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants