Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jul 27, 2015

JIRA: https://issues.apache.org/jira/browse/SPARK-9378

As old ParquetRelation is completely removed from codebase and ParquetRelation2 becomes ParquetRelation, one test in org.apache.spark.sql.hive.execution.SQLQuerySuite that checks schema stored by Hive will fail as observed in #7520's recent test report. We should remove the test now.

@SparkQA
Copy link

SparkQA commented Jul 27, 2015

Test build #38535 has finished for PR 7693 at commit d8b3c18.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jul 27, 2015

cc @liancheng

@cloud-fan
Copy link
Contributor

I saw this CTAS with serde test case failed in jenkins, but sometime it can pass, do you know why?

@viirya
Copy link
Member Author

viirya commented Jul 27, 2015

Not very sure. I suppose that it is related to removing old ParquetRelation code path. As I tested locally, it never passes. Can you pass this test locally?

@viirya
Copy link
Member Author

viirya commented Jul 27, 2015

I just thought that before old ParquetRelation code path is removed, this table in the test CTAS with serde,

sql(
  """CREATE TABLE ctas5
    | STORED AS parquet AS
    |   SELECT key, value
    |   FROM src
    |   ORDER BY key, value""".stripMargin)

seems to be saved as a MetastoreRelation. So in HiveCommandStrategy, it will run DescribeHiveTableCommand. I checked DescribeHiveTableCommand. It looks like to get the correct result.

Now it goes another path in HiveCommandStrategy, thus it causes this problem.

@liancheng
Copy link
Contributor

Reproduced this test failure locally. A proper fix for this issue can be:

    withSQLConf(HiveContext.CONVERT_METASTORE_PARQUET.key -> "false") {
      checkExistence(sql("DESC EXTENDED ctas5"), true,
        "name:key", "type:string", "name:value", "ctas5",
        "org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat",
        "org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat",
        "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe",
        "MANAGED_TABLE"
      )
    }

The reason is that, the MetastoreRelation in DESC EXTENDED ctas5 is converted to a Parquet data source table.

@viirya I'm going to open a new PR to fix this issue since it is probably breaking the PR builder. Will attribute the new PR to you. Thanks for bringing up and investigating this issue!

@viirya
Copy link
Member Author

viirya commented Jul 27, 2015

@liancheng Thanks.

@viirya viirya closed this Jul 27, 2015
@liancheng
Copy link
Contributor

However, what makes me confused is that, the following PySpark snippet shows the same result under Spark 1.4:

In [4]: sqlContext.sql('create table x stored as parquet as select 1 as key').show()


In [5]: sqlContext.sql('desc extended x').show()
+--------+---------+-------+
|col_name|data_type|comment|
+--------+---------+-------+
|     key|      int|       |
+--------+---------+-------+

Anyway, I'm fixing this now. Need further investigation why it just started failing.

asfgit pushed a commit that referenced this pull request Jul 27, 2015
This is a proper version of PR #7693 authored by viirya

The reason why "CTAS with serde" fails is that the `MetastoreRelation` gets converted to a Parquet data source relation by default.

Author: Cheng Lian <lian@databricks.com>

Closes #7700 from liancheng/spark-9378-fix-ctas-test and squashes the following commits:

4413af0 [Cheng Lian] Fixes test case "CTAS with serde"
@viirya viirya deleted the remove_hive_parquet_schema_test branch December 27, 2023 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants