Skip to content

Conversation

@yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jan 19, 2020

What changes were proposed in this pull request?

After this commit d67b98e, we are able to create table or alter table with interval column types if the external catalog accepts which is varying the interval type's purpose for internal usage. With d67b98e 's original purpose it should only work from cast logic.

Instead of adding type checker for the interval type from commands to commands to work among catalogs, It much simpler to treat interval as an invalid data type but can be identified by cast only.

Why are the changes needed?

enhance interval internal usage purpose.

Does this PR introduce any user-facing change?

NO,
Additionally, this PR restores user behavior when using interval type to create/alter table schema, e.g. for hive catalog
for 2.4,

Caused by: org.apache.spark.sql.catalyst.parser.ParseException:
DataType calendarinterval is not supported.(line 1, pos 0)

for master after d67b98e

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: Error: type expected at the position 0 of 'interval' but 'interval' is found.
  at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:862)

now with this pr, we restore the type checker in spark side.

How was this patch tested?

add more ut

@SparkQA
Copy link

SparkQA commented Jan 19, 2020

Test build #116994 has finished for PR 27277 at commit b0c3169.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 19, 2020

Test build #116995 has finished for PR 27277 at commit 46a96a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

This does reduce the scope of d67b98e to cast only, but we need more than just "keep behavior the same as 2.4", we need to hide interval type from external data sources/catalogs.

In 2.4, CREATE TABLE can still has interval type column, even if parser doesn't allow it. This is because we have a java API spark.catalog.createTable. Fortunately, this is OK in 2.4, as Hive catalog doesn't allow it and the CREATE TABLE command fails at the end.

In 3.0, the catalog becomes an API, and it's possible that we leak interval type column to external catalog implementations.

I think it's OK to allow interval type in the parser, as we allow it in spark.catalog.createTable already. But it's important to disallow creating table with interval type column like 2.4. We need to add a check in the analyzer.

checkAnswer(spark.internalCreateDataFrame(rdd, table.schema), Seq.empty)
}

test("CreateTable: invalid schema if has interval type") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also test CTAS and replace table?

@SparkQA
Copy link

SparkQA commented Jan 20, 2020

Test build #117089 has finished for PR 27277 at commit 2c29977.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 20, 2020

Test build #117102 has finished for PR 27277 at commit 4fc3f4b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 20, 2020

Test build #117115 has finished for PR 27277 at commit 9e5b209.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 20, 2020

Test build #117119 has finished for PR 27277 at commit a796d68.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 20, 2020

Test build #117126 has finished for PR 27277 at commit 18163ae.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • case class UnresolvedTableWithViewExists(view: ResolvedView) extends LeafNode
  • case class ResolvedView(identifier: Identifier, isTempView: Boolean) extends LeafNode
  • abstract class AlterTable extends Command
  • case class AlterTableAddColumns(
  • case class AlterTableAlterColumn(
  • case class AlterTableRenameColumn(
  • case class AlterTableDropColumns(
  • case class AlterTableSetProperties(
  • case class AlterTableUnsetProperties(
  • case class AlterTableSetLocation(

@cloud-fan cloud-fan closed this in 0388b7a Jan 21, 2020
@cloud-fan
Copy link
Contributor

thanks, merging to master!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants