Skip to content

Conversation

@ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Jul 19, 2019

What changes were proposed in this pull request?

More detail see 25085.
This PR is to discuss details that add exception when create field type NullType

Now, it's ok to create table like:

val schema = new StructType().add("c", NullType)
spark.catalog.createTable(
          tableName = "t",
          source = "json",
          schema = schema,
          options = Map.empty[String, String])

But it's not the spark idea

How was this patch tested?

UT

@ulysses-you
Copy link
Contributor Author

cc @cloud-fan

@maropu
Copy link
Member

maropu commented Jul 19, 2019

ok to test

@SparkQA
Copy link

SparkQA commented Jul 19, 2019

Test build #107883 has finished for PR 25198 at commit 791b3cb.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@botekchristophe
Copy link

Looks like a good idea to check this when creating a table with NullType

object CreateTableCheck extends Rule[LogicalPlan] {
override def apply(plan: LogicalPlan): LogicalPlan = {
plan match {
case ct: CreateTable if ct.tableDesc.schema.exists { f =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move it to the rule PreWriteCheck?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider that it's a little different between create table(DDL) and insert data (DML). Maybe split in two rules ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea we can add a DDLCheck rule. Please follow PreWriteCheck to make it a pure-checking rule.

Also don't forget to handle DS v2 CREATE TABLE as well, which has a different logical plan.

  • CreateV2Table
  • CreateTableAsSelect
  • ReplaceTable
  • ReplaceTableAsSelect

also cc @rdblue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will check this later

@SparkQA
Copy link

SparkQA commented Jul 23, 2019

Test build #108054 has finished for PR 25198 at commit fdaee89.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

failAnalysis("DataType NullType is not supported for create table")

// DataSourceStrategy will convert CreateTable to CreateDataSourceTableCommand before check
case cdstc: CreateDataSourceTableCommand if cdstc.table.schema.exists { f =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this isn't much code, but I think it would be better to refactor this exists and the failAnalysis call into a method so it isn't repeated several times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your advise.

@SparkQA
Copy link

SparkQA commented Jul 25, 2019

Test build #108135 has finished for PR 25198 at commit 9791914.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 7, 2019

Test build #108758 has finished for PR 25198 at commit 0ba1242.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}

/**
* SPARK-28443: Spark sql add exception when create field type NullType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: SPARK-28443: fail the DDL command if it creates a table with null-type columns

def failAnalysis(msg: String): Unit = { throw new AnalysisException(msg) }

def throwWhenExistsNullType(schema: StructType): Unit = {
if (schema.exists(f => DataTypes.NullType.sameType(f.dataType))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: f.dataType == NullType

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about nested filed? Do other databases forbid it as well?

Copy link
Contributor Author

@ulysses-you ulysses-you Aug 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive also forbid nested field with NullType

case ReplaceTable(_, _, tableSchema, _, _, _) =>
throwWhenExistsNullType(tableSchema)

case _ => // OK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateTableAsSelect, ReplaceTableAsSelect are missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark allowed CreateTableAsSelect and ReplaceTableAsSelect with NullType, isn't it ?
E.g. create table test as select null as c1.

@SparkQA
Copy link

SparkQA commented Aug 8, 2019

Test build #108791 has finished for PR 25198 at commit 517a1e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

case CreateTable(tableDesc, _, _) =>
checkSchema(tableDesc.schema)

case CreateV2Table(_, _, tableSchema, _, _, _) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the rationale here is that, we don't want to create tables with null type field. It should be same for both CREATE TABLE and CTAS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, and also add AlterTable.

@SparkQA
Copy link

SparkQA commented Aug 8, 2019

Test build #108811 has finished for PR 25198 at commit 138ac5e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

test("SPARK-28443: fail the DDL command if it creates a table with null-type columns") {
withTable("t") {
val e = intercept[AnalysisException]{
sql(s"CREATE TABLE t as SELECT NULL AS c")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also test inner fields?

fields.foreach{field => throwWhenExistsNullType(field.dataType)}

case other if other == NullType =>
failAnalysis("DataType NullType is not supported for create table.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about "cannot create tables with null-type column/field"


def failAnalysis(msg: String): Unit = { throw new AnalysisException(msg) }

def throwWhenExistsNullType(dataType: DataType): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failNullType

}
}

test("SPARK-28443: fail the DDL command if it creates a table with null-type columns") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget that spark not support create table t as select null as c now. Remove this.

@SparkQA
Copy link

SparkQA commented Aug 8, 2019

Test build #108833 has finished for PR 25198 at commit 108b7f9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 9, 2019

Test build #108848 has finished for PR 25198 at commit fa22cf7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ulysses-you
Copy link
Contributor Author

ping @cloud-fan

@ulysses-you ulysses-you requested a review from rdblue September 30, 2019 04:06
@SparkQA
Copy link

SparkQA commented Sep 30, 2019

Test build #111595 has finished for PR 25198 at commit 6a1b4f5.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 30, 2019

Test build #111605 has finished for PR 25198 at commit 358e2a2.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

github-actions bot commented Jan 9, 2020

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants