Skip to content

Conversation

@linhongliu-db
Copy link
Contributor

@linhongliu-db linhongliu-db commented Jul 20, 2021

What changes were proposed in this pull request?

Change the NullType.simpleString to "void" to set "void" as the formal type name of NullType

Why are the changes needed?

This PR is intended to address the type name discussion in PR #28833. Here are the reasons:

  1. The type name of NullType is displayed everywhere, e.g. schema string, error message, document. Hence it's not possible to hide it from users, we have to choose a proper name
  2. The "void" is widely used as the type name of "NULL", e.g. Hive, pgSQL
  3. Changing to "void" can enable the round trip of toDDL/fromDDL for NullType. (i.e. make from_json(col, schema.toDDL)) work

Does this PR introduce any user-facing change?

Yes, the type name of "NULL" is changed from "null" to "void". for example:

scala> sql("select null as a, 1 as b").schema.catalogString
res5: String = struct<a:void,b:int>

How was this patch tested?

existing test cases

@linhongliu-db linhongliu-db changed the title [SPARK-36224] Use Void as the type name of NullType [SPARK-36224][SQL] Use Void as the type name of NullType Jul 20, 2021
@SparkQA
Copy link

SparkQA commented Jul 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45835/

@SparkQA
Copy link

SparkQA commented Jul 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45835/

@SparkQA
Copy link

SparkQA commented Jul 20, 2021

Test build #141320 has finished for PR 33437 at commit f805f40.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45918/

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45918/

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me if tests pass

@HyukjinKwon
Copy link
Member

cc @cloud-fan, @dongjoon-hyun, @MaxGekk FYI

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Test build #141400 has finished for PR 33437 at commit f45cf55.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45949/

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45949/

@SparkQA
Copy link

SparkQA commented Jul 21, 2021

Test build #141431 has finished for PR 33437 at commit 4f0a5c9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@linhongliu-db linhongliu-db force-pushed the SPARK-36224-void-type-name branch from 4f0a5c9 to e3e563c Compare July 22, 2021 02:59
@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45982/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45982/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45989/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45989/

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Test build #141463 has finished for PR 33437 at commit e3e563c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 22, 2021

Test build #141471 has finished for PR 33437 at commit db290d9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@linhongliu-db
Copy link
Contributor Author

cc @cloud-fan, the jenkins tests passed.
#33437 (comment)

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the type name of "NULL" is changed from "null" to "void". for example:

@linhongliu-db If this PR introduces behavior change, should we mention this in the SQL migration guide?

@linhongliu-db
Copy link
Contributor Author

If this PR introduces behavior change, should we mention this in the SQL migration guide?

@MaxGekk We can add this to the migration guide. But I personally think it's not necessary because the change is very minor and I don't think changing the type name will break the user's existing workload. i.e. there is no migration needed.

@linhongliu-db linhongliu-db force-pushed the SPARK-36224-void-type-name branch from db290d9 to 7ac3c18 Compare July 29, 2021 07:41
@SparkQA
Copy link

SparkQA commented Jul 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46332/

@SparkQA
Copy link

SparkQA commented Jul 29, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46332/

@SparkQA
Copy link

SparkQA commented Jul 29, 2021

Test build #141819 has finished for PR 33437 at commit 7ac3c18.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

@linhongliu-db
Copy link
Contributor Author

we can also remove this line now

@cloud-fan looks like we can also remove HiveVoidType

@cloud-fan
Copy link
Contributor

looks like we can also remove HiveVoidType

Yea, let's remove it as well

@SparkQA
Copy link

SparkQA commented Jul 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46382/

@SparkQA
Copy link

SparkQA commented Jul 30, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46382/

@SparkQA
Copy link

SparkQA commented Jul 30, 2021

Test build #141873 has finished for PR 33437 at commit 911b5fc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 30, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46404/

@SparkQA
Copy link

SparkQA commented Jul 30, 2021

Test build #141895 has finished for PR 33437 at commit 4d5dcab.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 2, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/46447/

@SparkQA
Copy link

SparkQA commented Aug 2, 2021

Test build #141937 has finished for PR 33437 at commit eacf7ff.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

The DDL type string round trip is really a good point, and it's kind of a bug that from_json(col, schema.toDDL) may not work. I'm merging this to master/3.2, thanks!

@cloud-fan cloud-fan closed this in 2f70077 Aug 2, 2021
cloud-fan pushed a commit that referenced this pull request Aug 2, 2021
### What changes were proposed in this pull request?
Change the `NullType.simpleString` to "void" to set "void" as the formal type name of `NullType`

### Why are the changes needed?
This PR is intended to address the type name discussion in PR #28833. Here are the reasons:
1. The type name of NullType is displayed everywhere, e.g. schema string, error message, document. Hence it's not possible to hide it from users, we have to choose a proper name
2. The "void" is widely used as the type name of "NULL", e.g. Hive, pgSQL
3. Changing to "void" can enable the round trip of `toDDL`/`fromDDL` for NullType. (i.e. make `from_json(col, schema.toDDL)`) work

### Does this PR introduce _any_ user-facing change?
Yes, the type name of "NULL" is changed from "null" to "void". for example:
```
scala> sql("select null as a, 1 as b").schema.catalogString
res5: String = struct<a:void,b:int>
```

### How was this patch tested?
existing test cases

Closes #33437 from linhongliu-db/SPARK-36224-void-type-name.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 2f70077)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants