[SPARK-34320][SQL] Migrate ALTER TABLE DROP COLUMNS commands to use UnresolvedTable to resolve the identifier #32854

imback82 · 2021-06-10T06:06:15Z

What changes were proposed in this pull request?

This PR proposes to migrate the following ALTER TABLE ... DROP COLUMNS command to use UnresolvedTable as a child to resolve the table identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in JIRA or proposal doc.

Why are the changes needed?

This is a part of effort to make the relation lookup behavior consistent: SPARK-29900.

Does this PR introduce any user-facing change?

After this PR, the above ALTER TABLE ... DROP COLUMNS commands will have a consistent resolution behavior.

How was this patch tested?

Updated existing tests.

imback82 · 2021-06-10T06:11:01Z

@cloud-fan this is an alternative approach suggested in #32542 (comment). We can migrate the alter table commands one by one via this approach. Let me know if you want to see more examples of this for other alter table commands.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala

imback82 · 2021-06-10T06:11:36Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

      Seq(ResolveCommandsWithIfExists) ++
      postHocResolutionRules: _*),
+    Batch("Normalize Alter Table Commands", Once, ResolveAlterTableCommands),
    Batch("Normalize Alter Table", Once, ResolveAlterTableChanges),


We can remove ResolveAlterTableChanges once all the alter table commands are migrated to ResolveAlterTableCommands.

SparkQA · 2021-06-10T07:19:22Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44146/

SparkQA · 2021-06-10T07:54:23Z

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44146/

SparkQA · 2021-06-10T09:58:56Z

Test build #139619 has finished for PR 32854 at commit 0295845.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
case class AlterTableDropColumns(

SparkQA · 2021-06-10T17:36:36Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44179/

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala

cloud-fan · 2021-06-10T18:05:00Z

...core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala

      val changes = keys.map(key => TableChange.removeProperty(key))
      AlterTableExec(table.catalog, table.identifier, changes) :: Nil

+    case AlterTableDropColumns(table: ResolvedTable, cols) =>


shall we add a base trait for all alter table commands and define a def changes: Seq[TableChanges]? Then we can handle all alter table commands in one place.

SparkQA · 2021-06-10T19:11:05Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44179/

SparkQA · 2021-06-10T21:15:39Z

Test build #139651 has finished for PR 32854 at commit bd96508.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-06-23T19:18:04Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44745/

SparkQA · 2021-06-23T19:27:15Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44745/

SparkQA · 2021-06-23T23:11:58Z

Test build #140217 has finished for PR 32854 at commit c9c545b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2021-06-24T09:02:52Z

@imback82 how much work do you think is needed to migrate other ALTER TABLE commands? The framework is built, and if it's not much work, let's migrate all of them in one PR. We can also do migration one by one in followup PRs if it needs many code changes.

imback82 · 2021-06-24T14:57:09Z

@cloud-fan It will be many code changes if we migrate all the remaining alter table commands (4 remaining) since we need to refactor ResolveAlterTableChanges to fit into the new framework.

cloud-fan · 2021-06-24T14:59:23Z

thanks, merging to master!

…f the error message ### What changes were proposed in this pull request? This is a followup PR for SPARK-34320 (#32854). That PR changed the error message of `ALTER TABLE` but `V2JDBCTest` didn't comply with the change. ### Why are the changes needed? To fix`v2.*JDBCSuite` failure. ``` [info] - SPARK-33034: ALTER TABLE ... add new columns (173 milliseconds) [info] - SPARK-33034: ALTER TABLE ... drop column *** FAILED *** (126 milliseconds) [info] "Cannot delete missing field bad_column in postgresql.alt_table schema: root [info] |-- C2: string (nullable = true) [info] ; line 1 pos 0; [info] 'AlterTableDropColumns [unresolvedfieldname(bad_column)] [info] +- ResolvedTable org.apache.spark.sql.execution.datasources.v2.jdbc.JDBCTableCatalog7f4b7516, alt_table, JDBCTable(alt_table,StructType(StructField(C2,StringType,true)),org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions5842301d), [C2#1879] [info] " did not contain "Cannot delete missing field bad_column in alt_table schema" (V2JDBCTest.scala:106) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) [info] at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) [info] at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) [info] at org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) [info] at org.apache.spark.sql.jdbc.v2.V2JDBCTest.$anonfun$$init$$6(V2JDBCTest.scala:106) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1461) [info] at org.apache.spark.sql.test.SQLTestUtilsBase.withTable(SQLTestUtils.scala:305) [info] at org.apache.spark.sql.test.SQLTestUtilsBase.withTable$(SQLTestUtils.scala:303) [info] at org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.withTable(DockerJDBCIntegrationSuite.scala:95) [info] at org.apache.spark.sql.jdbc.v2.V2JDBCTest.$anonfun$$init$$5(V2JDBCTest.scala:95) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190) [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:190) [info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182) [info] at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:62) [info] at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:234) [info] at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:227) [info] at org.apache.spark.SparkFunSuite.runTest(SparkFunSuite.scala:62) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:233) [info] at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) [info] at scala.collection.immutable.List.foreach(List.scala:431) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:233) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:232) [info] at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1563) [info] at org.scalatest.Suite.run(Suite.scala:1112) [info] at org.scalatest.Suite.run$(Suite.scala:1094) [info] at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1563) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:237) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) [info] at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:237) [info] at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:236) [info] at org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterAll$$super$run(SparkFunSuite.scala:62) [info] at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) [info] at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:62) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:318) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:513) [info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:413) [info] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [info] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [info] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [info] at java.lang.Thread.run(Thread.java:748) [info] - SPARK-33034: ALTER TABLE ... update column type (122 milliseconds) [info] - SPARK-33034: ALTER TABLE ... rename column (93 milliseconds) [info] - SPARK-33034: ALTER TABLE ... update column nullability (92 milliseconds) [info] - CREATE TABLE with table comment (38 milliseconds) [info] - CREATE TABLE with table property (52 milliseconds) [info] MySQLIntegrationSuite: [info] - Basic test (61 milliseconds) [info] - Numeric types (67 milliseconds) [info] - Date types (59 milliseconds) [info] - String types (50 milliseconds) [info] - Basic write test (216 milliseconds) [info] - query JDBC option (64 milliseconds) [info] Run completed in 19 minutes, 43 seconds. [info] Total number of tests run: 89 [info] Suites: completed 14, aborted 0 [info] Tests: succeeded 84, failed 5, canceled 0, ignored 0, pending 0 [info] *** 5 TESTS FAILED *** [error] Failed tests: [error] org.apache.spark.sql.jdbc.v2.OracleIntegrationSuite [error] org.apache.spark.sql.jdbc.v2.MsSqlServerIntegrationSuite [error] org.apache.spark.sql.jdbc.v2.DB2IntegrationSuite [error] org.apache.spark.sql.jdbc.v2.MySQLIntegrationSuite [error] org.apache.spark.sql.jdbc.v2.PostgresIntegrationSuite [error] (docker-integration-tests / Test / test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 1223 s (20:23), completed Jun 25, 2021 1:31:04 AM ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? docker-integration-tests on GA. Closes #33074 from sarutak/followup-SPARK-34320. Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com> Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.com>

initial commit

0295845

github-actions bot added the SQL label Jun 10, 2021

imback82 commented Jun 10, 2021

View reviewed changes

fix test

bd96508

cloud-fan reviewed Jun 10, 2021

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala Show resolved Hide resolved

cloud-fan reviewed Jun 10, 2021

View reviewed changes

imback82 added 4 commits June 23, 2021 10:58

using UnresolvedFieldName/ResolvedFieldName

ef6097d

refactor

d7b4f50

Merge remote-tracking branch 'upstream/master' into alter_alternative

7505ffb

remove unused def

c9c545b

cloud-fan approved these changes Jun 24, 2021

View reviewed changes

cloud-fan closed this in 5b4816c Jun 24, 2021

sarutak mentioned this pull request Jun 25, 2021

[SPARK-34320][SQL][FOLLOWUP] Modify V2JDBCTest to follow the change of the error message. #33074

Closed

[SPARK-34320][SQL] Migrate ALTER TABLE DROP COLUMNS commands to use UnresolvedTable to resolve the identifier #32854

[SPARK-34320][SQL] Migrate ALTER TABLE DROP COLUMNS commands to use UnresolvedTable to resolve the identifier #32854

Uh oh!

Conversation

imback82 commented Jun 10, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

imback82 commented Jun 10, 2021

Uh oh!

Uh oh!

Uh oh!

imback82 Jun 10, 2021

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 10, 2021

Uh oh!

SparkQA commented Jun 10, 2021

Uh oh!

SparkQA commented Jun 10, 2021

Uh oh!

SparkQA commented Jun 10, 2021

Uh oh!

Uh oh!

cloud-fan Jun 10, 2021

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 10, 2021

Uh oh!

SparkQA commented Jun 10, 2021

Uh oh!

SparkQA commented Jun 23, 2021

Uh oh!

SparkQA commented Jun 23, 2021

Uh oh!

SparkQA commented Jun 23, 2021

Uh oh!

cloud-fan commented Jun 24, 2021

Uh oh!

imback82 commented Jun 24, 2021

Uh oh!

cloud-fan commented Jun 24, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants