[SPARK-51834][SQL] Support end-to-end table constraint management #50631

gengliangwang · 2025-04-18T00:12:14Z

What changes were proposed in this pull request?

Support end-to-end table constraint management:

Create a DSV2 table with constraints
Replace a DSV2 table with constraints
ALTER a DSV2 table to add a new constraint
ALTER a DSV2 table to drop a constraint

Why are the changes needed?

Allow users to define and modify table constraints in connectors that support them.

Does this PR introduce any user-facing change?

No, it is for DSV2 framework.

How was this patch tested?

New UT

Was this patch authored or co-authored using generative AI tooling?

No

gengliangwang · 2025-04-18T00:12:41Z

cc @aokolnychyi

gengliangwang · 2025-04-18T00:39:27Z

...talyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala

+    val constraint = tableConstraint.toV2Constraint(isCreateTable = false)
+    val validatedTableVersion = table match {
+      case t: ResolvedTable if constraint.enforced() =>
+        t.table.currentVersion()


Created a follow-up https://issues.apache.org/jira/browse/SPARK-51835 for testing the table version

aokolnychyi · 2025-04-22T15:49:29Z

common/utils/src/main/resources/error/error-conditions.json

+    "message" : [
+      "The check constraint `<checkCondition>` is non-deterministic. Check constraints must only contain deterministic expressions."
+    ],
+    "sqlState" : "42621"


The error code seems consistent with DB2 and what we use for generated columns, +1.

aokolnychyi · 2025-04-22T15:54:13Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableSpec.scala

@@ -18,11 +18,12 @@
 package org.apache.spark.sql.catalyst.analysis

 import org.apache.spark.SparkThrowable
-import org.apache.spark.sql.catalyst.expressions.{Expression, Literal}
+import org.apache.spark.sql.catalyst.expressions._


What is the agreement in the community on wildcard imports? Are they permitted after a given number of elements are imported directly?

As per https://github.com/databricks/scala-style-guide?tab=readme-ov-file#imports,
"Avoid using wildcard imports, unless you are importing more than 6 entities"

aokolnychyi · 2025-04-22T16:22:13Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveTableSpec.scala

+    Some(LocalRelation(attributeList))
+  }
+
+  private def analyzeConstraints(


Is there any other way to do this? Can we restructure the plan so that the analyzer naturally resolves these expressions? I like that we pivoted to DefaultValueExpression for default values, rather than using a custom analyzer.

I can't think of any other way.
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala#L335
This is similar to what we did for column default.

Let me think a bit.

Can we do something similar to what @cloud-fan did for OverwriteByExpression in SPARK-33412?

My worry is that we added DefaultValueExpression to eventually get rid of the custom analyzer and optimizer for default values. It would be great not to add more dependencies on it.

There are some differences here. In a create table statement:

column default value and option CANNOT reference columns

constraint CAN reference columns

Using a default analyzer with a dummy relation is simple, and it can include all other analysis batches other than the main Resolution batch.

aokolnychyi · 2025-04-22T16:25:46Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala

+    val validateStatus = if (isCreateTable) {
+      Constraint.ValidationStatus.UNVALIDATED
+    } else {
+      Constraint.ValidationStatus.VALID


Is the idea here that we always validate existing data in ALTER?

Yes for check constraint

aokolnychyi · 2025-04-22T16:26:50Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala

@@ -112,6 +117,27 @@ case class CheckConstraint(
  with TableConstraint {
 // scalastyle:on line.size.limit

+  def toV2Constraint(isCreateTable: Boolean): Constraint = {


I wonder if the input param should be related to the validation status, rather than to whether it is create or alter. For instance, we can make validation optional in ALTER.

ok, how about let's make all the validate status as UNVALIDATED in this PR? Once we support enforcing check constraint, we can have more discussions on this one

Makes sense to me.

Updated. PTAL. I am keeping the parameter here since the enforcement will be added soon.

aokolnychyi · 2025-04-22T16:30:33Z

...talyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2AlterTableCommands.scala

+      case _ =>
+        null
+    }
+    Seq(TableChange.addConstraint(constraint, validatedTableVersion))


CHECK constraints must optionally validate existing data in ALTER.
Am I right this PR doesn't have this? What would be our plan?

must optionally validate

Make sense. Do you mean CHECK ... NOT ENFOCED?

ENFORCED/NOT ENFORCED impacts subsequent writes. I was referring to ALTER TABLE ... ADD CONSTRAINT that must scan the existing data.

Created a follow-up: https://issues.apache.org/jira/browse/SPARK-51905

Just for my understanding. Anton's comment was about how to validate the existing data in ALTER TABLE ... ADD CONSTRAINT. Is it addressed in this PR, @gengliangwang ?

The above follow-up JIRA (SPARK-51905) is not about that, isn't it?

SPARK-51905 Disallow NOT ENFORCED CHECK constraint

Yeah, I think we need one more JIRA to add the scan capability to ALTER TABLE ... ADD CONSTRAINT.

Actually, @gengliangwang already created it: SPARK-51903.

Got it. Thank you, @aokolnychyi . Ya, SPARK-51903 is what I expected.

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala

sql/core/src/test/scala/org/apache/spark/sql/execution/command/CheckConstraintParseSuite.scala

viirya · 2025-04-28T07:35:24Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala

+  // Convert to a data source v2 constraint
+  def toV2Constraint(isCreateTable: Boolean): Constraint


For an API doc, could you clarify what isCreateTable is used for? Maybe add @param for it. Seems it is not even used in this PR.

Also the doc comment style is not consistent with other methods.

See comment in #50631 (comment)

To avoid confusion, I removed the parameter

viirya · 2025-04-28T07:40:26Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala

+  override def nullable: Boolean = true
+
+  override def dataType: DataType =
+    throw new SparkUnsupportedOperationException("CONSTRAINT_DOES_NOT_HAVE_DATA_TYPE")


Hmm, by default dataType is not supported, but it has default nullable? Does it make sense?

Can you explain what is the issue here?

I'm not sure. Is there a case you will use nullable without dataType? Should it be unsupported too by default for nullable?

Not a big deal, though. Just wondering the reason.

Either way should be ok. I changed to throwing exception.

gengliangwang · 2025-04-29T05:06:39Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala

+      nameParts: Seq[String],
+      allowTemp: Boolean,
+      columns: Seq[ColumnDefinition]): ResolvedIdentifier = {
+    val columnOutput = columns.map { col =>


@cloud-fan I made this new change to bypass 38c6ef4#diff-583171e935b2dc349378063a5841c5b98b30a2d57ac3743a9eccfe7bffcb8f2aR286
Does this look good to you?

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/CheckConstraintSuite.scala

aokolnychyi

I think the new approach that doesn't involve invoking the custom analyzer is a much better option!

common/utils/src/main/resources/error/error-conditions.json

gengliangwang · 2025-04-30T03:18:19Z

@aokolnychyi @cloud-fan @viirya @dongjoon-hyun Thanks for the review. Merging this one to master.

### What changes were proposed in this pull request? Support end-to-end table constraint management: - Create a DSV2 table with constraints - Replace a DSV2 table with constraints - ALTER a DSV2 table to add a new constraint - ALTER a DSV2 table to drop a constraint ### Why are the changes needed? Allow users to define and modify table constraints in connectors that support them. ### Does this PR introduce _any_ user-facing change? No, it is for DSV2 framework. ### How was this patch tested? New UT ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50631 from gengliangwang/constraintE2E. Authored-by: Gengliang Wang <gengliang@apache.org> Signed-off-by: Gengliang Wang <gengliang@apache.org>

gengliangwang added 6 commits April 17, 2025 08:41

add asConstraint

f73411b

support create/replace table with constraint

259de54

implment alter table commands

3ab17fc

add tests

2fac017

add tests

76508df

rename

a6507e0

github-actions bot added the SQL label Apr 18, 2025

gengliangwang requested review from viirya and cloud-fan April 18, 2025 00:12

add replace table test cases

b78c4f0

gengliangwang commented Apr 18, 2025

View reviewed changes

fix formatting

d54a64f

aokolnychyi reviewed Apr 22, 2025

View reviewed changes

gengliangwang added 4 commits April 23, 2025 20:57

convert constraints to expressions

0e4b88a

refactor ResolvedIdentifier

aa2b99b

more refactor and fix compiling

f4932ab

add comments

d284204

cloud-fan reviewed Apr 24, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Apr 24, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Apr 24, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala Outdated Show resolved Hide resolved

gengliangwang added 2 commits April 24, 2025 00:56

simplify

722d0a1

change validation status of check

e7d98d9

cloud-fan reviewed Apr 28, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveCatalogs.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Apr 28, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/constraints.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Apr 28, 2025

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/command/CheckConstraintParseSuite.scala Outdated Show resolved Hide resolved

cloud-fan reviewed Apr 28, 2025

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/command/CheckConstraintParseSuite.scala Outdated Show resolved Hide resolved

cloud-fan approved these changes Apr 28, 2025

View reviewed changes

viirya reviewed Apr 28, 2025

View reviewed changes

gengliangwang added 2 commits April 28, 2025 10:35

address comments

5101f2e

address comment

a87d4eb

viirya approved these changes Apr 28, 2025

View reviewed changes

gengliangwang added 2 commits April 28, 2025 15:59

fix test and add tests

59551c8

recursively replace

8044c9b

gengliangwang commented Apr 29, 2025

View reviewed changes

dongjoon-hyun reviewed Apr 29, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Apr 29, 2025

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Apr 29, 2025

View reviewed changes

sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/CheckConstraintSuite.scala Outdated Show resolved Hide resolved

aokolnychyi approved these changes Apr 29, 2025

View reviewed changes

aokolnychyi reviewed Apr 29, 2025

View reviewed changes

common/utils/src/main/resources/error/error-conditions.json Outdated Show resolved Hide resolved

address comments

a997c9b

gengliangwang closed this in fc1cb78 Apr 30, 2025

		// Convert to a data source v2 constraint
		def toV2Constraint(isCreateTable: Boolean): Constraint

[SPARK-51834][SQL] Support end-to-end table constraint management #50631

[SPARK-51834][SQL] Support end-to-end table constraint management #50631

Uh oh!

Conversation

gengliangwang commented Apr 18, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

gengliangwang commented Apr 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

viirya Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aokolnychyi Apr 22, 2025 •

edited

Loading

aokolnychyi Apr 29, 2025 •

edited

Loading

viirya Apr 28, 2025 •

edited

Loading