Skip to content

Conversation

@imback82
Copy link
Contributor

What changes were proposed in this pull request?

This PR proposes to migrate the following ALTER TABLE ... CHANGE COLUMN command to use UnresolvedTable as a child to resolve the table identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in JIRA or proposal doc.

Why are the changes needed?

This is a part of effort to make the relation lookup behavior consistent: SPARK-29900.

Does this PR introduce any user-facing change?

After this PR, the above ALTER TABLE ... CHANGE COLUMN commands will have a consistent resolution behavior.

How was this patch tested?

Updated existing tests.

@github-actions github-actions bot added the SQL label Jun 28, 2021
Comment on lines -533 to -554
update.newDataType match {
case _: StructType =>
alter.failAnalysis(s"Cannot update ${table.name} field $fieldName type: " +
s"update a struct by updating its fields")
case _: MapType =>
alter.failAnalysis(s"Cannot update ${table.name} field $fieldName type: " +
s"update a map by updating $fieldName.key or $fieldName.value")
case _: ArrayType =>
alter.failAnalysis(s"Cannot update ${table.name} field $fieldName type: " +
s"update the element by updating $fieldName.element")
case u: UserDefinedType[_] =>
alter.failAnalysis(s"Cannot update ${table.name} field $fieldName type: " +
s"update a UserDefinedType[${u.sql}] by updating its fields")
case _: CalendarIntervalType | _: YearMonthIntervalType |
_: DayTimeIntervalType =>
alter.failAnalysis(s"Cannot update ${table.name} field $fieldName to " +
s"interval type")
case _ =>
// update is okay
}

// We don't need to handle nested types here which shall fail before
Copy link
Contributor Author

@imback82 imback82 Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are moved below to checkAlterTableCommand.

positionArgumentExists(
updatePos.position(),
parent,
colsToAdd.getOrElse(parentName, Nil))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should consider colsToAdd since the CHANGE COLUMN command doesn't add a new column. So, in this PR, I didn't consider new columns added. Please let me know if I understood this wrong.

}
field.get._2
}
def findParentStruct(fieldNames: Seq[String]): StructType = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findField and findParentStruct are existing functions in CheckAnalysis with a slight modification (e.g., not passing operation argument). The existing functions will be removed once all alter table commands are migrated.

@imback82
Copy link
Contributor Author

cc @cloud-fan

}

/**
* The logical plan of the ALTER TABLE ... CHANGE COLUMN command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's put the SQL standard syntax in the doc ALTER TABLE ... ALTER COLUMN

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 620fde4 Jun 29, 2021
@dongjoon-hyun
Copy link
Member

This seems to break JDBC v2 suite. I made a follow-up. Could you review that, @imback82 and @cloud-fan ?

@SparkQA
Copy link

SparkQA commented Jun 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44892/

@SparkQA
Copy link

SparkQA commented Jun 29, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/44892/

.map(dt => col.field.copy(dataType = dt))
.getOrElse(col.field)
val newDataType = a.dataType.get
newDataType match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR: I think there is a small bug here, if the data type is not changed, we shouldn't fail even if the new data type is struct/array/map. @imback82 can you help to fix it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is OK. In the analzer, we will set the newDataType to None if it's the same with the existing data type in the table.

resolveFieldNames(table.schema, u.name).getOrElse(u)
case u: UnresolvedFieldPosition => u.position match {
case after: After =>
resolveFieldNames(table.schema, u.fieldName.init :+ after.column())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to put fieldName in UnresolvedFieldPosition? We can easily get it via AlterTableAlterColumn.column.name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloud-fan added a commit that referenced this pull request Jul 5, 2021
### What changes were proposed in this pull request?

This is a followup of #33113, to do some code cleanup:
1. `UnresolvedFieldPosition` doesn't need to include the field name. We can get it through "context" (`AlterTableAlterColumn.column.name`).
2. Run `ResolveAlterTableCommands` in the main resolution batch, so that the column/field resolution is also unified between v1 and v2 commands (same error message).
3. Fail immediately in `ResolveAlterTableCommands` if we can't resolve the field, instead of waiting until `CheckAnalysis`. We don't expect other rules to resolve fields in ALTER  TABLE commands, so failing immediately is simpler and we can remove duplicated code in `CheckAnalysis`.

### Why are the changes needed?

code simplification.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #33213 from cloud-fan/follow.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan added a commit that referenced this pull request Jul 5, 2021
### What changes were proposed in this pull request?

This is a followup of #33113, to do some code cleanup:
1. `UnresolvedFieldPosition` doesn't need to include the field name. We can get it through "context" (`AlterTableAlterColumn.column.name`).
2. Run `ResolveAlterTableCommands` in the main resolution batch, so that the column/field resolution is also unified between v1 and v2 commands (same error message).
3. Fail immediately in `ResolveAlterTableCommands` if we can't resolve the field, instead of waiting until `CheckAnalysis`. We don't expect other rules to resolve fields in ALTER  TABLE commands, so failing immediately is simpler and we can remove duplicated code in `CheckAnalysis`.

### Why are the changes needed?

code simplification.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing tests

Closes #33213 from cloud-fan/follow.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 8b46e26)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants