-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-34302][SQL] Migrate ALTER TABLE ... CHANGE COLUMN command to use UnresolvedTable to resolve the identifier #33113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| update.newDataType match { | ||
| case _: StructType => | ||
| alter.failAnalysis(s"Cannot update ${table.name} field $fieldName type: " + | ||
| s"update a struct by updating its fields") | ||
| case _: MapType => | ||
| alter.failAnalysis(s"Cannot update ${table.name} field $fieldName type: " + | ||
| s"update a map by updating $fieldName.key or $fieldName.value") | ||
| case _: ArrayType => | ||
| alter.failAnalysis(s"Cannot update ${table.name} field $fieldName type: " + | ||
| s"update the element by updating $fieldName.element") | ||
| case u: UserDefinedType[_] => | ||
| alter.failAnalysis(s"Cannot update ${table.name} field $fieldName type: " + | ||
| s"update a UserDefinedType[${u.sql}] by updating its fields") | ||
| case _: CalendarIntervalType | _: YearMonthIntervalType | | ||
| _: DayTimeIntervalType => | ||
| alter.failAnalysis(s"Cannot update ${table.name} field $fieldName to " + | ||
| s"interval type") | ||
| case _ => | ||
| // update is okay | ||
| } | ||
|
|
||
| // We don't need to handle nested types here which shall fail before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are moved below to checkAlterTableCommand.
| positionArgumentExists( | ||
| updatePos.position(), | ||
| parent, | ||
| colsToAdd.getOrElse(parentName, Nil)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should consider colsToAdd since the CHANGE COLUMN command doesn't add a new column. So, in this PR, I didn't consider new columns added. Please let me know if I understood this wrong.
| } | ||
| field.get._2 | ||
| } | ||
| def findParentStruct(fieldNames: Seq[String]): StructType = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findField and findParentStruct are existing functions in CheckAnalysis with a slight modification (e.g., not passing operation argument). The existing functions will be removed once all alter table commands are migrated.
|
cc @cloud-fan |
| } | ||
|
|
||
| /** | ||
| * The logical plan of the ALTER TABLE ... CHANGE COLUMN command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's put the SQL standard syntax in the doc ALTER TABLE ... ALTER COLUMN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
thanks, merging to master! |
|
This seems to break JDBC v2 suite. I made a follow-up. Could you review that, @imback82 and @cloud-fan ? |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
| .map(dt => col.field.copy(dataType = dt)) | ||
| .getOrElse(col.field) | ||
| val newDataType = a.dataType.get | ||
| newDataType match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR: I think there is a small bug here, if the data type is not changed, we shouldn't fail even if the new data type is struct/array/map. @imback82 can you help to fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is OK. In the analzer, we will set the newDataType to None if it's the same with the existing data type in the table.
| resolveFieldNames(table.schema, u.name).getOrElse(u) | ||
| case u: UnresolvedFieldPosition => u.position match { | ||
| case after: After => | ||
| resolveFieldNames(table.schema, u.fieldName.init :+ after.column()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to put fieldName in UnresolvedFieldPosition? We can easily get it via AlterTableAlterColumn.column.name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### What changes were proposed in this pull request? This is a followup of #33113, to do some code cleanup: 1. `UnresolvedFieldPosition` doesn't need to include the field name. We can get it through "context" (`AlterTableAlterColumn.column.name`). 2. Run `ResolveAlterTableCommands` in the main resolution batch, so that the column/field resolution is also unified between v1 and v2 commands (same error message). 3. Fail immediately in `ResolveAlterTableCommands` if we can't resolve the field, instead of waiting until `CheckAnalysis`. We don't expect other rules to resolve fields in ALTER TABLE commands, so failing immediately is simpler and we can remove duplicated code in `CheckAnalysis`. ### Why are the changes needed? code simplification. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes #33213 from cloud-fan/follow. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? This is a followup of #33113, to do some code cleanup: 1. `UnresolvedFieldPosition` doesn't need to include the field name. We can get it through "context" (`AlterTableAlterColumn.column.name`). 2. Run `ResolveAlterTableCommands` in the main resolution batch, so that the column/field resolution is also unified between v1 and v2 commands (same error message). 3. Fail immediately in `ResolveAlterTableCommands` if we can't resolve the field, instead of waiting until `CheckAnalysis`. We don't expect other rules to resolve fields in ALTER TABLE commands, so failing immediately is simpler and we can remove duplicated code in `CheckAnalysis`. ### Why are the changes needed? code simplification. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes #33213 from cloud-fan/follow. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 8b46e26) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
This PR proposes to migrate the following
ALTER TABLE ... CHANGE COLUMNcommand to useUnresolvedTableas achildto resolve the table identifier. This allows consistent resolution rules (temp view first, etc.) to be applied for both v1/v2 commands. More info about the consistent resolution rule proposal can be found in JIRA or proposal doc.Why are the changes needed?
This is a part of effort to make the relation lookup behavior consistent: SPARK-29900.
Does this PR introduce any user-facing change?
After this PR, the above
ALTER TABLE ... CHANGE COLUMNcommands will have a consistent resolution behavior.How was this patch tested?
Updated existing tests.