-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-43341][SQL] Patch StructType.toDDL not picking up on non-nullability of nested column #41016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-43341][SQL] Patch StructType.toDDL not picking up on non-nullability of nested column #41016
Conversation
|
cc @MaxGekk FYI |
MaxGekk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BramBoog Could you rebase on the recent master, please.
|
Hmm yeah that was strange, I should have noticed that immediately :/. Seems GitHub had added a whole list of commits which already were on master to the diff. Changing the base to a different branch and back to master has fixed it though. |
|
Hey guys, any chance to have the PR merged? Although not critical it would simplify our schema tests a lot. |
|
@zaza Yeah the way I see it the PR is ready, I've just been waiting for a review |
|
@BramBoog it has a conflicts against the lastest master branch. You would need to resolve the conflicts by git fetch upstream & git rebase upstream/master |
|
@HyukjinKwon Apologies, it took a while for me to find the time to get back to this. Resolved the conflicts, PR is up to date again. |
What changes were proposed in this pull request?
When converting a StructType instance containing a nested StructType column which in turn contains a column for which
nullable = falseto a DDL string using.toDDL, the resulting DDL string does not include this non-nullability. For example:gives
This is due to the fact that
StructType.toDDLcallsStructField.toDDLfor its fields, which in turn calls.sqlfor itsdataType. IfdataTypeis a StructType, the call to.sqlin turn calls.sqlfor all the nested fields, and this last method does not include the nullability of the field in its output. The proposed solution is therefore to haveStructField.toDDLcalldataType.toDDLfor a StructType, since this will include information about nullability of nested columns.To work around the different DDL formats of nested and non-nested structs (the former is wrapped in
"STRUCT ...>"and usescolName: dataTypefor its fields instead ofcolName dataType), package-private nested-specific versions of.toDDLhave been added for StructType and StructField.Why are the changes needed?
Currently, converting a StructType schema to a DDL string does not pass information about nullability of nested columns. This leads to a loss of information, and means converting to DDL and then back could alter the StructType schema.
Does this PR introduce any user-facing change?
Yes, given the example above, the output will become:
How was this patch tested?
In
StructTypeSuite, thenestedStructtesting value has been modified to include a non-nullable nested column. The relevant unit tests have been changed accordingly.