-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic type widening in INSERT #2785
Automatic type widening in INSERT #2785
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This is going to be really convenient! I have a question and a small nit, but otherwise it LGTM
spark/src/test/scala/org/apache/spark/sql/delta/DeltaTypeWideningSchemaEvolutionSuite.scala
Outdated
Show resolved
Hide resolved
case (ArrayType(s: StructType, sNull: Boolean), ArrayType(t: StructType, tNull: Boolean)) | ||
if s != t && sNull == tNull => | ||
addCastsToArrayStructs(tblName, attr, s, t, sNull) | ||
addCastsToArrayStructs(tblName, attr, s, t, sNull, allowTypeWidening) | ||
case (s: AtomicType, t: AtomicType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about structs that can be evolved using type evolution? Can this case be handled without addCastsToStructs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you could check that the struct only strictly contains widening type changes - no new fields or other type differences as that would require a cast.
In that case we don't currently modify the struct, just deconstruct it and reconstruct it as it was, which would land itself to being optimized away by an optimzier rule (not sure if that happens though).
This isn't going to happen often though, it means a type in that struct will actually be widened during this operation. I'd rather keep the code simpler and less error prone rather than trying to optimize for this case by comparing the input and output structs for new columns or non-widening type changes
5efbef6
to
3d08e11
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh this is amazing.
Which Delta project/connector is this regarding?
Description
This change is part of the type widening table feature.
Type widening feature request: #2622
Type Widening protocol RFC: #2624
It adds automatic type widening as part of schema evolution in INSERT. During resolution, when schema evolution and type widening are enabled, type differences between the input query and the target table are handled as follows:
ImplicitMetadataOperation
(already implemented as part of MERGE support)How was this patch tested?
DeltaTypeWideningAutomaticSuite
to cover type evolution in INSERTThis PR introduces the following user-facing changes
The table feature is available in testing only, there's no user-facing changes as of now.
When automatic schema evolution is enabled in INSERT and the source schema contains a type that is wider than the target schema:
With type widening disabled: the type in the target schema is not changed. A cast is added to the input to insert to match the expected target type.
With type widening enabled: the type in the target schema is updated to the wider source type.
After the INSERT operation, the target schema is
key int, value int
.