Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic type widening in INSERT #2785

Merged
merged 2 commits into from
Mar 25, 2024

Conversation

johanl-db
Copy link
Collaborator

@johanl-db johanl-db commented Mar 22, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This change is part of the type widening table feature.
Type widening feature request: #2622
Type Widening protocol RFC: #2624

It adds automatic type widening as part of schema evolution in INSERT. During resolution, when schema evolution and type widening are enabled, type differences between the input query and the target table are handled as follows:

  • If the type difference qualifies for automatic type evolution: the input type is left as is, the data will be inserted with the new type and the table schema will be updated in ImplicitMetadataOperation (already implemented as part of MERGE support)
  • If the type difference doesn't qualify for automatic type evolution: the current behavior is preserved: a cast is added from the input type to the existing target type.

How was this patch tested?

  • Tests are added to DeltaTypeWideningAutomaticSuite to cover type evolution in INSERT

This PR introduces the following user-facing changes

The table feature is available in testing only, there's no user-facing changes as of now.

When automatic schema evolution is enabled in INSERT and the source schema contains a type that is wider than the target schema:

With type widening disabled: the type in the target schema is not changed. A cast is added to the input to insert to match the expected target type.

With type widening enabled: the type in the target schema is updated to the wider source type.

-- target: key int, value short
-- source: key int, value int
INSERT INTO target SELECT * FROM source

After the INSERT operation, the target schema is key int, value int.

Copy link
Collaborator

@tomvanbussel tomvanbussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is going to be really convenient! I have a question and a small nit, but otherwise it LGTM

case (ArrayType(s: StructType, sNull: Boolean), ArrayType(t: StructType, tNull: Boolean))
if s != t && sNull == tNull =>
addCastsToArrayStructs(tblName, attr, s, t, sNull)
addCastsToArrayStructs(tblName, attr, s, t, sNull, allowTypeWidening)
case (s: AtomicType, t: AtomicType)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about structs that can be evolved using type evolution? Can this case be handled without addCastsToStructs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you could check that the struct only strictly contains widening type changes - no new fields or other type differences as that would require a cast.
In that case we don't currently modify the struct, just deconstruct it and reconstruct it as it was, which would land itself to being optimized away by an optimzier rule (not sure if that happens though).

This isn't going to happen often though, it means a type in that struct will actually be widened during this operation. I'd rather keep the code simpler and less error prone rather than trying to optimize for this case by comparing the input and output structs for new columns or non-widening type changes

@johanl-db johanl-db force-pushed the automatic-type-widening-in-insert branch from 5efbef6 to 3d08e11 Compare March 22, 2024 16:55
Copy link
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh this is amazing.

@tdas tdas merged commit a172276 into delta-io:master Mar 25, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants