feat(schema): Add VARIANT support to HoodieSchema #17751

voonhous · 2025-12-30T10:39:57Z

Describe the issue this Pull Request addresses

Introduce the new VARIANT schema type. This will look similar to a record/struct with a logical type associated with it. The implementation is inline with parquet spec.

Linked task: #17745

Note: No reader and writer support has been added yet. It will be added in a separate PR.

Summary and Changelog

Add VARIANT support to HoodieSchema
Add VARIANT type to HoodieSchemaType enum.

Impact

Support for Variant types in accordance to parquet's spec.

Risk Level

Low

Documentation Update

None

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

the-other-tim-brown · 2025-12-30T15:23:21Z

hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchemaType.java

        return DATE;
      } else if (logicalType == LogicalTypes.uuid()) {
        return UUID;
+      } else if (logicalType instanceof VariantLogicalType) {


Can we follow a similar pattern as the uuid where there is a singleton we can compare to instead of instanceof?

Make sense, limit the number of object creation for types. Will eagerly initialize a singleton.

the-other-tim-brown · 2025-12-30T15:24:25Z

hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java

  }

+  /**
+   * Creates an unshredded Variant schema.


General question, can you have shredded and unshredded values in the same dataset? if so, it seems like the schema should be the same for these

I don't quite understand this question. The implementation follows the parquet schema spec.

Nonetheless, will like to still understand what you're pushing towards.
Do you mean if we can have a unshredded_typed_column and shredded_typed_column in the dataset?

Or are you saying that since shredded_variant typed columns can hold unshredded data, we should just maintain the shredded type?

Unshredded

optional group variant_unshredded (VARIANT) { required binary metadata; required binary value; }

Shredded

optional group variant_shredded (VARIANT) { required binary metadata; optional binary value; optional int64 typed_value; }

So, to use shredded schema to represent unshredded, we can just make typed_value null and populate value?

I'm wondering if the column can have both shredded and unshredded in the same file.

hudi-bot · 2025-12-30T15:56:50Z

CI report:

0fff72f Azure: FAILURE

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

cshuo · 2025-12-31T07:36:42Z

hudi-common/src/main/java/org/apache/hudi/common/schema/HoodieSchema.java

+   * @param typedValueSchema the schema for the typed_value field (can be null if typed_value is not needed)
+   * @return a new HoodieSchema.Variant representing a shredded variant
+   */
+  public static HoodieSchema.Variant createVariantShredded(HoodieSchema typedValueSchema) {


Do we need include shredding information in type/schema layer? IIUC, it's more about read/write optimization mechanism, which can be inferred or fetched from configuration during reading or writing.

feat(schema): Add variant support to HoodieSchema

93ad5ca

voonhous requested review from bvaradar, rahil-c and the-other-tim-brown and removed request for the-other-tim-brown December 30, 2025 10:40

github-actions bot added the size:L PR with lines of changes in (300, 1000] label Dec 30, 2025

voonhous changed the title ~~feat(schema): Add variant support to HoodieSchema~~ feat(schema): Add VARIANT support to HoodieSchema Dec 30, 2025

the-other-tim-brown reviewed Dec 30, 2025

View reviewed changes

Make VariantLogicalType singleton

0fff72f

cshuo reviewed Dec 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(schema): Add VARIANT support to HoodieSchema #17751

feat(schema): Add VARIANT support to HoodieSchema #17751

voonhous commented Dec 30, 2025 •

edited

Loading

Uh oh!

the-other-tim-brown Dec 30, 2025

Uh oh!

voonhous Dec 30, 2025

Uh oh!

the-other-tim-brown Dec 30, 2025

Uh oh!

voonhous Dec 30, 2025 •

edited

Loading

Uh oh!

the-other-tim-brown Dec 30, 2025

Uh oh!

hudi-bot commented Dec 30, 2025

Uh oh!

cshuo Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(schema): Add VARIANT support to HoodieSchema #17751

Are you sure you want to change the base?

feat(schema): Add VARIANT support to HoodieSchema #17751

Conversation

voonhous commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

the-other-tim-brown Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

voonhous Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

the-other-tim-brown Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

voonhous Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Unshredded

Shredded

Uh oh!

the-other-tim-brown Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

hudi-bot commented Dec 30, 2025

CI report:

Uh oh!

cshuo Dec 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

voonhous commented Dec 30, 2025 •

edited

Loading

voonhous Dec 30, 2025 •

edited

Loading