description |
---|
Configure Delta tables to be read as Iceberg/Hudi tables using UniForm. |
Delta Universal Format (UniForm) allows you to read Delta tables with Iceberg and Hudi clients.
UniForm takes advantage of the fact that , Iceberg, and Hudi all consist of Parquet data files and a metadata layer. UniForm automatically generates Iceberg metadata asynchronously, allowing Iceberg clients to read Delta tables as if they were Iceberg or Hudi tables. You can expect negligible Delta write overhead when UniForm is enabled, as the metadata conversion and transaction occurs asynchronously after the Delta commit.
A single copy of the data files provides access to clients of all formats.
To enable UniForm, you must fulfill the following requirements:
- The table must have column mapping enabled. See _.
- The Delta table must have a
minReaderVersion
>= 2 andminWriterVersion
>= 7. - Writes to the table must use 3.1 or above.
- Hive Metastore (HMS) must be configured as the catalog. See the HMS documentation for how to configure to use Hive Metastore.
- Writes to the table must use 3.2 or above.
.. important::
Enabling Delta UniForm Iceberg requires the Delta table feature IcebergCompatV2
, a write protocol feature. Only clients that support this table feature can write to enabled tables. You must use 3.1 or above to write to Delta tables with this feature enabled.
Enabling Delta UniForm Iceberg requires "delta-iceberg" to be provided to Spark shell: --packages io.delta:io.delta:delta-iceberg_2.12:
Enabling Delta UniForm Hudi requires "delta-hudi" to be provided to Spark shell: --packages io.delta:io.delta:delta-hudi_2.12:
The following table properties enable UniForm support for Iceberg.
'delta.enableIcebergCompatV2' = 'true'
'delta.universalFormat.enabledFormats' = 'iceberg'
The following table properties enable UniForm support for Hudi.
'delta.universalFormat.enabledFormats' = 'hudi'
The following table properties enable UniForm support for both.
'delta.enableIcebergCompatV2' = 'true'
'delta.universalFormat.enabledFormats' = 'iceberg,hudi'
You must also enable column mapping to use UniForm. It is set automatically during table creation, as in the following example:
CREATE TABLE T(c1 INT) USING DELTA TBLPROPERTIES(
'delta.enableIcebergCompatV2' = 'true',
'delta.universalFormat.enabledFormats' = 'iceberg');
In Delta 3.3 and above, you can enable or upgrade UniForm Iceberg on an existing table using the following syntax:
ALTER TABLE table_name SET TBLPROPERTIES(
'delta.enableIcebergCompatV2' = 'true',
'delta.universalFormat.enabledFormats' = 'iceberg');
You can also use REORG to enable UniForm Iceberg and rewrite underlying data files, as in the following example:
REORG TABLE table_name APPLY (UPGRADE UNIFORM(ICEBERG_COMPAT_VERSION=2));
Use REORG if any of following are true:
- Your table has deletion vectors enabled.
- You previously enabled the IcebergCompatV1 version of UniForm Iceberg.
- You need to read from Iceberg engines that don't support Hive-style Parquet files, such as Athena or Redshift.
You can enable UniForm Hudi on an existing table using the following syntax:
ALTER TABLE table_name SET TBLPROPERTIES ('delta.universalFormat.enabledFormats' = 'hudi');
.. note:: This syntax requires _ to be enabled on the table prior to running on Delta 3.1. This syntax also works to upgrade from the IcbergCompatV1. It may rewrite existing files to make those Iceberg compatible, and it automatically disables and purges Deletion Vectors from the table.
.. important:: When you first enable UniForm, asynchronous metadata generation begins. This task must complete before external clients can query the table using Iceberg or Hudi. See _.
.. warning:: You can turn off UniForm by unsetting the delta.universalFormat.enabledFormats
table property. You cannot turn off column mapping once enabled, and upgrades to reader and writer protocol versions cannot be undone.
See _.
triggers Iceberg/Hudi metadata generation asynchronously after a write transaction completes using the same compute that completed the Delta transaction.
Iceberg/Hudi can have significantly higher write latencies than . Delta tables with frequent commits might bundle multiple Delta commits into a single Iceberg/Hudi commit.
ensures that only one metadata generation process per format is in progress at any time in a single cluster. Commits that would trigger a second concurrent metadata generation process successfully commit to Delta, but do not trigger asynchronous metadata generation. This prevents cascading latency for metadata generation for workloads with frequent commits (seconds to minutes between commits).
UniForm adds the following properties to Iceberg/Hudi table metadata to track metadata generation status:
Table property | Description |
---|---|
converted_delta_version |
The latest version of the Delta table for which metadata was successfully generated. |
converted_delta_timestamp |
The timestamp of the latest Delta commit for which metadata was successfully generated. |
See documentation for your Iceberg/Hudi reader client for how to review table properties outside . For , you can see these properties using the following syntax:
SHOW TBLPROPERTIES <table-name>;
You are able to read UniForm tables as Iceberg tables in with the following steps:
- Start with Iceberg, and connect to the Hive Metastore used by UniForm. Please refer to the Iceberg documentation for how to run Iceberg with and connect to a Hive Metastore.
- Use the
SHOW TABLES
command to see a list of available Iceberg tables in the catalog. - Read an Iceberg table using standard SQL such as
SELECT
.
Some Iceberg clients allow you to register external Iceberg tables by providing a path to versioned metadata files. Each time UniForm converts a new version of the Delta table to Iceberg, it creates a new metadata JSON file.
Clients that use metadata JSON paths for configuring Iceberg include BigQuery. Refer to documentation for the Iceberg reader client for configuration details.
stores Iceberg metadata under the table directory, using the following pattern:
<table-path>/metadata/v<version-number>-uuid.metadata.json
You are able to read UniForm tables as Hudi tables in with the following steps:
- See Hudi documentation for how to run Hudi on
spark.read.format("hudi").option("hoodie.metadata.enable", "true").load("PATH_TO_UNIFORM_TABLE_DIRECTORY")
All , Iceberg and Hudi allow time travel queries using table versions or timestamps stored in table metadata.
Delta and Iceberg table versions do not align by either the commit timestamp or the version ID. However, Delta and Hudi commit timestamp align, but version ID does not. If you wish to verify which version of a Delta table a given version of an Iceberg/Hudi table corresponds to, you can use the corresponding table properties set on the Iceberg/Hudi table. See _.
.. warning:: UniForm is read-only from an Iceberg and Hudi perspective. This, however, cannot be enforced as for Iceberg, UniForm uses HMS as an Iceberg catalog and for Hudi, metadata is stored on the file system. If any external writer (not ) writes to this Iceberg/Hudi table, this may destroy your Delta table and cause data loss, as the Iceberg/Hudi writer may perform data cleanup or garbage collection that Delta is unaware of.
The following limitations exist:
- UniForm does not work on tables with deletion vectors enabled. See _.
- Delta tables with UniForm enabled do not support
VOID
type. - Iceberg/Hudi clients can only read from UniForm. Writes are not supported.
- Iceberg/Hudi reader clients might have individual limitations, regardless of UniForm. See documentation for your target client.
The following features work for Delta clients when UniForm is enabled, but do not have support in Iceberg:
- Change Data Feed
- Delta Sharing
.. replace:: Delta Lake .. replace:: Apache Spark