Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ public static String description(Object sparkConfigObject) {
".options(clientOpts) // any of the Hudi client opts can be passed in as well\n" +
".option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY(), \"_row_key\")\n" +
".option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY(), \"partition\")\n" +
".option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY(), \"timestamp\")\n" +
".option(HoodieTableConfig.ORDERING_FIELDS(), \"timestamp\")\n" +
".option(HoodieWriteConfig.TABLE_NAME, tableName)\n" +
".mode(SaveMode.Append)\n" +
".save(basePath);\n" +
Expand Down
136 changes: 66 additions & 70 deletions website/docs/basic_configurations.md

Large diffs are not rendered by default.

137 changes: 67 additions & 70 deletions website/docs/configurations.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion website/docs/quick-start-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -1240,7 +1240,7 @@ CREATE TABLE hudi_table (
driver STRING,
fare DOUBLE,
city STRING
) USING HUDI TBLPROPERTIES (preCombineField = 'ts')
) USING HUDI TBLPROPERTIES (orderingFields = 'ts')
PARTITIONED BY (city);
```
</TabItem
Expand Down
10 changes: 5 additions & 5 deletions website/docs/record_merger.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,11 +128,11 @@ For more details on the implementation, see [RFC 101](https://github.com/apache/

The record merge mode and optional record merge strategy ID and custom merge implementation classes can be specified using the below configs.

| Config Name | Default | Description |
|---------------------------------------------------------|---------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| hoodie.write.record.merge.mode | EVENT_TIME_ORDERING (when ordering field is set)<br />COMMIT_TIME_ORDERING (when ordering field is not set) | Determines the logic of merging different records with the same record key. Valid values: (1) `COMMIT_TIME_ORDERING`: use commit time to merge records, i.e., the record from later commit overwrites the earlier record with the same key. (2) `EVENT_TIME_ORDERING`: use event time as the ordering to merge records, i.e., the record with the larger event time overwrites the record with the smaller event time on the same key, regardless of commit time. The event time or preCombine field needs to be specified by the user. This is the default when an ordering field is configured. (3) `CUSTOM`: use custom merging logic specified by the user.<br />`Config Param: RECORD_MERGE_MODE`<br />`Since Version: 1.0.0` |
| hoodie.write.record.merge.strategy.id | N/A (Optional) | ID of record merge strategy. Hudi will pick `HoodieRecordMerger` implementations from `hoodie.write.record.merge.custom.implementation.classes` that have the same merge strategy ID. When using custom merge logic, you need to specify both this config and `hoodie.write.record.merge.custom.implementation.classes`.<br />`Config Param: RECORD_MERGE_STRATEGY_ID`<br />`Since Version: 0.13.0`<br />`Alternative: hoodie.datasource.write.record.merger.strategy` (deprecated) |
| hoodie.write.record.merge.custom.implementation.classes | N/A (Optional) | List of `HoodieRecordMerger` implementations constituting Hudi's merging strategy based on the engine used. Hudi selects the first implementation from this list that matches the following criteria: (1) has the same merge strategy ID as specified in `hoodie.write.record.merge.strategy.id` (if provided), (2) is compatible with the execution engine (e.g., SPARK merger for Spark, FLINK merger for Flink, AVRO for Java/Hive). The order in the list matters - place your preferred implementation first. Engine-specific implementations (SPARK, FLINK) are more efficient as they avoid Avro serialization/deserialization overhead.<br />`Config Param: RECORD_MERGE_IMPL_CLASSES`<br />`Since Version: 0.13.0`<br />`Alternative: hoodie.datasource.write.record.merger.impls` (deprecated) |
| Config Name | Default | Description |
|---------------------------------------------------------|---------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| hoodie.write.record.merge.mode | EVENT_TIME_ORDERING (when ordering field is set)<br />COMMIT_TIME_ORDERING (when ordering field is not set) | Determines the logic of merging different records with the same record key. Valid values: (1) `COMMIT_TIME_ORDERING`: use commit time to merge records, i.e., the record from later commit overwrites the earlier record with the same key. (2) `EVENT_TIME_ORDERING`: use event time as the ordering to merge records, i.e., the record with the larger event time overwrites the record with the smaller event time on the same key, regardless of commit time. The event time or ordering fields need to be specified by the user. This is the default when an ordering field is configured. (3) `CUSTOM`: use custom merging logic specified by the user.<br />`Config Param: RECORD_MERGE_MODE`<br />`Since Version: 1.0.0` |
| hoodie.write.record.merge.strategy.id | N/A (Optional) | ID of record merge strategy. Hudi will pick `HoodieRecordMerger` implementations from `hoodie.write.record.merge.custom.implementation.classes` that have the same merge strategy ID. When using custom merge logic, you need to specify both this config and `hoodie.write.record.merge.custom.implementation.classes`.<br />`Config Param: RECORD_MERGE_STRATEGY_ID`<br />`Since Version: 0.13.0`<br />`Alternative: hoodie.datasource.write.record.merger.strategy` (deprecated) |
| hoodie.write.record.merge.custom.implementation.classes | N/A (Optional) | List of `HoodieRecordMerger` implementations constituting Hudi's merging strategy based on the engine used. Hudi selects the first implementation from this list that matches the following criteria: (1) has the same merge strategy ID as specified in `hoodie.write.record.merge.strategy.id` (if provided), (2) is compatible with the execution engine (e.g., SPARK merger for Spark, FLINK merger for Flink, AVRO for Java/Hive). The order in the list matters - place your preferred implementation first. Engine-specific implementations (SPARK, FLINK) are more efficient as they avoid Avro serialization/deserialization overhead.<br />`Config Param: RECORD_MERGE_IMPL_CLASSES`<br />`Since Version: 0.13.0`<br />`Alternative: hoodie.datasource.write.record.merger.impls` (deprecated) |

## Record Payloads (deprecated)

Expand Down
22 changes: 11 additions & 11 deletions website/docs/sql_ddl.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ should be specified as `PARTITIONED BY (dt, hh)`.

As discussed [here](quick-start-guide.md#keys), tables track each record in the table using a record key. Hudi auto-generated a highly compressed
key for each new record in the examples so far. If you want to use an existing field as the key, you can set the `primaryKey` option.
Typically, this is also accompanied by configuring ordering fields (via `preCombineField` option) to deal with out-of-order data and potential
Typically, this is also accompanied by configuring ordering fields (via `orderingFields` option) to deal with out-of-order data and potential
duplicate records with the same key in the incoming writes.

:::note
Expand All @@ -86,7 +86,7 @@ this materializes a composite key of the two fields, which can be useful for exp
:::

Here is an example of creating a table using both options. Typically, a field that denotes the time of the event or
fact, e.g., order creation time, event generation time etc., is used as the ordering field (via `preCombineField`). Hudi resolves multiple versions
fact, e.g., order creation time, event generation time etc., is used as the ordering field (via `orderingFields`). Hudi resolves multiple versions
of the same record by ordering based on this field when queries are run on the table.

```sql
Expand All @@ -99,7 +99,7 @@ CREATE TABLE IF NOT EXISTS hudi_table_keyed (
TBLPROPERTIES (
type = 'cow',
primaryKey = 'id',
preCombineField = 'ts'
orderingFields = 'ts'
);
```

Expand All @@ -118,13 +118,13 @@ CREATE TABLE IF NOT EXISTS hudi_table_merge_mode (
TBLPROPERTIES (
type = 'mor',
primaryKey = 'id',
precombineField = 'ts',
orderingFields = 'ts',
recordMergeMode = 'EVENT_TIME_ORDERING'
)
LOCATION 'file:///tmp/hudi_table_merge_mode/';
```

With `EVENT_TIME_ORDERING`, the record with the larger event time (specified via `precombineField` ordering field) overwrites the record with the
With `EVENT_TIME_ORDERING`, the record with the larger event time (specified via `orderingFields`) overwrites the record with the
smaller event time on the same key, regardless of transaction's commit time. Users can set `CUSTOM` mode to provide their own
merge logic. With `CUSTOM` merge mode, you can provide a custom class that implements the merge logic. The interfaces
to implement is explained in detail [here](record_merger.md#custom).
Expand All @@ -139,7 +139,7 @@ CREATE TABLE IF NOT EXISTS hudi_table_merge_mode_custom (
TBLPROPERTIES (
type = 'mor',
primaryKey = 'id',
precombineField = 'ts',
orderingFields = 'ts',
recordMergeMode = 'CUSTOM',
'hoodie.record.merge.strategy.id' = '<unique-uuid>'
)
Expand Down Expand Up @@ -177,7 +177,7 @@ CREATE TABLE hudi_table_ctas
USING hudi
TBLPROPERTIES (
type = 'cow',
preCombineField = 'ts'
orderingFields = 'ts'
)
PARTITIONED BY (dt)
AS SELECT * FROM parquet_table;
Expand All @@ -196,7 +196,7 @@ CREATE TABLE hudi_table_ctas
USING hudi
TBLPROPERTIES (
type = 'cow',
preCombineField = 'ts'
orderingFields = 'ts'
)
AS SELECT * FROM parquet_table;
```
Expand Down Expand Up @@ -579,10 +579,10 @@ Users can set table properties while creating a table. The important table prope
|------------------|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| type | cow | The table type to create. `type = 'cow'` creates a COPY-ON-WRITE table, while `type = 'mor'` creates a MERGE-ON-READ table. Same as `hoodie.datasource.write.table.type`. More details can be found [here](table_types.md) |
| primaryKey | uuid | The primary key field names of the table separated by commas. Same as `hoodie.datasource.write.recordkey.field`. If this config is ignored, hudi will auto-generate primary keys. If explicitly set, primary key generation will honor user configuration. |
| preCombineField | | The ordering field(s) of the table. It is used for resolving the final version of the record among multiple versions. Generally, `event time` or another similar column will be used for ordering purposes. Hudi will be able to handle out-of-order data using the ordering field value. |
| orderingFields | | The ordering field(s) of the table. It is used for resolving the final version of the record among multiple versions. Generally, `event time` or another similar column will be used for ordering purposes. Hudi will be able to handle out-of-order data using the ordering field value. |

:::note
`primaryKey`, `preCombineField`, and `type` and other properties are case-sensitive.
`primaryKey`, `orderingFields`, and `type` and other properties are case-sensitive.
:::

#### Passing Lock Providers for Concurrent Writers
Expand Down Expand Up @@ -833,7 +833,7 @@ WITH (
'connector' = 'hudi',
'path' = 'file:///tmp/hudi_table',
'table.type' = 'MERGE_ON_READ',
'precombine.field' = 'ts'
'ordering.fields' = 'ts'
);
```

Expand Down
10 changes: 5 additions & 5 deletions website/docs/sql_dml.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ INSERT INTO hudi_cow_pt_tbl PARTITION(dt, hh) SELECT 1 AS id, 'a1' AS name, 1000
:::note Mapping to write operations
Hudi offers flexibility in choosing the underlying [write operation](write_operations.md) of a `INSERT INTO` statement using
the `hoodie.spark.sql.insert.into.operation` configuration. Possible options include *"bulk_insert"* (large inserts), *"insert"* (with small file management),
and *"upsert"* (with deduplication/merging). If ordering fields are not set, *"insert"* is chosen as the default. For a table with ordering fields set (via `preCombineField`),
and *"upsert"* (with deduplication/merging). If ordering fields are not set, *"insert"* is chosen as the default. For a table with ordering fields set (via `orderingFields`),
*"upsert"* is chosen as the default operation.
:::

Expand Down Expand Up @@ -101,7 +101,7 @@ update hudi_cow_pt_tbl set ts = 1001 where name = 'a1';
```

:::info
The `UPDATE` operation requires the specification of ordering fields (via `preCombineField`).
The `UPDATE` operation requires the specification of ordering fields (via `orderingFields`).
:::

### Merge Into
Expand Down Expand Up @@ -138,7 +138,7 @@ For a Hudi table with user configured primary keys, the join condition and the `

For a table where Hudi auto generates primary keys, the join condition in `MERGE INTO` can be on any arbitrary data columns.

if the `hoodie.record.merge.mode` is set to `EVENT_TIME_ORDERING`, ordering fields (via `preCombineField`) are required to be set with value in the `UPDATE`/`INSERT` clause.
if the `hoodie.record.merge.mode` is set to `EVENT_TIME_ORDERING`, ordering fields (via `orderingFields`) are required to be set with value in the `UPDATE`/`INSERT` clause.

It is enforced that if the target table has primary key and partition key column, the source table counterparts must enforce the same data type accordingly. Plus, if the target table is configured with `hoodie.record.merge.mode` = `EVENT_TIME_ORDERING` where target table is expected to have valid ordering fields configuration, the source table counterpart must also have the same data type.
:::
Expand All @@ -148,7 +148,7 @@ Examples below
```sql
-- source table using hudi for testing merging into non-partitioned table
create table merge_source (id int, name string, price double, ts bigint) using hudi
tblproperties (primaryKey = 'id', preCombineField = 'ts');
tblproperties (primaryKey = 'id', orderingFields = 'ts');
insert into merge_source values (1, "old_a1", 22.22, 900), (2, "new_a2", 33.33, 2000), (3, "new_a3", 44.44, 2000);

merge into hudi_mor_tbl as target
Expand Down Expand Up @@ -199,7 +199,7 @@ CREATE TABLE tableName (
TBLPROPERTIES (
type = 'mor',
primaryKey = 'id',
preCombineField = '_ts'
orderingFields = '_ts'
)
LOCATION '/location/to/basePath';

Expand Down
6 changes: 3 additions & 3 deletions website/docs/sql_queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ CREATE TABLE IF NOT EXISTS hudi_table_merge_mode (
TBLPROPERTIES (
type = 'mor',
primaryKey = 'id',
precombineField = 'ts',
orderingFields = 'ts',
recordMergeMode = 'EVENT_TIME_ORDERING'
)
LOCATION 'file:///tmp/hudi_table_merge_mode/';
Expand All @@ -225,7 +225,7 @@ INSERT INTO hudi_table_merge_mode VALUES (1, 'a1', 900, 20.0);
SELECT id, name, ts, price FROM hudi_table_merge_mode;
```

With `EVENT_TIME_ORDERING`, the record with the larger event time (specified via `precombineField` ordering field) overwrites the record with the
With `EVENT_TIME_ORDERING`, the record with the larger event time (specified via `orderingFields`) overwrites the record with the
smaller event time on the same key, regardless of transaction time.

### Snapshot Query with Custom Merge Mode
Expand All @@ -244,7 +244,7 @@ CREATE TABLE IF NOT EXISTS hudi_table_merge_mode_custom (
TBLPROPERTIES (
type = 'mor',
primaryKey = 'id',
precombineField = 'ts',
orderingFields = 'ts',
recordMergeMode = 'CUSTOM',
'hoodie.datasource.write.payload.class' = 'org.apache.hudi.common.model.PartialUpdateAvroPayload'
)
Expand Down
Loading