From 7db70e2c18230a5569abfbed191d24c6861bd066 Mon Sep 17 00:00:00 2001 From: zhannngchen Date: Tue, 3 Dec 2024 09:20:30 +0800 Subject: [PATCH] [doc](update) modify the example for UPDATE usage (#1412) # Versions - [x] dev - [x] 3.0 - [x] 2.1 - [x] 2.0 # Languages - [x] Chinese - [x] English # relate case PR https://github.com/apache/doris/pull/44854 --- docs/data-operate/update/unique-update.md | 95 ++++++++++++---- .../data-operate/update/unique-update.md | 97 +++++++++++++---- .../data-operate/update/unique-update.md | 101 ++++++++++++----- .../data-operate/update/unique-update.md | 103 +++++++++++++----- .../data-operate/update/unique-update.md | 101 ++++++++++++----- .../data-operate/update/unique-update.md | 93 ++++++++++++---- .../data-operate/update/unique-update.md | 93 ++++++++++++---- .../data-operate/update/unique-update.md | 93 ++++++++++++---- 8 files changed, 592 insertions(+), 184 deletions(-) diff --git a/docs/data-operate/update/unique-update.md b/docs/data-operate/update/unique-update.md index 65fa1371356f9..aae397dfd4393 100644 --- a/docs/data-operate/update/unique-update.md +++ b/docs/data-operate/update/unique-update.md @@ -50,40 +50,91 @@ The performance of the UPDATE statement depends on the number of rows to be upda ## Example -Suppose there is an order table in Doris, where the order_id is the key column, and the order status and order amount are the Value columns. The data looks as follows: +Assuming in a financial risk control scenario, there is a transaction details table with the following structure: -| order_id | order_amount | order_status | -| -------- | ------------ | ------------ | -| 1 | 100 | Pending | +```sql +CREATE TABLE transaction_details ( + transaction_id BIGINT NOT NULL, -- Unique transaction ID + user_id BIGINT NOT NULL, -- User ID + transaction_date DATE NOT NULL, -- Transaction date + transaction_time DATETIME NOT NULL, -- Transaction time + transaction_amount DECIMAL(18, 2), -- Transaction amount + transaction_device STRING, -- Transaction device + transaction_region STRING, -- Transaction region + average_daily_amount DECIMAL(18, 2), -- Average daily transaction amount over the last 3 months + recent_transaction_count INT, -- Number of transactions in the last 7 days + has_dispute_history BOOLEAN, -- Whether there is a dispute history + risk_level STRING -- Risk level +) +UNIQUE KEY(transaction_id) +DISTRIBUTED BY HASH(transaction_id) BUCKETS 16 +PROPERTIES ( + "replication_num" = "3", -- Number of replicas, default is 3 + "enable_unique_key_merge_on_write" = "true" -- Enable MOW mode, support merge update +); +``` + +The following transaction data exists: ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | Pending | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | NULL | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | NULL | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | NULL | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | NULL | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | NULL | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` -Now, when a user clicks on the payment, the Doris system needs to update the order status of the order with ID '1' to 'To be shipped'. This requires using the UPDATE functionality. +Update the risk level of all transactions on a daily basis according to the following risk control rules: +1. If there is a dispute history, the risk is high. +2. If in a high-risk region, the risk is high. +3. If the transaction amount is abnormal (more than 5 times the daily average), the risk is high. +4. Frequent transactions in the last 7 days: + a. If the number of transactions > 50, the risk is high. + b. If the number of transactions is between 20 and 50, the risk is medium. +5. Transactions during non-working hours (2 AM to 4 AM), the risk is medium. +6. The default risk is low. ```sql -mysql> UPDATE test_order SET order_status = 'To be shipped' WHERE order_id = 1; -Query OK, 1 row affected (0.11 sec) -{'label':'update_20ae22daf0354fe0-b5aceeaaddc666c5', 'status':'VISIBLE', 'txnId':'33', 'queryId':'20ae22daf0354fe0-b5aceeaaddc666c5'} +UPDATE transaction_details +SET risk_level = CASE + -- Transactions with dispute history or in high-risk regions + WHEN has_dispute_history = TRUE THEN 'high' + WHEN transaction_region IN ('high_risk_region1', 'high_risk_region2') THEN 'high' + + -- Abnormal transaction amount + WHEN transaction_amount > 5 * average_daily_amount THEN 'high' + + -- High transaction frequency in the last 7 days + WHEN recent_transaction_count > 50 THEN 'high' + WHEN recent_transaction_count BETWEEN 20 AND 50 THEN 'medium' + + -- Transactions during non-working hours + WHEN HOUR(transaction_time) BETWEEN 2 AND 4 THEN 'medium' + + -- Default risk + ELSE 'low' +END +WHERE transaction_date = '2024-11-24'; ``` -The updated result is as follows: +The updated data is: ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | To be shipped| -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | low | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | medium | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | medium | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | high | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | high | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` ## More Details -For more detailed syntax on data updates, please refer to the [UPDATE](../../sql-manual/sql-statements/Data-Manipulation-Statements/Manipulation/UPDATE) command manual. You can also enter `HELP UPDATE` in the MySQL client command line for more information and assistance. \ No newline at end of file +For more detailed syntax on data updates, please refer to the [UPDATE](../../sql-manual/sql-statements/Data-Manipulation-Statements/Manipulation/UPDATE) command manual. You can also enter `HELP UPDATE` in the MySQL client command line for more information and assistance. diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/unique-update.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/unique-update.md index 276607527fbb2..2cfd390f59bfc 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/unique-update.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/data-operate/update/unique-update.md @@ -42,46 +42,97 @@ Update 语法在 Doris 中是一个同步语法,即 Update 语句执行成功 ### 性能 -Update 语句的性能和待更新的行数以及 condition 的检索效率密切相关。 +Update 语句的性能和待更新的行数以及查询条件的检索效率密切相关。 - 待更新的行数:待更新的行数越多,Update 语句的速度就会越慢。Update 更新比较合适偶发更新的场景,比如修改个别行的值。Update 并不适合大批量的修改数据。 -- condition 的检索效率:Update 实现原理是先将满足 condition 的行做读取处理,所以如果 condition 的检索效率高,则 Update 的速度也会快。condition 列最好能命中索引或者分区分桶裁剪,这样 Doris 就不需要扫全表,可以快速定位到需要更新的行,从而提升更新效率。强烈不推荐 condition 列中包含 value 列。 +- 查询条件的检索效率:Update 实现原理是先将满足查询条件的行做读取处理,所以如果查询条件的检索效率高,则 Update 的速度也会快。条件列最好能命中索引或者分区分桶裁剪,这样 Doris 就不需要扫全表,可以快速定位到需要更新的行,从而提升更新效率。强烈不推荐条件列中包含 value 列。 ## 使用示例 -假设 Doris 中存在一张订单表,其中订单 id 是 Key 列,订单状态,订单金额是 Value 列。数据状态如下: +假设在金融风控场景中,存在如下结构的交易明细表: -| 订单 id | 订单金额 | 订单状态 | -| ------ | -------- | -------- | -| 1 | 100 | 待付款 | +```sql +CREATE TABLE transaction_details ( + transaction_id BIGINT NOT NULL, -- 唯一交易编号 + user_id BIGINT NOT NULL, -- 用户编号 + transaction_date DATE NOT NULL, -- 交易日期 + transaction_time DATETIME NOT NULL, -- 交易时间 + transaction_amount DECIMAL(18, 2), -- 交易金额 + transaction_device STRING, -- 交易设备 + transaction_region STRING, -- 交易地区 + average_daily_amount DECIMAL(18, 2), -- 最近3个月日均交易金额 + recent_transaction_count INT, -- 最近7天交易次数 + has_dispute_history BOOLEAN, -- 是否有拒付记录 + risk_level STRING -- 风险等级 +) +UNIQUE KEY(transaction_id) +DISTRIBUTED BY HASH(transaction_id) BUCKETS 16 +PROPERTIES ( + "replication_num" = "3", -- 副本数量,默认3 + "enable_unique_key_merge_on_write" = "true" -- 启用MOW模式,支持合并更新 +); +``` + +存在如下交易数据: ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | 待付款 | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | NULL | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | NULL | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | NULL | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | NULL | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | NULL | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` -这时候,用户点击付款后,Doris 系统需要将订单 id 为 '1' 的订单状态变更为 '待发货',就需要用到 Update 功能。 +按照如下风控规则来更新每日所有交易记录的风险等级: +1. 有拒付记录,风险为 high。 +2. 在高风险地区,风险为 high。 +3. 交易金额异常(超过日均 5 倍),风险为 high。 +4. 最近 7 天交易频繁: + a. 交易次数 > 50,风险为 high。 + b. 交易次数在 20-50 之间,风险为 medium。 +5. 非工作时间交易(凌晨 2 点到 4 点),风险为 medium。 +6. 默认风险为 low。 ```sql -mysql> UPDATE test_order SET order_status = '待发货' WHERE order_id = 1; -Query OK, 1 row affected (0.11 sec) -{'label':'update_20ae22daf0354fe0-b5aceeaaddc666c5', 'status':'VISIBLE', 'txnId':'33', 'queryId':'20ae22daf0354fe0-b5aceeaaddc666c5'} +UPDATE transaction_details +SET risk_level = CASE + -- 有拒付记录或在高风险地区的交易 + WHEN has_dispute_history = TRUE THEN 'high' + WHEN transaction_region IN ('high_risk_region1', 'high_risk_region2') THEN 'high' + + -- 突然异常交易金额 + WHEN transaction_amount > 5 * average_daily_amount THEN 'high' + + -- 最近7天交易频率很高 + WHEN recent_transaction_count > 50 THEN 'high' + WHEN recent_transaction_count BETWEEN 20 AND 50 THEN 'medium' + + -- 非工作时间交易 + WHEN HOUR(transaction_time) BETWEEN 2 AND 4 THEN 'medium' + + -- 默认风险 + ELSE 'low' +END +WHERE transaction_date = '2024-11-24'; ``` -更新后结果如下 +更新之后的数据为 ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | 待发货 | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | low | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | medium | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | medium | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | high | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | high | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` ## 更多帮助 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/update/unique-update.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/update/unique-update.md index 0c83feeb66fa4..990592b048416 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/update/unique-update.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.0/data-operate/update/unique-update.md @@ -30,7 +30,7 @@ under the License. - 对满足某些条件的行,修改其取值 -- 这个适合少量数据,不频繁的更新 +- 适合少量数据,不频繁的更新 ## 基本原理 @@ -42,46 +42,97 @@ Update 语法在 Doris 中是一个同步语法,即 Update 语句执行成功 ### 性能 -Update 语句的性能和待更新的行数以及 condition 的检索效率密切相关。 +Update 语句的性能和待更新的行数以及查询条件的检索效率密切相关。 -- 待更新的行数:待更新的行数越多,Update 语句的速度就会越慢。Update 更新比较合适偶发更新的场景,比如修改个别行的值。Update 并不适合大批量的修改数据。 +- 待更新的行数:待更新的行数越多,Update 语句的速度就会越慢。Update 更新比较适合偶发更新的场景,比如修改个别行的值。Update 并不适合大批量的修改数据。 -- condition 的检索效率:Update 实现原理是先将满足 condition 的行做读取处理,所以如果 condition 的检索效率高,则 Update 的速度也会快。condition 列最好能命中索引或者分区分桶裁剪,这样 Doris 就不需要扫全表,可以快速定位到需要更新的行,从而提升更新效率。强烈不推荐 condition 列中包含 value 列。 +- 查询条件的检索效率:Update 实现原理是先将满足查询条件的行做读取处理,所以如果查询条件的检索效率高,则 Update 的速度也会快。条件列最好能命中索引或者分区分桶裁剪,这样 Doris 就不需要扫全表,可以快速定位到需要更新的行,从而提升更新效率。强烈不推荐条件列中包含 value 列。 ## 使用示例 -假设 Doris 中存在一张订单表,其中订单 id 是 Key 列,订单状态,订单金额是 Value 列。数据状态如下: +假设在金融风控场景中,存在如下结构的交易明细表: -| 订单 id | 订单金额 | 订单状态 | -| ------ | -------- | -------- | -| 1 | 100 | 待付款 | +```sql +CREATE TABLE transaction_details ( + transaction_id BIGINT NOT NULL, -- 唯一交易编号 + user_id BIGINT NOT NULL, -- 用户编号 + transaction_date DATE NOT NULL, -- 交易日期 + transaction_time DATETIME NOT NULL, -- 交易时间 + transaction_amount DECIMAL(18, 2), -- 交易金额 + transaction_device STRING, -- 交易设备 + transaction_region STRING, -- 交易地区 + average_daily_amount DECIMAL(18, 2), -- 最近3个月日均交易金额 + recent_transaction_count INT, -- 最近7天交易次数 + has_dispute_history BOOLEAN, -- 是否有拒付记录 + risk_level STRING -- 风险等级 +) +UNIQUE KEY(transaction_id) +DISTRIBUTED BY HASH(transaction_id) BUCKETS 16 +PROPERTIES ( + "replication_num" = "3", -- 副本数量,默认3 + "enable_unique_key_merge_on_write" = "true" -- 启用MOW模式,支持合并更新 +); +``` + +存在如下交易数据: ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | 待付款 | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | NULL | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | NULL | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | NULL | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | NULL | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | NULL | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` -这时候,用户点击付款后,Doris 系统需要将订单 id 为 '1' 的订单状态变更为 '待发货',就需要用到 Update 功能。 +按照如下风控规则来更新每日所有交易记录的风险等级: +1. 有拒付记录,风险为 high。 +2. 在高风险地区,风险为 high。 +3. 交易金额异常(超过日均 5 倍),风险为 high。 +4. 最近 7 天交易频繁: + a. 交易次数 > 50,风险为 high。 + b. 交易次数在 20-50 之间,风险为 medium。 +5. 非工作时间交易(凌晨 2 点到 4 点),风险为 medium。 +6. 默认风险为 low。 ```sql -mysql> UPDATE test_order SET order_status = '待发货' WHERE order_id = 1; -Query OK, 1 row affected (0.11 sec) -{'label':'update_20ae22daf0354fe0-b5aceeaaddc666c5', 'status':'VISIBLE', 'txnId':'33', 'queryId':'20ae22daf0354fe0-b5aceeaaddc666c5'} +UPDATE transaction_details +SET risk_level = CASE + -- 有拒付记录或在高风险地区的交易 + WHEN has_dispute_history = TRUE THEN 'high' + WHEN transaction_region IN ('high_risk_region1', 'high_risk_region2') THEN 'high' + + -- 突然异常交易金额 + WHEN transaction_amount > 5 * average_daily_amount THEN 'high' + + -- 最近7天交易频率很高 + WHEN recent_transaction_count > 50 THEN 'high' + WHEN recent_transaction_count BETWEEN 20 AND 50 THEN 'medium' + + -- 非工作时间交易 + WHEN HOUR(transaction_time) BETWEEN 2 AND 4 THEN 'medium' + + -- 默认风险 + ELSE 'low' +END +WHERE transaction_date = '2024-11-24'; ``` -更新后结果如下 +更新之后的数据为 ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | 待发货 | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | low | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | medium | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | medium | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | high | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | high | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` ## 更多帮助 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/unique-update.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/unique-update.md index 276607527fbb2..1160e937ce467 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/unique-update.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-2.1/data-operate/update/unique-update.md @@ -30,11 +30,11 @@ under the License. - 对满足某些条件的行,修改其取值 -- 这个适合少量数据,不频繁的更新 +- 适合少量数据,不频繁的更新 ## 基本原理 -利用查询引擎自身的 where 过滤逻辑,从待更新表中筛选出需要被更新的行。再利用 Unique 模型自带的 Value 列新数据替换旧数据的逻辑,将待更新的行变更后,再重新插入到表中,从而实现行级别更新。 +利用查询引擎自身的 where 过滤逻辑,从待更新表中筛选出需要被更新的行。再利用 Unique 模型自带的 Value 列新数据替换旧数据的逻辑,将待更新的行变更后重新插入到表中,从而实现行级别更新。 ### 同步 @@ -42,46 +42,97 @@ Update 语法在 Doris 中是一个同步语法,即 Update 语句执行成功 ### 性能 -Update 语句的性能和待更新的行数以及 condition 的检索效率密切相关。 +Update 语句的性能和待更新的行数以及条件的检索效率密切相关。 -- 待更新的行数:待更新的行数越多,Update 语句的速度就会越慢。Update 更新比较合适偶发更新的场景,比如修改个别行的值。Update 并不适合大批量的修改数据。 +- 待更新的行数:待更新的行数越多,Update 语句的速度就会越慢。Update 更新比较适合偶发更新的场景,比如修改个别行的值。Update 并不适合大批量的修改数据。 -- condition 的检索效率:Update 实现原理是先将满足 condition 的行做读取处理,所以如果 condition 的检索效率高,则 Update 的速度也会快。condition 列最好能命中索引或者分区分桶裁剪,这样 Doris 就不需要扫全表,可以快速定位到需要更新的行,从而提升更新效率。强烈不推荐 condition 列中包含 value 列。 +- 查询条件的检索效率:Update 实现原理是先将满足查询条件的行做读取处理,所以如果查询条件的检索效率高,则 Update 的速度也会快。条件列最好能命中索引或者分区分桶裁剪,这样 Doris 就不需要扫全表,可以快速定位到需要更新的行,从而提升更新效率。强烈不推荐条件列中包含 value 列。 ## 使用示例 -假设 Doris 中存在一张订单表,其中订单 id 是 Key 列,订单状态,订单金额是 Value 列。数据状态如下: +假设在金融风控场景中,存在如下结构的交易明细表: -| 订单 id | 订单金额 | 订单状态 | -| ------ | -------- | -------- | -| 1 | 100 | 待付款 | +```sql +CREATE TABLE transaction_details ( + transaction_id BIGINT NOT NULL, -- 唯一交易编号 + user_id BIGINT NOT NULL, -- 用户编号 + transaction_date DATE NOT NULL, -- 交易日期 + transaction_time DATETIME NOT NULL, -- 交易时间 + transaction_amount DECIMAL(18, 2), -- 交易金额 + transaction_device STRING, -- 交易设备 + transaction_region STRING, -- 交易地区 + average_daily_amount DECIMAL(18, 2), -- 最近3个月日均交易金额 + recent_transaction_count INT, -- 最近7天交易次数 + has_dispute_history BOOLEAN, -- 是否有拒付记录 + risk_level STRING -- 风险等级 +) +UNIQUE KEY(transaction_id) +DISTRIBUTED BY HASH(transaction_id) BUCKETS 16 +PROPERTIES ( + "replication_num" = "3", -- 副本数量,默认3 + "enable_unique_key_merge_on_write" = "true" -- 启用MOW模式,支持合并更新 +); +``` + +存在如下交易数据: ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | 待付款 | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | NULL | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | NULL | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | NULL | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | NULL | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | NULL | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` -这时候,用户点击付款后,Doris 系统需要将订单 id 为 '1' 的订单状态变更为 '待发货',就需要用到 Update 功能。 +按照如下风控规则来更新每日所有交易记录的风险等级: +1. 有拒付记录,风险为 high。 +2. 在高风险地区,风险为 high。 +3. 交易金额异常(超过日均 5 倍),风险为 high。 +4. 最近 7 天交易频繁: + a. 交易次数 > 50,风险为 high。 + b. 交易次数在 20-50 之间,风险为 medium。 +5. 非工作时间交易(凌晨 2 点到 4 点),风险为 medium。 +6. 默认风险为 low。 ```sql -mysql> UPDATE test_order SET order_status = '待发货' WHERE order_id = 1; -Query OK, 1 row affected (0.11 sec) -{'label':'update_20ae22daf0354fe0-b5aceeaaddc666c5', 'status':'VISIBLE', 'txnId':'33', 'queryId':'20ae22daf0354fe0-b5aceeaaddc666c5'} +UPDATE transaction_details +SET risk_level = CASE + -- 有拒付记录或在高风险地区的交易 + WHEN has_dispute_history = TRUE THEN 'high' + WHEN transaction_region IN ('high_risk_region1', 'high_risk_region2') THEN 'high' + + -- 突然异常交易金额 + WHEN transaction_amount > 5 * average_daily_amount THEN 'high' + + -- 最近7天交易频率很高 + WHEN recent_transaction_count > 50 THEN 'high' + WHEN recent_transaction_count BETWEEN 20 AND 50 THEN 'medium' + + -- 非工作时间交易 + WHEN HOUR(transaction_time) BETWEEN 2 AND 4 THEN 'medium' + + -- 默认风险 + ELSE 'low' +END +WHERE transaction_date = '2024-11-24'; ``` -更新后结果如下 +更新之后的数据为 ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | 待发货 | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | low | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | medium | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | medium | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | high | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | high | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` ## 更多帮助 diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/unique-update.md b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/unique-update.md index 276607527fbb2..914e7f7c809e7 100644 --- a/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/unique-update.md +++ b/i18n/zh-CN/docusaurus-plugin-content-docs/version-3.0/data-operate/update/unique-update.md @@ -30,7 +30,7 @@ under the License. - 对满足某些条件的行,修改其取值 -- 这个适合少量数据,不频繁的更新 +- 适合少量数据,不频繁的更新 ## 基本原理 @@ -42,46 +42,97 @@ Update 语法在 Doris 中是一个同步语法,即 Update 语句执行成功 ### 性能 -Update 语句的性能和待更新的行数以及 condition 的检索效率密切相关。 +Update 语句的性能和待更新的行数以及查询条件的检索效率密切相关。 -- 待更新的行数:待更新的行数越多,Update 语句的速度就会越慢。Update 更新比较合适偶发更新的场景,比如修改个别行的值。Update 并不适合大批量的修改数据。 +- 待更新的行数:待更新的行数越多,Update 语句的速度就会越慢。Update 更新比较适合偶发更新的场景,比如修改个别行的值。Update 并不适合大批量的修改数据。 -- condition 的检索效率:Update 实现原理是先将满足 condition 的行做读取处理,所以如果 condition 的检索效率高,则 Update 的速度也会快。condition 列最好能命中索引或者分区分桶裁剪,这样 Doris 就不需要扫全表,可以快速定位到需要更新的行,从而提升更新效率。强烈不推荐 condition 列中包含 value 列。 +- 查询条件的检索效率:Update 实现原理是先将满足查询条件的行做读取处理,所以如果查询条件的检索效率高,则 Update 的速度也会快。条件列最好能命中索引或者分区分桶裁剪,这样 Doris 就不需要扫全表,可以快速定位到需要更新的行,从而提升更新效率。强烈不推荐条件列中包含 value 列。 ## 使用示例 -假设 Doris 中存在一张订单表,其中订单 id 是 Key 列,订单状态,订单金额是 Value 列。数据状态如下: +假设在金融风控场景中,存在如下结构的交易明细表: -| 订单 id | 订单金额 | 订单状态 | -| ------ | -------- | -------- | -| 1 | 100 | 待付款 | +```sql +CREATE TABLE transaction_details ( + transaction_id BIGINT NOT NULL, -- 唯一交易编号 + user_id BIGINT NOT NULL, -- 用户编号 + transaction_date DATE NOT NULL, -- 交易日期 + transaction_time DATETIME NOT NULL, -- 交易时间 + transaction_amount DECIMAL(18, 2), -- 交易金额 + transaction_device STRING, -- 交易设备 + transaction_region STRING, -- 交易地区 + average_daily_amount DECIMAL(18, 2), -- 最近3个月日均交易金额 + recent_transaction_count INT, -- 最近7天交易次数 + has_dispute_history BOOLEAN, -- 是否有拒付记录 + risk_level STRING -- 风险等级 +) +UNIQUE KEY(transaction_id) +DISTRIBUTED BY HASH(transaction_id) BUCKETS 16 +PROPERTIES ( + "replication_num" = "3", -- 副本数量,默认3 + "enable_unique_key_merge_on_write" = "true" -- 启用MOW模式,支持合并更新 +); +``` + +存在如下交易数据: ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | 待付款 | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | NULL | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | NULL | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | NULL | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | NULL | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | NULL | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` -这时候,用户点击付款后,Doris 系统需要将订单 id 为 '1' 的订单状态变更为 '待发货',就需要用到 Update 功能。 +按照如下风控规则来更新每日所有交易记录的风险等级: +1. 有拒付记录,风险为 high。 +2. 在高风险地区,风险为 high。 +3. 交易金额异常(超过日均 5 倍),风险为 high。 +4. 最近 7 天交易频繁: + a. 交易次数 > 50,风险为 high。 + b. 交易次数在 20-50 之间,风险为 medium。 +5. 非工作时间交易(凌晨 2 点到 4 点),风险为 medium。 +6. 默认风险为 low。 ```sql -mysql> UPDATE test_order SET order_status = '待发货' WHERE order_id = 1; -Query OK, 1 row affected (0.11 sec) -{'label':'update_20ae22daf0354fe0-b5aceeaaddc666c5', 'status':'VISIBLE', 'txnId':'33', 'queryId':'20ae22daf0354fe0-b5aceeaaddc666c5'} +UPDATE transaction_details +SET risk_level = CASE + -- 有拒付记录或在高风险地区的交易 + WHEN has_dispute_history = TRUE THEN 'high' + WHEN transaction_region IN ('high_risk_region1', 'high_risk_region2') THEN 'high' + + -- 突然异常交易金额 + WHEN transaction_amount > 5 * average_daily_amount THEN 'high' + + -- 最近7天交易频率很高 + WHEN recent_transaction_count > 50 THEN 'high' + WHEN recent_transaction_count BETWEEN 20 AND 50 THEN 'medium' + + -- 非工作时间交易 + WHEN HOUR(transaction_time) BETWEEN 2 AND 4 THEN 'medium' + + -- 默认风险 + ELSE 'low' +END +WHERE transaction_date = '2024-11-24'; ``` -更新后结果如下 +更新之后的数据为 ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | 待发货 | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | low | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | medium | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | medium | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | high | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | high | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` ## 更多帮助 diff --git a/versioned_docs/version-2.0/data-operate/update/unique-update.md b/versioned_docs/version-2.0/data-operate/update/unique-update.md index 6bdb881003c1d..44554dc248067 100644 --- a/versioned_docs/version-2.0/data-operate/update/unique-update.md +++ b/versioned_docs/version-2.0/data-operate/update/unique-update.md @@ -50,38 +50,89 @@ The performance of the UPDATE statement depends on the number of rows to be upda ## Example -Suppose there is an order table in Doris, where the order_id is the key column, and the order status and order amount are the Value columns. The data looks as follows: +Assuming in a financial risk control scenario, there is a transaction details table with the following structure: -| order_id | order_amount | order_status | -| -------- | ------------ | ------------ | -| 1 | 100 | Pending | +```sql +CREATE TABLE transaction_details ( + transaction_id BIGINT NOT NULL, -- Unique transaction ID + user_id BIGINT NOT NULL, -- User ID + transaction_date DATE NOT NULL, -- Transaction date + transaction_time DATETIME NOT NULL, -- Transaction time + transaction_amount DECIMAL(18, 2), -- Transaction amount + transaction_device STRING, -- Transaction device + transaction_region STRING, -- Transaction region + average_daily_amount DECIMAL(18, 2), -- Average daily transaction amount over the last 3 months + recent_transaction_count INT, -- Number of transactions in the last 7 days + has_dispute_history BOOLEAN, -- Whether there is a dispute history + risk_level STRING -- Risk level +) +UNIQUE KEY(transaction_id) +DISTRIBUTED BY HASH(transaction_id) BUCKETS 16 +PROPERTIES ( + "replication_num" = "3", -- Number of replicas, default is 3 + "enable_unique_key_merge_on_write" = "true" -- Enable MOW mode, support merge update +); +``` + +The following transaction data exists: ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | Pending | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | NULL | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | NULL | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | NULL | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | NULL | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | NULL | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` -Now, when a user clicks on the payment, the Doris system needs to update the order status of the order with ID '1' to 'To be shipped'. This requires using the UPDATE functionality. +Update the risk level of all transactions on a daily basis according to the following risk control rules: +1. If there is a dispute history, the risk is high. +2. If in a high-risk region, the risk is high. +3. If the transaction amount is abnormal (more than 5 times the daily average), the risk is high. +4. Frequent transactions in the last 7 days: + a. If the number of transactions > 50, the risk is high. + b. If the number of transactions is between 20 and 50, the risk is medium. +5. Transactions during non-working hours (2 AM to 4 AM), the risk is medium. +6. The default risk is low. ```sql -mysql> UPDATE test_order SET order_status = 'To be shipped' WHERE order_id = 1; -Query OK, 1 row affected (0.11 sec) -{'label':'update_20ae22daf0354fe0-b5aceeaaddc666c5', 'status':'VISIBLE', 'txnId':'33', 'queryId':'20ae22daf0354fe0-b5aceeaaddc666c5'} +UPDATE transaction_details +SET risk_level = CASE + -- Transactions with dispute history or in high-risk regions + WHEN has_dispute_history = TRUE THEN 'high' + WHEN transaction_region IN ('high_risk_region1', 'high_risk_region2') THEN 'high' + + -- Abnormal transaction amount + WHEN transaction_amount > 5 * average_daily_amount THEN 'high' + + -- High transaction frequency in the last 7 days + WHEN recent_transaction_count > 50 THEN 'high' + WHEN recent_transaction_count BETWEEN 20 AND 50 THEN 'medium' + + -- Transactions during non-working hours + WHEN HOUR(transaction_time) BETWEEN 2 AND 4 THEN 'medium' + + -- Default risk + ELSE 'low' +END +WHERE transaction_date = '2024-11-24'; ``` -The updated result is as follows: +The updated data is: ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | To be shipped| -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | low | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | medium | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | medium | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | high | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | high | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` ## More Details diff --git a/versioned_docs/version-2.1/data-operate/update/unique-update.md b/versioned_docs/version-2.1/data-operate/update/unique-update.md index 65fa1371356f9..871251adcfb67 100644 --- a/versioned_docs/version-2.1/data-operate/update/unique-update.md +++ b/versioned_docs/version-2.1/data-operate/update/unique-update.md @@ -50,38 +50,89 @@ The performance of the UPDATE statement depends on the number of rows to be upda ## Example -Suppose there is an order table in Doris, where the order_id is the key column, and the order status and order amount are the Value columns. The data looks as follows: +Assuming in a financial risk control scenario, there is a transaction details table with the following structure: -| order_id | order_amount | order_status | -| -------- | ------------ | ------------ | -| 1 | 100 | Pending | +```sql +CREATE TABLE transaction_details ( + transaction_id BIGINT NOT NULL, -- Unique transaction ID + user_id BIGINT NOT NULL, -- User ID + transaction_date DATE NOT NULL, -- Transaction date + transaction_time DATETIME NOT NULL, -- Transaction time + transaction_amount DECIMAL(18, 2), -- Transaction amount + transaction_device STRING, -- Transaction device + transaction_region STRING, -- Transaction region + average_daily_amount DECIMAL(18, 2), -- Average daily transaction amount over the last 3 months + recent_transaction_count INT, -- Number of transactions in the last 7 days + has_dispute_history BOOLEAN, -- Whether there is a dispute history + risk_level STRING -- Risk level +) +UNIQUE KEY(transaction_id) +DISTRIBUTED BY HASH(transaction_id) BUCKETS 16 +PROPERTIES ( + "replication_num" = "3", -- Number of replicas, default is 3 + "enable_unique_key_merge_on_write" = "true" -- Enable MOW mode, support merge update +); +``` + +The following transaction data exists: ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | Pending | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | NULL | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | NULL | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | NULL | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | NULL | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | NULL | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` -Now, when a user clicks on the payment, the Doris system needs to update the order status of the order with ID '1' to 'To be shipped'. This requires using the UPDATE functionality. +Update the risk level of all transactions on a daily basis according to the following risk control rules: +1. If there is a dispute history, the risk is high. +2. If in a high-risk region, the risk is high. +3. If the transaction amount is abnormal (more than 5 times the daily average), the risk is high. +4. Frequent transactions in the last 7 days: + a. If the number of transactions > 50, the risk is high. + b. If the number of transactions is between 20 and 50, the risk is medium. +5. Transactions during non-working hours (2 AM to 4 AM), the risk is medium. +6. The default risk is low. ```sql -mysql> UPDATE test_order SET order_status = 'To be shipped' WHERE order_id = 1; -Query OK, 1 row affected (0.11 sec) -{'label':'update_20ae22daf0354fe0-b5aceeaaddc666c5', 'status':'VISIBLE', 'txnId':'33', 'queryId':'20ae22daf0354fe0-b5aceeaaddc666c5'} +UPDATE transaction_details +SET risk_level = CASE + -- Transactions with dispute history or in high-risk regions + WHEN has_dispute_history = TRUE THEN 'high' + WHEN transaction_region IN ('high_risk_region1', 'high_risk_region2') THEN 'high' + + -- Abnormal transaction amount + WHEN transaction_amount > 5 * average_daily_amount THEN 'high' + + -- High transaction frequency in the last 7 days + WHEN recent_transaction_count > 50 THEN 'high' + WHEN recent_transaction_count BETWEEN 20 AND 50 THEN 'medium' + + -- Transactions during non-working hours + WHEN HOUR(transaction_time) BETWEEN 2 AND 4 THEN 'medium' + + -- Default risk + ELSE 'low' +END +WHERE transaction_date = '2024-11-24'; ``` -The updated result is as follows: +The updated data is: ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | To be shipped| -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | low | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | medium | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | medium | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | high | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | high | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` ## More Details diff --git a/versioned_docs/version-3.0/data-operate/update/unique-update.md b/versioned_docs/version-3.0/data-operate/update/unique-update.md index 65fa1371356f9..871251adcfb67 100644 --- a/versioned_docs/version-3.0/data-operate/update/unique-update.md +++ b/versioned_docs/version-3.0/data-operate/update/unique-update.md @@ -50,38 +50,89 @@ The performance of the UPDATE statement depends on the number of rows to be upda ## Example -Suppose there is an order table in Doris, where the order_id is the key column, and the order status and order amount are the Value columns. The data looks as follows: +Assuming in a financial risk control scenario, there is a transaction details table with the following structure: -| order_id | order_amount | order_status | -| -------- | ------------ | ------------ | -| 1 | 100 | Pending | +```sql +CREATE TABLE transaction_details ( + transaction_id BIGINT NOT NULL, -- Unique transaction ID + user_id BIGINT NOT NULL, -- User ID + transaction_date DATE NOT NULL, -- Transaction date + transaction_time DATETIME NOT NULL, -- Transaction time + transaction_amount DECIMAL(18, 2), -- Transaction amount + transaction_device STRING, -- Transaction device + transaction_region STRING, -- Transaction region + average_daily_amount DECIMAL(18, 2), -- Average daily transaction amount over the last 3 months + recent_transaction_count INT, -- Number of transactions in the last 7 days + has_dispute_history BOOLEAN, -- Whether there is a dispute history + risk_level STRING -- Risk level +) +UNIQUE KEY(transaction_id) +DISTRIBUTED BY HASH(transaction_id) BUCKETS 16 +PROPERTIES ( + "replication_num" = "3", -- Number of replicas, default is 3 + "enable_unique_key_merge_on_write" = "true" -- Enable MOW mode, support merge update +); +``` + +The following transaction data exists: ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | Pending | -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | NULL | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | NULL | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | NULL | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | NULL | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | NULL | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` -Now, when a user clicks on the payment, the Doris system needs to update the order status of the order with ID '1' to 'To be shipped'. This requires using the UPDATE functionality. +Update the risk level of all transactions on a daily basis according to the following risk control rules: +1. If there is a dispute history, the risk is high. +2. If in a high-risk region, the risk is high. +3. If the transaction amount is abnormal (more than 5 times the daily average), the risk is high. +4. Frequent transactions in the last 7 days: + a. If the number of transactions > 50, the risk is high. + b. If the number of transactions is between 20 and 50, the risk is medium. +5. Transactions during non-working hours (2 AM to 4 AM), the risk is medium. +6. The default risk is low. ```sql -mysql> UPDATE test_order SET order_status = 'To be shipped' WHERE order_id = 1; -Query OK, 1 row affected (0.11 sec) -{'label':'update_20ae22daf0354fe0-b5aceeaaddc666c5', 'status':'VISIBLE', 'txnId':'33', 'queryId':'20ae22daf0354fe0-b5aceeaaddc666c5'} +UPDATE transaction_details +SET risk_level = CASE + -- Transactions with dispute history or in high-risk regions + WHEN has_dispute_history = TRUE THEN 'high' + WHEN transaction_region IN ('high_risk_region1', 'high_risk_region2') THEN 'high' + + -- Abnormal transaction amount + WHEN transaction_amount > 5 * average_daily_amount THEN 'high' + + -- High transaction frequency in the last 7 days + WHEN recent_transaction_count > 50 THEN 'high' + WHEN recent_transaction_count BETWEEN 20 AND 50 THEN 'medium' + + -- Transactions during non-working hours + WHEN HOUR(transaction_time) BETWEEN 2 AND 4 THEN 'medium' + + -- Default risk + ELSE 'low' +END +WHERE transaction_date = '2024-11-24'; ``` -The updated result is as follows: +The updated data is: ```sql -+----------+--------------+--------------+ -| order_id | order_amount | order_status | -+----------+--------------+--------------+ -| 1 | 100 | To be shipped| -+----------+--------------+--------------+ -1 row in set (0.01 sec) ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ +| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | low | +| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | medium | +| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | medium | +| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | high | +| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | high | ++----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+ ``` ## More Details