Skip to content

Commit

Permalink
[doc](update) modify the example for UPDATE usage (apache#1412)
Browse files Browse the repository at this point in the history
# Versions 

- [x] dev
- [x] 3.0
- [x] 2.1
- [x] 2.0

# Languages

- [x] Chinese
- [x] English

# relate case PR

apache/doris#44854
  • Loading branch information
zhannngchen authored and echo-hhj committed Jan 6, 2025
1 parent bc8f803 commit 7db70e2
Show file tree
Hide file tree
Showing 8 changed files with 592 additions and 184 deletions.
95 changes: 73 additions & 22 deletions docs/data-operate/update/unique-update.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,40 +50,91 @@ The performance of the UPDATE statement depends on the number of rows to be upda

## Example

Suppose there is an order table in Doris, where the order_id is the key column, and the order status and order amount are the Value columns. The data looks as follows:
Assuming in a financial risk control scenario, there is a transaction details table with the following structure:

| order_id | order_amount | order_status |
| -------- | ------------ | ------------ |
| 1 | 100 | Pending |
```sql
CREATE TABLE transaction_details (
transaction_id BIGINT NOT NULL, -- Unique transaction ID
user_id BIGINT NOT NULL, -- User ID
transaction_date DATE NOT NULL, -- Transaction date
transaction_time DATETIME NOT NULL, -- Transaction time
transaction_amount DECIMAL(18, 2), -- Transaction amount
transaction_device STRING, -- Transaction device
transaction_region STRING, -- Transaction region
average_daily_amount DECIMAL(18, 2), -- Average daily transaction amount over the last 3 months
recent_transaction_count INT, -- Number of transactions in the last 7 days
has_dispute_history BOOLEAN, -- Whether there is a dispute history
risk_level STRING -- Risk level
)
UNIQUE KEY(transaction_id)
DISTRIBUTED BY HASH(transaction_id) BUCKETS 16
PROPERTIES (
"replication_num" = "3", -- Number of replicas, default is 3
"enable_unique_key_merge_on_write" = "true" -- Enable MOW mode, support merge update
);
```

The following transaction data exists:

```sql
+----------+--------------+--------------+
| order_id | order_amount | order_status |
+----------+--------------+--------------+
| 1 | 100 | Pending |
+----------+--------------+--------------+
1 row in set (0.01 sec)
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level |
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | NULL |
| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | NULL |
| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | NULL |
| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | NULL |
| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | NULL |
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
```

Now, when a user clicks on the payment, the Doris system needs to update the order status of the order with ID '1' to 'To be shipped'. This requires using the UPDATE functionality.
Update the risk level of all transactions on a daily basis according to the following risk control rules:
1. If there is a dispute history, the risk is high.
2. If in a high-risk region, the risk is high.
3. If the transaction amount is abnormal (more than 5 times the daily average), the risk is high.
4. Frequent transactions in the last 7 days:
a. If the number of transactions > 50, the risk is high.
b. If the number of transactions is between 20 and 50, the risk is medium.
5. Transactions during non-working hours (2 AM to 4 AM), the risk is medium.
6. The default risk is low.

```sql
mysql> UPDATE test_order SET order_status = 'To be shipped' WHERE order_id = 1;
Query OK, 1 row affected (0.11 sec)
{'label':'update_20ae22daf0354fe0-b5aceeaaddc666c5', 'status':'VISIBLE', 'txnId':'33', 'queryId':'20ae22daf0354fe0-b5aceeaaddc666c5'}
UPDATE transaction_details
SET risk_level = CASE
-- Transactions with dispute history or in high-risk regions
WHEN has_dispute_history = TRUE THEN 'high'
WHEN transaction_region IN ('high_risk_region1', 'high_risk_region2') THEN 'high'

-- Abnormal transaction amount
WHEN transaction_amount > 5 * average_daily_amount THEN 'high'

-- High transaction frequency in the last 7 days
WHEN recent_transaction_count > 50 THEN 'high'
WHEN recent_transaction_count BETWEEN 20 AND 50 THEN 'medium'

-- Transactions during non-working hours
WHEN HOUR(transaction_time) BETWEEN 2 AND 4 THEN 'medium'

-- Default risk
ELSE 'low'
END
WHERE transaction_date = '2024-11-24';
```

The updated result is as follows:
The updated data is:

```sql
+----------+--------------+--------------+
| order_id | order_amount | order_status |
+----------+--------------+--------------+
| 1 | 100 | To be shipped|
+----------+--------------+--------------+
1 row in set (0.01 sec)
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level |
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | low |
| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | medium |
| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | medium |
| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | high |
| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | high |
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
```

## More Details

For more detailed syntax on data updates, please refer to the [UPDATE](../../sql-manual/sql-statements/Data-Manipulation-Statements/Manipulation/UPDATE) command manual. You can also enter `HELP UPDATE` in the MySQL client command line for more information and assistance.
For more detailed syntax on data updates, please refer to the [UPDATE](../../sql-manual/sql-statements/Data-Manipulation-Statements/Manipulation/UPDATE) command manual. You can also enter `HELP UPDATE` in the MySQL client command line for more information and assistance.
Original file line number Diff line number Diff line change
Expand Up @@ -42,46 +42,97 @@ Update 语法在 Doris 中是一个同步语法,即 Update 语句执行成功

### 性能

Update 语句的性能和待更新的行数以及 condition 的检索效率密切相关
Update 语句的性能和待更新的行数以及查询条件的检索效率密切相关

- 待更新的行数:待更新的行数越多,Update 语句的速度就会越慢。Update 更新比较合适偶发更新的场景,比如修改个别行的值。Update 并不适合大批量的修改数据。

- condition 的检索效率:Update 实现原理是先将满足 condition 的行做读取处理,所以如果 condition 的检索效率高,则 Update 的速度也会快。condition 列最好能命中索引或者分区分桶裁剪,这样 Doris 就不需要扫全表,可以快速定位到需要更新的行,从而提升更新效率。强烈不推荐 condition 列中包含 value 列。
- 查询条件的检索效率:Update 实现原理是先将满足查询条件的行做读取处理,所以如果查询条件的检索效率高,则 Update 的速度也会快。条件列最好能命中索引或者分区分桶裁剪,这样 Doris 就不需要扫全表,可以快速定位到需要更新的行,从而提升更新效率。强烈不推荐条件列中包含 value 列。

## 使用示例

假设 Doris 中存在一张订单表,其中订单 id 是 Key 列,订单状态,订单金额是 Value 列。数据状态如下
假设在金融风控场景中,存在如下结构的交易明细表

| 订单 id | 订单金额 | 订单状态 |
| ------ | -------- | -------- |
| 1 | 100 | 待付款 |
```sql
CREATE TABLE transaction_details (
transaction_id BIGINT NOT NULL, -- 唯一交易编号
user_id BIGINT NOT NULL, -- 用户编号
transaction_date DATE NOT NULL, -- 交易日期
transaction_time DATETIME NOT NULL, -- 交易时间
transaction_amount DECIMAL(18, 2), -- 交易金额
transaction_device STRING, -- 交易设备
transaction_region STRING, -- 交易地区
average_daily_amount DECIMAL(18, 2), -- 最近3个月日均交易金额
recent_transaction_count INT, -- 最近7天交易次数
has_dispute_history BOOLEAN, -- 是否有拒付记录
risk_level STRING -- 风险等级
)
UNIQUE KEY(transaction_id)
DISTRIBUTED BY HASH(transaction_id) BUCKETS 16
PROPERTIES (
"replication_num" = "3", -- 副本数量,默认3
"enable_unique_key_merge_on_write" = "true" -- 启用MOW模式,支持合并更新
);
```

存在如下交易数据:

```sql
+----------+--------------+--------------+
| order_id | order_amount | order_status |
+----------+--------------+--------------+
| 1 | 100 | 待付款 |
+----------+--------------+--------------+
1 row in set (0.01 sec)
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level |
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | NULL |
| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | NULL |
| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | NULL |
| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | NULL |
| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | NULL |
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
```

这时候,用户点击付款后,Doris 系统需要将订单 id 为 '1' 的订单状态变更为 '待发货',就需要用到 Update 功能。
按照如下风控规则来更新每日所有交易记录的风险等级:
1. 有拒付记录,风险为 high。
2. 在高风险地区,风险为 high。
3. 交易金额异常(超过日均 5 倍),风险为 high。
4. 最近 7 天交易频繁:
a. 交易次数 > 50,风险为 high。
b. 交易次数在 20-50 之间,风险为 medium。
5. 非工作时间交易(凌晨 2 点到 4 点),风险为 medium。
6. 默认风险为 low。

```sql
mysql> UPDATE test_order SET order_status = '待发货' WHERE order_id = 1;
Query OK, 1 row affected (0.11 sec)
{'label':'update_20ae22daf0354fe0-b5aceeaaddc666c5', 'status':'VISIBLE', 'txnId':'33', 'queryId':'20ae22daf0354fe0-b5aceeaaddc666c5'}
UPDATE transaction_details
SET risk_level = CASE
-- 有拒付记录或在高风险地区的交易
WHEN has_dispute_history = TRUE THEN 'high'
WHEN transaction_region IN ('high_risk_region1', 'high_risk_region2') THEN 'high'

-- 突然异常交易金额
WHEN transaction_amount > 5 * average_daily_amount THEN 'high'

-- 最近7天交易频率很高
WHEN recent_transaction_count > 50 THEN 'high'
WHEN recent_transaction_count BETWEEN 20 AND 50 THEN 'medium'

-- 非工作时间交易
WHEN HOUR(transaction_time) BETWEEN 2 AND 4 THEN 'medium'

-- 默认风险
ELSE 'low'
END
WHERE transaction_date = '2024-11-24';
```

更新后结果如下
更新之后的数据为

```sql
+----------+--------------+--------------+
| order_id | order_amount | order_status |
+----------+--------------+--------------+
| 1 | 100 | 待发货 |
+----------+--------------+--------------+
1 row in set (0.01 sec)
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
| transaction_id | user_id | transaction_date | transaction_time | transaction_amount | transaction_device | transaction_region | average_daily_amount | recent_transaction_count | has_dispute_history | risk_level |
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
| 1001 | 5001 | 2024-11-24 | 2024-11-24 14:30:00 | 100.00 | iPhone 12 | New York | 100.00 | 10 | 0 | low |
| 1002 | 5002 | 2024-11-24 | 2024-11-24 03:30:00 | 120.00 | iPhone 12 | New York | 100.00 | 15 | 0 | medium |
| 1003 | 5003 | 2024-11-24 | 2024-11-24 10:00:00 | 150.00 | Samsung S21 | Los Angeles | 100.00 | 30 | 0 | medium |
| 1004 | 5004 | 2024-11-24 | 2024-11-24 16:00:00 | 300.00 | MacBook Pro | high_risk_region1 | 200.00 | 5 | 0 | high |
| 1005 | 5005 | 2024-11-24 | 2024-11-24 11:00:00 | 1100.00 | iPad Pro | Chicago | 200.00 | 10 | 0 | high |
+----------------+---------+------------------+---------------------+--------------------+--------------------+--------------------+----------------------+--------------------------+---------------------+------------+
```

## 更多帮助
Expand Down
Loading

0 comments on commit 7db70e2

Please sign in to comment.