From 56452b14ade16180ea53cb1114bb855910bee04a Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Wed, 6 Jul 2022 16:27:39 +0800 Subject: [PATCH 01/29] add foreign key design doc Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 400 ++++++++++++++++++++++++++ 1 file changed, 400 insertions(+) create mode 100644 docs/design/2022-06-22-foreign-key.md diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md new file mode 100644 index 0000000000000..2d849457daeb5 --- /dev/null +++ b/docs/design/2022-06-22-foreign-key.md @@ -0,0 +1,400 @@ +# Foreign Key Design Doc + +- Author(s): [crazycs520](https://github.com/crazycs520) +- Tracking Issue: https://github.com/pingcap/tidb/issues/18209 + +## DDL Technical Design + +### Table Information Changes +Table's foreign key information will be stored in `model.TableInfo`: + +```go +// TableInfo provides meta data describing a DB table. +type TableInfo struct { + ... + ForeignKeys []*FKInfo `json:"fk_info"` + CitedForeignKeys []*CitedFKInfo `json:"cited_fk_info"` + ... +} + +// FKInfo provides meta data describing a foreign key constraint. +type FKInfo struct { + ID int64 `json:"id"` + Name CIStr `json:"fk_name"` + RefSchema CIStr `json:"ref_schema"` + RefTable CIStr `json:"ref_table"` + RefCols []CIStr `json:"ref_cols"` + Cols []CIStr `json:"cols"` + OnDelete int `json:"on_delete"` + OnUpdate int `json:"on_update"` + State SchemaState `json:"state"` +} + +// CitedFKInfo provides the cited foreign key in the child table. +type CitedFKInfo struct { + Cols []CIStr `json:"cols"` + ChildSchema CIStr `json:"child_schema"` + ChildTable CIStr `json:"child_table"` + ChildFKIndex CIStr `json:"child_fk_index"` +} +``` + +Struct `FKInfo` uses for child table to record the referenced parent table. +Struct `CitedFKInfo` uses for parent table to record the child table which referenced me. + +### Create Table with Foreign Key + +Create a table with foreign key, check following condition when build DDL job and DDL owner received DDL job(aka Double-Check): +- whether the user have `REFERENCES` privilege to the foreign key references table. +- Corresponding columns in the foreign key and the referenced key must have similar data types. The size and sign of fixed precision types such as INTEGER and DECIMAL must be the same. The length of string types need not be the same. For nonbinary (character) string columns, the character set and collation must be the same. +- Supports foreign key references between one column and another within a table. (A column cannot have a foreign key reference to itself.) +- Require indexes on foreign keys and referenced keys so that foreign key checks can be fast and not require a table scan. The size and sign of fixed precision types such as INTEGER and DECIMAL must be the same. The length of string types need not be the same. For nonbinary (character) string columns, the character set and collation must be the same. +- Index prefixes on foreign key columns are not supported. Consequently, BLOB and TEXT columns cannot be included in a foreign key because indexes on those columns must always include a prefix length. +- Does not currently support foreign keys for tables with user-defined partitioning. This includes both parent and child tables. +- A foreign key constraint cannot reference a virtual generated column, but stored generated column is ok. + +DDL owner handle create table job step is: + +- Option-1: +``` +1. None -> Public. +- create a new table. +- update parent table info. +- **notify multi-table's schema change in 1 schema version**. +``` +This will also need to to related change when TiDB to do `loadInfoSchema`. + +- Option-2: +``` +1. None -> Write Only(whatever): Update Parent Table info +2. Write Only -> Done: Create Table +``` + +### Alter Table Add Foreign Key. + +Not supported at this time. + +### Drop Table + +If `foreign_key_checks` is `ON`, then drop the table which has foreign key references will be rejected. + +```sql +> drop table t1; +(3730, "Cannot drop table 't1' referenced by a foreign key constraint 't2_ibfk_1' on table 't2'.") +``` + +### Drop Index + +Drop index which used by foreign key will be rejected. + +```sql +> set @@foreign_key_checks=0; +Query OK, 0 rows affected +> alter table t2 drop index fk; +(1553, "Cannot drop index 'fk': needed in a foreign key constraint") +``` + +### Rename Table/Column + +Rename table which has foreign key references, should also need to update the related child table info. + +```sql +create table t1 (id int key,a int, index(a)); +create table t2 (id int key,a int, foreign key fk(a) references t1(id) ON DELETE CASCADE); +rename table t1 to t11; +alter table t11 change column id id1 int; +show create table t2\G +***************************[ 1. row ]*************************** + Table | t2 +Create Table | CREATE TABLE `t2` ( + `id` int NOT NULL, + `a` int DEFAULT NULL, + PRIMARY KEY (`id`), + KEY `fk` (`a`), + CONSTRAINT `t2_ibfk_1` FOREIGN KEY (`a`) REFERENCES `t11` (`id1`) ON DELETE CASCADE + ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci +``` + +### Modify Column + +Modify column which used by foreign key will be rejected. + +```sql +> alter table t1 change column id id1 bigint; +(3780, "Referencing column 'a' and referenced column 'id1' in foreign key constraint 't2_ibfk_1' are incompatible.") +``` + +MySQL modify column problem: https://www.percona.com/blog/2019/06/04/ddl-queries-foreign-key-columns-MySQL-pxc/ + +What if the user really need to modify column type, such as from `INT` to `BIGINT`. Maybe we can offer an variable such as `alter-foreign-keys-method=auto`, +then when user modify the column type, TiDB will auto modify the related foreign key column's type. For easy implementation and reduce risk, maybe only support modify column type which doesn't need to reorg table row data. + +## DML Technical Design + +### DML On Child Table + +On Child Table Insert Or Update, need to Find FK column value whether exist in Parent table: + +1. Get parent table info by table name. +2. Get related fk index of parent table. +3. tiny optimize, check fk column value exist in parent table cache(map[string(index_key)]struct). +3. Get related row in parent, there are a few different implementations, I prefer option-c. +- option-a. use SQL string and use `ExecRestrictedSQL` API to check row exist. + - drawback: + - Need convert `Datum` to string when construct query SQL string. is there any risk? + - In bulk-insert/update situation, the construct SQL string maybe very huge, may have some risk. + - performance is bad. +- option-b. manual construct a index reader to check. + - drawback: + - there is some complexity, but acceptable? +- option-c. Construct index key and then use snapshot `Iter` and `Seek` API to scan. + - `Iter` default scan batch size is 256, need to set 1. + - Need manual decode index key by index schema. +4. compact column value to make sure exist. +5. put column value into parent fk column value cache. + +check order should check unique/primary key constrain first: + +```sql +test> create table t1 (id int key,a int, index(a)); +test> create table t2 (id int key,a int, foreign key fk(a) references t1(id) ON DELETE CASCADE); +test> insert into t1 values (1, 1); +test> insert into t2 values (1, 1); +test> insert into t2 values (1, 2); +(1062, "Duplicate entry '1' for key 't2.PRIMARY'") +test> insert ignore into t2 values (1, 2); +Query OK, 0 rows affected +``` + +Load data should also do foreign key check, but report warning instead error: + +```sql +create table t1 (id int key,a int, index(a)); +create table t2 (id int key,a int, foreign key fk(a) references t1(id) ON DELETE CASCADE); + +test> load data local infile 'data.csv' into table t2 FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'; +Query OK, 0 rows affected +test> show warnings; ++---------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Level | Code | Message | ++---------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Warning | 1452 | Cannot add or update a child row: a foreign key constraint fails (`test`.`t2`, CONSTRAINT `t2_ibfk_1` FOREIGN KEY (`a`) REFERENCES `t1` (`id`) ON DELETE CASCADE) | +| Warning | 1452 | Cannot add or update a child row: a foreign key constraint fails (`test`.`t2`, CONSTRAINT `t2_ibfk_1` FOREIGN KEY (`a`) REFERENCES `t1` (`id`) ON DELETE CASCADE) | ++---------+------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +``` + +`data.csv` is: +``` +1,1 +2,2 +``` + +### DML On Parent Table + +On Child Table Insert Or Update: + +1. check related child table row exist. +2. modify related child table row by referential action: +- `CASCADE`: update/delete related child table row. +- `SET NULL`: set related child row's foreign key columns value to NULL. +- `RESTRICT`, `NO ACTION`: If related row doesn't exit in child table, reject update/delete parent table. +- `SET DEFAULT`: just like `RESTRICT`. + +modify related child table row by following step: +1. get child table info by name(in parent table info). +2. get the child table fk index's column info. +3. build update executor to update child table rows. + +cascade modification test case: + +```sql +drop table if exists t3,t2,t1; +create table t1 (id int key,a int, index(a)); +create table t2 (id int key,a int, foreign key fk(a) references t1(id) ON DELETE CASCADE); +create table t3 (id int key,a int, foreign key fk(a) references t2(id) ON DELETE CASCADE); +insert into t1 values (1,1); +insert into t2 values (2,1); +insert into t3 values (3,2); +delete from t1 where id = 1; -- both t1, t2, t3 rows are deleted. +``` + +following is a MySQL test case about `SET DEFAULT`: + +```sql +MySQL>create table t1 (a int,b int, index(a,b)) ; +Query OK, 0 rows affected +Time: 0.022s +MySQL>create table t (a int, b int, foreign key fk_a(a) references test.t1(a) ON DELETE SET DEFAULT); +Query OK, 0 rows affected +Time: 0.019s +MySQL>insert into t1 values (1,1); +Query OK, 1 row affected +Time: 0.003s +MySQL>insert into t values (1,1); +Query OK, 1 row affected +Time: 0.006s +MySQL>delete from t1 where a=1; +(1451, 'Cannot delete or update a parent row: a foreign key constraint fails (`test`.`t`, CONSTRAINT `t_ibfk_1` FOREIGN KEY (`a`) REFERENCES `t1` (`a`))') +MySQL>select version(); ++-----------+ +| version() | ++-----------+ +| 8.0.29 | ++-----------+ +``` + +### Issue need to be discussed + +#### Affect Row + +related article: https://www.percona.com/blog/hidden-cost-of-foreign-key-constraints-in-MySQL/ + +Here is a MySQL example: + +prepare: +```sql +create table t1 (id int key,a int, index(a)); +create table t2 (id int key,a int, foreign key fk(a) references t1(id) ON DELETE CASCADE); +insert into t1 values (1, 1); +insert into t2 values (1, 1); +``` + +Then delete on parent table: +```sql +> delete from t1 where id=1; +Query OK, 1 row affected +``` + +As you can see, in the query result, the affected row is 1, but actually should be 2, since the related row in t2 is also been deleted. + +This is a MySQL issue, do we need to be compatible with it? + +#### DML Execution plan + +Here is a MySQL example: + +prepare: +```sql +create table t1 (id int key,a int, index(a)); +create table t2 (id int key,a int, foreign key fk(a) references t1(id) ON DELETE CASCADE); +insert into t1 values (1, 1); +insert into t2 values (1, 1); +``` + +```sql +> explain delete from t1 where id = 1; ++----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+ +| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | ++----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+ +| 1 | DELETE | t1 | | range | PRIMARY | PRIMARY | 4 | const | 1 | 100.0 | Using where | ++----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+ +``` + +From the plan, you can't see any information about the foreign key constrain which need to delete the related row in child table `t2`. + +I think this is a MySQL issue, do we need to be compatible with it, or make TiDB plan better, at least when we meet some slow query, we can know maybe it is caused by modify related row in child table. + +### Some special case + +#### Self-Referencing Tables + +For example, table `employee` has a column `manager_id` references to `employee.id`. + +```sql +create table employee (id int key,manager_id int, foreign key fk(manager_id) references employee(id) ON DELETE CASCADE); +insert into employee values (1,1); +insert into employee values (2,1); + +test> delete from employee where id=1; +Query OK, 1 row affected +test> select * from employee; ++----+---+ +| id | a | ++----+---+ +0 rows in set +``` + +A case of self-reference and cyclical dependencies: + +```sql +test> create table t (id int key,a int, foreign key fk_a(a) references t(id) ON DELETE CASCADE, foreign key fk_id(id) references t(a) ON DELETE CASCADE); +Query OK, 0 rows affected +Time: 0.045s +test> insert into t values (1,1); +(1452, 'Cannot add or update a child row: a foreign key constraint fails (`test`.`t`, CONSTRAINT `t_ibfk_2` FOREIGN KEY (`id`) REFERENCES `t` (`a`) ON DELETE CASCADE)') +``` + +#### Cyclical Dependencies + +```sql +create table t1 (id int key,a int, index(a)); +create table t2 (id int key,a int, foreign key fk(a) references t1(id) ON DELETE CASCADE); +insert into t1 values (1,1); +ALTER TABLE t1 ADD foreign key fk(a) references t2(id) ON DELETE CASCADE; +(1452, 'Cannot add or update a child row: a foreign key constraint fails (`test`.`#sql-298_8`, CONSTRAINT `t1_ibfk_1` FOREIGN KEY (`a`) REFERENCES `t2` (`id`) ON DELETE CASCADE)') +``` + +```sql +set @@foreign_key_checks=0; +create table t1 (id int key,a int, foreign key fk(a) references t2(id) ON DELETE CASCADE); +create table t2 (id int key,a int, foreign key fk(a) references t1(id) ON DELETE CASCADE); +insert into t1 values (1, 2); +insert into t2 values (2, 1); +set @@foreign_key_checks=1; -- add test case without this. +delete from t1 where id=1; +test> select * from t2; ++----+---+ +| id | a | ++----+---+ +0 rows in set +Time: 0.004s +test> select * from t1; ++----+---+ +| id | a | ++----+---+ +0 rows in set +``` + +#### MATCH FULL or MATCH SIMPLE + +This definition is from [CRDB](https://www.cockroachlabs.com/docs/v22.1/foreign-key.html#match-composite-foreign-keys-with-match-simple-and-match-full). MySQL doesn't mention it, here is a MySQL test case: + +```sql +create table t1 (i int, a int,b int, index(a,b)) ; +create table t (a int, b int, foreign key fk_a(a,b) references test.t1(a,b)); + +test> insert into t values (null,1); +Query OK, 1 row affected +test> insert into t values (null,null); +Query OK, 1 row affected +test> insert into t values (1,null); +Query OK, 1 row affected +``` + +I think keep this behavior consistent with MySQL is better. + + +## Other Technical Design + +### How to check foreign key integrity? + +How MySQL to do this? Look like MySQL doesn't provide any method, but the user can use stored procedure to do this, see: https://stackoverflow.com/questions/2250775/force-innodb-to-recheck-foreign-keys-on-a-table-tables + +Maybe We can use following syntax to check foreign key integrity: + +```sql +ADMIN CHECK FOREIGN KEY [table_name] [foreign_key_name] +``` + +## Impact + +### Impact of data replication + +todo + +### reference + +- [MySQL FOREIGN KEY Constraints Document](https://dev.mysql.com/doc/refman/8.0/en/create-table-foreign-keys.html#foreign-key-adding) +- [3 Common Foreign Key Mistakes (And How to Avoid Them)](https://www.cockroachlabs.com/blog/common-foreign-key-mistakes/) +- [Hidden Cost of Foreign Key Constraints in MySQL](https://www.percona.com/blog/hidden-cost-of-foreign-key-constraints-in-MySQL/) +- [DDL Queries on Foreign Key Columns in MySQL/PXC](https://www.percona.com/blog/2019/06/04/ddl-queries-foreign-key-columns-mysql-pxc/) From c810eba90193c091b2c3d723f05e958a38e4e200 Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Thu, 7 Jul 2022 14:34:43 +0800 Subject: [PATCH 02/29] update doc Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 80 +++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 2d849457daeb5..495e58be8fe39 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -294,6 +294,86 @@ From the plan, you can't see any information about the foreign key constrain whi I think this is a MySQL issue, do we need to be compatible with it, or make TiDB plan better, at least when we meet some slow query, we can know maybe it is caused by modify related row in child table. +##### CockroachDB DML Execution Plan + +```sql +CREATE TABLE customers_2 ( + id INT PRIMARY KEY + ); + +CREATE TABLE orders_2 ( + id INT PRIMARY KEY, + customer_id INT REFERENCES customers_2(id) ON UPDATE CASCADE ON DELETE CASCADE +); + +INSERT INTO customers_2 VALUES (1), (2), (3); +INSERT INTO orders_2 VALUES (100,1), (101,2), (102,3), (103,1); +``` + +```sql +> explain analyze UPDATE customers_2 SET id = 23 WHERE id = 1; + info +-------------------------------------------------- + planning time: 494µs + execution time: 5ms + distribution: local + vectorized: true + rows read from KV: 6 (170 B) + cumulative time spent in KV: 978µs + maximum memory usage: 100 KiB + network usage: 0 B (0 messages) + regions: us-east1 + + • root + │ + ├── • update + │ │ nodes: n1 + │ │ regions: us-east1 + │ │ actual row count: 1 + │ │ table: customers_2 + │ │ set: id + │ │ + │ └── • buffer + │ │ label: buffer 1 + │ │ + │ └── • render + │ │ nodes: n1 + │ │ regions: us-east1 + │ │ actual row count: 1 + │ │ KV rows read: 1 + │ │ KV bytes read: 27 B + │ │ + │ └── • scan + │ nodes: n1 + │ regions: us-east1 + │ actual row count: 1 + │ KV rows read: 1 + │ KV bytes read: 27 B + │ missing stats + │ table: customers_2@primary + │ spans: [/1 - /1] + │ locking strength: for update + │ + └── • fk-cascade + fk: fk_customer_id_ref_customers_2 + input: buffer 1 +``` + +##### PostgreSQL DML Execution Plan + +```sql +postgres=# explain analyze UPDATE customers_2 SET id = 20 WHERE id = 23; + QUERY PLAN +------------------------------------------------------------------------------------------------------------------------------------- + Update on customers_2 (cost=0.15..8.17 rows=1 width=10) (actual time=0.039..0.039 rows=0 loops=1) + -> Index Scan using customers_2_pkey on customers_2 (cost=0.15..8.17 rows=1 width=10) (actual time=0.016..0.016 rows=1 loops=1) + Index Cond: (id = 23) + Planning Time: 0.057 ms + Trigger for constraint orders_2_customer_id_fkey on customers_2: time=0.045 calls=1 + Trigger for constraint orders_2_customer_id_fkey on orders_2: time=0.023 calls=2 + Execution Time: 0.129 ms +``` + ### Some special case #### Self-Referencing Tables From 727635ae231b8cb05a7849e6923821c6aa10cf2b Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Mon, 11 Jul 2022 16:15:09 +0800 Subject: [PATCH 03/29] update Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 105 +++++++++++++++++++++----- 1 file changed, 85 insertions(+), 20 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 495e58be8fe39..e0107f9a444a1 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -12,8 +12,8 @@ Table's foreign key information will be stored in `model.TableInfo`: // TableInfo provides meta data describing a DB table. type TableInfo struct { ... - ForeignKeys []*FKInfo `json:"fk_info"` - CitedForeignKeys []*CitedFKInfo `json:"cited_fk_info"` + ForeignKeys []*FKInfo `json:"fk_info"` + ReferredForeignKeys []*ReferredFKInfo `json:"cited_fk_info"` ... } @@ -21,6 +21,7 @@ type TableInfo struct { type FKInfo struct { ID int64 `json:"id"` Name CIStr `json:"fk_name"` + Enable bool `json:"enable"` RefSchema CIStr `json:"ref_schema"` RefTable CIStr `json:"ref_table"` RefCols []CIStr `json:"ref_cols"` @@ -30,8 +31,8 @@ type FKInfo struct { State SchemaState `json:"state"` } -// CitedFKInfo provides the cited foreign key in the child table. -type CitedFKInfo struct { +// ReferredFKInfo provides the referred foreign key in the child table. +type ReferredFKInfo struct { Cols []CIStr `json:"cols"` ChildSchema CIStr `json:"child_schema"` ChildTable CIStr `json:"child_table"` @@ -39,12 +40,36 @@ type CitedFKInfo struct { } ``` -Struct `FKInfo` uses for child table to record the referenced parent table. -Struct `CitedFKInfo` uses for parent table to record the child table which referenced me. +Struct `FKInfo` uses for child table to record the referenced parent table. Struct `FKInfo` has existed for a long time, I just added some fields. +Struct `ReferredFKInfo` uses for referenced table to record the child table which referenced me. ### Create Table with Foreign Key +#### Build TableInfo +When build `TableInfo`, auto create an index for foreign key columns if there is no index cover foreign key columns. Here is an example: + +```sql +mysql>create table t (id int key,a int, foreign key fk(a) references t(id)); +Query OK, 0 rows affected +mysql>show create table t\G +***************************[ 1. row ]*************************** +Table | t +Create Table | CREATE TABLE `t` ( + `id` int NOT NULL, + `a` int DEFAULT NULL, + PRIMARY KEY (`id`), + KEY `fk` (`a`), + CONSTRAINT `t_ibfk_1` FOREIGN KEY (`a`) REFERENCES `t` (`id`) +) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci +1 row in set +``` + +As you can see, the index `fk` is auto create for foreign key usage. + +#### Validate + Create a table with foreign key, check following condition when build DDL job and DDL owner received DDL job(aka Double-Check): + - whether the user have `REFERENCES` privilege to the foreign key references table. - Corresponding columns in the foreign key and the referenced key must have similar data types. The size and sign of fixed precision types such as INTEGER and DECIMAL must be the same. The length of string types need not be the same. For nonbinary (character) string columns, the character set and collation must be the same. - Supports foreign key references between one column and another within a table. (A column cannot have a foreign key reference to itself.) @@ -53,26 +78,66 @@ Create a table with foreign key, check following condition when build DDL job an - Does not currently support foreign keys for tables with user-defined partitioning. This includes both parent and child tables. - A foreign key constraint cannot reference a virtual generated column, but stored generated column is ok. -DDL owner handle create table job step is: +#### Handle In DDL Owner -- Option-1: -``` -1. None -> Public. -- create a new table. -- update parent table info. -- **notify multi-table's schema change in 1 schema version**. -``` -This will also need to to related change when TiDB to do `loadInfoSchema`. +When DDL Owner handle create table job, DDL owner need to create a new table, and update the reference tables information. -- Option-2: -``` -1. None -> Write Only(whatever): Update Parent Table info -2. Write Only -> Done: Create Table +At the same point in time, there may be two versions of the schema in the TiDB cluster, so we can't create new table and +update all reference tables in one schema version, since this may break foreign key constrain, such as delete reference table +without foreign key constrain check in child table. + +```sql +-- In TiDB-1 and Schema Version is 1 +insert into t_has_foreign_key values (1, 1); + +-- In TiDB-0 and Schema Version is 0 +delete from t_reference where id = 1; --Since doesn't know foreign key information in old version, so doesn't do foreign key constrain check. ``` +So, when create a table with foreign key, we need multi-schema version change: + +1. None -> Write Only: Create table with state is `write-only`, and update all reference tables info. +2. Write Only -> Done: Update the new created table state to `public`. + +In step-1, we need update some table info in one schema-version. Technically, we can implement is since we already support `ActionCreateTables` DDL job. + ### Alter Table Add Foreign Key. -Not supported at this time. +Here is an example: +```sql +create table t1 (id int key,a int, index(a)); +create table t2 (id int key,a int); +alter table t2 add foreign key fk(a) references t1(id) ON DELETE CASCADE; +``` + +Just like create table, we should validate first, and return error if the conditions for creating foreign keys are not met, and also need to do double-check. + +When build `TableInfo`, we need to auto create an index for foreign key columns if there is no index cover foreign key columns. +And this is divides the problem into two cases: +- Case-1: No need to auto create index, and only add foreign key constrain. +- Case-2: Need auto create index for foreign key + +#### Case-1: Only add foreign key constrain + +The DDL owner handle add foreign key constrain step is: + +1. None -> Write Only: add foreign key constrain which state is `write-only` into table and update the reference table info. +3. Write Only - Write Reorg: check all row in the table whether has related foreign key exists in reference table, we can use following SQL to check: + ```sql + select count(*) from t2 where t2.a not in (select id from t1); + ``` + The expected result is `0`, otherwise, an error is returned and cancel the ddl job. +4. Write Reorg -> Public: update the foreign key constrain state to `public`. + +DML should also check the foreign key constrain which state is `write-only` + +#### Case-2: Auto create index for foreign key and add foreign key constrain + +As TiDB support multi-schema change now, we can split this into 2 sub-ddl job. +- Add Index DDL job +- Add Foreign Key Constrain DDL job + +We should do add index DDL job first, after index ddl job in `write-reorg`, then start to do add foreign key constrain ddl job. ### Drop Table From 8295eaf76978cdc665626cb482bb9b401f173d8c Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Mon, 11 Jul 2022 16:19:22 +0800 Subject: [PATCH 04/29] update Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index e0107f9a444a1..0b76454b8f4eb 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -203,18 +203,10 @@ On Child Table Insert Or Update, need to Find FK column value whether exist in P 1. Get parent table info by table name. 2. Get related fk index of parent table. 3. tiny optimize, check fk column value exist in parent table cache(map[string(index_key)]struct). -3. Get related row in parent, there are a few different implementations, I prefer option-c. -- option-a. use SQL string and use `ExecRestrictedSQL` API to check row exist. - - drawback: - - Need convert `Datum` to string when construct query SQL string. is there any risk? - - In bulk-insert/update situation, the construct SQL string maybe very huge, may have some risk. - - performance is bad. -- option-b. manual construct a index reader to check. - - drawback: - - there is some complexity, but acceptable? -- option-c. Construct index key and then use snapshot `Iter` and `Seek` API to scan. - - `Iter` default scan batch size is 256, need to set 1. - - Need manual decode index key by index schema. +3. Get related row in parent. + - Construct index key and then use snapshot `Iter` and `Seek` API to scan. If the index is unique and only contain + foreign key columns, use snapshot `Get` API. + - `Iter` default scan batch size is 256, need to set 2 to avoid read unnecessary data. 4. compact column value to make sure exist. 5. put column value into parent fk column value cache. From 7755392601ca9fc7292e1ebfdaa6c34e26c080ce Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Tue, 12 Jul 2022 11:01:13 +0800 Subject: [PATCH 05/29] update doc Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 146 ++++++++++++++------------ 1 file changed, 79 insertions(+), 67 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 0b76454b8f4eb..80dc03b5137f5 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -3,6 +3,10 @@ - Author(s): [crazycs520](https://github.com/crazycs520) - Tracking Issue: https://github.com/pingcap/tidb/issues/18209 +## Abstract + +This proposes an implementation of supporting foreign key constrain. + ## DDL Technical Design ### Table Information Changes @@ -41,11 +45,13 @@ type ReferredFKInfo struct { ``` Struct `FKInfo` uses for child table to record the referenced parent table. Struct `FKInfo` has existed for a long time, I just added some fields. + - `Enable`: uses to distinguish between old and new versions. Struct `ReferredFKInfo` uses for referenced table to record the child table which referenced me. ### Create Table with Foreign Key #### Build TableInfo + When build `TableInfo`, auto create an index for foreign key columns if there is no index cover foreign key columns. Here is an example: ```sql @@ -101,9 +107,10 @@ So, when create a table with foreign key, we need multi-schema version change: In step-1, we need update some table info in one schema-version. Technically, we can implement is since we already support `ActionCreateTables` DDL job. -### Alter Table Add Foreign Key. +### Alter Table Add Foreign Key Here is an example: + ```sql create table t1 (id int key,a int, index(a)); create table t2 (id int key,a int); @@ -137,7 +144,7 @@ As TiDB support multi-schema change now, we can split this into 2 sub-ddl job. - Add Index DDL job - Add Foreign Key Constrain DDL job -We should do add index DDL job first, after index ddl job in `write-reorg`, then start to do add foreign key constrain ddl job. +We should do add index DDL job first, after index ddl job finish `write-reorg` and ready for public, then start to do add foreign key constrain ddl job. ### Drop Table @@ -153,7 +160,7 @@ If `foreign_key_checks` is `ON`, then drop the table which has foreign key refer Drop index which used by foreign key will be rejected. ```sql -> set @@foreign_key_checks=0; +> set @@foreign_key_checks=0; -- Even disable foreign_key_checks, you still can't drop the index which used for foreign key constrain. Query OK, 0 rows affected > alter table t2 drop index fk; (1553, "Cannot drop index 'fk': needed in a foreign key constraint") @@ -262,44 +269,6 @@ modify related child table row by following step: 2. get the child table fk index's column info. 3. build update executor to update child table rows. -cascade modification test case: - -```sql -drop table if exists t3,t2,t1; -create table t1 (id int key,a int, index(a)); -create table t2 (id int key,a int, foreign key fk(a) references t1(id) ON DELETE CASCADE); -create table t3 (id int key,a int, foreign key fk(a) references t2(id) ON DELETE CASCADE); -insert into t1 values (1,1); -insert into t2 values (2,1); -insert into t3 values (3,2); -delete from t1 where id = 1; -- both t1, t2, t3 rows are deleted. -``` - -following is a MySQL test case about `SET DEFAULT`: - -```sql -MySQL>create table t1 (a int,b int, index(a,b)) ; -Query OK, 0 rows affected -Time: 0.022s -MySQL>create table t (a int, b int, foreign key fk_a(a) references test.t1(a) ON DELETE SET DEFAULT); -Query OK, 0 rows affected -Time: 0.019s -MySQL>insert into t1 values (1,1); -Query OK, 1 row affected -Time: 0.003s -MySQL>insert into t values (1,1); -Query OK, 1 row affected -Time: 0.006s -MySQL>delete from t1 where a=1; -(1451, 'Cannot delete or update a parent row: a foreign key constraint fails (`test`.`t`, CONSTRAINT `t_ibfk_1` FOREIGN KEY (`a`) REFERENCES `t1` (`a`))') -MySQL>select version(); -+-----------+ -| version() | -+-----------+ -| 8.0.29 | -+-----------+ -``` - ### Issue need to be discussed #### Affect Row @@ -349,7 +318,7 @@ insert into t2 values (1, 1); From the plan, you can't see any information about the foreign key constrain which need to delete the related row in child table `t2`. -I think this is a MySQL issue, do we need to be compatible with it, or make TiDB plan better, at least when we meet some slow query, we can know maybe it is caused by modify related row in child table. +I think this is a MySQL issue, should we make TiDB plan better, at least when we meet some slow query, we can know maybe it is caused by modify related row in child table. ##### CockroachDB DML Execution Plan @@ -431,9 +400,71 @@ postgres=# explain analyze UPDATE customers_2 SET id = 20 WHERE id = 23; Execution Time: 0.129 ms ``` -### Some special case +## Other Technical Design + +### How to check foreign key integrity? + +How MySQL to do this? Look like MySQL doesn't provide any method, but the user can use stored procedure to do this, see: https://stackoverflow.com/questions/2250775/force-innodb-to-recheck-foreign-keys-on-a-table-tables -#### Self-Referencing Tables +Maybe We can use following syntax to check foreign key integrity: + +```sql +ADMIN CHECK FOREIGN KEY [table_name] [foreign_key_name] +``` + +which implemention is use following SQL to check: + +```sql +select count(*) from t where t.a not in (select id from t_refer); +``` + +## Impact + +### Impact of data replication + +todo + +## Test Case + +cascade modification test case: + +```sql +drop table if exists t3,t2,t1; +create table t1 (id int key,a int, index(a)); +create table t2 (id int key,a int, foreign key fk(a) references t1(id) ON DELETE CASCADE); +create table t3 (id int key,a int, foreign key fk(a) references t2(id) ON DELETE CASCADE); +insert into t1 values (1,1); +insert into t2 values (2,1); +insert into t3 values (3,2); +delete from t1 where id = 1; -- both t1, t2, t3 rows are deleted. +``` + +following is a MySQL test case about `SET DEFAULT`: + +```sql +MySQL>create table t1 (a int,b int, index(a,b)) ; +Query OK, 0 rows affected +Time: 0.022s +MySQL>create table t (a int, b int, foreign key fk_a(a) references test.t1(a) ON DELETE SET DEFAULT); +Query OK, 0 rows affected +Time: 0.019s +MySQL>insert into t1 values (1,1); +Query OK, 1 row affected +Time: 0.003s +MySQL>insert into t values (1,1); +Query OK, 1 row affected +Time: 0.006s +MySQL>delete from t1 where a=1; +(1451, 'Cannot delete or update a parent row: a foreign key constraint fails (`test`.`t`, CONSTRAINT `t_ibfk_1` FOREIGN KEY (`a`) REFERENCES `t1` (`a`))') +MySQL>select version(); ++-----------+ +| version() | ++-----------+ +| 8.0.29 | ++-----------+ +``` + +### Self-Referencing Tables For example, table `employee` has a column `manager_id` references to `employee.id`. @@ -461,7 +492,7 @@ test> insert into t values (1,1); (1452, 'Cannot add or update a child row: a foreign key constraint fails (`test`.`t`, CONSTRAINT `t_ibfk_2` FOREIGN KEY (`id`) REFERENCES `t` (`a`) ON DELETE CASCADE)') ``` -#### Cyclical Dependencies +### Cyclical Dependencies ```sql create table t1 (id int key,a int, index(a)); @@ -492,10 +523,12 @@ test> select * from t1; 0 rows in set ``` -#### MATCH FULL or MATCH SIMPLE +### MATCH FULL or MATCH SIMPLE This definition is from [CRDB](https://www.cockroachlabs.com/docs/v22.1/foreign-key.html#match-composite-foreign-keys-with-match-simple-and-match-full). MySQL doesn't mention it, here is a MySQL test case: +Here is an MySQL example: + ```sql create table t1 (i int, a int,b int, index(a,b)) ; create table t (a int, b int, foreign key fk_a(a,b) references test.t1(a,b)); @@ -508,27 +541,6 @@ test> insert into t values (1,null); Query OK, 1 row affected ``` -I think keep this behavior consistent with MySQL is better. - - -## Other Technical Design - -### How to check foreign key integrity? - -How MySQL to do this? Look like MySQL doesn't provide any method, but the user can use stored procedure to do this, see: https://stackoverflow.com/questions/2250775/force-innodb-to-recheck-foreign-keys-on-a-table-tables - -Maybe We can use following syntax to check foreign key integrity: - -```sql -ADMIN CHECK FOREIGN KEY [table_name] [foreign_key_name] -``` - -## Impact - -### Impact of data replication - -todo - ### reference - [MySQL FOREIGN KEY Constraints Document](https://dev.mysql.com/doc/refman/8.0/en/create-table-foreign-keys.html#foreign-key-adding) From 865a70478e65c3511bee549b81676228c307280a Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Tue, 12 Jul 2022 13:07:26 +0800 Subject: [PATCH 06/29] fix typo Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 80dc03b5137f5..6bfa9428dca29 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -81,7 +81,7 @@ Create a table with foreign key, check following condition when build DDL job an - Supports foreign key references between one column and another within a table. (A column cannot have a foreign key reference to itself.) - Require indexes on foreign keys and referenced keys so that foreign key checks can be fast and not require a table scan. The size and sign of fixed precision types such as INTEGER and DECIMAL must be the same. The length of string types need not be the same. For nonbinary (character) string columns, the character set and collation must be the same. - Index prefixes on foreign key columns are not supported. Consequently, BLOB and TEXT columns cannot be included in a foreign key because indexes on those columns must always include a prefix length. -- Does not currently support foreign keys for tables with user-defined partitioning. This includes both parent and child tables. +- Does not currently support foreign keys for tables with user-defined partitioning. This includes both reference and child tables. - A foreign key constraint cannot reference a virtual generated column, but stored generated column is ok. #### Handle In DDL Owner @@ -205,17 +205,17 @@ then when user modify the column type, TiDB will auto modify the related foreign ### DML On Child Table -On Child Table Insert Or Update, need to Find FK column value whether exist in Parent table: +On Child Table Insert Or Update, need to Find FK column value whether exist in reference table: -1. Get parent table info by table name. -2. Get related fk index of parent table. -3. tiny optimize, check fk column value exist in parent table cache(map[string(index_key)]struct). -3. Get related row in parent. +1. Get reference table info by table name. +2. Get related fk index of reference table. +3. tiny optimize, check fk column value exist in reference table cache(map[string(index_key)]struct). +3. Get related row in reference. - Construct index key and then use snapshot `Iter` and `Seek` API to scan. If the index is unique and only contain foreign key columns, use snapshot `Get` API. - `Iter` default scan batch size is 256, need to set 2 to avoid read unnecessary data. 4. compact column value to make sure exist. -5. put column value into parent fk column value cache. +5. put column value into reference fk column value cache. check order should check unique/primary key constrain first: @@ -253,19 +253,19 @@ test> show warnings; 2,2 ``` -### DML On Parent Table +### DML On reference Table -On Child Table Insert Or Update: +On reference Table Insert Or Update: 1. check related child table row exist. 2. modify related child table row by referential action: - `CASCADE`: update/delete related child table row. - `SET NULL`: set related child row's foreign key columns value to NULL. -- `RESTRICT`, `NO ACTION`: If related row doesn't exit in child table, reject update/delete parent table. +- `RESTRICT`, `NO ACTION`: If related row exist in child table, reject update/delete reference table. - `SET DEFAULT`: just like `RESTRICT`. modify related child table row by following step: -1. get child table info by name(in parent table info). +1. get child table info by name(in reference table info). 2. get the child table fk index's column info. 3. build update executor to update child table rows. @@ -285,7 +285,7 @@ insert into t1 values (1, 1); insert into t2 values (1, 1); ``` -Then delete on parent table: +Then delete on reference table: ```sql > delete from t1 where id=1; Query OK, 1 row affected From cc65dfe7f2d598313f8cc1092df02951b1a32533 Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Tue, 12 Jul 2022 13:09:25 +0800 Subject: [PATCH 07/29] refine Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 6bfa9428dca29..762a839d43b62 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -257,8 +257,7 @@ test> show warnings; On reference Table Insert Or Update: -1. check related child table row exist. -2. modify related child table row by referential action: +1. modify related child table row by referential action: - `CASCADE`: update/delete related child table row. - `SET NULL`: set related child row's foreign key columns value to NULL. - `RESTRICT`, `NO ACTION`: If related row exist in child table, reject update/delete reference table. From 2664f28fc3fe0d0f3d94b67c8c1a5ae1d9d77c3c Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Mon, 18 Jul 2022 10:54:18 +0800 Subject: [PATCH 08/29] update lock Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 60 +++++++++++++++++++++------ 1 file changed, 48 insertions(+), 12 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 762a839d43b62..d7b1740e2a83c 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -25,7 +25,6 @@ type TableInfo struct { type FKInfo struct { ID int64 `json:"id"` Name CIStr `json:"fk_name"` - Enable bool `json:"enable"` RefSchema CIStr `json:"ref_schema"` RefTable CIStr `json:"ref_table"` RefCols []CIStr `json:"ref_cols"` @@ -33,6 +32,7 @@ type FKInfo struct { OnDelete int `json:"on_delete"` OnUpdate int `json:"on_update"` State SchemaState `json:"state"` + Version bool `json:"version"` } // ReferredFKInfo provides the referred foreign key in the child table. @@ -45,7 +45,7 @@ type ReferredFKInfo struct { ``` Struct `FKInfo` uses for child table to record the referenced parent table. Struct `FKInfo` has existed for a long time, I just added some fields. - - `Enable`: uses to distinguish between old and new versions. + - `Version`: uses to distinguish between old and new versions. Struct `ReferredFKInfo` uses for referenced table to record the child table which referenced me. ### Create Table with Foreign Key @@ -215,21 +215,42 @@ On Child Table Insert Or Update, need to Find FK column value whether exist in r foreign key columns, use snapshot `Get` API. - `Iter` default scan batch size is 256, need to set 2 to avoid read unnecessary data. 4. compact column value to make sure exist. -5. put column value into reference fk column value cache. +5. If relate row exist in reference table, also need to add lock in the related row. +6. put column value into reference fk column value cache. -check order should check unique/primary key constrain first: +#### Lock + +Let's see an example in MySQL first: +prepare: ```sql -test> create table t1 (id int key,a int, index(a)); -test> create table t2 (id int key,a int, foreign key fk(a) references t1(id) ON DELETE CASCADE); -test> insert into t1 values (1, 1); -test> insert into t2 values (1, 1); -test> insert into t2 values (1, 2); -(1062, "Duplicate entry '1' for key 't2.PRIMARY'") -test> insert ignore into t2 values (1, 2); -Query OK, 0 rows affected +create table t1 (id int key,a int, b int, unique index(a, b, id)); +create table t2 (id int key,a int, b int, index (a,b,id), foreign key fk(a, b) references t1(a, b)); +insert into t1 values (-1, 1, 1); ``` +Then, execute following SQL in 2 session: + +| Session 1 | Session 2 | +| -------------------------------- | ------------------------------------------- | +| Begin; | | +| insert into t2 values (1, 1, 1); | | +| | delete from t1; -- Blocked by wait lock | +| Commit | | +| | ERROR: Cannot delete or update a parent row | + +So we need to add lock in reference table when insert/update child table. + +##### In Pessimistic Transaction + +When TiDB add pessimistic locks, if relate row exist in reference table, also need to add lock in the related row. + +##### In Optimistic Transaction + +todo: + +#### DML Load data + Load data should also do foreign key check, but report warning instead error: ```sql @@ -522,6 +543,21 @@ test> select * from t1; 0 rows in set ``` +### Check order + +check order should check unique/primary key constrain first: + +```sql +test> create table t1 (id int key,a int, index(a)); +test> create table t2 (id int key,a int, foreign key fk(a) references t1(id) ON DELETE CASCADE); +test> insert into t1 values (1, 1); +test> insert into t2 values (1, 1); +test> insert into t2 values (1, 2); +(1062, "Duplicate entry '1' for key 't2.PRIMARY'") +test> insert ignore into t2 values (1, 2); +Query OK, 0 rows affected +``` + ### MATCH FULL or MATCH SIMPLE This definition is from [CRDB](https://www.cockroachlabs.com/docs/v22.1/foreign-key.html#match-composite-foreign-keys-with-match-simple-and-match-full). MySQL doesn't mention it, here is a MySQL test case: From a23b1ad5e65ddef0f3f0fa0fd9cd903994e7a569 Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Mon, 18 Jul 2022 15:49:37 +0800 Subject: [PATCH 09/29] update Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index d7b1740e2a83c..34c64a2d1e88a 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -247,7 +247,7 @@ When TiDB add pessimistic locks, if relate row exist in reference table, also ne ##### In Optimistic Transaction -todo: +Just like `SELECT FOR UPDATE` statement, need to use `doLockKeys` to lock the related row in the reference table. #### DML Load data From 9ac4ed83a24af5e31919f1bfdc1bf76a54a2f96c Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Mon, 8 Aug 2022 19:11:31 +0800 Subject: [PATCH 10/29] add pseudocodeOC Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 155 +++++++++++++++++++++++--- 1 file changed, 139 insertions(+), 16 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 34c64a2d1e88a..b14656c556906 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -18,30 +18,34 @@ type TableInfo struct { ... ForeignKeys []*FKInfo `json:"fk_info"` ReferredForeignKeys []*ReferredFKInfo `json:"cited_fk_info"` + // MaxFKIndexID uses to allocate foreign key ID. Or use MaxConstraintID instead. + MaxFKIndexID int64 `json:"max_fk_idx_id"` ... } // FKInfo provides meta data describing a foreign key constraint. type FKInfo struct { - ID int64 `json:"id"` - Name CIStr `json:"fk_name"` - RefSchema CIStr `json:"ref_schema"` - RefTable CIStr `json:"ref_table"` - RefCols []CIStr `json:"ref_cols"` - Cols []CIStr `json:"cols"` - OnDelete int `json:"on_delete"` - OnUpdate int `json:"on_update"` - State SchemaState `json:"state"` - Version bool `json:"version"` + ID int64 `json:"id"` + Name CIStr `json:"fk_name"` + RefSchemaID CIStr `json:"ref_schema_id"` + RefTableID CIStr `json:"ref_table_id"` + RefCols []CIStr `json:"ref_cols"` + Cols []CIStr `json:"cols"` + OnDelete int `json:"on_delete"` + OnUpdate int `json:"on_update"` + State SchemaState `json:"state"` + // Deprecated + RefTable CIStr `json:"ref_table"` } // ReferredFKInfo provides the referred foreign key in the child table. type ReferredFKInfo struct { - Cols []CIStr `json:"cols"` - ChildSchema CIStr `json:"child_schema"` - ChildTable CIStr `json:"child_table"` - ChildFKIndex CIStr `json:"child_fk_index"` + Cols []CIStr `json:"cols"` + ChildSchemaID int64 `json:"child_schema_id"` + ChildTableID int64 `json:"child_table_id"` + ChildFKIndex CIStr `json:"child_fk_index"` } + ``` Struct `FKInfo` uses for child table to record the referenced parent table. Struct `FKInfo` has existed for a long time, I just added some fields. @@ -166,9 +170,9 @@ Query OK, 0 rows affected (1553, "Cannot drop index 'fk': needed in a foreign key constraint") ``` -### Rename Table/Column +### Rename Column -Rename table which has foreign key references, should also need to update the related child table info. +Rename column which has foreign key or references, should also need to update the related child/parent table info. ```sql create table t1 (id int key,a int, index(a)); @@ -187,6 +191,10 @@ Create Table | CREATE TABLE `t2` ( ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci ``` +### Truncate Table + +Truncate table which has foreign key or references, should also need to update the related child/parent table info. + ### Modify Column Modify column which used by foreign key will be rejected. @@ -289,6 +297,121 @@ modify related child table row by following step: 2. get the child table fk index's column info. 3. build update executor to update child table rows. +There is an pseudocode about `DELETE` on reference table: + +1. Pseudocode in build plan stage. + +```go +type Delete struct { + ... + // table-id => foreign key triggers. + FKTriggers map[int64][]*ForeignKeyTrigger +} + +type ForeignKeyTrigger struct { + Tp FKTriggerType + OnModifyReferredTable *OnModifyReferredTableFKInfo + OnModifyChildTable *OnModifyChildTableFKInfo +} + +type OnModifyReferredTableFKInfo struct { + ReferredFK *model.ReferredFKInfo + ChildTable table.Table + FK *model.FKInfo +} + +const ( + FKTriggerOnDeleteReferredTable FKTriggerType = 1 + FKTriggerOnUpdateReferredTable FKTriggerType = 2 + FKTriggerOnInsertOrUpdateChildTable FKTriggerType = 3 +) + +func (b *PlanBuilder) buildDelete(ctx context.Context, ds *ast.DeleteStmt) (Plan, error) { + .... + del.FKTriggers = buildOnDeleteForeignKeyTrigger() +} +``` + +2. Pseudocode in build executor. + +```go +type DeleteExec struct { + fkTriggerExecs map[int64][]*ForeignKeyTriggerExec +} + +type ForeignKeyTriggerExec struct { + fkTrigger *plannercore.ForeignKeyTrigger + colsOffsets []int + fkValues [][]types.Datum +} + +func (b *executorBuilder) buildDelete(v *plannercore.Delete) Executor { + ... + deleteExec.fkTriggerExecs = b.buildTblID2ForeignKeyTriggerExecs() +} +``` + +3. Pseudocode in `Delete` execution. + +```go +func (e *DeleteExec) removeRow(...){ + fkTriggerExecs := e.fkTriggerExecs[t.Meta().ID] + sc := ctx.GetSessionVars().StmtCtx + for _, fkt := range fkTriggerExecs { + err = fkt.addRowNeedToTrigger(sc, row) + } +} + +func (fkt *ForeignKeyTriggerExec) addRowNeedToTrigger(sc *stmtctx.StatementContext, row []types.Datum) error { + vals, err := fetchFKValues(row, fkt.colsOffsets) + if err != nil || hasNullValue(vals) { + return err + } + fkt.fkValues = append(fkt.fkValues, vals) + return nil +} +``` + +4. Pseudocode after `Delete` execution. + +```go +func (a *ExecStmt) Exec(){ + e, err := a.buildExecutor() + e.Open() + f handled, result, err := a.handleNoDelay(ctx, e, isPessimistic); handled { + err = a.handleForeignKeyTrigger(ctx, e, isPessimistic) + return result, err + } +} + +func (a *ExecStmt) handleForeignKeyTrigger(ctx context.Context, e Executor, isPessimistic bool) error { + fkTriggerExecs := e.GetForeignKeyTriggerExecs() + for i := 0; i < len(fkTriggerExecs); i++{ + fkt := fkTriggerExecs[i] + e, err := fkt.buildExecutor(ctx) + if err != nil { + return err + } + if e == nil { + continue + } + if err := e.Open(ctx); err != nil { + terror.Call(e.Close) + return err + } + _, _, err = a.handleNoDelay(ctx, e, isPessimistic) + if err != nil { + return err + } + err = a.handleForeignKeyTrigger(ctx, e, isPessimistic) + if err != nil { + return err + } + } + return nil +} +``` + ### Issue need to be discussed #### Affect Row From 73f8af3d18d8a96642d019b2681a2232a0d7a185 Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Tue, 9 Aug 2022 15:14:48 +0800 Subject: [PATCH 11/29] add plan example Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index b14656c556906..a77050b05f3ac 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -461,7 +461,20 @@ insert into t2 values (1, 1); From the plan, you can't see any information about the foreign key constrain which need to delete the related row in child table `t2`. -I think this is a MySQL issue, should we make TiDB plan better, at least when we meet some slow query, we can know maybe it is caused by modify related row in child table. +I think this is a MySQL issue, should we make TiDB plan better, at least when we meet some slow query, we can know maybe it is caused by modify related row in child table. + +Here is an example plan with foreign key: + +```sql +> explain delete from t1 where id = 1; ++--------------------------+---------+------+---------------+---------------+ +| id | estRows | task | access object | operator info | ++--------------------------+---------+------+---------------+---------------+ +| Delete_4 | N/A | root | | N/A | +| └─Point_Get_6 | 1.00 | root | table:t1 | handle:1 | +| └─Foreign_Key_Trigger | N/A | root | table:t2 | cascade | ++--------------------------+---------+------+---------------+---------------+ +``` ##### CockroachDB DML Execution Plan From cdf34f437d6efc757710a240f132b967834d5437 Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Tue, 9 Aug 2022 15:34:24 +0800 Subject: [PATCH 12/29] fix typo Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index a77050b05f3ac..5675f2059272b 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -284,7 +284,7 @@ test> show warnings; ### DML On reference Table -On reference Table Insert Or Update: +On reference Table Delete Or Update: 1. modify related child table row by referential action: - `CASCADE`: update/delete related child table row. From 224a5044dcdc6b214ddd136e3b539409b6aa9bce Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Wed, 10 Aug 2022 11:12:39 +0800 Subject: [PATCH 13/29] refine doc Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 36 +++++++++++++-------------- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 5675f2059272b..a7b6099715018 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -5,7 +5,7 @@ ## Abstract -This proposes an implementation of supporting foreign key constrain. +This proposes an implementation of supporting foreign key constraint. ## DDL Technical Design @@ -15,12 +15,12 @@ Table's foreign key information will be stored in `model.TableInfo`: ```go // TableInfo provides meta data describing a DB table. type TableInfo struct { - ... - ForeignKeys []*FKInfo `json:"fk_info"` - ReferredForeignKeys []*ReferredFKInfo `json:"cited_fk_info"` - // MaxFKIndexID uses to allocate foreign key ID. Or use MaxConstraintID instead. + ... + ForeignKeys []*FKInfo `json:"fk_info"` + ReferredForeignKeys []*ReferredFKInfo `json:"referred_fk_info"` + // MaxFKIndexID uses to allocate foreign key ID. Or use MaxConstraintID instead. MaxFKIndexID int64 `json:"max_fk_idx_id"` - ... + ... } // FKInfo provides meta data describing a foreign key constraint. @@ -50,18 +50,18 @@ type ReferredFKInfo struct { Struct `FKInfo` uses for child table to record the referenced parent table. Struct `FKInfo` has existed for a long time, I just added some fields. - `Version`: uses to distinguish between old and new versions. -Struct `ReferredFKInfo` uses for referenced table to record the child table which referenced me. +Struct `ReferredFKInfo` is used to record the tables that are referencing the current table. ### Create Table with Foreign Key #### Build TableInfo -When build `TableInfo`, auto create an index for foreign key columns if there is no index cover foreign key columns. Here is an example: +When build `TableInfo`, an index for the foreign key columns is created automatically if there is no index covering the foreign key columns. Here is an example: ```sql -mysql>create table t (id int key,a int, foreign key fk(a) references t(id)); +mysql> create table t (id int key, a int, foreign key fk(a) references t(id)); Query OK, 0 rows affected -mysql>show create table t\G +mysql> show create table t\G ***************************[ 1. row ]*************************** Table | t Create Table | CREATE TABLE `t` ( @@ -74,26 +74,26 @@ Create Table | CREATE TABLE `t` ( 1 row in set ``` -As you can see, the index `fk` is auto create for foreign key usage. +As you can see, the index `fk` is created automatically for the foreign key. #### Validate -Create a table with foreign key, check following condition when build DDL job and DDL owner received DDL job(aka Double-Check): +Create a table with foreign key, check the following conditions when a DDL job is built and the DDL owner received a DDL job(aka Double-Check): - whether the user have `REFERENCES` privilege to the foreign key references table. - Corresponding columns in the foreign key and the referenced key must have similar data types. The size and sign of fixed precision types such as INTEGER and DECIMAL must be the same. The length of string types need not be the same. For nonbinary (character) string columns, the character set and collation must be the same. - Supports foreign key references between one column and another within a table. (A column cannot have a foreign key reference to itself.) -- Require indexes on foreign keys and referenced keys so that foreign key checks can be fast and not require a table scan. The size and sign of fixed precision types such as INTEGER and DECIMAL must be the same. The length of string types need not be the same. For nonbinary (character) string columns, the character set and collation must be the same. -- Index prefixes on foreign key columns are not supported. Consequently, BLOB and TEXT columns cannot be included in a foreign key because indexes on those columns must always include a prefix length. +- Indexes on foreign keys and referenced keys are required, because the foreign key check can be fast and do not require a table scan. The size and sign of fixed precision types such as INTEGER and DECIMAL must be the same. The length of string types need not be the same. For nonbinary (character) string columns, the character set and collation must be the same. +- Prefixed indexes on foreign key columns are not supported. Consequently, BLOB and TEXT columns cannot be included in a foreign key because indexes on those columns must always include a prefix length. - Does not currently support foreign keys for tables with user-defined partitioning. This includes both reference and child tables. -- A foreign key constraint cannot reference a virtual generated column, but stored generated column is ok. +- A foreign key constraint cannot reference any virtual generated columns, but the stored generated columns are fine. #### Handle In DDL Owner When DDL Owner handle create table job, DDL owner need to create a new table, and update the reference tables information. At the same point in time, there may be two versions of the schema in the TiDB cluster, so we can't create new table and -update all reference tables in one schema version, since this may break foreign key constrain, such as delete reference table +update all reference tables in one schema version, since this may break foreign key constraint, such as delete reference table without foreign key constrain check in child table. ```sql @@ -109,7 +109,7 @@ So, when create a table with foreign key, we need multi-schema version change: 1. None -> Write Only: Create table with state is `write-only`, and update all reference tables info. 2. Write Only -> Done: Update the new created table state to `public`. -In step-1, we need update some table info in one schema-version. Technically, we can implement is since we already support `ActionCreateTables` DDL job. +In step-1, we need update some table info in one schema-version. Technically, we can implement it since we already support `ActionCreateTables` DDL job. ### Alter Table Add Foreign Key @@ -125,7 +125,7 @@ Just like create table, we should validate first, and return error if the condit When build `TableInfo`, we need to auto create an index for foreign key columns if there is no index cover foreign key columns. And this is divides the problem into two cases: -- Case-1: No need to auto create index, and only add foreign key constrain. +- Case-1: No need to create index automatically, and only add foreign key constraint. - Case-2: Need auto create index for foreign key #### Case-1: Only add foreign key constrain From ea2deca6ca02bfd22a9bcb8749f16a99546b5b81 Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Tue, 16 Aug 2022 17:56:19 +0800 Subject: [PATCH 14/29] tiny update ddl meta design Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 34 ++++++++++++++++++++------- 1 file changed, 25 insertions(+), 9 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index a7b6099715018..e5b71e2d88e49 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -18,8 +18,8 @@ type TableInfo struct { ... ForeignKeys []*FKInfo `json:"fk_info"` ReferredForeignKeys []*ReferredFKInfo `json:"referred_fk_info"` - // MaxFKIndexID uses to allocate foreign key ID. Or use MaxConstraintID instead. - MaxFKIndexID int64 `json:"max_fk_idx_id"` + // MaxFKIndexID uses to allocate foreign key ID. + MaxForeignKeyID int64 `json:"max_fk_id"` ... } @@ -27,23 +27,22 @@ type TableInfo struct { type FKInfo struct { ID int64 `json:"id"` Name CIStr `json:"fk_name"` - RefSchemaID CIStr `json:"ref_schema_id"` - RefTableID CIStr `json:"ref_table_id"` + RefSchema CIStr `json:"ref_schema"` + RefTable CIStr `json:"ref_table"` RefCols []CIStr `json:"ref_cols"` Cols []CIStr `json:"cols"` OnDelete int `json:"on_delete"` OnUpdate int `json:"on_update"` State SchemaState `json:"state"` - // Deprecated - RefTable CIStr `json:"ref_table"` + Version int `json:"version"` } // ReferredFKInfo provides the referred foreign key in the child table. type ReferredFKInfo struct { Cols []CIStr `json:"cols"` - ChildSchemaID int64 `json:"child_schema_id"` - ChildTableID int64 `json:"child_table_id"` - ChildFKIndex CIStr `json:"child_fk_index"` + ChildSchema CIStr `json:"child_schema"` + ChildTable CIStr `json:"child_table"` + ChildFKName CIStr `json:"child_fk"` } ``` @@ -52,6 +51,23 @@ Struct `FKInfo` uses for child table to record the referenced parent table. Stru - `Version`: uses to distinguish between old and new versions. Struct `ReferredFKInfo` is used to record the tables that are referencing the current table. +Why `FKInfo` and `ReferredFKInfo` record the table/schema name instead of table/schema id? Because we may don't know the table/schema id when build `FKInfo`. Here is an example: + +```sql +>set @@foreign_key_checks=0; +Query OK, 0 rows affected +>create table t2 (a int key, foreign key fk(a) references t1(id)); +Query OK, 0 rows affected +>create table t1 (id int key); +Query OK, 0 rows affected +>set @@foreign_key_checks=1; +Query OK, 0 rows affected +>insert into t2 values (1); +(1452, 'Cannot add or update a child row: a foreign key constraint fails (`test`.`t2`, CONSTRAINT `t2_ibfk_1` FOREIGN KEY (`a`) REFERENCES `t1` (`id`))') +``` + +As you can see, the table `t2` is refer to table `t1`, and when create table `t2`, the table `t1` still not be created, so we can't know the id of table `t1`. + ### Create Table with Foreign Key #### Build TableInfo From 8dd2bf788ed2b6c802fc737976f534f93fcae8f5 Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Wed, 17 Aug 2022 13:25:28 +0800 Subject: [PATCH 15/29] update ddl create table design Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 32 ++++++++++++++++++++++++--- 1 file changed, 29 insertions(+), 3 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index e5b71e2d88e49..63f55fc8939d1 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -122,10 +122,36 @@ delete from t_reference where id = 1; --Since doesn't know foreign key informati So, when create a table with foreign key, we need multi-schema version change: -1. None -> Write Only: Create table with state is `write-only`, and update all reference tables info. -2. Write Only -> Done: Update the new created table state to `public`. +1. None -> Write Only: Create table with state is `write-only`. +2. Write Only -> Done: Update the created table state to `public`. -In step-1, we need update some table info in one schema-version. Technically, we can implement it since we already support `ActionCreateTables` DDL job. +#### Maintain ReferredFKInfo + +Why need to maintain `ReferredFKInfo` in reference table? When execute `UPDATE`/`DELETE` in reference table, we need the `ReferredFKInfo` of reference table to do foreign key check/cascade. + +How to maintain `ReferredFKInfo` in reference table? When we create table with foreign key, we didn't add `ReferredFKInfo` into reference table, because the reference table may not have been created yet, +when `foreign_key_checks` variable value is `OFF`, the user can create child table before reference table. + +We decided to maintain `ReferredFKInfo` while TiDB loading schema. At first, `infoSchema` will record all table's `ReferredFKInfo`: + +```go +type infoSchema struct { + // referredForeignKeyMap records all table's ReferredFKInfo. + // referredSchemaAndTableName => child SchemaAndTableAndForeignKeyName => *model.ReferredFKInfo + referredForeignKeyMap map[SchemaAndTableName]map[SchemaAndTableAndForeignKeyName]*model.ReferredFKInfo +} +``` + +Function `applyTableUpdate` uses `applyDropTable` to drop the old table, uses `applyCreateTable` to create the new table. + +In function `applyDropTable`, we will delete the table's foreign key information from `infoSchema.referredForeignKeyMap`. + +In function `applyCreateTable`, we will add the table's foreign key information into `infoSchema.referredForeignKeyMap` first, +then get the table's `ReferredFKInfo` by schema name and table name, then store the `ReferredFKInfo` into `TableInfo.ReferredForeignKeys`. + +Then `applyTableUpdate` will also need to reload the old/new table's referred table information, also uses `applyDropTable` to drop the old reference table, use `applyCreateTable` to create new reference table. + +That's all. ### Alter Table Add Foreign Key From 3f6439256445e08c9e58bd121588e37088c56c7c Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Wed, 17 Aug 2022 13:32:58 +0800 Subject: [PATCH 16/29] fix typo Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 63f55fc8939d1..73c47d030fa4e 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -106,7 +106,7 @@ Create a table with foreign key, check the following conditions when a DDL job i #### Handle In DDL Owner -When DDL Owner handle create table job, DDL owner need to create a new table, and update the reference tables information. +When DDL Owner handle create table job, DDL owner need to create a new table. At the same point in time, there may be two versions of the schema in the TiDB cluster, so we can't create new table and update all reference tables in one schema version, since this may break foreign key constraint, such as delete reference table From 8eb66ba3ffb9671368b9278868230984b0794f0b Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Mon, 5 Sep 2022 10:56:24 +0800 Subject: [PATCH 17/29] update design Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 34 +++++++++++++++++---------- 1 file changed, 22 insertions(+), 12 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 73c47d030fa4e..c1eedabdaace5 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -17,7 +17,6 @@ Table's foreign key information will be stored in `model.TableInfo`: type TableInfo struct { ... ForeignKeys []*FKInfo `json:"fk_info"` - ReferredForeignKeys []*ReferredFKInfo `json:"referred_fk_info"` // MaxFKIndexID uses to allocate foreign key ID. MaxForeignKeyID int64 `json:"max_fk_id"` ... @@ -36,22 +35,12 @@ type FKInfo struct { State SchemaState `json:"state"` Version int `json:"version"` } - -// ReferredFKInfo provides the referred foreign key in the child table. -type ReferredFKInfo struct { - Cols []CIStr `json:"cols"` - ChildSchema CIStr `json:"child_schema"` - ChildTable CIStr `json:"child_table"` - ChildFKName CIStr `json:"child_fk"` -} - ``` Struct `FKInfo` uses for child table to record the referenced parent table. Struct `FKInfo` has existed for a long time, I just added some fields. - `Version`: uses to distinguish between old and new versions. -Struct `ReferredFKInfo` is used to record the tables that are referencing the current table. -Why `FKInfo` and `ReferredFKInfo` record the table/schema name instead of table/schema id? Because we may don't know the table/schema id when build `FKInfo`. Here is an example: +Why `FKInfo` record the table/schema name instead of table/schema id? Because we may don't know the table/schema id when build `FKInfo`. Here is an example: ```sql >set @@foreign_key_checks=0; @@ -134,6 +123,16 @@ when `foreign_key_checks` variable value is `OFF`, the user can create child tab We decided to maintain `ReferredFKInfo` while TiDB loading schema. At first, `infoSchema` will record all table's `ReferredFKInfo`: +```sql +// ReferredFKInfo provides the referred foreign key in the child table. +type ReferredFKInfo struct { + Cols []CIStr `json:"cols"` + ChildSchema CIStr `json:"child_schema"` + ChildTable CIStr `json:"child_table"` + ChildFKName CIStr `json:"child_fk"` +} +``` + ```go type infoSchema struct { // referredForeignKeyMap records all table's ReferredFKInfo. @@ -456,6 +455,17 @@ func (a *ExecStmt) handleForeignKeyTrigger(ctx context.Context, e Executor, isPe ### Issue need to be discussed +#### Impact when upgrading + +Create table with foreign key requires a multi-step state change(none -> write-only -> public), +when the table's state changes from write-only to public, infoSchema need to drop the old table +which state is write-only, otherwise, infoSchema.sortedTablesBuckets will contain 2 table both +have the same ID, but one state is write-only, another table's state is public, it's unexpected. + +Warning, this change will break the compatibility if execute create table with foreign key DDL when upgrading TiDB, +since old-version TiDB doesn't know to delete the old table, Since the cluster-index feature also has similar problem, +we chose to prevent DDL execution during the upgrade process to avoid this issue. + #### Affect Row related article: https://www.percona.com/blog/hidden-cost-of-foreign-key-constraints-in-MySQL/ From 9e5e127cdf82685946b191c8cf177b1d2eb71e54 Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Wed, 7 Sep 2022 12:39:38 +0800 Subject: [PATCH 18/29] update design doc Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index c1eedabdaace5..0aaef12ab4208 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -298,6 +298,12 @@ When TiDB add pessimistic locks, if relate row exist in reference table, also ne Just like `SELECT FOR UPDATE` statement, need to use `doLockKeys` to lock the related row in the reference table. +##### Issue + +TiDB current only support `lock for update`(aka write-lock, such as `select for update`), doesn't support `lock for share`(aka read-lock, such as `select for share`). + +So far we have to add `lock for update` in reference table when insert/update child table, then the performance will be poor. After TiDB support `lock for share`, we should use `lock for share` instead. + #### DML Load data Load data should also do foreign key check, but report warning instead error: From bd1fab77061295c5a65a698681fcb40852f48fab Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Tue, 20 Sep 2022 17:41:36 +0800 Subject: [PATCH 19/29] add drop database check Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 0aaef12ab4208..416958099f1fd 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -200,6 +200,15 @@ If `foreign_key_checks` is `ON`, then drop the table which has foreign key refer (3730, "Cannot drop table 't1' referenced by a foreign key constraint 't2_ibfk_1' on table 't2'.") ``` +### Drop Database + +If `foreign_key_checks` is `ON`, then drop the database which has foreign key references by other database will be rejected. + +```sql +> drop database test; +(3730, "Cannot drop table 't1' referenced by a foreign key constraint 't2_ibfk_1' on table 't2'.") +``` + ### Drop Index Drop index which used by foreign key will be rejected. From 0006030669188b6eb30e3b696086ad809664375d Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Mon, 17 Oct 2022 17:57:36 +0800 Subject: [PATCH 20/29] update design doc Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 416958099f1fd..fcd59a72595d9 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -173,7 +173,7 @@ And this is divides the problem into two cases: The DDL owner handle add foreign key constrain step is: -1. None -> Write Only: add foreign key constrain which state is `write-only` into table and update the reference table info. +1. None -> Write Only: add foreign key constrain which state is `write-only` into table. 3. Write Only - Write Reorg: check all row in the table whether has related foreign key exists in reference table, we can use following SQL to check: ```sql select count(*) from t2 where t2.a not in (select id from t1); @@ -181,7 +181,28 @@ The DDL owner handle add foreign key constrain step is: The expected result is `0`, otherwise, an error is returned and cancel the ddl job. 4. Write Reorg -> Public: update the foreign key constrain state to `public`. -DML should also check the foreign key constrain which state is `write-only` +A problem is, How the DML treat the foreign key with on delete/update cascade behaviour which state is non-public? +Here is an example: + +```sql +create table t1 (id int key,a int, index(a)); +create table t2 (id int key,a int, index(a)); +insert into t1 values (1,1); +insert into t2 values (1,1); +alter table t2 add constraint fk_1 foreign key (a) references t1(id) ON DELETE CASCADE; +``` + +The schema change of foreign key `fk_1` is from `None` -> `Write-Only` -> `Write-Reorg` -> `Public`。 +When the foreign key `fk_1` in `Write-Only` state, a DML request has come to be processed: + +```sql +delete from t1 where id = 1; +``` + +Then, TiDB shouldn't do cascade delete for foreign key `fk_1` in state `Write-Only`, since the `Add Foreign Key` DDL job maybe +failed in `Write-Reorg` state and rollback the DDL job. But it is hard to rollback the cascade deleted executed before. + +So, when execute DML with `non-publick` foreign key, TiDB will do foreign key constraint check instead of foreign key cascade behaviour. #### Case-2: Auto create index for foreign key and add foreign key constrain From 8c4290f6cdf587f8056cd9873c2d9d982d63f7af Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Fri, 4 Nov 2022 11:11:15 +0800 Subject: [PATCH 21/29] update Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 144 +++----------------------- 1 file changed, 13 insertions(+), 131 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index fcd59a72595d9..57e8a39edac8a 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -176,9 +176,9 @@ The DDL owner handle add foreign key constrain step is: 1. None -> Write Only: add foreign key constrain which state is `write-only` into table. 3. Write Only - Write Reorg: check all row in the table whether has related foreign key exists in reference table, we can use following SQL to check: ```sql - select count(*) from t2 where t2.a not in (select id from t1); + select 1 from t2 where t2.a is not null and t2.a not in (select id from t1) limit 1; ``` - The expected result is `0`, otherwise, an error is returned and cancel the ddl job. + The expected result is `empty`, otherwise, an error is returned and cancel the ddl job. 4. Write Reorg -> Public: update the foreign key constrain state to `public`. A problem is, How the DML treat the foreign key with on delete/update cascade behaviour which state is non-public? @@ -374,134 +374,8 @@ modify related child table row by following step: 2. get the child table fk index's column info. 3. build update executor to update child table rows. -There is an pseudocode about `DELETE` on reference table: - -1. Pseudocode in build plan stage. - -```go -type Delete struct { - ... - // table-id => foreign key triggers. - FKTriggers map[int64][]*ForeignKeyTrigger -} - -type ForeignKeyTrigger struct { - Tp FKTriggerType - OnModifyReferredTable *OnModifyReferredTableFKInfo - OnModifyChildTable *OnModifyChildTableFKInfo -} - -type OnModifyReferredTableFKInfo struct { - ReferredFK *model.ReferredFKInfo - ChildTable table.Table - FK *model.FKInfo -} - -const ( - FKTriggerOnDeleteReferredTable FKTriggerType = 1 - FKTriggerOnUpdateReferredTable FKTriggerType = 2 - FKTriggerOnInsertOrUpdateChildTable FKTriggerType = 3 -) - -func (b *PlanBuilder) buildDelete(ctx context.Context, ds *ast.DeleteStmt) (Plan, error) { - .... - del.FKTriggers = buildOnDeleteForeignKeyTrigger() -} -``` - -2. Pseudocode in build executor. - -```go -type DeleteExec struct { - fkTriggerExecs map[int64][]*ForeignKeyTriggerExec -} - -type ForeignKeyTriggerExec struct { - fkTrigger *plannercore.ForeignKeyTrigger - colsOffsets []int - fkValues [][]types.Datum -} - -func (b *executorBuilder) buildDelete(v *plannercore.Delete) Executor { - ... - deleteExec.fkTriggerExecs = b.buildTblID2ForeignKeyTriggerExecs() -} -``` - -3. Pseudocode in `Delete` execution. - -```go -func (e *DeleteExec) removeRow(...){ - fkTriggerExecs := e.fkTriggerExecs[t.Meta().ID] - sc := ctx.GetSessionVars().StmtCtx - for _, fkt := range fkTriggerExecs { - err = fkt.addRowNeedToTrigger(sc, row) - } -} - -func (fkt *ForeignKeyTriggerExec) addRowNeedToTrigger(sc *stmtctx.StatementContext, row []types.Datum) error { - vals, err := fetchFKValues(row, fkt.colsOffsets) - if err != nil || hasNullValue(vals) { - return err - } - fkt.fkValues = append(fkt.fkValues, vals) - return nil -} -``` - -4. Pseudocode after `Delete` execution. - -```go -func (a *ExecStmt) Exec(){ - e, err := a.buildExecutor() - e.Open() - f handled, result, err := a.handleNoDelay(ctx, e, isPessimistic); handled { - err = a.handleForeignKeyTrigger(ctx, e, isPessimistic) - return result, err - } -} - -func (a *ExecStmt) handleForeignKeyTrigger(ctx context.Context, e Executor, isPessimistic bool) error { - fkTriggerExecs := e.GetForeignKeyTriggerExecs() - for i := 0; i < len(fkTriggerExecs); i++{ - fkt := fkTriggerExecs[i] - e, err := fkt.buildExecutor(ctx) - if err != nil { - return err - } - if e == nil { - continue - } - if err := e.Open(ctx); err != nil { - terror.Call(e.Close) - return err - } - _, _, err = a.handleNoDelay(ctx, e, isPessimistic) - if err != nil { - return err - } - err = a.handleForeignKeyTrigger(ctx, e, isPessimistic) - if err != nil { - return err - } - } - return nil -} -``` - ### Issue need to be discussed -#### Impact when upgrading - -Create table with foreign key requires a multi-step state change(none -> write-only -> public), -when the table's state changes from write-only to public, infoSchema need to drop the old table -which state is write-only, otherwise, infoSchema.sortedTablesBuckets will contain 2 table both -have the same ID, but one state is write-only, another table's state is public, it's unexpected. - -Warning, this change will break the compatibility if execute create table with foreign key DDL when upgrading TiDB, -since old-version TiDB doesn't know to delete the old table, Since the cluster-index feature also has similar problem, -we chose to prevent DDL execution during the upgrade process to avoid this issue. - #### Affect Row related article: https://www.percona.com/blog/hidden-cost-of-foreign-key-constraints-in-MySQL/ @@ -524,7 +398,7 @@ Query OK, 1 row affected As you can see, in the query result, the affected row is 1, but actually should be 2, since the related row in t2 is also been deleted. -This is a MySQL issue, do we need to be compatible with it? +This is a MySQL behaviour, We can be compatible with it. #### DML Execution plan @@ -560,7 +434,7 @@ Here is an example plan with foreign key: +--------------------------+---------+------+---------------+---------------+ | Delete_4 | N/A | root | | N/A | | └─Point_Get_6 | 1.00 | root | table:t1 | handle:1 | -| └─Foreign_Key_Trigger | N/A | root | table:t2 | cascade | +| └─Foreign_Key_Cascade | N/A | root | table:t2 | fk: fk_id | +--------------------------+---------+------+---------------+---------------+ ``` @@ -662,11 +536,19 @@ which implemention is use following SQL to check: select count(*) from t where t.a not in (select id from t_refer); ``` +Also check the foreign key is valid: +- the foreign key state should be public. +- the reference whether exist. +- whether has index for the foreign key to use. + ## Impact ### Impact of data replication -todo +Test sync table with foreign key: +- TiCDC +- DM +- BR ## Test Case From dc6a75898fa2841c6390f5868d8bd5be4e68cd87 Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Fri, 2 Dec 2022 12:43:39 +0800 Subject: [PATCH 22/29] update Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 88 ++++++++++++++++----------- 1 file changed, 52 insertions(+), 36 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 57e8a39edac8a..0c0b61815ada0 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -38,7 +38,7 @@ type FKInfo struct { ``` Struct `FKInfo` uses for child table to record the referenced parent table. Struct `FKInfo` has existed for a long time, I just added some fields. - - `Version`: uses to distinguish between old and new versions. + - `Version`: uses to distinguish between old and new versions. The new version value is 1, the old version value is 0. Why `FKInfo` record the table/schema name instead of table/schema id? Because we may don't know the table/schema id when build `FKInfo`. Here is an example: @@ -429,13 +429,28 @@ Here is an example plan with foreign key: ```sql > explain delete from t1 where id = 1; -+--------------------------+---------+------+---------------+---------------+ -| id | estRows | task | access object | operator info | -+--------------------------+---------+------+---------------+---------------+ -| Delete_4 | N/A | root | | N/A | -| └─Point_Get_6 | 1.00 | root | table:t1 | handle:1 | -| └─Foreign_Key_Cascade | N/A | root | table:t2 | fk: fk_id | -+--------------------------+---------+------+---------------+---------------+ ++-------------------------+---------+------+---------------+-----------------------------------+ +| id | estRows | task | access object | operator info | ++-------------------------+---------+------+---------------+-----------------------------------+ +| Delete_2 | N/A | root | | N/A | +| ├─Point_Get_1 | 1.00 | root | table:t1 | handle:1 | +| └─Foreign_Key_Cascade_3 | 0.00 | root | table:t2 | foreign_key:fk, on_delete:CASCADE | ++-------------------------+---------+------+---------------+-----------------------------------+ +``` + +And the `explain analyze` will show the foreign key cascade child plan: + +```sql +> explain analyze delete from t1 where id = 1; ++-------------------------+---------+---------+------+---------------+----------------------------------------------------------+-----------------------------------+-----------+------+ +| id | estRows | actRows | task | access object | execution info | operator info | memory | disk | ++-------------------------+---------+---------+------+---------------+----------------------------------------------------------+-----------------------------------+-----------+------+ +| Delete_2 | N/A | 0 | root | | time:109.5µs, loops:1 | N/A | 380 Bytes | N/A | +| ├─Point_Get_1 | 1.00 | 1 | root | table:t1 | time:62.7µs, loops:2, Get:{num_rpc:1, total_time:26.4µs} | handle:1 | N/A | N/A | +| └─Foreign_Key_Cascade_3 | 0.00 | 0 | root | table:t2 | total:322.1µs, foreign_keys:1 | foreign_key:fk, on_delete:CASCADE | N/A | N/A | +| └─Delete_7 | N/A | 0 | root | | time:23.5µs, loops:1 | N/A | 129 Bytes | N/A | +| └─Point_Get_9 | 1.00 | 1 | root | table:t2 | time:12.6µs, loops:2, Get:{num_rpc:1, total_time:4.21µs} | handle:1 | N/A | N/A | ++-------------------------+---------+---------+------+---------------+----------------------------------------------------------+-----------------------------------+-----------+------+ ``` ##### CockroachDB DML Execution Plan @@ -443,13 +458,11 @@ Here is an example plan with foreign key: ```sql CREATE TABLE customers_2 ( id INT PRIMARY KEY - ); - +); CREATE TABLE orders_2 ( - id INT PRIMARY KEY, - customer_id INT REFERENCES customers_2(id) ON UPDATE CASCADE ON DELETE CASCADE + id INT PRIMARY KEY, + customer_id INT REFERENCES customers_2(id) ON UPDATE CASCADE ON DELETE CASCADE ); - INSERT INTO customers_2 VALUES (1), (2), (3); INSERT INTO orders_2 VALUES (100,1), (101,2), (102,3), (103,1); ``` @@ -518,41 +531,44 @@ postgres=# explain analyze UPDATE customers_2 SET id = 20 WHERE id = 23; Execution Time: 0.129 ms ``` -## Other Technical Design - -### How to check foreign key integrity? +## Compatibility -How MySQL to do this? Look like MySQL doesn't provide any method, but the user can use stored procedure to do this, see: https://stackoverflow.com/questions/2250775/force-innodb-to-recheck-foreign-keys-on-a-table-tables +Since the old version TiDB already support foreign key syntax, but doesn't support it, after upgrade TiDB to latest version, the foreign key created before won't take effect. Only foreign keys created in the new version actually take effect. -Maybe We can use following syntax to check foreign key integrity: +You can use `SHOW CREATE TABLE` to see whethere a foreign key is take effect, the old version foreign key will have a comment `/* FOREIGN KEY INVALID */` to indicate it is invalid, the new version foreign key doesn't have this comment. ```sql -ADMIN CHECK FOREIGN KEY [table_name] [foreign_key_name] +> show create table t2\G + ***************************[ 1. row ]*************************** + Table | t2 +Create Table | CREATE TABLE `t2` ( + `id` int(11) NOT NULL, + PRIMARY KEY (`id`) /*T![clustered_index] CLUSTERED */, + CONSTRAINT `fk` FOREIGN KEY (`id`) REFERENCES `test`.`t1` (`id`) ON DELETE CASCADE ON UPDATE CASCADE /* FOREIGN KEY INVALID */ + ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ``` -which implemention is use following SQL to check: +## Impact -```sql -select count(*) from t where t.a not in (select id from t_refer); -``` +### Impact of data replication -Also check the foreign key is valid: -- the foreign key state should be public. -- the reference whether exist. -- whether has index for the foreign key to use. +#### TiCDC -## Impact +When sync table data to downstream TiDB, TiCDC should `set @@foreign_key_checks=0` in downstream TiDB. -### Impact of data replication +#### DM + +When do full synchronization, DM should `set @@foreign_key_checks=0` in downstream TiDB. + +When do incremental synchronization, DM should set `foreign_key_checks` session variable according to MySQL binlog. + +#### BR -Test sync table with foreign key: -- TiCDC -- DM -- BR +When sync table data to downstream TiDB, TiCDC should `set @@foreign_key_checks=0` in downstream TiDB. ## Test Case -cascade modification test case: +Cascade modification test case: ```sql drop table if exists t3,t2,t1; @@ -565,7 +581,7 @@ insert into t3 values (3,2); delete from t1 where id = 1; -- both t1, t2, t3 rows are deleted. ``` -following is a MySQL test case about `SET DEFAULT`: +Following is a MySQL test case about `SET DEFAULT`, as you can see, MySQL actualy doesn't support `SET DEFAULT`, the behaviour is just like `RESTRICT` ```sql MySQL>create table t1 (a int,b int, index(a,b)) ; @@ -682,7 +698,7 @@ test> insert into t values (1,null); Query OK, 1 row affected ``` -### reference +## reference - [MySQL FOREIGN KEY Constraints Document](https://dev.mysql.com/doc/refman/8.0/en/create-table-foreign-keys.html#foreign-key-adding) - [3 Common Foreign Key Mistakes (And How to Avoid Them)](https://www.cockroachlabs.com/blog/common-foreign-key-mistakes/) From d2f7d6f4583a38e2b6af481e15603f862a4c8d22 Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Mon, 12 Dec 2022 15:12:05 +0800 Subject: [PATCH 23/29] update doc Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 0c0b61815ada0..8fdc64fe14998 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -206,11 +206,11 @@ So, when execute DML with `non-publick` foreign key, TiDB will do foreign key co #### Case-2: Auto create index for foreign key and add foreign key constrain -As TiDB support multi-schema change now, we can split this into 2 sub-ddl job. +As TiDB support multi-schema change now, we create a `ActionMultiSchemaChange` job and contains following 2 sub-ddl job. - Add Index DDL job - Add Foreign Key Constrain DDL job -We should do add index DDL job first, after index ddl job finish `write-reorg` and ready for public, then start to do add foreign key constrain ddl job. +When TiDB add foreign key ddl job meet error, TiDB will rollback the `ActionMultiSchemaChange` job and the 2 sub-ddl job will also be rollback. ### Drop Table From 2e51dabaf858d8d8f5fc39de86baf822264f42ed Mon Sep 17 00:00:00 2001 From: crazycs Date: Thu, 15 Dec 2022 14:23:05 +0800 Subject: [PATCH 24/29] Update docs/design/2022-06-22-foreign-key.md Co-authored-by: tangenta --- docs/design/2022-06-22-foreign-key.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 8fdc64fe14998..fc24fc9e089a4 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -112,7 +112,7 @@ delete from t_reference where id = 1; --Since doesn't know foreign key informati So, when create a table with foreign key, we need multi-schema version change: 1. None -> Write Only: Create table with state is `write-only`. -2. Write Only -> Done: Update the created table state to `public`. +2. Write Only -> Public: Update the created table state to `public`. #### Maintain ReferredFKInfo From addf7eb69162d95187384db7cb5cfa9bebe3aafc Mon Sep 17 00:00:00 2001 From: crazycs Date: Thu, 15 Dec 2022 14:24:00 +0800 Subject: [PATCH 25/29] Update docs/design/2022-06-22-foreign-key.md Co-authored-by: tangenta --- docs/design/2022-06-22-foreign-key.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index fc24fc9e089a4..f4e6cc75707c3 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -88,7 +88,7 @@ Create a table with foreign key, check the following conditions when a DDL job i - whether the user have `REFERENCES` privilege to the foreign key references table. - Corresponding columns in the foreign key and the referenced key must have similar data types. The size and sign of fixed precision types such as INTEGER and DECIMAL must be the same. The length of string types need not be the same. For nonbinary (character) string columns, the character set and collation must be the same. - Supports foreign key references between one column and another within a table. (A column cannot have a foreign key reference to itself.) -- Indexes on foreign keys and referenced keys are required, because the foreign key check can be fast and do not require a table scan. The size and sign of fixed precision types such as INTEGER and DECIMAL must be the same. The length of string types need not be the same. For nonbinary (character) string columns, the character set and collation must be the same. +- Indexes on foreign keys and referenced keys are required, because the foreign key check can be fast and do not require a table scan. - Prefixed indexes on foreign key columns are not supported. Consequently, BLOB and TEXT columns cannot be included in a foreign key because indexes on those columns must always include a prefix length. - Does not currently support foreign keys for tables with user-defined partitioning. This includes both reference and child tables. - A foreign key constraint cannot reference any virtual generated columns, but the stored generated columns are fine. From 2bdc88f53e202da6478434764daad3d4cf85d4e7 Mon Sep 17 00:00:00 2001 From: crazycs Date: Thu, 15 Dec 2022 14:26:18 +0800 Subject: [PATCH 26/29] Update docs/design/2022-06-22-foreign-key.md Co-authored-by: tangenta --- docs/design/2022-06-22-foreign-key.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index f4e6cc75707c3..bfa63e7d81c2d 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -202,7 +202,7 @@ delete from t1 where id = 1; Then, TiDB shouldn't do cascade delete for foreign key `fk_1` in state `Write-Only`, since the `Add Foreign Key` DDL job maybe failed in `Write-Reorg` state and rollback the DDL job. But it is hard to rollback the cascade deleted executed before. -So, when execute DML with `non-publick` foreign key, TiDB will do foreign key constraint check instead of foreign key cascade behaviour. +So, when execute DML with `non-public` foreign key, TiDB will do foreign key constraint check instead of foreign key cascade behaviour. #### Case-2: Auto create index for foreign key and add foreign key constrain From d3f40ce01b7b91798c510f7cf121cccfbc39f943 Mon Sep 17 00:00:00 2001 From: crazycs Date: Thu, 15 Dec 2022 14:26:36 +0800 Subject: [PATCH 27/29] Update docs/design/2022-06-22-foreign-key.md Co-authored-by: tangenta --- docs/design/2022-06-22-foreign-key.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index bfa63e7d81c2d..4f3a9fcd3e7b6 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -174,7 +174,7 @@ And this is divides the problem into two cases: The DDL owner handle add foreign key constrain step is: 1. None -> Write Only: add foreign key constrain which state is `write-only` into table. -3. Write Only - Write Reorg: check all row in the table whether has related foreign key exists in reference table, we can use following SQL to check: +3. Write Only -> Write Reorg: check all row in the table whether has related foreign key exists in reference table, we can use following SQL to check: ```sql select 1 from t2 where t2.a is not null and t2.a not in (select id from t1) limit 1; ``` From 42710052aaa2034f768b09d34301c26b343f7fac Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Thu, 15 Dec 2022 14:48:43 +0800 Subject: [PATCH 28/29] refine gramma Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 126 +++++++++++++------------- 1 file changed, 63 insertions(+), 63 deletions(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index f4e6cc75707c3..76e6696548741 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -5,12 +5,12 @@ ## Abstract -This proposes an implementation of supporting foreign key constraint. +This proposes an implementation of supporting foreign key constraints. ## DDL Technical Design ### Table Information Changes -Table's foreign key information will be stored in `model.TableInfo`: +The table's foreign key information will be stored in `model.TableInfo`: ```go // TableInfo provides meta data describing a DB table. @@ -37,10 +37,10 @@ type FKInfo struct { } ``` -Struct `FKInfo` uses for child table to record the referenced parent table. Struct `FKInfo` has existed for a long time, I just added some fields. - - `Version`: uses to distinguish between old and new versions. The new version value is 1, the old version value is 0. +Struct `FKInfo` uses for the child table to record the referenced parent table. Struct `FKInfo` has existed for a long time, I just added some fields. +- `Version`: uses to distinguish between old and new versions. The new version value is 1, the old version value is 0. -Why `FKInfo` record the table/schema name instead of table/schema id? Because we may don't know the table/schema id when build `FKInfo`. Here is an example: +Why `FKInfo` record the table/schema name instead of table/schema id? Because we may don't know the table/schema id when building `FKInfo`. Here is an example: ```sql >set @@foreign_key_checks=0; @@ -55,13 +55,13 @@ Query OK, 0 rows affected (1452, 'Cannot add or update a child row: a foreign key constraint fails (`test`.`t2`, CONSTRAINT `t2_ibfk_1` FOREIGN KEY (`a`) REFERENCES `t1` (`id`))') ``` -As you can see, the table `t2` is refer to table `t1`, and when create table `t2`, the table `t1` still not be created, so we can't know the id of table `t1`. +As you can see, table `t2` refers to table `t1`, and when creating table `t2`, table `t1` still not be created, so we can't know the id of table `t1`. ### Create Table with Foreign Key #### Build TableInfo -When build `TableInfo`, an index for the foreign key columns is created automatically if there is no index covering the foreign key columns. Here is an example: +When building `TableInfo`, an index for the foreign key columns is created automatically if there is no index covering the foreign key columns. Here is an example: ```sql mysql> create table t (id int key, a int, foreign key fk(a) references t(id)); @@ -85,21 +85,21 @@ As you can see, the index `fk` is created automatically for the foreign key. Create a table with foreign key, check the following conditions when a DDL job is built and the DDL owner received a DDL job(aka Double-Check): -- whether the user have `REFERENCES` privilege to the foreign key references table. +- whether the user has `REFERENCES` privilege to the foreign key references table. - Corresponding columns in the foreign key and the referenced key must have similar data types. The size and sign of fixed precision types such as INTEGER and DECIMAL must be the same. The length of string types need not be the same. For nonbinary (character) string columns, the character set and collation must be the same. - Supports foreign key references between one column and another within a table. (A column cannot have a foreign key reference to itself.) - Indexes on foreign keys and referenced keys are required, because the foreign key check can be fast and do not require a table scan. - Prefixed indexes on foreign key columns are not supported. Consequently, BLOB and TEXT columns cannot be included in a foreign key because indexes on those columns must always include a prefix length. -- Does not currently support foreign keys for tables with user-defined partitioning. This includes both reference and child tables. +- Does not currently support foreign keys for the partition table. This includes both reference and child tables. - A foreign key constraint cannot reference any virtual generated columns, but the stored generated columns are fine. #### Handle In DDL Owner -When DDL Owner handle create table job, DDL owner need to create a new table. +When the DDL Owner handle create table job, the DDL owner needs to create a new table. -At the same point in time, there may be two versions of the schema in the TiDB cluster, so we can't create new table and -update all reference tables in one schema version, since this may break foreign key constraint, such as delete reference table -without foreign key constrain check in child table. +At the same point in time, there may be two versions of the schema in the TiDB cluster, so we can't create a new table and +update all reference tables in one schema version, since this may break foreign key constraints, such as delete reference table +without foreign key constraint check in the child table. ```sql -- In TiDB-1 and Schema Version is 1 @@ -109,17 +109,17 @@ insert into t_has_foreign_key values (1, 1); delete from t_reference where id = 1; --Since doesn't know foreign key information in old version, so doesn't do foreign key constrain check. ``` -So, when create a table with foreign key, we need multi-schema version change: +So, when creating a table with foreign key, we need multi-schema version change: -1. None -> Write Only: Create table with state is `write-only`. +1. None -> Write Only: Create table with the state is `write-only`. 2. Write Only -> Public: Update the created table state to `public`. #### Maintain ReferredFKInfo -Why need to maintain `ReferredFKInfo` in reference table? When execute `UPDATE`/`DELETE` in reference table, we need the `ReferredFKInfo` of reference table to do foreign key check/cascade. +Why need to maintain `ReferredFKInfo` in the reference table? When executing `UPDATE`/`DELETE` in the reference table, we need the `ReferredFKInfo` of the reference table to do a foreign key check/cascade. -How to maintain `ReferredFKInfo` in reference table? When we create table with foreign key, we didn't add `ReferredFKInfo` into reference table, because the reference table may not have been created yet, -when `foreign_key_checks` variable value is `OFF`, the user can create child table before reference table. +How to maintain `ReferredFKInfo` in the reference table? When we create table with foreign key, we didn't add `ReferredFKInfo` into the reference table, because the reference table may not have been created yet, +when `foreign_key_checks` variable value is `OFF`, the user can create a child table before the reference table. We decided to maintain `ReferredFKInfo` while TiDB loading schema. At first, `infoSchema` will record all table's `ReferredFKInfo`: @@ -143,9 +143,9 @@ type infoSchema struct { Function `applyTableUpdate` uses `applyDropTable` to drop the old table, uses `applyCreateTable` to create the new table. -In function `applyDropTable`, we will delete the table's foreign key information from `infoSchema.referredForeignKeyMap`. +In the function `applyDropTable`, we will delete the table's foreign key information from `infoSchema.referredForeignKeyMap`. -In function `applyCreateTable`, we will add the table's foreign key information into `infoSchema.referredForeignKeyMap` first, +In the function `applyCreateTable`, we will add the table's foreign key information into `infoSchema.referredForeignKeyMap` first, then get the table's `ReferredFKInfo` by schema name and table name, then store the `ReferredFKInfo` into `TableInfo.ReferredForeignKeys`. Then `applyTableUpdate` will also need to reload the old/new table's referred table information, also uses `applyDropTable` to drop the old reference table, use `applyCreateTable` to create new reference table. @@ -162,26 +162,26 @@ create table t2 (id int key,a int); alter table t2 add foreign key fk(a) references t1(id) ON DELETE CASCADE; ``` -Just like create table, we should validate first, and return error if the conditions for creating foreign keys are not met, and also need to do double-check. +Just like create table, we should validate first, and return an error if the conditions for creating foreign keys are not met, and also need to double-check. -When build `TableInfo`, we need to auto create an index for foreign key columns if there is no index cover foreign key columns. -And this is divides the problem into two cases: -- Case-1: No need to create index automatically, and only add foreign key constraint. -- Case-2: Need auto create index for foreign key +When building `TableInfo`, we need to auto-create an index for foreign key columns if there are no index cover foreign key columns. +And this divides the problem into two cases: +- Case-1: No need to create an index automatically, and only add the foreign key constraint. +- Case-2: Need auto-create index for foreign key #### Case-1: Only add foreign key constrain -The DDL owner handle add foreign key constrain step is: +The DDL owner handle adds foreign key constrain step is: -1. None -> Write Only: add foreign key constrain which state is `write-only` into table. -3. Write Only - Write Reorg: check all row in the table whether has related foreign key exists in reference table, we can use following SQL to check: +1. None -> Write Only: add foreign key constraint which state is `write-only` into the table. +3. Write Only - Write Reorg: check all rows in the table whether has related foreign key exists in the reference table, we can use the following SQL to check: ```sql select 1 from t2 where t2.a is not null and t2.a not in (select id from t1) limit 1; ``` - The expected result is `empty`, otherwise, an error is returned and cancel the ddl job. -4. Write Reorg -> Public: update the foreign key constrain state to `public`. + The expected result is `empty`, otherwise, an error is returned and cancels the DDL job. +4. Write Reorg -> Public: update the foreign key constraint state to `public`. -A problem is, How the DML treat the foreign key with on delete/update cascade behaviour which state is non-public? +A problem is, How the DML treat the foreign key on delete/update cascade behaviour in which state is non-public? Here is an example: ```sql @@ -202,15 +202,15 @@ delete from t1 where id = 1; Then, TiDB shouldn't do cascade delete for foreign key `fk_1` in state `Write-Only`, since the `Add Foreign Key` DDL job maybe failed in `Write-Reorg` state and rollback the DDL job. But it is hard to rollback the cascade deleted executed before. -So, when execute DML with `non-publick` foreign key, TiDB will do foreign key constraint check instead of foreign key cascade behaviour. +So, when executing DML with `non-public` foreign key, TiDB will do a foreign key constraint check instead of foreign key cascade behaviour. -#### Case-2: Auto create index for foreign key and add foreign key constrain +#### Case-2: Auto-create index for foreign key and add foreign key constrain -As TiDB support multi-schema change now, we create a `ActionMultiSchemaChange` job and contains following 2 sub-ddl job. +As TiDB support multi-schema change now, we create an `ActionMultiSchemaChange` job that contains the following 2 sub-ddl job. - Add Index DDL job -- Add Foreign Key Constrain DDL job +- Add Foreign Key Constraint DDL job -When TiDB add foreign key ddl job meet error, TiDB will rollback the `ActionMultiSchemaChange` job and the 2 sub-ddl job will also be rollback. +When TiDB adds foreign key DDL job meet error, TiDB will rollback the `ActionMultiSchemaChange` job and the 2 sub-ddl job will also be rollback. ### Drop Table @@ -223,7 +223,7 @@ If `foreign_key_checks` is `ON`, then drop the table which has foreign key refer ### Drop Database -If `foreign_key_checks` is `ON`, then drop the database which has foreign key references by other database will be rejected. +If `foreign_key_checks` is `ON`, then drop the database which has foreign key references by another database will be rejected. ```sql > drop database test; @@ -232,10 +232,10 @@ If `foreign_key_checks` is `ON`, then drop the database which has foreign key re ### Drop Index -Drop index which used by foreign key will be rejected. +Drop index used by the foreign key will be rejected. ```sql -> set @@foreign_key_checks=0; -- Even disable foreign_key_checks, you still can't drop the index which used for foreign key constrain. +> set @@foreign_key_checks=0; -- Even disable foreign_key_checks, you still can't drop the index used for foreign key constrain. Query OK, 0 rows affected > alter table t2 drop index fk; (1553, "Cannot drop index 'fk': needed in a foreign key constraint") @@ -243,7 +243,7 @@ Query OK, 0 rows affected ### Rename Column -Rename column which has foreign key or references, should also need to update the related child/parent table info. +Rename column which has foreign keys or references should also need to update the related child/parent table info. ```sql create table t1 (id int key,a int, index(a)); @@ -264,11 +264,11 @@ Create Table | CREATE TABLE `t2` ( ### Truncate Table -Truncate table which has foreign key or references, should also need to update the related child/parent table info. +A truncate table which has foreign keys or references should also need to update the related child/parent table info. ### Modify Column -Modify column which used by foreign key will be rejected. +Modify column which used by the foreign key will be rejected. ```sql > alter table t1 change column id id1 bigint; @@ -277,21 +277,21 @@ Modify column which used by foreign key will be rejected. MySQL modify column problem: https://www.percona.com/blog/2019/06/04/ddl-queries-foreign-key-columns-MySQL-pxc/ -What if the user really need to modify column type, such as from `INT` to `BIGINT`. Maybe we can offer an variable such as `alter-foreign-keys-method=auto`, -then when user modify the column type, TiDB will auto modify the related foreign key column's type. For easy implementation and reduce risk, maybe only support modify column type which doesn't need to reorg table row data. +What if the user really needs to modify column type, such as from `INT` to `BIGINT`. Maybe we can offer a variable such as `alter-foreign-keys-method=auto`, +then when the user modifies the column type, TiDB will auto-modify the related foreign key column's type. For easy implementation and to reduce risk, maybe only support modifying column type which doesn't need to reorg table row data. ## DML Technical Design ### DML On Child Table -On Child Table Insert Or Update, need to Find FK column value whether exist in reference table: +On Child Table Insert Or Update, need to Find FK column value that exist in the reference table: 1. Get reference table info by table name. -2. Get related fk index of reference table. -3. tiny optimize, check fk column value exist in reference table cache(map[string(index_key)]struct). +2. Get the related fk index of the reference table. +3. tiny optimize, check fk column value exists in reference table cache(map[string(index_key)]struct). 3. Get related row in reference. - - Construct index key and then use snapshot `Iter` and `Seek` API to scan. If the index is unique and only contain - foreign key columns, use snapshot `Get` API. +- Construct index key and then use snapshot `Iter` and `Seek` API to scan. If the index is unique and only contain + foreign key columns, use the snapshot `Get` API. - `Iter` default scan batch size is 256, need to set 2 to avoid read unnecessary data. 4. compact column value to make sure exist. 5. If relate row exist in reference table, also need to add lock in the related row. @@ -308,7 +308,7 @@ create table t2 (id int key,a int, b int, index (a,b,id), foreign key fk(a, b) r insert into t1 values (-1, 1, 1); ``` -Then, execute following SQL in 2 session: +Then, execute the following SQL in 2 sessions: | Session 1 | Session 2 | | -------------------------------- | ------------------------------------------- | @@ -318,11 +318,11 @@ Then, execute following SQL in 2 session: | Commit | | | | ERROR: Cannot delete or update a parent row | -So we need to add lock in reference table when insert/update child table. +So we need to add lock in the reference table when insert/update child table. ##### In Pessimistic Transaction -When TiDB add pessimistic locks, if relate row exist in reference table, also need to add lock in the related row. +When TiDB add pessimistic locks, if related row exists in the reference table, also needs to add lock in the related row. ##### In Optimistic Transaction @@ -330,13 +330,13 @@ Just like `SELECT FOR UPDATE` statement, need to use `doLockKeys` to lock the re ##### Issue -TiDB current only support `lock for update`(aka write-lock, such as `select for update`), doesn't support `lock for share`(aka read-lock, such as `select for share`). +TiDB currently only support `lock for update`(aka write-lock, such as `select for update`), and doesn't support `lock for share`(aka read-lock, such as `select for share`). -So far we have to add `lock for update` in reference table when insert/update child table, then the performance will be poor. After TiDB support `lock for share`, we should use `lock for share` instead. +So far we have to add `lock for update` in the reference table when insert/update child table, then the performance will be poor. After TiDB support `lock for share`, we should use `lock for share` instead. #### DML Load data -Load data should also do foreign key check, but report warning instead error: +Load data should also do foreign key check, but report a warning instead error: ```sql create table t1 (id int key,a int, index(a)); @@ -366,10 +366,10 @@ On reference Table Delete Or Update: 1. modify related child table row by referential action: - `CASCADE`: update/delete related child table row. - `SET NULL`: set related child row's foreign key columns value to NULL. -- `RESTRICT`, `NO ACTION`: If related row exist in child table, reject update/delete reference table. +- `RESTRICT`, `NO ACTION`: If related row exist in child table, reject update/delete the reference table. - `SET DEFAULT`: just like `RESTRICT`. -modify related child table row by following step: +modify related child table rows by the following step: 1. get child table info by name(in reference table info). 2. get the child table fk index's column info. 3. build update executor to update child table rows. @@ -421,9 +421,9 @@ insert into t2 values (1, 1); +----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+ ``` -From the plan, you can't see any information about the foreign key constrain which need to delete the related row in child table `t2`. +From the plan, you can't see any information about the foreign key constraint which needs to delete the related row in child table `t2`. -I think this is a MySQL issue, should we make TiDB plan better, at least when we meet some slow query, we can know maybe it is caused by modify related row in child table. +I think this is a MySQL issue, should we make TiDB plan better, at least when we meet some slow query, we can know maybe it is caused by modify related rows in the child table. Here is an example plan with foreign key: @@ -554,17 +554,17 @@ Create Table | CREATE TABLE `t2` ( #### TiCDC -When sync table data to downstream TiDB, TiCDC should `set @@foreign_key_checks=0` in downstream TiDB. +When syncing table data to downstream TiDB, TiCDC should `set @@foreign_key_checks=0` in downstream TiDB. #### DM -When do full synchronization, DM should `set @@foreign_key_checks=0` in downstream TiDB. +When doing full synchronization, DM should `set @@foreign_key_checks=0` in downstream TiDB. -When do incremental synchronization, DM should set `foreign_key_checks` session variable according to MySQL binlog. +When doing incremental synchronization, DM should set `foreign_key_checks` session variable according to MySQL binlog. #### BR -When sync table data to downstream TiDB, TiCDC should `set @@foreign_key_checks=0` in downstream TiDB. +When syncing table data to downstream TiDB, BR should `set @@foreign_key_checks=0` in downstream TiDB. ## Test Case From d1d14bc252bef9c35baee45fc85cd052578899ba Mon Sep 17 00:00:00 2001 From: crazycs520 Date: Thu, 15 Dec 2022 15:50:27 +0800 Subject: [PATCH 29/29] refine Signed-off-by: crazycs520 --- docs/design/2022-06-22-foreign-key.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/design/2022-06-22-foreign-key.md b/docs/design/2022-06-22-foreign-key.md index 2ac04b8f15342..5c6b32d9474e0 100644 --- a/docs/design/2022-06-22-foreign-key.md +++ b/docs/design/2022-06-22-foreign-key.md @@ -566,7 +566,7 @@ When doing incremental synchronization, DM should set `foreign_key_checks` sessi #### BR -When syncing table data to downstream TiDB, BR should `set @@foreign_key_checks=0` in downstream TiDB. +When restore data into TiDB, BR should `set @@foreign_key_checks=0` in TiDB. ## Test Case