ddl: fix invalid index on multi-layer virtual columns #11260

tangenta · 2019-07-15T12:37:26Z

What problem does this PR solve?

What is changed and how it works?

When a generated column depends on another generated column, like c -> b -> a, the old implementation fails to backfill the existing row data when creating an index on c.

In makeupDecodeColMap(), a decodeColMap is used to save the information about a column, containing
column id -> table.column & generated expression.
In index-value-backfill period, decodeColMap can be used to fetch row values.

For example,

drop table if exists t;
create table t (a int, b int as (a + 1), c int as (b + 1));
insert into t (a) values (1);
create index idx on t (c);

When creating an index on c, the old implementation of makeupDecodeColMap() builds a decodeColMap like

column ID	table.Column, GenExpr
3	c, b + 1
2	b, nil

without considering whether b is a generated column.

In this PR, to build a decodeColMap, below three steps is needed:

Build a "full" map that contains all the columns mentioned in an index, including directly and indirectly depended columns.
Substitute generated expression.Column in map with another expression, until no generated column in GenExpr field of every value in map.
Delete all unused columns in the map. Here the "unused" means virtual and not included in indexed columns.

Step2 substitute generated columns in a columnID-ascending order, ensuring that no unresolved generated column exists.

The map building process in the above example:
Step 1,

column ID	table.Column, GenExpr
3	c, b + 1
2	b, a + 1
1	a, nil

Step 2,

column ID	table.Column, GenExpr
3	c, (a + 1) + 1
2	b, (a + 1)
1	a, nil

Step 3,

column ID	table.Column, GenExpr
3	c, (a + 1) + 1
1	a, nil

Check List

Tests

Unit test
Integration test

Code changes

Has exported function/method change

Side effects

Possible performance regression

Related changes

Need to cherry-pick to the release branch

codecov · 2019-07-15T12:42:36Z

Codecov Report

Merging #11260 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #11260   +/-   ##
===========================================
  Coverage   81.4063%   81.4063%           
===========================================
  Files           423        423           
  Lines         90816      90816           
===========================================
  Hits          73930      73930           
  Misses        11573      11573           
  Partials       5313       5313

tangenta · 2019-07-16T07:10:29Z

/run-integration-tests

util/rowDecoder/decoder.go

tangenta · 2019-07-18T07:58:36Z

@crazycs520 @bb7133 @zimulala PTAL~

bb7133 · 2019-07-19T03:12:46Z

Excellent PR description! @tangenta

util/rowDecoder/decoder.go

ddl/db_integration_test.go

bb7133 · 2019-07-19T04:21:48Z

util/rowDecoder/decoder.go

+		}
+		return v
+	case *expression.ScalarFunction:
+		if v.FuncName.L == ast.Cast {


Why do we handle CAST in this special way? I don't get the point from expression.ColumnSubstitute, neither from here.

Actually, I am not sure. The removal of them does NOT fail the integration test.

util/rowDecoder/decoder.go

crazycs520 · 2019-07-19T04:58:19Z

util/rowDecoder/decoder.go

+// SubstituteGenColsInDecodeColMap substitutes generated columns in every expression in decodeColMap
+// with non-generated one by looking up decodeColMap.
+func SubstituteGenColsInDecodeColMap(decodeColMap map[int64]Column) {
+	// Sort columns by ID in ascending order.


Please Add comment why you need sort here. I think you have an assumption here.

@crazycs520 I think line 150 does explain it. Do you have any suggestions?

The reason is here: (https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html)

A generated column definition can refer to other generated columns, but only those occurring earlier in the table definition. A generated column definition can refer to any base (nongenerated) column in the table whether its definition occurs earlier or later.

Is it possible that the order of column ID is not the order of columns shown in the table definition? Using column ID is a bit risky here. @crazycs520 @tangenta

@bb7133 You are right. I have found a test case that violates this assumption. I think
sorting by table.Column.Offset is the correct choice.

@bb7133 You are right. I have found a test case that violates this assumption. I think
sorting by table.Column.Offset is the correct choice.

Good job, would you please show the case here?

create table t (a int, c int as (a + 1)); alter table t add column b int as (a + 1) after a;

Here the ID of column a, b and c is 1, 3 and 2 respectively.

It looks like you need fix bug:#11365 first.

…rtual-col

bb7133

LGTM

ddl/index.go

…rtual-col

winkyao

LGTM

crazycs520

LGTM
Attention, #11365 need to be fix later. @AilinKid

crazycs520 · 2019-07-24T02:52:33Z

/run-all-tests

crazycs520 · 2019-07-24T02:53:40Z

/run-all-tests

qw4990 · 2019-07-24T03:02:12Z

@tangenta Please fix the CI problem.

tangenta · 2019-07-24T03:10:33Z

[2019-07-24T02:56:21.102Z] ----------------------------------------------------------------------
[2019-07-24T02:56:21.102Z] FAIL: builtin_time_test.go:1222: testEvaluatorSuite.TestCurrentTime
[2019-07-24T02:56:21.102Z] 
[2019-07-24T02:56:21.102Z] builtin_time_test.go:1242:
[2019-07-24T02:56:21.102Z]     c.Assert(n.String(), GreaterEqual, last.Format(tfStr))
[2019-07-24T02:56:21.102Z] ... compare_one string = "10:56:15.950"
[2019-07-24T02:56:21.102Z] ... compare_two string = "10:56:16"

/run-unit-test

sre-bot · 2019-07-24T03:20:38Z

cherry pick to release-3.0 failed

sre-bot · 2019-07-24T03:23:52Z

cherry pick to release-2.1 failed

tangenta added 2 commits July 15, 2019 20:00

ddl: fix invalid index on virtual columns in dependency chain

bffe3c5

fix var-naming

fb6dd42

crazycs520 added component/DDL-need-LGT3 type/bugfix This PR fixes a bug. labels Jul 15, 2019

fix multiple generated columns in admin check

cd690d7

tangenta removed the status/WIP label Jul 16, 2019

Merge branch 'master' into index-on-multi-virtual-col

2e6b5fe

crazycs520 reviewed Jul 16, 2019

View reviewed changes

util/rowDecoder/decoder.go Outdated Show resolved Hide resolved

move import statement to a right place

83e1c9f

tangenta added the needs-cherry-pick-3.0 label Jul 16, 2019

Add comment about substitution generated column

ec4a648

qw4990 self-requested a review July 18, 2019 05:43

bb7133 added the needs-cherry-pick-2.1 label Jul 19, 2019

bb7133 reviewed Jul 19, 2019

View reviewed changes

crazycs520 reviewed Jul 19, 2019

View reviewed changes

add more tests and update comments

6e9d9e5

tangenta added the status/WIP label Jul 19, 2019

tangenta added 2 commits July 19, 2019 23:32

use column offset to determine position

98e3a65

Merge remote-tracking branch 'upstream/master' into index-on-multi-vi…

a3e81a9

…rtual-col

tangenta removed the status/WIP label Jul 19, 2019

bb7133 reviewed Jul 21, 2019

View reviewed changes

tangenta added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 22, 2019

tangenta changed the title ~~ddl: fix invalid index on virtual columns in dependency chain~~ ddl: fix invalid index on multi-layer virtual columns Jul 22, 2019

winkyao reviewed Jul 22, 2019

View reviewed changes

ddl/index.go Outdated Show resolved Hide resolved

move copy logic

18833f0

tangenta added 4 commits July 22, 2019 21:37

Merge remote-tracking branch 'upstream/master' into index-on-multi-vi…

b680dc1

…rtual-col

add comment

d0213e0

fix errorcheck

10f004c

add virtual column check to avoid unnecessary substitutions

4756c34

tangenta requested review from bb7133, winkyao and crazycs520 July 23, 2019 04:01

winkyao reviewed Jul 24, 2019

View reviewed changes

winkyao added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 24, 2019

crazycs520 approved these changes Jul 24, 2019

View reviewed changes

Merge branch 'master' into index-on-multi-virtual-col

567fa27

Merge branch 'master' into index-on-multi-virtual-col

c9a0282

qw4990 added status/all tests passed status/LGT3 The PR has already had 3 LGTM. and removed status/LGT2 Indicates that a PR has LGTM 2. labels Jul 24, 2019

tangenta merged commit 05f66f4 into pingcap:master Jul 24, 2019

tangenta added a commit to tangenta/tidb that referenced this pull request Jul 27, 2019

ddl: fix invalid index on multi-layer virtual columns (pingcap#11260)

f43a4de

tangenta mentioned this pull request Jul 27, 2019

ddl: fix invalid index on multi-layer virtual columns (#11260) #11475

Merged

tangenta added a commit to tangenta/tidb that referenced this pull request Jul 31, 2019

ddl: fix invalid index on multi-layer virtual columns (pingcap#11260)

6ec8dea

tangenta mentioned this pull request Jul 31, 2019

ddl: fix invalid index on multi-layer virtual columns (#11260) #11538

Merged

sre-bot pushed a commit that referenced this pull request Aug 2, 2019

ddl: fix invalid index on multi-layer virtual columns (#11260) (#11538)

2db2616

sre-bot pushed a commit that referenced this pull request Aug 5, 2019

ddl: fix invalid index on multi-layer virtual columns (#11260) (#11475)

a91e3a4

you06 added the sig/sql-infra SIG: SQL Infra label Mar 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ddl: fix invalid index on multi-layer virtual columns #11260

ddl: fix invalid index on multi-layer virtual columns #11260

tangenta commented Jul 15, 2019 •

edited

Loading

codecov bot commented Jul 15, 2019 •

edited

Loading

tangenta commented Jul 16, 2019

tangenta commented Jul 18, 2019

bb7133 commented Jul 19, 2019

bb7133 Jul 19, 2019

tangenta Jul 19, 2019 •

edited

Loading

crazycs520 Jul 19, 2019

tangenta Jul 19, 2019

bb7133 Jul 19, 2019

tangenta Jul 19, 2019

bb7133 Jul 21, 2019

tangenta Jul 22, 2019 •

edited

Loading

crazycs520 Jul 22, 2019

bb7133 left a comment

winkyao left a comment

crazycs520 left a comment

crazycs520 commented Jul 24, 2019

crazycs520 commented Jul 24, 2019

qw4990 commented Jul 24, 2019

tangenta commented Jul 24, 2019

sre-bot commented Jul 24, 2019

sre-bot commented Jul 24, 2019

ddl: fix invalid index on multi-layer virtual columns #11260

ddl: fix invalid index on multi-layer virtual columns #11260

Conversation

tangenta commented Jul 15, 2019 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

codecov bot commented Jul 15, 2019 • edited Loading

Codecov Report

tangenta commented Jul 16, 2019

tangenta commented Jul 18, 2019

bb7133 commented Jul 19, 2019

bb7133 Jul 19, 2019

Choose a reason for hiding this comment

tangenta Jul 19, 2019 • edited Loading

Choose a reason for hiding this comment

crazycs520 Jul 19, 2019

Choose a reason for hiding this comment

tangenta Jul 19, 2019

Choose a reason for hiding this comment

bb7133 Jul 19, 2019

Choose a reason for hiding this comment

tangenta Jul 19, 2019

Choose a reason for hiding this comment

bb7133 Jul 21, 2019

Choose a reason for hiding this comment

tangenta Jul 22, 2019 • edited Loading

Choose a reason for hiding this comment

crazycs520 Jul 22, 2019

Choose a reason for hiding this comment

bb7133 left a comment

Choose a reason for hiding this comment

winkyao left a comment

Choose a reason for hiding this comment

crazycs520 left a comment

Choose a reason for hiding this comment

crazycs520 commented Jul 24, 2019

crazycs520 commented Jul 24, 2019

qw4990 commented Jul 24, 2019

tangenta commented Jul 24, 2019

sre-bot commented Jul 24, 2019

sre-bot commented Jul 24, 2019

tangenta commented Jul 15, 2019 •

edited

Loading

codecov bot commented Jul 15, 2019 •

edited

Loading

tangenta Jul 19, 2019 •

edited

Loading

tangenta Jul 22, 2019 •

edited

Loading