Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: fix invalid index on multi-layer virtual columns #11260

Merged
merged 18 commits into from
Jul 24, 2019

Conversation

tangenta
Copy link
Contributor

@tangenta tangenta commented Jul 15, 2019

What problem does this PR solve?

Fix #11139.

What is changed and how it works?

When a generated column depends on another generated column, like c -> b -> a, the old implementation fails to backfill the existing row data when creating an index on c.

In makeupDecodeColMap(), a decodeColMap is used to save the information about a column, containing
column id -> table.column & generated expression.
In index-value-backfill period, decodeColMap can be used to fetch row values.

For example,

drop table if exists t;
create table t (a int, b int as (a + 1), c int as (b + 1));
insert into t (a) values (1);
create index idx on t (c);

When creating an index on c, the old implementation of makeupDecodeColMap() builds a decodeColMap like

column ID table.Column, GenExpr
3 c, b + 1
2 b, nil

without considering whether b is a generated column.

In this PR, to build a decodeColMap, below three steps is needed:

  1. Build a "full" map that contains all the columns mentioned in an index, including directly and indirectly depended columns.
  2. Substitute generated expression.Column in map with another expression, until no generated column in GenExpr field of every value in map.
  3. Delete all unused columns in the map. Here the "unused" means virtual and not included in indexed columns.

Step2 substitute generated columns in a columnID-ascending order, ensuring that no unresolved generated column exists.

The map building process in the above example:
Step 1,

column ID table.Column, GenExpr
3 c, b + 1
2 b, a + 1
1 a, nil

Step 2,

column ID table.Column, GenExpr
3 c, (a + 1) + 1
2 b, (a + 1)
1 a, nil

Step 3,

column ID table.Column, GenExpr
3 c, (a + 1) + 1
1 a, nil

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has exported function/method change

Side effects

  • Possible performance regression

Related changes

  • Need to cherry-pick to the release branch

@codecov
Copy link

codecov bot commented Jul 15, 2019

Codecov Report

Merging #11260 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #11260   +/-   ##
===========================================
  Coverage   81.4063%   81.4063%           
===========================================
  Files           423        423           
  Lines         90816      90816           
===========================================
  Hits          73930      73930           
  Misses        11573      11573           
  Partials       5313       5313

@tangenta
Copy link
Contributor Author

/run-integration-tests

@qw4990 qw4990 self-requested a review July 18, 2019 05:43
@tangenta
Copy link
Contributor Author

@crazycs520 @bb7133 @zimulala PTAL~

@bb7133
Copy link
Member

bb7133 commented Jul 19, 2019

Excellent PR description! @tangenta

util/rowDecoder/decoder.go Outdated Show resolved Hide resolved
util/rowDecoder/decoder.go Outdated Show resolved Hide resolved
util/rowDecoder/decoder.go Outdated Show resolved Hide resolved
ddl/db_integration_test.go Show resolved Hide resolved
}
return v
case *expression.ScalarFunction:
if v.FuncName.L == ast.Cast {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we handle CAST in this special way? I don't get the point from expression.ColumnSubstitute, neither from here.

Copy link
Contributor Author

@tangenta tangenta Jul 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am not sure. The removal of them does NOT fail the integration test.

util/rowDecoder/decoder.go Show resolved Hide resolved
// SubstituteGenColsInDecodeColMap substitutes generated columns in every expression in decodeColMap
// with non-generated one by looking up decodeColMap.
func SubstituteGenColsInDecodeColMap(decodeColMap map[int64]Column) {
// Sort columns by ID in ascending order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please Add comment why you need sort here. I think you have an assumption here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crazycs520 I think line 150 does explain it. Do you have any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is here: (https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html)

A generated column definition can refer to other generated columns, but only those occurring earlier in the table definition. A generated column definition can refer to any base (nongenerated) column in the table whether its definition occurs earlier or later.

Is it possible that the order of column ID is not the order of columns shown in the table definition? Using column ID is a bit risky here. @crazycs520 @tangenta

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bb7133 You are right. I have found a test case that violates this assumption. I think
sorting by table.Column.Offset is the correct choice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bb7133 You are right. I have found a test case that violates this assumption. I think
sorting by table.Column.Offset is the correct choice.

Good job, would you please show the case here?

Copy link
Contributor Author

@tangenta tangenta Jul 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create table t (a int, c int as (a + 1));
alter table t add column b int as (a + 1) after a;

Here the ID of column a, b and c is 1, 3 and 2 respectively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you need fix bug:#11365 first.

Copy link
Member

@bb7133 bb7133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tangenta tangenta added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 22, 2019
@tangenta tangenta changed the title ddl: fix invalid index on virtual columns in dependency chain ddl: fix invalid index on multi-layer virtual columns Jul 22, 2019
ddl/index.go Outdated Show resolved Hide resolved
Copy link
Contributor

@winkyao winkyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@winkyao winkyao added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 24, 2019
Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Attention, #11365 need to be fix later. @AilinKid

@crazycs520
Copy link
Contributor

/run-all-tests

@crazycs520
Copy link
Contributor

/run-all-tests

@qw4990
Copy link
Contributor

qw4990 commented Jul 24, 2019

@tangenta Please fix the CI problem.

@tangenta
Copy link
Contributor Author

[2019-07-24T02:56:21.102Z] ----------------------------------------------------------------------
[2019-07-24T02:56:21.102Z] FAIL: builtin_time_test.go:1222: testEvaluatorSuite.TestCurrentTime
[2019-07-24T02:56:21.102Z] 
[2019-07-24T02:56:21.102Z] builtin_time_test.go:1242:
[2019-07-24T02:56:21.102Z]     c.Assert(n.String(), GreaterEqual, last.Format(tfStr))
[2019-07-24T02:56:21.102Z] ... compare_one string = "10:56:15.950"
[2019-07-24T02:56:21.102Z] ... compare_two string = "10:56:16"

/run-unit-test

@qw4990 qw4990 added status/all tests passed status/LGT3 The PR has already had 3 LGTM. and removed status/LGT2 Indicates that a PR has LGTM 2. labels Jul 24, 2019
@tangenta tangenta merged commit 05f66f4 into pingcap:master Jul 24, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Jul 24, 2019

cherry pick to release-3.0 failed

@sre-bot
Copy link
Contributor

sre-bot commented Jul 24, 2019

cherry pick to release-2.1 failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/sql-infra SIG: SQL Infra status/LGT3 The PR has already had 3 LGTM. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

should fill index data after adding an index on a virtual generated column
7 participants