planner, expression: take NullFlag into consideration when optimize the `int non-const` <cmp > `non-int const` #20040

TszKitLo40 · 2020-09-16T12:20:51Z

What problem does this PR solve?

Issue Number: close #16679

Problem Summary:

What is changed and how it works?

What's Changed:
Fix bug when fetching null data.
How it Works:
Check the NotNullFlag of the not-int constant.

Check List

Tests

Unit test
Integration test
Manual test (same steps in the issue)

Release note

Fix incorrectly fetching null data

XuHuaiyu · 2020-09-16T14:09:43Z

The test failed because the optimization rule is changed, some plan will change according to it.
You can run ./run-tests.sh -r all in cmd/explaintest to run the test and record the result to get the correct test result.

mmyj · 2020-09-16T15:03:19Z

planner/core/expression_rewriter_test.go

@@ -260,3 +260,19 @@ func (s *testExpressionRewriterSuite) TestPatternLikeToExpression(c *C) {
 tk.MustQuery("select 0 like '0';").Check(testkit.Rows("1"))
 tk.MustQuery("select 0.00 like '0.00';").Check(testkit.Rows("1"))
 }
+
+func (s *testExpressionRewriterSuite) TestFetchNullData(c *C) {


TestFetchNullData -> TestIssue16679, and please add a comment // TestIssue16679 contains tests for https://github.com/pingcap/tidb/issues/16679.

TszKitLo40 · 2020-09-17T02:39:47Z

The test failed because the optimization rule is changed, some plan will change according to it.
You can run ./run-tests.sh -r all in cmd/explaintest to run the test and record the result to get the correct test result.

Thanks for your comments, I have fixed the tests.

mmyj · 2020-09-17T04:12:35Z

planner/core/integration_test.go

@@ -698,7 +698,7 @@ func (s *testIntegrationSuite) TestPartitionPruningForInExpr(c *C) {

 tk.MustExec("use test")
 tk.MustExec("drop table if exists t")
- tk.MustExec("create table t(a int(11), b int) partition by range (a) (partition p0 values less than (4), partition p1 values less than(10), partition p2 values less than maxvalue);")
+ tk.MustExec("create table t(a int(11) not null, b int) partition by range (a) (partition p0 values less than (4), partition p1 values less than(10), partition p2 values less than maxvalue);")


should we add a new test for the column a without not null?

I have added a new test for it.

mmyj · 2020-09-17T04:14:15Z

expression/builtin_compare_test.go

@@ -67,6 +67,25 @@ func (s *testEvaluatorSuite) TestCompareFunctionWithRefine(c *C) {
 {"123456789123456789123456789.12345 < a", "0"},
 {"-123456789123456789123456789.12345 < a", "1"},
 {"'aaaa'=a", "eq(0, a)"},
+ //{"1.5071004017670217e-01=a", "eq(1.5071004017670217e-01, a)"},


why comment here?

mmyj · 2020-09-17T04:15:37Z

expression/builtin_compare.go

@@ -1295,7 +1295,7 @@ func (c *compareFunctionClass) refineArgs(ctx sessionctx.Context, args []Express
 isExceptional, finalArg0, finalArg1 := false, args[0], args[1]
 isPositiveInfinite, isNegativeInfinite := false, false
 // int non-constant [cmp] non-int constant
- if arg0IsInt && !arg0IsCon && !arg1IsInt && arg1IsCon {
+ if mysql.HasNotNullFlag(args[0].GetType().Flag) && arg0IsInt && !arg0IsCon && !arg1IsInt && arg1IsCon {


mysql.HasNotNullFlag(args[0].GetType().Flag) -> mysql.HasNotNullFlag(arg0Type.Flag)

mmyj · 2020-09-17T04:15:44Z

expression/builtin_compare.go

@@ -1314,7 +1314,7 @@ func (c *compareFunctionClass) refineArgs(ctx sessionctx.Context, args []Express
 }
 }
 // non-int constant [cmp] int non-constant
- if arg1IsInt && !arg1IsCon && !arg0IsInt && arg0IsCon {
+ if mysql.HasNotNullFlag(args[1].GetType().Flag) && arg1IsInt && !arg1IsCon && !arg0IsInt && arg0IsCon {


ti-challenge-bot · 2020-09-23T12:36:24Z

There is no reward for this challenge pull request, so you can request a reward from @XuHuaiyu.

More

Tip : About reward you can refs to reward-command.

Warning: None

mmyj · 2020-09-25T02:16:10Z

PTAL @XuHuaiyu

XuHuaiyu · 2020-09-25T02:17:10Z

/reward 300

ti-challenge-bot · 2020-09-25T02:17:15Z

Reward success.

mmyj · 2020-09-25T03:04:17Z

LGTM

XuHuaiyu · 2020-09-27T07:03:45Z

/run-all-tests

XuHuaiyu · 2020-09-27T08:16:02Z

@TszKitLo40
I think we should not forbid refineArgs for all the scenarios, otherwise, it may cause bad performance for the queries on the partitioned table.
e.g.

-- table t with range partition [p0 (1~10)], [p1 (11~20)]
select * from t where par_col in (9);

Before this commit, we only scan the p0 partition. But after this commit, we'll trigger a full table scan, it's unacceptable.

A naive rule for whether we need to forbid refineArgs can be defined as the following:

If it's an expression in SelectionExec, and this comparison function is not wrapped with other expressions, we should not forbid the refineArgs even if the NotNullFlag is not set.
For others, we forbid refineArgs when NotNullFlag is not set.

xuyifangreeneyes · 2021-01-07T08:18:08Z

@TszKitLo40 is there any update?

TszKitLo40 · 2021-01-07T08:21:37Z

@TszKitLo40 is there any update?
No, I have no idea how to know whether the value is null in planner phase. @xuyifangreeneyes Can you give me some advice?

xuyifangreeneyes · 2021-01-08T04:51:54Z

@TszKitLo40 is there any update?
No, I have no idea how to know whether the value is null in planner phase. @xuyifangreeneyes Can you give me some advice?

We cannot know whether the value is null in planner phase. But as @XuHuaiyu said, we can allow refineArgs in some certain scenarios to avoid bad performance. For example,

create table t0 (a int);
explain select * from t0 where a = 3.4;

Here is the result if we allow refineArgs for a = 3.4

tidb> explain select * from t0 where a = 3.4;
+-------------+---------+------+---------------+---------------+
| id          | estRows | task | access object | operator info |
+-------------+---------+------+---------------+---------------+
| TableDual_6 | 0.00    | root |               | rows:0        |
+-------------+---------+------+---------------+---------------+
1 row in set (0.00 sec)

And here is the result if we forbid it.

tidb> explain select * from t0 where a = 3.4;
+-------------------------+----------+-----------+---------------+--------------------------------+
| id                      | estRows  | task      | access object | operator info                  |
+-------------------------+----------+-----------+---------------+--------------------------------+
| TableReader_7           | 8000.00  | root      |               | data:Selection_6               |
| └─Selection_6           | 8000.00  | cop[tikv] |               | eq(cast(test.t0.a), 3.4)       |
|   └─TableFullScan_5     | 10000.00 | cop[tikv] | table:t0      | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+---------------+--------------------------------+
3 rows in set (0.00 sec)

Notice that the where condition a = 3.4 is either false or null, so no row would be selected.

Hence, one improvement we can achieve is that when the comparison function is the where condition in select statement, even if NotNullFlag is not set, we try refineArgs. If the result of refineArgs is constantly false, refineArgs can be allowed since the comparison function is null if the column value is null and the where condition won't be satisfied no matter it is evaluated as false or null.

TszKitLo40 · 2021-01-08T10:03:11Z

@TszKitLo40 is there any update?
No, I have no idea how to know whether the value is null in planner phase. @xuyifangreeneyes Can you give me some advice?

We cannot know whether the value is null in planner phase. But as @XuHuaiyu said, we can allow refineArgs in some certain scenarios to avoid bad performance. For example,
create table t0 (a int);
explain select * from t0 where a = 3.4;
Here is the result if we allow refineArgs for a = 3.4
tidb> explain select * from t0 where a = 3.4;
+-------------+---------+------+---------------+---------------+
| id          | estRows | task | access object | operator info |
+-------------+---------+------+---------------+---------------+
| TableDual_6 | 0.00    | root |               | rows:0        |
+-------------+---------+------+---------------+---------------+
1 row in set (0.00 sec)
And here is the result if we forbid it.
tidb> explain select * from t0 where a = 3.4;
+-------------------------+----------+-----------+---------------+--------------------------------+
| id                      | estRows  | task      | access object | operator info                  |
+-------------------------+----------+-----------+---------------+--------------------------------+
| TableReader_7           | 8000.00  | root      |               | data:Selection_6               |
| └─Selection_6           | 8000.00  | cop[tikv] |               | eq(cast(test.t0.a), 3.4)       |
|   └─TableFullScan_5     | 10000.00 | cop[tikv] | table:t0      | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+---------------+--------------------------------+
3 rows in set (0.00 sec)
Notice that the where condition a = 3.4 is either false or null, so no row would be selected.

Hence, one improvement we can achieve is that when the comparison function is the where condition in select statement, even if NotNullFlag is not set, we try refineArgs. If the result of refineArgs is constantly false, refineArgs can be allowed since the comparison function is null if the column value is null and the where condition won't be satisfied no matter it is evaluated as false or null.

But it is hard to define whether refineArgs should be allowed.

ti-chi-bot · 2021-02-23T04:23:07Z

@TszKitLo40: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

TszKitLo40 added 2 commits September 16, 2020 20:10

Add unit tests and check null flag.

696b07f

Merge branch 'master' into incorrectly-fetching-null-data

7672e8d

TszKitLo40 requested review from a team as code owners September 16, 2020 12:20

TszKitLo40 requested review from XuHuaiyu and removed request for a team September 16, 2020 12:20

ti-srebot added the contribution This PR is from a community contributor. label Sep 16, 2020

github-actions bot added the component/expression label Sep 16, 2020

Fix check_dev

7c42b0b

mmyj reviewed Sep 16, 2020

View reviewed changes

youwithouto mentioned this pull request Sep 16, 2020

expression: Add column nullability checking before "refine args" #20044

Closed

TszKitLo40 added 2 commits September 17, 2020 10:08

Fix check_dev

9f6a5bb

fix check_dev2

9bc7e78

mmyj requested changes Sep 17, 2020

View reviewed changes

TszKitLo40 added 10 commits September 17, 2020 13:33

nitpick

584d6fe

Fix json

066ae3d

Fix dev

19a17be

Fix checkdev2

805035c

Fix dev

c4c10ab

Merge branch 'master' into incorrectly-fetching-null-data

5cd0fcc

Fix check_dev

2723c10

Merge branch 'master' into incorrectly-fetching-null-data

de318ad

Merge branch 'master' into incorrectly-fetching-null-data

df75dc1

Nitpick for performance.

5bd2260

ti-challenge-bot bot added the rewarded label Sep 25, 2020

ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 25, 2020

Merge branch 'master' into incorrectly-fetching-null-data

5b43a49

XuHuaiyu changed the title ~~executor: Fix incorrectly fetching null data~~ planner, expression: take NullFlag into consideration when optimize the int non-const <cmp > non-int const Sep 27, 2020

XuHuaiyu added the needs-cherry-pick-4.0 label Sep 27, 2020

ichn-hu mentioned this pull request Nov 3, 2020

Welcome to contribute #20804

Closed

ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 23, 2021

TszKitLo40 closed this Mar 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

planner, expression: take NullFlag into consideration when optimize the `int non-const` <cmp > `non-int const` #20040

planner, expression: take NullFlag into consideration when optimize the `int non-const` <cmp > `non-int const` #20040

TszKitLo40 commented Sep 16, 2020

XuHuaiyu commented Sep 16, 2020

mmyj Sep 16, 2020

TszKitLo40 commented Sep 17, 2020

mmyj Sep 17, 2020

TszKitLo40 Sep 17, 2020

mmyj Sep 17, 2020

mmyj Sep 17, 2020

mmyj Sep 17, 2020

ti-challenge-bot bot commented Sep 23, 2020

mmyj commented Sep 25, 2020

XuHuaiyu commented Sep 25, 2020

ti-challenge-bot bot commented Sep 25, 2020

mmyj commented Sep 25, 2020

XuHuaiyu commented Sep 27, 2020

XuHuaiyu commented Sep 27, 2020

xuyifangreeneyes commented Jan 7, 2021

TszKitLo40 commented Jan 7, 2021 •

edited

Loading

xuyifangreeneyes commented Jan 8, 2021

TszKitLo40 commented Jan 8, 2021

ti-chi-bot commented Feb 23, 2021

planner, expression: take NullFlag into consideration when optimize the int non-const <cmp > non-int const #20040

planner, expression: take NullFlag into consideration when optimize the int non-const <cmp > non-int const #20040

Conversation

TszKitLo40 commented Sep 16, 2020

What problem does this PR solve?

What is changed and how it works?

Check List

Release note

XuHuaiyu commented Sep 16, 2020

mmyj Sep 16, 2020

Choose a reason for hiding this comment

TszKitLo40 commented Sep 17, 2020

mmyj Sep 17, 2020

Choose a reason for hiding this comment

TszKitLo40 Sep 17, 2020

Choose a reason for hiding this comment

mmyj Sep 17, 2020

Choose a reason for hiding this comment

mmyj Sep 17, 2020

Choose a reason for hiding this comment

mmyj Sep 17, 2020

Choose a reason for hiding this comment

ti-challenge-bot bot commented Sep 23, 2020

mmyj commented Sep 25, 2020

XuHuaiyu commented Sep 25, 2020

ti-challenge-bot bot commented Sep 25, 2020

mmyj commented Sep 25, 2020

XuHuaiyu commented Sep 27, 2020

XuHuaiyu commented Sep 27, 2020

xuyifangreeneyes commented Jan 7, 2021

TszKitLo40 commented Jan 7, 2021 • edited Loading

xuyifangreeneyes commented Jan 8, 2021

TszKitLo40 commented Jan 8, 2021

ti-chi-bot commented Feb 23, 2021

planner, expression: take NullFlag into consideration when optimize the `int non-const` <cmp > `non-int const` #20040

planner, expression: take NullFlag into consideration when optimize the `int non-const` <cmp > `non-int const` #20040

TszKitLo40 commented Jan 7, 2021 •

edited

Loading