Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planner, expression: take NullFlag into consideration when optimize the int non-const <cmp > non-int const #20040

Closed

Conversation

TszKitLo40
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #16679

Problem Summary:

What is changed and how it works?

What's Changed:
Fix bug when fetching null data.
How it Works:
Check the NotNullFlag of the not-int constant.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (same steps in the issue)

Release note

  • Fix incorrectly fetching null data

@TszKitLo40 TszKitLo40 requested review from a team as code owners September 16, 2020 12:20
@TszKitLo40 TszKitLo40 requested review from XuHuaiyu and removed request for a team September 16, 2020 12:20
@ti-srebot ti-srebot added the contribution This PR is from a community contributor. label Sep 16, 2020
@XuHuaiyu
Copy link
Contributor

The test failed because the optimization rule is changed, some plan will change according to it.
You can run ./run-tests.sh -r all in cmd/explaintest to run the test and record the result to get the correct test result.

@@ -260,3 +260,19 @@ func (s *testExpressionRewriterSuite) TestPatternLikeToExpression(c *C) {
tk.MustQuery("select 0 like '0';").Check(testkit.Rows("1"))
tk.MustQuery("select 0.00 like '0.00';").Check(testkit.Rows("1"))
}

func (s *testExpressionRewriterSuite) TestFetchNullData(c *C) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestFetchNullData -> TestIssue16679, and please add a comment // TestIssue16679 contains tests for https://github.com/pingcap/tidb/issues/16679.

@TszKitLo40
Copy link
Contributor Author

The test failed because the optimization rule is changed, some plan will change according to it.
You can run ./run-tests.sh -r all in cmd/explaintest to run the test and record the result to get the correct test result.

Thanks for your comments, I have fixed the tests.

@@ -698,7 +698,7 @@ func (s *testIntegrationSuite) TestPartitionPruningForInExpr(c *C) {

tk.MustExec("use test")
tk.MustExec("drop table if exists t")
tk.MustExec("create table t(a int(11), b int) partition by range (a) (partition p0 values less than (4), partition p1 values less than(10), partition p2 values less than maxvalue);")
tk.MustExec("create table t(a int(11) not null, b int) partition by range (a) (partition p0 values less than (4), partition p1 values less than(10), partition p2 values less than maxvalue);")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a new test for the column a without not null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a new test for it.

@@ -67,6 +67,25 @@ func (s *testEvaluatorSuite) TestCompareFunctionWithRefine(c *C) {
{"123456789123456789123456789.12345 < a", "0"},
{"-123456789123456789123456789.12345 < a", "1"},
{"'aaaa'=a", "eq(0, a)"},
//{"1.5071004017670217e-01=a", "eq(1.5071004017670217e-01, a)"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why comment here?

@@ -1295,7 +1295,7 @@ func (c *compareFunctionClass) refineArgs(ctx sessionctx.Context, args []Express
isExceptional, finalArg0, finalArg1 := false, args[0], args[1]
isPositiveInfinite, isNegativeInfinite := false, false
// int non-constant [cmp] non-int constant
if arg0IsInt && !arg0IsCon && !arg1IsInt && arg1IsCon {
if mysql.HasNotNullFlag(args[0].GetType().Flag) && arg0IsInt && !arg0IsCon && !arg1IsInt && arg1IsCon {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mysql.HasNotNullFlag(args[0].GetType().Flag) -> mysql.HasNotNullFlag(arg0Type.Flag)

@@ -1314,7 +1314,7 @@ func (c *compareFunctionClass) refineArgs(ctx sessionctx.Context, args []Express
}
}
// non-int constant [cmp] int non-constant
if arg1IsInt && !arg1IsCon && !arg0IsInt && arg0IsCon {
if mysql.HasNotNullFlag(args[1].GetType().Flag) && arg1IsInt && !arg1IsCon && !arg0IsInt && arg0IsCon {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@ti-challenge-bot
Copy link

There is no reward for this challenge pull request, so you can request a reward from @XuHuaiyu.

More

Tip : About reward you can refs to reward-command.

Warning: None

@mmyj
Copy link
Member

mmyj commented Sep 25, 2020

PTAL @XuHuaiyu

@XuHuaiyu
Copy link
Contributor

/reward 300

@ti-challenge-bot
Copy link

Reward success.

@mmyj
Copy link
Member

mmyj commented Sep 25, 2020

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 25, 2020
@XuHuaiyu
Copy link
Contributor

/run-all-tests

@XuHuaiyu XuHuaiyu changed the title executor: Fix incorrectly fetching null data planner, expression: take NullFlag into consideration when optimize the int non-const <cmp > non-int const Sep 27, 2020
@XuHuaiyu
Copy link
Contributor

@TszKitLo40
I think we should not forbid refineArgs for all the scenarios, otherwise, it may cause bad performance for the queries on the partitioned table.
e.g.

-- table t with range partition [p0 (1~10)], [p1 (11~20)]
select * from t where par_col in (9);

Before this commit, we only scan the p0 partition. But after this commit, we'll trigger a full table scan, it's unacceptable.

A naive rule for whether we need to forbid refineArgs can be defined as the following:

  1. If it's an expression in SelectionExec, and this comparison function is not wrapped with other expressions, we should not forbid the refineArgs even if the NotNullFlag is not set.
  2. For others, we forbid refineArgs when NotNullFlag is not set.

@ichn-hu ichn-hu mentioned this pull request Nov 3, 2020
@xuyifangreeneyes
Copy link
Contributor

@TszKitLo40 is there any update?

@TszKitLo40
Copy link
Contributor Author

TszKitLo40 commented Jan 7, 2021

@TszKitLo40 is there any update?
No, I have no idea how to know whether the value is null in planner phase. @xuyifangreeneyes Can you give me some advice?

@xuyifangreeneyes
Copy link
Contributor

@TszKitLo40 is there any update?
No, I have no idea how to know whether the value is null in planner phase. @xuyifangreeneyes Can you give me some advice?

We cannot know whether the value is null in planner phase. But as @XuHuaiyu said, we can allow refineArgs in some certain scenarios to avoid bad performance. For example,

create table t0 (a int);
explain select * from t0 where a = 3.4;

Here is the result if we allow refineArgs for a = 3.4

tidb> explain select * from t0 where a = 3.4;
+-------------+---------+------+---------------+---------------+
| id          | estRows | task | access object | operator info |
+-------------+---------+------+---------------+---------------+
| TableDual_6 | 0.00    | root |               | rows:0        |
+-------------+---------+------+---------------+---------------+
1 row in set (0.00 sec)

And here is the result if we forbid it.

tidb> explain select * from t0 where a = 3.4;
+-------------------------+----------+-----------+---------------+--------------------------------+
| id                      | estRows  | task      | access object | operator info                  |
+-------------------------+----------+-----------+---------------+--------------------------------+
| TableReader_7           | 8000.00  | root      |               | data:Selection_6               |
| └─Selection_6           | 8000.00  | cop[tikv] |               | eq(cast(test.t0.a), 3.4)       |
|   └─TableFullScan_5     | 10000.00 | cop[tikv] | table:t0      | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+---------------+--------------------------------+
3 rows in set (0.00 sec)

Notice that the where condition a = 3.4 is either false or null, so no row would be selected.

Hence, one improvement we can achieve is that when the comparison function is the where condition in select statement, even if NotNullFlag is not set, we try refineArgs. If the result of refineArgs is constantly false, refineArgs can be allowed since the comparison function is null if the column value is null and the where condition won't be satisfied no matter it is evaluated as false or null.

@TszKitLo40
Copy link
Contributor Author

@TszKitLo40 is there any update?
No, I have no idea how to know whether the value is null in planner phase. @xuyifangreeneyes Can you give me some advice?

We cannot know whether the value is null in planner phase. But as @XuHuaiyu said, we can allow refineArgs in some certain scenarios to avoid bad performance. For example,

create table t0 (a int);
explain select * from t0 where a = 3.4;

Here is the result if we allow refineArgs for a = 3.4

tidb> explain select * from t0 where a = 3.4;
+-------------+---------+------+---------------+---------------+
| id          | estRows | task | access object | operator info |
+-------------+---------+------+---------------+---------------+
| TableDual_6 | 0.00    | root |               | rows:0        |
+-------------+---------+------+---------------+---------------+
1 row in set (0.00 sec)

And here is the result if we forbid it.

tidb> explain select * from t0 where a = 3.4;
+-------------------------+----------+-----------+---------------+--------------------------------+
| id                      | estRows  | task      | access object | operator info                  |
+-------------------------+----------+-----------+---------------+--------------------------------+
| TableReader_7           | 8000.00  | root      |               | data:Selection_6               |
| └─Selection_6           | 8000.00  | cop[tikv] |               | eq(cast(test.t0.a), 3.4)       |
|   └─TableFullScan_5     | 10000.00 | cop[tikv] | table:t0      | keep order:false, stats:pseudo |
+-------------------------+----------+-----------+---------------+--------------------------------+
3 rows in set (0.00 sec)

Notice that the where condition a = 3.4 is either false or null, so no row would be selected.

Hence, one improvement we can achieve is that when the comparison function is the where condition in select statement, even if NotNullFlag is not set, we try refineArgs. If the result of refineArgs is constantly false, refineArgs can be allowed since the comparison function is null if the column value is null and the where condition won't be satisfied no matter it is evaluated as false or null.

But it is hard to define whether refineArgs should be allowed.

@ti-chi-bot ti-chi-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 23, 2021
@ti-chi-bot
Copy link
Member

@TszKitLo40: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@TszKitLo40 TszKitLo40 closed this Mar 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/expression contribution This PR is from a community contributor. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. status/LGT1 Indicates that a PR has LGTM 1.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

incorrectly fetch null data
6 participants