Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

execution: refine precision of cast as decimal in agg func #30805

Merged
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
bf4c7c7
refine precision of cast as decimal in agg func
dragonly Dec 16, 2021
a101ff5
fix for unsigned
dragonly Dec 16, 2021
4996c92
fix typo
dragonly Dec 16, 2021
f2dfa13
fix: only modify args decimal precision when function is sum
dragonly Dec 17, 2021
55c6523
fix plan test cases
dragonly Dec 17, 2021
0661403
fix plan test cases
dragonly Dec 17, 2021
cd0d33a
fix plan test cases
dragonly Dec 17, 2021
1c5f59d
fix explaintest
dragonly Dec 17, 2021
999fffa
fix explaintest
dragonly Dec 17, 2021
cebe292
fix explaintest & plan test cases
dragonly Dec 17, 2021
520647f
fix: accomodate float decimal
dragonly Dec 17, 2021
f2639b4
./run-tests.sh -r explain_easy
dragonly Dec 17, 2021
9581de0
fix ut
dragonly Dec 17, 2021
4ee8129
fix explain_complex
dragonly Dec 17, 2021
b1c2b1b
Apply suggestions from code review
dragonly Dec 17, 2021
f8f77da
Update expression/aggregation/base_func.go
dragonly Dec 20, 2021
b0f94ec
./run-tests.sh -r explain_complex_stats
dragonly Dec 20, 2021
eb60493
fix compile
dragonly Dec 20, 2021
471bd09
address reviewer's comments
dragonly Dec 20, 2021
19a80ad
improve code
dragonly Dec 20, 2021
69b4cad
improve comment
dragonly Dec 20, 2021
1cf8e97
add explaintest for sum & avg
dragonly Dec 20, 2021
bc9dca9
restore mysql.opt_rule_blacklist
dragonly Dec 20, 2021
ee5e055
./run-tests.sh -r explain_easy
dragonly Dec 20, 2021
9948be7
improve comment
dragonly Dec 21, 2021
7a7d943
Update expression/aggregation/base_func.go
dragonly Dec 22, 2021
d1cad1e
fix explaintest
dragonly Dec 22, 2021
55d3bdc
fix explaintest
dragonly Dec 22, 2021
7b538e0
fix all explaintest
dragonly Dec 22, 2021
459caec
fix ut
dragonly Dec 22, 2021
df374b1
refactor
dragonly Dec 22, 2021
cd8f8f8
Merge branch 'master' into dragonly/agg-cast-decimal-precision
hawkingrei Dec 23, 2021
9a5c03c
Merge branch 'master' into dragonly/agg-cast-decimal-precision
ti-chi-bot Dec 23, 2021
425a3dc
Merge branch 'master' into dragonly/agg-cast-decimal-precision
ti-chi-bot Dec 23, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 118 additions & 0 deletions cmd/explaintest/r/explain_easy.result
Original file line number Diff line number Diff line change
Expand Up @@ -846,3 +846,121 @@ Projection 8000.00 root Column#4, Column#4, Column#5
└─HashAgg 8000.00 cop[tikv] group by:test.t.b, funcs:sum(test.t.a)->Column#13, funcs:count(test.t.a)->Column#14
└─TableFullScan 10000.00 cop[tikv] table:t keep order:false, stats:pseudo
drop table if exists t;
create table t(a tinyint, b smallint, c mediumint, d int, e bigint);
insert into mysql.opt_rule_blacklist VALUES("aggregation_push_down");
admin reload opt_rule_blacklist;

explain format = 'brief' select sum(t1.a) from t t1 join t t2 on t1.a=t2.a;
id estRows task access object operator info
StreamAgg 1.00 root funcs:sum(Column#14)->Column#13
└─Projection 12487.50 root cast(test.t.a, decimal(25,0) BINARY)->Column#14
└─HashJoin 12487.50 root inner join, equal:[eq(test.t.a, test.t.a)]
├─TableReader(Build) 9990.00 root data:Selection
│ └─Selection 9990.00 cop[tikv] not(isnull(test.t.a))
│ └─TableFullScan 10000.00 cop[tikv] table:t2 keep order:false, stats:pseudo
└─TableReader(Probe) 9990.00 root data:Selection
└─Selection 9990.00 cop[tikv] not(isnull(test.t.a))
└─TableFullScan 10000.00 cop[tikv] table:t1 keep order:false, stats:pseudo
explain format = 'brief' select sum(t1.b) from t t1 join t t2 on t1.b=t2.b;
id estRows task access object operator info
StreamAgg 1.00 root funcs:sum(Column#14)->Column#13
└─Projection 12487.50 root cast(test.t.b, decimal(27,0) BINARY)->Column#14
└─HashJoin 12487.50 root inner join, equal:[eq(test.t.b, test.t.b)]
├─TableReader(Build) 9990.00 root data:Selection
│ └─Selection 9990.00 cop[tikv] not(isnull(test.t.b))
│ └─TableFullScan 10000.00 cop[tikv] table:t2 keep order:false, stats:pseudo
└─TableReader(Probe) 9990.00 root data:Selection
└─Selection 9990.00 cop[tikv] not(isnull(test.t.b))
└─TableFullScan 10000.00 cop[tikv] table:t1 keep order:false, stats:pseudo
explain format = 'brief' select sum(t1.c) from t t1 join t t2 on t1.c=t2.c;
id estRows task access object operator info
StreamAgg 1.00 root funcs:sum(Column#14)->Column#13
└─Projection 12487.50 root cast(test.t.c, decimal(30,0) BINARY)->Column#14
└─HashJoin 12487.50 root inner join, equal:[eq(test.t.c, test.t.c)]
├─TableReader(Build) 9990.00 root data:Selection
│ └─Selection 9990.00 cop[tikv] not(isnull(test.t.c))
│ └─TableFullScan 10000.00 cop[tikv] table:t2 keep order:false, stats:pseudo
└─TableReader(Probe) 9990.00 root data:Selection
└─Selection 9990.00 cop[tikv] not(isnull(test.t.c))
└─TableFullScan 10000.00 cop[tikv] table:t1 keep order:false, stats:pseudo
explain format = 'brief' select sum(t1.d) from t t1 join t t2 on t1.d=t2.d;
id estRows task access object operator info
StreamAgg 1.00 root funcs:sum(Column#14)->Column#13
└─Projection 12487.50 root cast(test.t.d, decimal(32,0) BINARY)->Column#14
└─HashJoin 12487.50 root inner join, equal:[eq(test.t.d, test.t.d)]
├─TableReader(Build) 9990.00 root data:Selection
│ └─Selection 9990.00 cop[tikv] not(isnull(test.t.d))
│ └─TableFullScan 10000.00 cop[tikv] table:t2 keep order:false, stats:pseudo
└─TableReader(Probe) 9990.00 root data:Selection
└─Selection 9990.00 cop[tikv] not(isnull(test.t.d))
└─TableFullScan 10000.00 cop[tikv] table:t1 keep order:false, stats:pseudo
explain format = 'brief' select sum(t1.e) from t t1 join t t2 on t1.e=t2.e;
id estRows task access object operator info
StreamAgg 1.00 root funcs:sum(Column#14)->Column#13
└─Projection 12487.50 root cast(test.t.e, decimal(41,0) BINARY)->Column#14
└─HashJoin 12487.50 root inner join, equal:[eq(test.t.e, test.t.e)]
├─TableReader(Build) 9990.00 root data:Selection
│ └─Selection 9990.00 cop[tikv] not(isnull(test.t.e))
│ └─TableFullScan 10000.00 cop[tikv] table:t2 keep order:false, stats:pseudo
└─TableReader(Probe) 9990.00 root data:Selection
└─Selection 9990.00 cop[tikv] not(isnull(test.t.e))
└─TableFullScan 10000.00 cop[tikv] table:t1 keep order:false, stats:pseudo
explain format = 'brief' select avg(t1.a) from t t1 join t t2 on t1.a=t2.a;
id estRows task access object operator info
StreamAgg 1.00 root funcs:avg(Column#14)->Column#13
└─Projection 12487.50 root cast(test.t.a, decimal(8,4) BINARY)->Column#14
└─HashJoin 12487.50 root inner join, equal:[eq(test.t.a, test.t.a)]
├─TableReader(Build) 9990.00 root data:Selection
│ └─Selection 9990.00 cop[tikv] not(isnull(test.t.a))
│ └─TableFullScan 10000.00 cop[tikv] table:t2 keep order:false, stats:pseudo
└─TableReader(Probe) 9990.00 root data:Selection
└─Selection 9990.00 cop[tikv] not(isnull(test.t.a))
└─TableFullScan 10000.00 cop[tikv] table:t1 keep order:false, stats:pseudo
explain format = 'brief' select avg(t1.b) from t t1 join t t2 on t1.b=t2.b;
id estRows task access object operator info
StreamAgg 1.00 root funcs:avg(Column#14)->Column#13
└─Projection 12487.50 root cast(test.t.b, decimal(10,4) BINARY)->Column#14
└─HashJoin 12487.50 root inner join, equal:[eq(test.t.b, test.t.b)]
├─TableReader(Build) 9990.00 root data:Selection
│ └─Selection 9990.00 cop[tikv] not(isnull(test.t.b))
│ └─TableFullScan 10000.00 cop[tikv] table:t2 keep order:false, stats:pseudo
└─TableReader(Probe) 9990.00 root data:Selection
└─Selection 9990.00 cop[tikv] not(isnull(test.t.b))
└─TableFullScan 10000.00 cop[tikv] table:t1 keep order:false, stats:pseudo
explain format = 'brief' select avg(t1.c) from t t1 join t t2 on t1.c=t2.c;
id estRows task access object operator info
StreamAgg 1.00 root funcs:avg(Column#14)->Column#13
└─Projection 12487.50 root cast(test.t.c, decimal(13,4) BINARY)->Column#14
└─HashJoin 12487.50 root inner join, equal:[eq(test.t.c, test.t.c)]
├─TableReader(Build) 9990.00 root data:Selection
│ └─Selection 9990.00 cop[tikv] not(isnull(test.t.c))
│ └─TableFullScan 10000.00 cop[tikv] table:t2 keep order:false, stats:pseudo
└─TableReader(Probe) 9990.00 root data:Selection
└─Selection 9990.00 cop[tikv] not(isnull(test.t.c))
└─TableFullScan 10000.00 cop[tikv] table:t1 keep order:false, stats:pseudo
explain format = 'brief' select avg(t1.d) from t t1 join t t2 on t1.d=t2.d;
id estRows task access object operator info
StreamAgg 1.00 root funcs:avg(Column#14)->Column#13
└─Projection 12487.50 root cast(test.t.d, decimal(15,4) BINARY)->Column#14
└─HashJoin 12487.50 root inner join, equal:[eq(test.t.d, test.t.d)]
├─TableReader(Build) 9990.00 root data:Selection
│ └─Selection 9990.00 cop[tikv] not(isnull(test.t.d))
│ └─TableFullScan 10000.00 cop[tikv] table:t2 keep order:false, stats:pseudo
└─TableReader(Probe) 9990.00 root data:Selection
└─Selection 9990.00 cop[tikv] not(isnull(test.t.d))
└─TableFullScan 10000.00 cop[tikv] table:t1 keep order:false, stats:pseudo
explain format = 'brief' select avg(t1.e) from t t1 join t t2 on t1.e=t2.e;
id estRows task access object operator info
StreamAgg 1.00 root funcs:avg(Column#14)->Column#13
└─Projection 12487.50 root cast(test.t.e, decimal(24,4) BINARY)->Column#14
└─HashJoin 12487.50 root inner join, equal:[eq(test.t.e, test.t.e)]
├─TableReader(Build) 9990.00 root data:Selection
│ └─Selection 9990.00 cop[tikv] not(isnull(test.t.e))
│ └─TableFullScan 10000.00 cop[tikv] table:t2 keep order:false, stats:pseudo
└─TableReader(Probe) 9990.00 root data:Selection
└─Selection 9990.00 cop[tikv] not(isnull(test.t.e))
└─TableFullScan 10000.00 cop[tikv] table:t1 keep order:false, stats:pseudo
drop table if exists t;
delete from mysql.opt_rule_blacklist where name="aggregation_push_down";
admin reload opt_rule_blacklist;

19 changes: 19 additions & 0 deletions cmd/explaintest/t/explain_easy.test
Original file line number Diff line number Diff line change
Expand Up @@ -218,3 +218,22 @@ explain format = 'brief' select count(a) from t group by b order by (select coun
explain format = 'brief' select (select sum(count(a))) from t;
explain format = 'brief' select sum(a), (select sum(a)), count(a) from t group by b order by (select count(a));
drop table if exists t;

# lower precision for cast to decimal for integer type variables in sum function
create table t(a tinyint, b smallint, c mediumint, d int, e bigint);
insert into mysql.opt_rule_blacklist VALUES("aggregation_push_down");
admin reload opt_rule_blacklist;
explain format = 'brief' select sum(t1.a) from t t1 join t t2 on t1.a=t2.a;
explain format = 'brief' select sum(t1.b) from t t1 join t t2 on t1.b=t2.b;
explain format = 'brief' select sum(t1.c) from t t1 join t t2 on t1.c=t2.c;
explain format = 'brief' select sum(t1.d) from t t1 join t t2 on t1.d=t2.d;
explain format = 'brief' select sum(t1.e) from t t1 join t t2 on t1.e=t2.e;
# note that avg will be converted to count and sum, and .decimal field will be non-zero
explain format = 'brief' select avg(t1.a) from t t1 join t t2 on t1.a=t2.a;
explain format = 'brief' select avg(t1.b) from t t1 join t t2 on t1.b=t2.b;
explain format = 'brief' select avg(t1.c) from t t1 join t t2 on t1.c=t2.c;
explain format = 'brief' select avg(t1.d) from t t1 join t t2 on t1.d=t2.d;
explain format = 'brief' select avg(t1.e) from t t1 join t t2 on t1.e=t2.e;
drop table if exists t;
delete from mysql.opt_rule_blacklist where name="aggregation_push_down";
admin reload opt_rule_blacklist;
29 changes: 28 additions & 1 deletion expression/aggregation/base_func.go
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ func (a *baseFuncDesc) typeInfer4ApproxPercentile(ctx sessionctx.Context) error
return nil
}

// typeInfer4Sum should returns a "decimal", otherwise it returns a "double".
// typeInfer4Sum should return a "decimal", otherwise it returns a "double".
// Because child returns integer or decimal type.
func (a *baseFuncDesc) typeInfer4Sum(ctx sessionctx.Context) {
switch a.Args[0].GetType().Tp {
Expand Down Expand Up @@ -421,6 +421,7 @@ func (a *baseFuncDesc) WrapCastForAggArgs(ctx sessionctx.Context) {
if a.Args[i].GetType().Tp == mysql.TypeNull {
continue
}
tpOld := a.Args[i].GetType().Tp
a.Args[i] = castFunc(ctx, a.Args[i])
if a.Name != ast.AggFuncAvg && a.Name != ast.AggFuncSum {
continue
Expand All @@ -443,5 +444,31 @@ func (a *baseFuncDesc) WrapCastForAggArgs(ctx sessionctx.Context) {
originTp := a.Args[i].GetType().Tp
*(a.Args[i].GetType()) = *(a.RetTp)
a.Args[i].GetType().Tp = originTp
// refine each mysql integer type to the needed decimal precision for sum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides minimalDecimalLenForHoldingInteger, how about extract L447-L454 to another function? adjustDecimalLenForSumInteger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is so tightly coupled with the surrounding code (like the iterating index i, and the if-true-then-modify logic), which results in many input/output of the extracted function (not so clean abstraction).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func adjustDecimalLenForSumInteger(ft *FieldType, tpOld byte) {
    if types.IsTypeInteger(tpOld) && ft.Tp == mysql.TypeNewDecimal {
        if flen, err := minimalDecimalLenForHoldingInteger(tpOld); err == nil {
            ft.Flen = mathutil.Min(ft.Flen, flen+ft.Decimal)
        }
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then here is:

if a.Name == ast.AggFuncSum {
    adjustDecimalLenForSumInteger(a.Args[i].GetType(), tpOld)
}

An explicit function can be more readable for readers who don't care the details.

if a.Name == ast.AggFuncSum && types.IsTypeInteger(tpOld) {
dragonly marked this conversation as resolved.
Show resolved Hide resolved
if flen, err := minimalDecimalLenForHoldingInteger(tpOld); err != nil {
dragonly marked this conversation as resolved.
Show resolved Hide resolved
// avg could be split into sum and count, so we should take the `.Decimal` field into account
a.Args[i].GetType().Flen = mathutil.Min(a.Args[i].GetType().Flen, flen+a.Args[i].GetType().Decimal)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also explain the mathutil.Min?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just integer min.
it's already in this file before this PR.

dragonly marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
}

func minimalDecimalLenForHoldingInteger(tp byte) (int, error) {
switch tp {
case mysql.TypeTiny:
return 3, nil
case mysql.TypeShort:
return 5, nil
case mysql.TypeInt24:
return 8, nil
case mysql.TypeLong:
return 10, nil
case mysql.TypeLonglong:
return 20, nil
case mysql.TypeYear:
return 4, nil
default:
return -1, errors.Errorf("Invalid type: %v", tp)
}
}
2 changes: 1 addition & 1 deletion expression/builtin.go
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ func newBaseBuiltinFunc(ctx sessionctx.Context, funcName string, args []Expressi

// newBaseBuiltinFuncWithTp creates a built-in function signature with specified types of arguments and the return type of the function.
// argTps indicates the types of the args, retType indicates the return type of the built-in function.
// Every built-in function needs determined argTps and retType when we create it.
// Every built-in function needs to be determined argTps and retType when we create it.
func newBaseBuiltinFuncWithTp(ctx sessionctx.Context, funcName string, args []Expression, retType types.EvalType, argTps ...types.EvalType) (bf baseBuiltinFunc, err error) {
if len(args) != len(argTps) {
panic("unexpected length of args and argTps")
Expand Down
10 changes: 5 additions & 5 deletions parser/mysql/type.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,15 @@ package mysql
// MySQL type information.
const (
TypeUnspecified byte = 0
TypeTiny byte = 1
TypeShort byte = 2
TypeLong byte = 3
TypeTiny byte = 1 // TINYINT
TypeShort byte = 2 // SMALLINT
TypeLong byte = 3 // INT
TypeFloat byte = 4
TypeDouble byte = 5
TypeNull byte = 6
TypeTimestamp byte = 7
TypeLonglong byte = 8
TypeInt24 byte = 9
TypeLonglong byte = 8 // BIGINT
TypeInt24 byte = 9 // MEDIUMINT
TypeDate byte = 10
/* TypeDuration original name was TypeTime, renamed to TypeDuration to resolve the conflict with Go type Time.*/
TypeDuration byte = 11
Expand Down
2 changes: 1 addition & 1 deletion planner/core/preprocess.go
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ func TryAddExtraLimit(ctx sessionctx.Context, node ast.StmtNode) ast.StmtNode {
return node
}

// Preprocess resolves table names of the node, and checks some statements validation.
// Preprocess resolves table names of the node, and checks some statements' validation.
// preprocessReturn used to extract the infoschema for the tableName and the timestamp from the asof clause.
func Preprocess(ctx sessionctx.Context, node ast.Node, preprocessOpt ...PreprocessOpt) error {
v := preprocessor{ctx: ctx, tableAliasInJoin: make([]map[string]interface{}, 0), withName: make(map[string]interface{})}
Expand Down